2007-08-15 13:42:57

by Florin Iucha

[permalink] [raw]
Subject: USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

Today my USB keyboard stopped working in the middle of composing and
e-mail. I unplugged it and plugged it back, with no success. I
logged in remotely and found this lovely message:

[ 1301.567351] usb 1-4: USB disconnect, address 3
[ 1301.567356] usb 1-4.2: USB disconnect, address 5
[ 1301.567599] sysfs_remove_bin_file: bad dentry or inode or no such file: "descriptors"
[ 1301.567604]
[ 1301.567605] Call Trace:
[ 1301.567614] [<ffffffff802b89a6>] sysfs_remove_bin_file+0x39/0x3d
[ 1301.567619] [<ffffffff803f1d24>] device_remove_bin_file+0x15/0x17
[ 1301.567623] [<ffffffff8045ea0d>] usb_remove_sysfs_dev_files+0x89/0x9d
[ 1301.567627] [<ffffffff804625cc>] generic_disconnect+0x2e/0x32
[ 1301.567630] [<ffffffff8045b58d>] usb_unbind_device+0x15/0x19
[ 1301.567634] [<ffffffff803f4161>] __device_release_driver+0x93/0xb3
[ 1301.567637] [<ffffffff803f45af>] device_release_driver+0x31/0x49
[ 1301.567640] [<ffffffff803f39f1>] bus_remove_device+0x76/0x87
[ 1301.567644] [<ffffffff803f2014>] device_del+0x216/0x297
[ 1301.567648] [<ffffffff80455c8f>] usb_disconnect+0xc8/0x151
[ 1301.567651] [<ffffffff80455c56>] usb_disconnect+0x8f/0x151
[ 1301.567655] [<ffffffff804564c8>] hub_thread+0x442/0xc47
[ 1301.567659] [<ffffffff80553cdb>] _spin_unlock_irq+0x9/0xc
[ 1301.567664] [<ffffffff80244ad7>] autoremove_wake_function+0x0/0x38
[ 1301.567668] [<ffffffff80456086>] hub_thread+0x0/0xc47
[ 1301.567671] [<ffffffff802449cb>] kthread+0x49/0x76
[ 1301.567674] [<ffffffff8020c618>] child_rip+0xa/0x12
[ 1301.567679] [<ffffffff80244982>] kthread+0x0/0x76
[ 1301.567682] [<ffffffff8020c60e>] child_rip+0x0/0x12
[ 1301.567684]

I have rebooted, and while composing this message, I thought useful to
include the output from 'lsusb'. Funny enough, lsusb does not list any
devices. A 'cat /proc/bus/usb/devices' yields the following:

T: Bus=02 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh=10
B: Alloc= 25/900 us ( 3%), #Int= 5, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=0000 ProdID=0000 Rev= 2.06
S: Manufacturer=Linux 2.6.23-rc3-1 ohci_hcd
S: Product=OHCI Host Controller
S: SerialNumber=0000:00:02.0
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=02 Lev=01 Prnt=01 Port=02 Cnt=01 Dev#= 2 Spd=12 MxCh= 4
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=05f3 ProdID=0081 Rev= 3.10
S: Manufacturer=PI Engineering
S: Product=Kinesis Keyboard Hub
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr= 50mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 1 Ivl=255ms

T: Bus=02 Lev=02 Prnt=02 Port=01 Cnt=01 Dev#= 3 Spd=12 MxCh= 0
D: Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=05f3 ProdID=0007 Rev= 3.10
C:* #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr= 64mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=01 Driver=usbhid
E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=8ms
I:* If#= 1 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid
E: Ad=82(I) Atr=03(Int.) MxPS= 4 Ivl=8ms

T: Bus=02 Lev=02 Prnt=02 Port=02 Cnt=02 Dev#= 4 Spd=1.5 MxCh= 0
D: Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=05bc ProdID=0102 Rev= 2.00
S: Manufacturer=Forward
S: Product=USB Optical Mouse
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=usbhid
E: Ad=81(I) Atr=03(Int.) MxPS= 4 Ivl=10ms

T: Bus=02 Lev=02 Prnt=02 Port=03 Cnt=03 Dev#= 5 Spd=1.5 MxCh= 0
D: Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=045e ProdID=0039 Rev= 1.21
S: Manufacturer=Microsoft
S: Product=Microsoft IntelliMouse? Optical
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=usbhid
E: Ad=81(I) Atr=03(Int.) MxPS= 4 Ivl=10ms

T: Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=480 MxCh=10
B: Alloc= 0/800 us ( 0%), #Int= 1, #Iso= 0
D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0000 ProdID=0000 Rev= 2.06
S: Manufacturer=Linux 2.6.23-rc3-1 ehci_hcd
S: Product=EHCI Host Controller
S: SerialNumber=0000:00:02.1
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 4 Ivl=256ms

T: Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#= 3 Spd=480 MxCh= 4
D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0409 ProdID=0058 Rev= 1.00
S: Manufacturer=NEC Corporation
S: Product=USB2.0 Hub Controller
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 1 Ivl=256ms

T: Bus=01 Lev=02 Prnt=03 Port=01 Cnt=01 Dev#= 5 Spd=1.5 MxCh= 0
D: Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=0fe9 ProdID=9010 Rev= 1.00
S: Manufacturer=DVICO
S: Product=DVICO USB HID Remocon V1.00
C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid
E: Ad=81(I) Atr=03(Int.) MxPS= 3 Ivl=32ms
E: Ad=02(O) Atr=03(Int.) MxPS= 1 Ivl=32ms

T: Bus=01 Lev=01 Prnt=01 Port=08 Cnt=02 Dev#= 4 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=07cc ProdID=0501 Rev=91.44
S: Manufacturer=USB2.0
S: Product=CardReader
S: SerialNumber=1234609
C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms

I am testing each rcX kernel, and I did not see this problem so far.
Smells like a new regression.

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (5.69 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-15 14:39:14

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Wed, 15 Aug 2007, Florin Iucha wrote:

> Today my USB keyboard stopped working in the middle of composing and
> e-mail. I unplugged it and plugged it back, with no success. I
> logged in remotely and found this lovely message:
>
> [ 1301.567351] usb 1-4: USB disconnect, address 3
> [ 1301.567356] usb 1-4.2: USB disconnect, address 5
> [ 1301.567599] sysfs_remove_bin_file: bad dentry or inode or no such file: "descriptors"
> [ 1301.567604]
> [ 1301.567605] Call Trace:
> [ 1301.567614] [<ffffffff802b89a6>] sysfs_remove_bin_file+0x39/0x3d
> [ 1301.567619] [<ffffffff803f1d24>] device_remove_bin_file+0x15/0x17
> [ 1301.567623] [<ffffffff8045ea0d>] usb_remove_sysfs_dev_files+0x89/0x9d
> [ 1301.567627] [<ffffffff804625cc>] generic_disconnect+0x2e/0x32
> [ 1301.567630] [<ffffffff8045b58d>] usb_unbind_device+0x15/0x19
> [ 1301.567634] [<ffffffff803f4161>] __device_release_driver+0x93/0xb3
> [ 1301.567637] [<ffffffff803f45af>] device_release_driver+0x31/0x49
> [ 1301.567640] [<ffffffff803f39f1>] bus_remove_device+0x76/0x87
> [ 1301.567644] [<ffffffff803f2014>] device_del+0x216/0x297
> [ 1301.567648] [<ffffffff80455c8f>] usb_disconnect+0xc8/0x151
> [ 1301.567651] [<ffffffff80455c56>] usb_disconnect+0x8f/0x151
> [ 1301.567655] [<ffffffff804564c8>] hub_thread+0x442/0xc47
> [ 1301.567659] [<ffffffff80553cdb>] _spin_unlock_irq+0x9/0xc
> [ 1301.567664] [<ffffffff80244ad7>] autoremove_wake_function+0x0/0x38
> [ 1301.567668] [<ffffffff80456086>] hub_thread+0x0/0xc47
> [ 1301.567671] [<ffffffff802449cb>] kthread+0x49/0x76
> [ 1301.567674] [<ffffffff8020c618>] child_rip+0xa/0x12
> [ 1301.567679] [<ffffffff80244982>] kthread+0x0/0x76
> [ 1301.567682] [<ffffffff8020c60e>] child_rip+0x0/0x12

I think we can simply remove the error message. There's no obvious
reason why sysfs_remove_bin_file() should complain about attempts to
remove a nonexistent file; sysfs_remove_file() doesn't.

This patch will get rid of the annoying error messages. It won't do
anything about your keyboard's tendency to spontaneously stop working,
alas.

Alan Stern


Index: usb-2.6/fs/sysfs/bin.c
===================================================================
--- usb-2.6.orig/fs/sysfs/bin.c
+++ usb-2.6/fs/sysfs/bin.c
@@ -248,12 +248,7 @@ int sysfs_create_bin_file(struct kobject

void sysfs_remove_bin_file(struct kobject * kobj, struct bin_attribute * attr)
{
- if (sysfs_hash_and_remove(kobj->sd, attr->attr.name) < 0) {
- printk(KERN_ERR "%s: "
- "bad dentry or inode or no such file: \"%s\"\n",
- __FUNCTION__, attr->attr.name);
- dump_stack();
- }
+ sysfs_hash_and_remove(kobj->sd, attr->attr.name);
}

EXPORT_SYMBOL_GPL(sysfs_create_bin_file);

2007-08-15 14:49:29

by Jiri Kosina

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Wed, 15 Aug 2007, Florin Iucha wrote:

> Today my USB keyboard stopped working in the middle of composing and
> e-mail. I unplugged it and plugged it back, with no success. I
> logged in remotely and found this lovely message:

The error message seems unrelated to your keyboard becoming dead.

> I am testing each rcX kernel, and I did not see this problem so far.
> Smells like a new regression.

Is that reproducible, or did it happen just once? Any error message
present in log prior to that sysfs dump please?

Thanks,

--
Jiri Kosina

2007-08-15 14:50:40

by Florin Iucha

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Wed, Aug 15, 2007 at 10:38:54AM -0400, Alan Stern wrote:
> This patch will get rid of the annoying error messages. It won't do
> anything about your keyboard's tendency to spontaneously stop working,
> alas.

My keyboard works fine for days, with kernels up to and including
2.6.23-rc2 . I have booted into 2.6.23-rc3-$whatever this morning, and
after 10-15 minutes the keyboard stopped working. The mice which were
plugged in the keyboard's built-in hub were fine though.

The first time it happened, I removed the keyboard and got the oops
that started this thread. The second time, I just logged-in remotely
and rebooted, and the reboot process stopped at "KILLING all
processes" step. I simply reset the box and rebooted into 2.6.23-rc2
and it is fine since (over an hour ago).

Regards,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (909.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-15 14:53:50

by Florin Iucha

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Wed, Aug 15, 2007 at 04:49:02PM +0200, Jiri Kosina wrote:
> On Wed, 15 Aug 2007, Florin Iucha wrote:
>
> > Today my USB keyboard stopped working in the middle of composing and
> > e-mail. I unplugged it and plugged it back, with no success. I
> > logged in remotely and found this lovely message:
>
> The error message seems unrelated to your keyboard becoming dead.

Yes, it was related to me unplugging it in the hopes that a re-plug
will make it work again ;)

> > I am testing each rcX kernel, and I did not see this problem so far.
> > Smells like a new regression.
>
> Is that reproducible, or did it happen just once? Any error message
> present in log prior to that sysfs dump please?

[See my message to Alan]: It happened twice, within 15 minutes of
boot+login, with 2.6.23-rc3-$whatever . I does not happen with
2.6.2[123](-rc*)? After the two incidents, I rebooted in 2.6.23-rc2
and it is working for an hour now.

Regards,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.03 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-15 14:55:16

by Tejun Heo

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

Alan Stern wrote:
> I think we can simply remove the error message. There's no obvious
> reason why sysfs_remove_bin_file() should complain about attempts to
> remove a nonexistent file; sysfs_remove_file() doesn't.
>
> This patch will get rid of the annoying error messages. It won't do
> anything about your keyboard's tendency to spontaneously stop working,
> alas.

Agreed but I think sysfs_remove_bin_file() should relay the return value
from sysfs_has_and_remove() to the caller.

Thanks.

--
tejun

2007-08-15 14:59:01

by Jiri Kosina

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Wed, 15 Aug 2007, Florin Iucha wrote:

> [See my message to Alan]: It happened twice, within 15 minutes of
> boot+login, with 2.6.23-rc3-$whatever . I does not happen with
> 2.6.2[123](-rc*)? After the two incidents, I rebooted in 2.6.23-rc2 and
> it is working for an hour now.

It is not immediately clear what might be causing this, 2.6.23-rc3 didn't
get any USB nor HID updates at all compared to 2.6.23-rc2.

Could you please enable USB and HID debugging to see whether we can see
anything spurious in the logs at the time the keyboard gets stuck?

Bisecting this might be a bit painful if it is not reproducible in
predictable timeframes :(

--
Jiri Kosina

2007-08-15 15:21:18

by Cornelia Huck

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Wed, 15 Aug 2007 23:54:43 +0900,
Tejun Heo <[email protected]> wrote:

> Alan Stern wrote:
> > I think we can simply remove the error message. There's no obvious
> > reason why sysfs_remove_bin_file() should complain about attempts to
> > remove a nonexistent file; sysfs_remove_file() doesn't.
> >
> > This patch will get rid of the annoying error messages. It won't do
> > anything about your keyboard's tendency to spontaneously stop working,
> > alas.
>
> Agreed but I think sysfs_remove_bin_file() should relay the return value
> from sysfs_has_and_remove() to the caller.

Three comments:

- Randy made sysfs_remove_bin_file() return void in commit
995982ca79d9262869513948ec7c540f32035491.

- For symmetry reasons, sysfs_remove_file() should then also pass the
return value on.

- I'm not sure who wants to care whether they removed an existing or
non-existing file. But maybe I'm just unimaginative.

2007-08-15 15:25:00

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Wed, 15 Aug 2007, Florin Iucha wrote:

> On Wed, Aug 15, 2007 at 10:38:54AM -0400, Alan Stern wrote:
> > This patch will get rid of the annoying error messages. It won't do
> > anything about your keyboard's tendency to spontaneously stop working,
> > alas.
>
> My keyboard works fine for days, with kernels up to and including
> 2.6.23-rc2 . I have booted into 2.6.23-rc3-$whatever this morning, and
> after 10-15 minutes the keyboard stopped working. The mice which were
> plugged in the keyboard's built-in hub were fine though.
>
> The first time it happened, I removed the keyboard and got the oops
> that started this thread.

It wasn't an oops, just a warning.

> The second time, I just logged-in remotely
> and rebooted, and the reboot process stopped at "KILLING all
> processes" step. I simply reset the box and rebooted into 2.6.23-rc2
> and it is fine since (over an hour ago).

To track this down, you might try building 2.6.23-rc3 with
CONFIG_USB_DEBUG enabled. Then retrieve the dmesg log after the
keyboard stops working and post it. You probably ought to CC: the
maintainer of the HID core layer as well (and you can trim the existing
CC: list).

Alan Stern

2007-08-15 15:30:28

by Tejun Heo

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

Cornelia Huck wrote:
> On Wed, 15 Aug 2007 23:54:43 +0900,
> Tejun Heo <[email protected]> wrote:
>
>> Alan Stern wrote:
>>> I think we can simply remove the error message. There's no obvious
>>> reason why sysfs_remove_bin_file() should complain about attempts to
>>> remove a nonexistent file; sysfs_remove_file() doesn't.
>>>
>>> This patch will get rid of the annoying error messages. It won't do
>>> anything about your keyboard's tendency to spontaneously stop working,
>>> alas.
>> Agreed but I think sysfs_remove_bin_file() should relay the return value
>> from sysfs_has_and_remove() to the caller.
>
> Three comments:
>
> - Randy made sysfs_remove_bin_file() return void in commit
> 995982ca79d9262869513948ec7c540f32035491.
>
> - For symmetry reasons, sysfs_remove_file() should then also pass the
> return value on.
>
> - I'm not sure who wants to care whether they removed an existing or
> non-existing file. But maybe I'm just unimaginative.

Hmmm... Well, failure information is lost there, so I was a bit worried.
It probably doesn't really matter and can be easily changed later if
needed. If sysfs_remove_file() returns void, I have no objection.

Thanks.

--
tejun

2007-08-15 15:33:33

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Wed, 15 Aug 2007, Tejun Heo wrote:

> Alan Stern wrote:
> > I think we can simply remove the error message. There's no obvious
> > reason why sysfs_remove_bin_file() should complain about attempts to
> > remove a nonexistent file; sysfs_remove_file() doesn't.
> >
> > This patch will get rid of the annoying error messages. It won't do
> > anything about your keyboard's tendency to spontaneously stop working,
> > alas.
>
> Agreed but I think sysfs_remove_bin_file() should relay the return value
> from sysfs_has_and_remove() to the caller.

Perhaps. But none of

sysfs_remove_one()
sysfs_remove_subdir()
sysfs_remove_dir()
sysfs_remove_file()
sysfs_remove_file_from_group()
sysfs_remove_group()
sysfs_remove_link()

return a value. Why should sysfs_remove_bin_file() be different? And
what callers would pay attention to the return value?

Alan Stern

2007-08-21 11:51:25

by Florin Iucha

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Wed, Aug 15, 2007 at 04:58:33PM +0200, Jiri Kosina wrote:
> On Wed, 15 Aug 2007, Florin Iucha wrote:
>
> > [See my message to Alan]: It happened twice, within 15 minutes of
> > boot+login, with 2.6.23-rc3-$whatever . I does not happen with
> > 2.6.2[123](-rc*)? After the two incidents, I rebooted in 2.6.23-rc2 and
> > it is working for an hour now.
>
> It is not immediately clear what might be causing this, 2.6.23-rc3 didn't
> get any USB nor HID updates at all compared to 2.6.23-rc2.
>
> Could you please enable USB and HID debugging to see whether we can see
> anything spurious in the logs at the time the keyboard gets stuck?

Jiri,

I have enabled USB debugging and I see a bunch (=46) of these messages:

[ $timestamp] usb 1-9: usb auto-suspend
[ $timestamp] usb 1-9: usb auto-resume
[ $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
[ $timestamp] usb 1-9: finish resume

The messages continued to be logged, even after the keyboard has
become unresponsive.

The entire kernel log is at http://iucha.net/usb/log-2.6.23-rc3-2 .
The dump of /proc/bus/usb/devices is at http://iucha.net/usb/devices .
The output of 'lsusb -t' is at http://iucha.net/usb/lsusb-t . Plain
lsusb is not working. The version of usbutils is '0.72-7ubuntu2' .

Do you need me to build a -rc2 with USB debug enabled to compare and
contrast?

Thanks,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.48 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-21 12:04:47

by Jiri Kosina

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Tue, 21 Aug 2007, Florin Iucha wrote:

> I have enabled USB debugging and I see a bunch (=46) of these messages:

> [ $timestamp] usb 1-9: usb auto-suspend
> [ $timestamp] usb 1-9: usb auto-resume
> [ $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
> [ $timestamp] usb 1-9: finish resume
> The messages continued to be logged, even after the keyboard has
> become unresponsive.

I guess that this is the card reader being suspended and resumed
afterwards. Do you by any chance see any improvement when you

- rmmod ehci_hcd
- disable USB_AUTOSUSPEND

please? Thanks,

--
Jiri Kosina

2007-08-21 12:06:43

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

Am Dienstag 21 August 2007 schrieb Florin Iucha:
> On Wed, Aug 15, 2007 at 04:58:33PM +0200, Jiri Kosina wrote:
> > On Wed, 15 Aug 2007, Florin Iucha wrote:
> >
> > > [See my message to Alan]: It happened twice, within 15 minutes of
> > > boot+login, with 2.6.23-rc3-$whatever . I does not happen with
> > > 2.6.2[123](-rc*)? After the two incidents, I rebooted in 2.6.23-rc2 and
> > > it is working for an hour now.
> >
> > It is not immediately clear what might be causing this, 2.6.23-rc3 didn't
> > get any USB nor HID updates at all compared to 2.6.23-rc2.
> >
> > Could you please enable USB and HID debugging to see whether we can see
> > anything spurious in the logs at the time the keyboard gets stuck?
>
> Jiri,
>
> I have enabled USB debugging and I see a bunch (=46) of these messages:
>
> [ $timestamp] usb 1-9: usb auto-suspend
> [ $timestamp] usb 1-9: usb auto-resume
> [ $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
> [ $timestamp] usb 1-9: finish resume
>
> The messages continued to be logged, even after the keyboard has
> become unresponsive.

[ 60.756730] usb 1-9: usb auto-resume
Did you hit a key at that time?


It looks like your keyboard gets autosuspended. But how can that happen?
Keyboards should never autosuspend, as they are always open.
The patch for minimum autosuspend support in HID did get in earlier,
didn't it?

Regards
Oliver


2007-08-21 12:09:19

by Jiri Kosina

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Tue, 21 Aug 2007, Oliver Neukum wrote:

> It looks like your keyboard gets autosuspended. But how can that happen?
> Keyboards should never autosuspend, as they are always open. The patch
> for minimum autosuspend support in HID did get in earlier, didn't it?

Hi Oliver,

it actually even is not in mainline, it's queued in my tree for the next
merge window.

--
Jiri Kosina

2007-08-21 12:19:29

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

Am Dienstag 21 August 2007 schrieb Oliver Neukum:
> [ ? 60.756730] usb 1-9: usb auto-resume
> Did you hit a key at that time?
>
>
> It looks like your keyboard gets autosuspended. But how can that happen?
> Keyboards should never autosuspend, as they are always open.
> The patch for minimum autosuspend support in HID did get in earlier,
> didn't it?
>

Sorry disregard the question, I mistook your devices.

Regards
Oliver

2007-08-21 12:29:15

by Florin Iucha

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Tue, Aug 21, 2007 at 02:04:26PM +0200, Jiri Kosina wrote:
> > I have enabled USB debugging and I see a bunch (=46) of these messages:
>
> > [ $timestamp] usb 1-9: usb auto-suspend
> > [ $timestamp] usb 1-9: usb auto-resume
> > [ $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
> > [ $timestamp] usb 1-9: finish resume
> > The messages continued to be logged, even after the keyboard has
> > become unresponsive.
>
> I guess that this is the card reader being suspended and resumed
> afterwards. Do you by any chance see any improvement when you
>
> - rmmod ehci_hcd

It's built-in. Should I build it as a module? This machine has only
usb 2.0 ports. If I rmmod it, will my USB keyboard still work?

[The card reader is one of those that fit into a 3.5" bay, connected
straight to the motherboard controller, so it's a bit of a pain to
disconnect.]

> - disable USB_AUTOSUSPEND

You mean CONFIG_USB_SUSPEND?

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.06 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-21 12:57:28

by Florin Iucha

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Tue, Aug 21, 2007 at 06:51:15AM -0500, Florin Iucha wrote:
> On Wed, Aug 15, 2007 at 04:58:33PM +0200, Jiri Kosina wrote:
> > On Wed, 15 Aug 2007, Florin Iucha wrote:
> >
> > > [See my message to Alan]: It happened twice, within 15 minutes of
> > > boot+login, with 2.6.23-rc3-$whatever . I does not happen with
> > > 2.6.2[123](-rc*)? After the two incidents, I rebooted in 2.6.23-rc2 and
> > > it is working for an hour now.
> >
> > It is not immediately clear what might be causing this, 2.6.23-rc3 didn't
> > get any USB nor HID updates at all compared to 2.6.23-rc2.
> >
> > Could you please enable USB and HID debugging to see whether we can see
> > anything spurious in the logs at the time the keyboard gets stuck?
>
> Jiri,
>
> I have enabled USB debugging and I see a bunch (=46) of these messages:
>
> [ $timestamp] usb 1-9: usb auto-suspend
> [ $timestamp] usb 1-9: usb auto-resume
> [ $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
> [ $timestamp] usb 1-9: finish resume
>
> The messages continued to be logged, even after the keyboard has
> become unresponsive.

[snip]

> Do you need me to build a -rc2 with USB debug enabled to compare and
> contrast?

With 2.6.23-rc2 and USB_DEBUG enabled, I see the same messages but no
keyboard "dissapearance".

I have rebuilt 2.6.23-rc3 with 'CONFIG_USB_EHCI_HCD=m' and
'CONFIG_USB_SUSPEND is not set' and will use it for a while, to see if
the keyboard/usb behaves or not.

Regards,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.59 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-21 13:05:48

by Jiri Kosina

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Tue, 21 Aug 2007, Florin Iucha wrote:

> I have rebuilt 2.6.23-rc3 with 'CONFIG_USB_EHCI_HCD=m' and
> 'CONFIG_USB_SUSPEND is not set' and will use it for a while, to see if
> the keyboard/usb behaves or not.

Thanks. If this doesn't give us any hint, it would be useful if you could
do git-bisect between rc2 and rc3, I really can't immediately see anything
in the list of commits that might directly cause the behavior you are
seeing (most importantly because there were no USB and no HID updates in
this window).

There are approximately 290 commits, so it shouldn't require more than 9
reboots plus the time needed to check whether the bug triggers or not.

Thanks,

--
Jiri Kosina

2007-08-21 13:18:17

by Florin Iucha

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Tue, Aug 21, 2007 at 03:05:25PM +0200, Jiri Kosina wrote:
> > I have rebuilt 2.6.23-rc3 with 'CONFIG_USB_EHCI_HCD=m' and
> > 'CONFIG_USB_SUSPEND is not set' and will use it for a while, to see if
> > the keyboard/usb behaves or not.
>
> Thanks. If this doesn't give us any hint, it would be useful if you could
> do git-bisect between rc2 and rc3, I really can't immediately see anything
> in the list of commits that might directly cause the behavior you are
> seeing (most importantly because there were no USB and no HID updates in
> this window).

The keyboard still locked up. There is absolutely nothing in the
kernel log.

> There are approximately 290 commits, so it shouldn't require more than 9
> reboots plus the time needed to check whether the bug triggers or not.

The top commit is not v2.6.23-rc3 but

commit 28e8351ac22de25034e048c680014ad824323c65
Merge: 3b993e8... d18c4d6...
Author: Linus Torvalds <[email protected]>
Date: Tue Aug 14 10:00:29 2007 -0700

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes

I'll try to make time to bisect it...

Thanks,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.23 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-21 13:27:18

by Florin Iucha

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Tue, Aug 21, 2007 at 08:17:59AM -0500, Florin Iucha wrote:
> On Tue, Aug 21, 2007 at 03:05:25PM +0200, Jiri Kosina wrote:
> > > I have rebuilt 2.6.23-rc3 with 'CONFIG_USB_EHCI_HCD=m' and
> > > 'CONFIG_USB_SUSPEND is not set' and will use it for a while, to see if
> > > the keyboard/usb behaves or not.
> >
> > Thanks. If this doesn't give us any hint, it would be useful if you could
> > do git-bisect between rc2 and rc3, I really can't immediately see anything
> > in the list of commits that might directly cause the behavior you are
> > seeing (most importantly because there were no USB and no HID updates in
> > this window).
>
> The keyboard still locked up. There is absolutely nothing in the
> kernel log.
>
> > There are approximately 290 commits, so it shouldn't require more than 9
> > reboots plus the time needed to check whether the bug triggers or not.
>
> The top commit is not v2.6.23-rc3 but
>
> commit 28e8351ac22de25034e048c680014ad824323c65
> Merge: 3b993e8... d18c4d6...
> Author: Linus Torvalds <[email protected]>
> Date: Tue Aug 14 10:00:29 2007 -0700
>
> Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
>
> I'll try to make time to bisect it...

There is another interesting angle to this: in the past, every time I
had keyboard problems, it used to be caused by the VFS and/or NFS...
after much wrangling, a bunch of bugs were fixed (Hi Trond, Peter,
Alan!).

Now, after the keyboard "locked up", I used the mouse to close the
gnome session, then I logged-in remotely to reboot. The reboot
process locked up and I need to use the reset button! The second
time the keyboard "locked up" I listed my processes, and I noticed
that I had a couple of bash processes and a ssh process in "D" state.

Something is fishy again in the VFS ;)

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.91 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-21 13:43:38

by Jiri Kosina

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Tue, 21 Aug 2007, Florin Iucha wrote:

> There is another interesting angle to this: in the past, every time I
> had keyboard problems, it used to be caused by the VFS and/or NFS...
> after much wrangling, a bunch of bugs were fixed (Hi Trond, Peter,
> Alan!). Now, after the keyboard "locked up", I used the mouse to close
> the gnome session, then I logged-in remotely to reboot. The reboot
> process locked up and I need to use the reset button! The second time
> the keyboard "locked up" I listed my processes, and I noticed that I had
> a couple of bash processes and a ssh process in "D" state. Something is
> fishy again in the VFS ;)

Yes, there were some NFS updates in between -rc2 and
28e8351ac22de25034e048c680014ad824323c65. I'd be now even more curious
what are you going to find by bisect, please let us know.

I added Trond to CC, full thread to be found at
http://lkml.org/lkml/2007/8/21/151 for reference.

Florin, it also might be useful to capture the states of stuck processess
via alt-sysrq-T (or better by echo t > /proc/sysrq-trigger), so that we
know better where are they stuck.

--
Jiri Kosina

2007-08-21 14:51:52

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Tue, 21 Aug 2007, Jiri Kosina wrote:

> On Tue, 21 Aug 2007, Florin Iucha wrote:
>
> > I have enabled USB debugging and I see a bunch (=46) of these messages:
>
> > [ $timestamp] usb 1-9: usb auto-suspend
> > [ $timestamp] usb 1-9: usb auto-resume
> > [ $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
> > [ $timestamp] usb 1-9: finish resume
> > The messages continued to be logged, even after the keyboard has
> > become unresponsive.
>
> I guess that this is the card reader being suspended and resumed
> afterwards. Do you by any chance see any improvement when you

FYI, the card reader suspend/resume problem should be fixed by this
patch:

http://marc.info/?l=linux-usb-devel&m=118764229910761&w=2

Alan Stern

2007-08-22 13:22:14

by Florin Iucha

[permalink] [raw]
Subject: Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Tue, Aug 21, 2007 at 03:42:26PM +0200, Jiri Kosina wrote:
> On Tue, 21 Aug 2007, Florin Iucha wrote:
>
> > There is another interesting angle to this: in the past, every time I
> > had keyboard problems, it used to be caused by the VFS and/or NFS...
> > after much wrangling, a bunch of bugs were fixed (Hi Trond, Peter,
> > Alan!). Now, after the keyboard "locked up", I used the mouse to close
> > the gnome session, then I logged-in remotely to reboot. The reboot
> > process locked up and I need to use the reset button! The second time
> > the keyboard "locked up" I listed my processes, and I noticed that I had
> > a couple of bash processes and a ssh process in "D" state. Something is
> > fishy again in the VFS ;)
>
> Yes, there were some NFS updates in between -rc2 and
> 28e8351ac22de25034e048c680014ad824323c65. I'd be now even more curious
> what are you going to find by bisect, please let us know.
>
> I added Trond to CC, full thread to be found at
> http://lkml.org/lkml/2007/8/21/151 for reference.
>
> Florin, it also might be useful to capture the states of stuck processess
> via alt-sysrq-T (or better by echo t > /proc/sysrq-trigger), so that we
> know better where are they stuck.

This morning it took a bit longer to hang, but it happened. The
backtraces are at http://iucha.net/2.6.23-rc3/backtraces.gz .

I'll try a bisect session this weekend.

Cheers,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.48 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-23 12:52:21

by Florin Iucha

[permalink] [raw]
Subject: NFS woes again Was: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

Trond,

Fess up... I'm closing in:

http://iucha.net/2.6.23-rc3/2.6.23-rc-bisect.png

[Dropping Jiri and linux-usb-devel from future postings. You are
included now just for communicating the conclusion of this thread.]

On Wed, Aug 22, 2007 at 08:22:00AM -0500, Florin Iucha wrote:
> On Tue, Aug 21, 2007 at 03:42:26PM +0200, Jiri Kosina wrote:
> > On Tue, 21 Aug 2007, Florin Iucha wrote:
> >
> > > There is another interesting angle to this: in the past, every time I
> > > had keyboard problems, it used to be caused by the VFS and/or NFS...
> > > after much wrangling, a bunch of bugs were fixed (Hi Trond, Peter,
> > > Alan!). Now, after the keyboard "locked up", I used the mouse to close
> > > the gnome session, then I logged-in remotely to reboot. The reboot
> > > process locked up and I need to use the reset button! The second time
> > > the keyboard "locked up" I listed my processes, and I noticed that I had
> > > a couple of bash processes and a ssh process in "D" state. Something is
> > > fishy again in the VFS ;)
> >
> > Yes, there were some NFS updates in between -rc2 and
> > 28e8351ac22de25034e048c680014ad824323c65. I'd be now even more curious
> > what are you going to find by bisect, please let us know.
> >
> > I added Trond to CC, full thread to be found at
> > http://lkml.org/lkml/2007/8/21/151 for reference.
> >
> > Florin, it also might be useful to capture the states of stuck processess
> > via alt-sysrq-T (or better by echo t > /proc/sysrq-trigger), so that we
> > know better where are they stuck.
>
> This morning it took a bit longer to hang, but it happened. The
> backtraces are at http://iucha.net/2.6.23-rc3/backtraces.gz .
>
> I'll try a bisect session this weekend.

florin@zeus $ git bisect bad
Bisecting: 5 revisions left to test after this
florin@zeus $ git bisect log
git-bisect start
# good: [d4ac2477fad0f2680e84ec12e387ce67682c5c13] Linux 2.6.23-rc2
git-bisect good d4ac2477fad0f2680e84ec12e387ce67682c5c13
# bad: [28e8351ac22de25034e048c680014ad824323c65] Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
git-bisect bad 28e8351ac22de25034e048c680014ad824323c65
# bad: [8f2ea1fd3f97ab7a809e939b5b9005a16f862439] [POWERPC] Fix initialization and usage of dma_mask
git-bisect bad 8f2ea1fd3f97ab7a809e939b5b9005a16f862439
# good: [ff95f3df54609d9d4b9572f8a67d09922a645043] sched: remove the 'u64 now' parameter from pick_next_task()
git-bisect good ff95f3df54609d9d4b9572f8a67d09922a645043
# good: [be12014dd7750648fde33e1e45cac24dc9a8be6d] Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
git-bisect good be12014dd7750648fde33e1e45cac24dc9a8be6d
# good: [6a0ed91e361a93ee1efb4c20c4967024ed2a8dd7] hexdump: use const notation
git-bisect good 6a0ed91e361a93ee1efb4c20c4967024ed2a8dd7
# bad: [6adb31c90c47262c8a25bf5097de9b3426caf3ae] remove dubious legal statment from uio-howto
git-bisect bad 6adb31c90c47262c8a25bf5097de9b3426caf3ae

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (3.00 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-23 17:14:49

by Bret Towe

[permalink] [raw]
Subject: Re: NFS woes again Was: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On 8/23/07, Florin Iucha <[email protected]> wrote:
> Trond,
>
> Fess up... I'm closing in:
>
> http://iucha.net/2.6.23-rc3/2.6.23-rc-bisect.png
>
> [Dropping Jiri and linux-usb-devel from future postings. You are
> included now just for communicating the conclusion of this thread.]
>
> On Wed, Aug 22, 2007 at 08:22:00AM -0500, Florin Iucha wrote:
> > On Tue, Aug 21, 2007 at 03:42:26PM +0200, Jiri Kosina wrote:
> > > On Tue, 21 Aug 2007, Florin Iucha wrote:
> > >
> > > > There is another interesting angle to this: in the past, every time I
> > > > had keyboard problems, it used to be caused by the VFS and/or NFS...
> > > > after much wrangling, a bunch of bugs were fixed (Hi Trond, Peter,
> > > > Alan!). Now, after the keyboard "locked up", I used the mouse to close
> > > > the gnome session, then I logged-in remotely to reboot. The reboot
> > > > process locked up and I need to use the reset button! The second time
> > > > the keyboard "locked up" I listed my processes, and I noticed that I had
> > > > a couple of bash processes and a ssh process in "D" state. Something is
> > > > fishy again in the VFS ;)
> > >
> > > Yes, there were some NFS updates in between -rc2 and
> > > 28e8351ac22de25034e048c680014ad824323c65. I'd be now even more curious
> > > what are you going to find by bisect, please let us know.
> > >
> > > I added Trond to CC, full thread to be found at
> > > http://lkml.org/lkml/2007/8/21/151 for reference.
> > >
> > > Florin, it also might be useful to capture the states of stuck processess
> > > via alt-sysrq-T (or better by echo t > /proc/sysrq-trigger), so that we
> > > know better where are they stuck.
> >
> > This morning it took a bit longer to hang, but it happened. The
> > backtraces are at http://iucha.net/2.6.23-rc3/backtraces.gz .
> >
> > I'll try a bisect session this weekend.
>
> florin@zeus $ git bisect bad
> Bisecting: 5 revisions left to test after this
> florin@zeus $ git bisect log
> git-bisect start
> # good: [d4ac2477fad0f2680e84ec12e387ce67682c5c13] Linux 2.6.23-rc2
> git-bisect good d4ac2477fad0f2680e84ec12e387ce67682c5c13
> # bad: [28e8351ac22de25034e048c680014ad824323c65] Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
> git-bisect bad 28e8351ac22de25034e048c680014ad824323c65
> # bad: [8f2ea1fd3f97ab7a809e939b5b9005a16f862439] [POWERPC] Fix initialization and usage of dma_mask
> git-bisect bad 8f2ea1fd3f97ab7a809e939b5b9005a16f862439
> # good: [ff95f3df54609d9d4b9572f8a67d09922a645043] sched: remove the 'u64 now' parameter from pick_next_task()
> git-bisect good ff95f3df54609d9d4b9572f8a67d09922a645043
> # good: [be12014dd7750648fde33e1e45cac24dc9a8be6d] Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
> git-bisect good be12014dd7750648fde33e1e45cac24dc9a8be6d
> # good: [6a0ed91e361a93ee1efb4c20c4967024ed2a8dd7] hexdump: use const notation
> git-bisect good 6a0ed91e361a93ee1efb4c20c4967024ed2a8dd7
> # bad: [6adb31c90c47262c8a25bf5097de9b3426caf3ae] remove dubious legal statment from uio-howto
> git-bisect bad 6adb31c90c47262c8a25bf5097de9b3426caf3ae
>
> florin
>
> --
> Bruce Schneier expects the Spanish Inquisition.
> http://geekz.co.uk/schneierfacts/fact/163
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFGzYL6ND0rFCN2b1sRAqMlAJ9hvBi5oVBeRYZfNwXDG3EmJNgQ4ACbB4V8
> koRJC/8+P1x600SSS51NvZE=
> =+Adv
> -----END PGP SIGNATURE-----

this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6

2007-08-23 17:36:48

by Florin Iucha

[permalink] [raw]
Subject: Re: NFS woes again Was: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Thu, Aug 23, 2007 at 10:14:38AM -0700, Bret Towe wrote:
> this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6

Yes, it certainly does -- all the symptoms match!

I'm not [alone in] seeing dead keyboards!

Now, if only somebody could clarify to me the connection between
the bad NFS4 shooting the keyboard but not the mouse, that would
be wonderful.

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (556.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-27 13:17:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS woes again Was: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On Thu, 2007-08-23 at 12:36 -0500, Florin Iucha wrote:
> On Thu, Aug 23, 2007 at 10:14:38AM -0700, Bret Towe wrote:
> > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
>
> Yes, it certainly does -- all the symptoms match!
>
> I'm not [alone in] seeing dead keyboards!
>
> Now, if only somebody could clarify to me the connection between
> the bad NFS4 shooting the keyboard but not the mouse, that would
> be wonderful.
>
> florin

Could you and Bret please check if the attached patch fixes the hang?

Cheers
Trond



Attachments:
linux-2.6.23-001-fix_cancel_work_hang.dif (2.27 kB)

2007-08-28 01:19:40

by Bret Towe

[permalink] [raw]
Subject: Re: NFS woes again Was: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351

On 8/27/07, Trond Myklebust <[email protected]> wrote:
> On Thu, 2007-08-23 at 12:36 -0500, Florin Iucha wrote:
> > On Thu, Aug 23, 2007 at 10:14:38AM -0700, Bret Towe wrote:
> > > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> >
> > Yes, it certainly does -- all the symptoms match!
> >
> > I'm not [alone in] seeing dead keyboards!
> >
> > Now, if only somebody could clarify to me the connection between
> > the bad NFS4 shooting the keyboard but not the mouse, that would
> > be wonderful.
> >
> > florin
>
> Could you and Bret please check if the attached patch fixes the hang?

no good for me still hangs after ~30minutes

> Cheers
> Trond
>
>
>
>
> ---------- Forwarded message ----------
> From: Trond Myklebust <[email protected]>
> To:
> Date: Mon, 27 Aug 2007 09:14:56 -0400
> Subject: No Subject
> We need to ensure that nobody adds anything to nfs_automount_list while we
> are killing off the work queue entry, or else nfs_expire_automounts will
> simply rearm it, and we hang.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> fs/nfs/namespace.c | 14 +++++++++++++-
> 1 files changed, 13 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
> index aea76d0..bcd0777 100644
> --- a/fs/nfs/namespace.c
> +++ b/fs/nfs/namespace.c
> @@ -22,6 +22,11 @@ static void nfs_expire_automounts(struct work_struct *work);
>
> LIST_HEAD(nfs_automount_list);
> static DECLARE_DELAYED_WORK(nfs_automount_task, nfs_expire_automounts);
> +/*
> + * The following mutex prevents nfs_follow_mountpoint from adding new
> + * entries to nfs_automount_list
> + */
> +static DEFINE_MUTEX(nfs_automount_mutex);
> int nfs_mountpoint_expiry_timeout = 500 * HZ;
>
> static struct vfsmount *nfs_do_submount(const struct vfsmount *mnt_parent,
> @@ -128,18 +133,21 @@ static void * nfs_follow_mountpoint(struct dentry *dentry, struct nameidata *nd)
> goto out_err;
>
> mntget(mnt);
> + mutex_lock(&nfs_automount_mutex);
> err = do_add_mount(mnt, nd, nd->mnt->mnt_flags|MNT_SHRINKABLE, &nfs_automount_list);
> if (err < 0) {
> + mutex_unlock(&nfs_automount_mutex);
> mntput(mnt);
> if (err == -EBUSY)
> goto out_follow;
> goto out_err;
> }
> + schedule_delayed_work(&nfs_automount_task, nfs_mountpoint_expiry_timeout);
> + mutex_unlock(&nfs_automount_mutex);
> mntput(nd->mnt);
> dput(nd->dentry);
> nd->mnt = mnt;
> nd->dentry = dget(mnt->mnt_root);
> - schedule_delayed_work(&nfs_automount_task, nfs_mountpoint_expiry_timeout);
> out:
> dprintk("%s: done, returned %d\n", __FUNCTION__, err);
>
> @@ -175,8 +183,12 @@ static void nfs_expire_automounts(struct work_struct *work)
>
> void nfs_release_automount_timer(void)
> {
> + if (!list_empty(&nfs_automount_list))
> + return;
> + mutex_lock(&nfs_automount_mutex);
> if (list_empty(&nfs_automount_list))
> cancel_delayed_work_sync(&nfs_automount_task);
> + mutex_unlock(&nfs_automount_mutex);
> }
>
> /*
>
>

2007-08-28 01:35:53

by Florin Iucha

[permalink] [raw]
Subject: Re: NFS woes again

On Mon, Aug 27, 2007 at 06:19:29PM -0700, Bret Towe wrote:
> On 8/27/07, Trond Myklebust <[email protected]> wrote:
> > > > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> > >
> > > Yes, it certainly does -- all the symptoms match!
> >
> > Could you and Bret please check if the attached patch fixes the hang?
>
> no good for me still hangs after ~30minutes

I just booted into the new kernel
(3d39c691ff486142dd9aaeac12f553f4476b7a6 + Trond's patch) and it hangs
in 10-15 minutes.

Process traces available at http://iucha.net/nfs/23-rc2-nfs-fix-1/kernel.log.gz

Regards,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (797.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-28 13:28:59

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS woes again

On Mon, 2007-08-27 at 20:35 -0500, Florin Iucha wrote:
> On Mon, Aug 27, 2007 at 06:19:29PM -0700, Bret Towe wrote:
> > On 8/27/07, Trond Myklebust <[email protected]> wrote:
> > > > > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > > > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> > > >
> > > > Yes, it certainly does -- all the symptoms match!
> > >
> > > Could you and Bret please check if the attached patch fixes the hang?
> >
> > no good for me still hangs after ~30minutes
>
> I just booted into the new kernel
> (3d39c691ff486142dd9aaeac12f553f4476b7a6 + Trond's patch) and it hangs
> in 10-15 minutes.
>
> Process traces available at http://iucha.net/nfs/23-rc2-nfs-fix-1/kernel.log.gz
>
> Regards,
> florin

Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be
called recursively.

The following patch should be correct. Please just discard the previous
one...

Trond


Attachments:
linux-2.6.23-001-fix_cancel_work_hang.dif (979.00 B)

2007-08-29 03:27:18

by Florin Iucha

[permalink] [raw]
Subject: Re: NFS woes again

On Tue, Aug 28, 2007 at 09:28:43AM -0400, Trond Myklebust wrote:
> Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be
> called recursively.
>
> The following patch should be correct. Please just discard the previous
> one...

So far so good. This patch got one hour uptime... I'll stay with
this kernel for a few days, to keep an eye on it.

Thanks,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (479.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-08-29 05:52:37

by Bret Towe

[permalink] [raw]
Subject: Re: NFS woes again

On 8/28/07, Trond Myklebust <[email protected]> wrote:
> On Mon, 2007-08-27 at 20:35 -0500, Florin Iucha wrote:
> > On Mon, Aug 27, 2007 at 06:19:29PM -0700, Bret Towe wrote:
> > > On 8/27/07, Trond Myklebust <[email protected]> wrote:
> > > > > > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > > > > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> > > > >
> > > > > Yes, it certainly does -- all the symptoms match!
> > > >
> > > > Could you and Bret please check if the attached patch fixes the hang?
> > >
> > > no good for me still hangs after ~30minutes
> >
> > I just booted into the new kernel
> > (3d39c691ff486142dd9aaeac12f553f4476b7a6 + Trond's patch) and it hangs
> > in 10-15 minutes.
> >
> > Process traces available at http://iucha.net/nfs/23-rc2-nfs-fix-1/kernel.log.gz
> >
> > Regards,
> > florin
>
> Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be
> called recursively.
>
> The following patch should be correct. Please just discard the previous
> one...
>
> Trond
>

uptime of 3 hours and keyboard is still working fine
I'll hopefully get to test this on the mini tomorrow for at least 3 hours also

>
> ---------- Forwarded message ----------
> From: Trond Myklebust <[email protected]>
> To:
> Date: Mon, 27 Aug 2007 09:14:56 -0400
> Subject: No Subject
> Doh! We can't use cancel_delayed_work_sync because we may have been called
> from an unmount that was being performed by nfs_automount_task.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> fs/nfs/namespace.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
> index aea76d0..acfc56f 100644
> --- a/fs/nfs/namespace.c
> +++ b/fs/nfs/namespace.c
> @@ -176,7 +176,7 @@ static void nfs_expire_automounts(struct work_struct *work)
> void nfs_release_automount_timer(void)
> {
> if (list_empty(&nfs_automount_list))
> - cancel_delayed_work_sync(&nfs_automount_task);
> + cancel_delayed_work(&nfs_automount_task);
> }
>
> /*
>
>

2007-08-30 22:19:07

by Bret Towe

[permalink] [raw]
Subject: Re: NFS woes again

On 8/28/07, Bret Towe <[email protected]> wrote:
> On 8/28/07, Trond Myklebust <[email protected]> wrote:
> > On Mon, 2007-08-27 at 20:35 -0500, Florin Iucha wrote:
> > > On Mon, Aug 27, 2007 at 06:19:29PM -0700, Bret Towe wrote:
> > > > On 8/27/07, Trond Myklebust <[email protected]> wrote:
> > > > > > > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > > > > > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> > > > > >
> > > > > > Yes, it certainly does -- all the symptoms match!
> > > > >
> > > > > Could you and Bret please check if the attached patch fixes the hang?
> > > >
> > > > no good for me still hangs after ~30minutes
> > >
> > > I just booted into the new kernel
> > > (3d39c691ff486142dd9aaeac12f553f4476b7a6 + Trond's patch) and it hangs
> > > in 10-15 minutes.
> > >
> > > Process traces available at http://iucha.net/nfs/23-rc2-nfs-fix-1/kernel.log.gz
> > >
> > > Regards,
> > > florin
> >
> > Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be
> > called recursively.
> >
> > The following patch should be correct. Please just discard the previous
> > one...
> >
> > Trond
> >
>
> uptime of 3 hours and keyboard is still working fine
> I'll hopefully get to test this on the mini tomorrow for at least 3 hours also

got 45min on mini before I had to go elsewhere
the amd64 shutdown fine and has been up for more than 3 hours
I'd say the patch does it

> >
> > ---------- Forwarded message ----------
> > From: Trond Myklebust <[email protected]>
> > To:
> > Date: Mon, 27 Aug 2007 09:14:56 -0400
> > Subject: No Subject
> > Doh! We can't use cancel_delayed_work_sync because we may have been called
> > from an unmount that was being performed by nfs_automount_task.
> >
> > Signed-off-by: Trond Myklebust <[email protected]>
> > ---
> >
> > fs/nfs/namespace.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
> > index aea76d0..acfc56f 100644
> > --- a/fs/nfs/namespace.c
> > +++ b/fs/nfs/namespace.c
> > @@ -176,7 +176,7 @@ static void nfs_expire_automounts(struct work_struct *work)
> > void nfs_release_automount_timer(void)
> > {
> > if (list_empty(&nfs_automount_list))
> > - cancel_delayed_work_sync(&nfs_automount_task);
> > + cancel_delayed_work(&nfs_automount_task);
> > }
> >
> > /*
> >
> >
>

2007-08-30 23:15:14

by Florin Iucha

[permalink] [raw]
Subject: Re: NFS woes again

On Thu, Aug 30, 2007 at 03:18:37PM -0700, Bret Towe wrote:
> > uptime of 3 hours and keyboard is still working fine
> > I'll hopefully get to test this on the mini tomorrow for at least 3 hours also
>
> got 45min on mini before I had to go elsewhere
> the amd64 shutdown fine and has been up for more than 3 hours
> I'd say the patch does it

Yup. Same here. Many startups, shuthdowns and minutes of uptime,
with no observations. Check it in!

Thanks,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (564.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments