2003-09-24 16:48:38

by Sergey Vlasov

[permalink] [raw]
Subject: [BUG] [PATCH 2.4] ieee1394 locking bug in nodemgr

Hello!

I have found a locking bug in ieee1394 nodemgr_host_thread() (in
Linux 2.4.22; checked at linux.bkbits.net - the code in question did
not change).

The bug manifests itself in a kudzu hang at system startup. kudzu
performs "modprobe ohci1394", then reads /proc/bus/ieee1394/devices,
then calls "modprobe -r ohci1394". At this point the modprobe
process stays in the D state forever (unkillable), and kudzu waits
forever for it to complete. Alt-SysRq-T shows this about the hung
modprobe process:

modprobe D C02C8500 0 1435 1 1690 1080 (NOTLB)
Call Trace: [<dec2a540>] [<c01076c4>] [<dec2a548>] [<dec2a548>] [<c0107810>]
[<dec2a540>] [<dec26e27>] [<dec2a598>] [<dec2a580>] [<dec233b2>] [<dec30291>]
[<dec31d40>] [<c01c245f>] [<dec3060a>] [<dec31d40>] [<c011ac5e>] [<c011a10c>]
[<c0108933>]

Proc; modprobe

>>EIP; c02c8500 <tasklist_lock+0/0> <=====

Trace; dec2a540 <[ieee1394]nodemgr_serialize+0/10>
Trace; c01076c4 <__down+54/a0>
Trace; dec2a548 <[ieee1394]nodemgr_serialize+8/10>
Trace; dec2a548 <[ieee1394]nodemgr_serialize+8/10>
Trace; c0107810 <__down_failed+8/c>
Trace; dec2a540 <[ieee1394]nodemgr_serialize+0/10>
Trace; dec26e27 <[ieee1394].text.lock.nodemgr+b9/d2>
Trace; dec2a598 <[ieee1394]nodemgr_highlevel+18/30>
Trace; dec2a580 <[ieee1394]nodemgr_highlevel+0/30>
Trace; dec233b2 <[ieee1394]highlevel_remove_host+22/40>
Trace; dec30291 <[ieee1394]dummy_max_addr+5bf1/a9c0>
Trace; dec31d40 <[ieee1394]dummy_max_addr+76a0/a9c0>
Trace; c01c245f <pci_unregister_driver+3f/60>
Trace; dec3060a <[ieee1394]dummy_max_addr+5f6a/a9c0>
Trace; dec31d40 <[ieee1394]dummy_max_addr+76a0/a9c0>
Trace; c011ac5e <free_module+1e/b0>
Trace; c011a10c <sys_delete_module+11c/1d0>
Trace; c0108933 <system_call+33/40>

This shows that the modprobe process is waiting for the
nodemgr_serialize mutex, which for some reason was never released.
I have found the point where this mutex might not be released; it is
in nodemgr_host_thread(). When the schedule_timeout(HZ/16) call in
this function is interrupted, it jumps out of the loop with (goto
caught_signal), but does not release the lock. Apparently kudzu
produces just the right timing to trigger this bug.

I have made a simple patch to fix this problem (adding
up(&nodemgr_serialize) before that goto); with this patch the kudzu
hang is no longer occuring.

--
Sergey Vlasov


Attachments:
(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments

2003-09-24 17:20:56

by Ben Collins

[permalink] [raw]
Subject: Re: [BUG] [PATCH 2.4] ieee1394 locking bug in nodemgr

On Wed, Sep 24, 2003 at 08:48:32PM +0400, Sergey Vlasov wrote:
> Hello!
>
> I have found a locking bug in ieee1394 nodemgr_host_thread() (in
> Linux 2.4.22; checked at linux.bkbits.net - the code in question did
> not change).

Good catch. Patch applied to all branches.


--
Debian - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/