2003-09-06 22:00:59

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: 2.6.0-test4-mm6: locking imbalance with rtnl_lock/unlock?

I've been playing with Bryan O'Sullivan's netplug daemon
(http://www.red-bean.com/~bos/). It uses netlink to look at carrier
state changes on network interfaces.

I'm seeing a problem however: after a while, all ifconfig commands just
block uninterruptably in __down(). From strace, it seems to be in:

ioctl(4, 0x8915...

which is SIOCGIFADDR. It seems to me the down() is actually the
rtnl_lock() called at net/ipv4/devinet.c:536 in devinet_ioctl. This
happens even when netplugd is no longer running. It looks like someone
isn't releasing the lock.

I'm going over all the uses of rtnl_lock() to see if I can find a
problem, but no sign yet. I wonder if someone might have broken this
recently: I'm running 2.6.0-test4-mm6, but I think Bryan is running an
older kernel (2.6.0-test4?), and hasn't seen any problems.

J


2003-09-06 22:25:02

by Anton Blanchard

[permalink] [raw]
Subject: Re: 2.6.0-test4-mm6: locking imbalance with rtnl_lock/unlock?


> which is SIOCGIFADDR. It seems to me the down() is actually the
> rtnl_lock() called at net/ipv4/devinet.c:536 in devinet_ioctl. This
> happens even when netplugd is no longer running. It looks like someone
> isn't releasing the lock.
>
> I'm going over all the uses of rtnl_lock() to see if I can find a
> problem, but no sign yet. I wonder if someone might have broken this
> recently: I'm running 2.6.0-test4-mm6, but I think Bryan is running an
> older kernel (2.6.0-test4?), and hasn't seen any problems.

Yep I saw this too when updating from test2 to BK from a few days ago.
>From memory the cpu that had the rtnl_lock was stuck in dev_close,
probably netif_poll_disable. I got side tracked and wasnt able to look
into it.

Anton

2003-09-07 01:46:12

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.0-test4-mm6: locking imbalance with rtnl_lock/unlock?

Anton Blanchard <[email protected]> wrote:
>
>
> > which is SIOCGIFADDR. It seems to me the down() is actually the
> > rtnl_lock() called at net/ipv4/devinet.c:536 in devinet_ioctl. This
> > happens even when netplugd is no longer running. It looks like someone
> > isn't releasing the lock.
> >
> > I'm going over all the uses of rtnl_lock() to see if I can find a
> > problem, but no sign yet. I wonder if someone might have broken this
> > recently: I'm running 2.6.0-test4-mm6, but I think Bryan is running an
> > older kernel (2.6.0-test4?), and hasn't seen any problems.
>
> Yep I saw this too when updating from test2 to BK from a few days ago.
> >From memory the cpu that had the rtnl_lock was stuck in dev_close,
> probably netif_poll_disable. I got side tracked and wasnt able to look
> into it.

If the caller of netif_poll_disable() has a signal pending,
netif_poll_disable() becomes a busy loop, which might be causing a
lockup. Probably not, but it needs to use TASK_UNINTERRUPTIBLE.

I doubt if that explains Jeremy's deadlock though...