Hi
It looks like there is something weird as my systems stops when the
swap is mounted.
I've played bisection game and this is the commit which makes the
system unusable:
# bad: [45b503548210fe6f23e92b856421c2a3f05fd034] [RTNETLINK]: Send a
single notification on device state changes.
git-bisect bad 45b503548210fe6f23e92b856421c2a3f05fd034
I've tried to reverse this commit - and it has compiled & worked.
Zdenek
On Sun, 17 Feb 2008, Zdenek Kabelac wrote:
> It looks like there is something weird as my systems stops when the swap
> is mounted. I've played bisection game and this is the commit which
> makes the system unusable:
> # bad: [45b503548210fe6f23e92b856421c2a3f05fd034] [RTNETLINK]: Send a
> single notification on device state changes.
> git-bisect bad 45b503548210fe6f23e92b856421c2a3f05fd034
> I've tried to reverse this commit - and it has compiled & worked.
This commit is completely broken (it, for example, breaks locking around
dev->link_mode), as has been already mentioned by Rafael at
http://lkml.org/lkml/2008/2/15/542
Dave, do you have a proper fix queued? Otherwise I would propose just to
revert it completely from Linus' tree for now.
--
Jiri Kosina
From: Jiri Kosina <[email protected]>
Date: Sun, 17 Feb 2008 13:04:59 +0100 (CET)
> On Sun, 17 Feb 2008, Zdenek Kabelac wrote:
>
> > It looks like there is something weird as my systems stops when the swap
> > is mounted. I've played bisection game and this is the commit which
> > makes the system unusable:
> > # bad: [45b503548210fe6f23e92b856421c2a3f05fd034] [RTNETLINK]: Send a
> > single notification on device state changes.
> > git-bisect bad 45b503548210fe6f23e92b856421c2a3f05fd034
> > I've tried to reverse this commit - and it has compiled & worked.
>
> This commit is completely broken (it, for example, breaks locking around
> dev->link_mode), as has been already mentioned by Rafael at
> http://lkml.org/lkml/2008/2/15/542
>
> Dave, do you have a proper fix queued? Otherwise I would propose just to
> revert it completely from Linus' tree for now.
I just reverted and I'll push that to Linus.
David Miller wrote:
> From: Jiri Kosina <[email protected]>
> Date: Sun, 17 Feb 2008 13:04:59 +0100 (CET)
>
>> On Sun, 17 Feb 2008, Zdenek Kabelac wrote:
>>
>>> It looks like there is something weird as my systems stops when the swap
>>> is mounted. I've played bisection game and this is the commit which
>>> makes the system unusable:
>>> # bad: [45b503548210fe6f23e92b856421c2a3f05fd034] [RTNETLINK]: Send a
>>> single notification on device state changes.
>>> git-bisect bad 45b503548210fe6f23e92b856421c2a3f05fd034
>>> I've tried to reverse this commit - and it has compiled & worked.
>> This commit is completely broken (it, for example, breaks locking around
>> dev->link_mode), as has been already mentioned by Rafael at
>> http://lkml.org/lkml/2008/2/15/542
>>
>> Dave, do you have a proper fix queued? Otherwise I would propose just to
>> revert it completely from Linus' tree for now.
>
> I just reverted and I'll push that to Linus.
>
Okay, but I can't figure out what's the problem with it. I don't have
wireless card on my linux box also I can't test it but everything else
works. Swap is mounted.
The concurrency cannot be a problem because the write operation is
protected by a lock.
My only idea is the notification itself, because the netdev_state_change
function is used which calls rtmsg_ifinfo too.
void netdev_state_change(struct net_device *dev)
{
if (dev->flags & IFF_UP) {
call_netdevice_notifiers(NETDEV_CHANGE, dev);
rtmsg_ifinfo(RTM_NEWLINK, dev, 0);
}
}
Also last lines of my patch could be instead of
if (modified)
netdev_state_change(dev);
the following:
if (modified && dev->flags & IFF_UP) {
call_netdevice_notifiers(NETDEV_CHANGE, dev);
}
Regards,
Attila
On Mon, 18 Feb 2008, Laszlo Attila Toth wrote:
> Okay, but I can't figure out what's the problem with it. I don't have
> wireless card on my linux box also I can't test it but everything else
> works. Swap is mounted. The concurrency cannot be a problem because the
> write operation is protected by a lock.
- write_lock_bh(&dev_base_lock);
- dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
- write_unlock_bh(&dev_base_lock);
+ if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
+ write_lock_bh(&dev_base_lock);
+ dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
+ write_lock_bh(&dev_base_lock);
+ modified = 1;
+ }
}
1) you are accessing dev->link_mode and tb[] outside the dev_base_lock
2) there is obvious and immediate deadlock -- you acquire the
dev_base_lock twice, without any unlock, just look at the chunk above
3) even with this deadlock fixed, Rafael states that either NM or
wpa_supplicant (I don't recall from top of my head) still don't work
--
Jiri Kosina
SUSE Labs
Jiri Kosina wrote:
> On Mon, 18 Feb 2008, Laszlo Attila Toth wrote:
>
>> Okay, but I can't figure out what's the problem with it. I don't have
>> wireless card on my linux box also I can't test it but everything else
>> works. Swap is mounted. The concurrency cannot be a problem because the
>> write operation is protected by a lock.
>
> - write_lock_bh(&dev_base_lock);
> - dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> - write_unlock_bh(&dev_base_lock);
> + if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
> + write_lock_bh(&dev_base_lock);
> + dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> + write_lock_bh(&dev_base_lock);
> + modified = 1;
> + }
> }
>
> 1) you are accessing dev->link_mode and tb[] outside the dev_base_lock
yes, because tb[IFLA_LINKMODE] is not used by someone else in this case
only dev->link_mode. Although its value is unpredictable in case of a
concurrent access in the condition, it does not affect the final value
of dev->link_mode but the length of the critical section remains
minimal. The if statement may be inside the lock.
> 2) there is obvious and immediate deadlock -- you acquire the
> dev_base_lock twice, without any unlock, just look at the chunk above
Indeed:
"Feb 16 16:51:49 sandman kernel: BUG: rwlock recursion on CPU#0,"
I missed it. I copied the code from another patch which didn't contain
the two locking statements and when I copied them back it became a
copy-paste bug.
> 3) even with this deadlock fixed, Rafael states that either NM or
> wpa_supplicant (I don't recall from top of my head) still don't work
That's bad. Does my suggestion solve the problem? Again:
- if (modified)
- netdev_state_change(dev);
+ if (modified && dev->flags & IFF_UP)
+ call_netdevice_notifiers(NETDEV_CHANGE, dev)
Regards,
Attila
On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> Jiri Kosina wrote:
> > On Mon, 18 Feb 2008, Laszlo Attila Toth wrote:
> >
> >> Okay, but I can't figure out what's the problem with it. I don't have
> >> wireless card on my linux box also I can't test it but everything else
> >> works. Swap is mounted. The concurrency cannot be a problem because the
> >> write operation is protected by a lock.
> >
> > - write_lock_bh(&dev_base_lock);
> > - dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> > - write_unlock_bh(&dev_base_lock);
> > + if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
> > + write_lock_bh(&dev_base_lock);
> > + dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> > + write_lock_bh(&dev_base_lock);
> > + modified = 1;
> > + }
> > }
> >
> > 1) you are accessing dev->link_mode and tb[] outside the dev_base_lock
>
> yes, because tb[IFLA_LINKMODE] is not used by someone else in this case
> only dev->link_mode. Although its value is unpredictable in case of a
> concurrent access in the condition, it does not affect the final value
> of dev->link_mode but the length of the critical section remains
> minimal. The if statement may be inside the lock.
>
> > 2) there is obvious and immediate deadlock -- you acquire the
> > dev_base_lock twice, without any unlock, just look at the chunk above
>
> Indeed:
> "Feb 16 16:51:49 sandman kernel: BUG: rwlock recursion on CPU#0,"
>
> I missed it. I copied the code from another patch which didn't contain
> the two locking statements and when I copied them back it became a
> copy-paste bug.
>
>
> > 3) even with this deadlock fixed, Rafael states that either NM or
> > wpa_supplicant (I don't recall from top of my head) still don't work
>
> That's bad. Does my suggestion solve the problem? Again:
>
> - if (modified)
> - netdev_state_change(dev);
> + if (modified && dev->flags & IFF_UP)
> + call_netdevice_notifiers(NETDEV_CHANGE, dev)
All in all, I gather you wanted me to test the patch below. :-)
Yes, that helps.
Thanks,
Rafael
---
Fix net/core/rtnetlink.c breakage caused by commit
45b503548210fe6f23e92b856421c2a3f05fd034
"[RTNETLINK]: Send a single notification on device state changes."
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
net/core/rtnetlink.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
Index: linux-2.6/net/core/rtnetlink.c
===================================================================
--- linux-2.6.orig/net/core/rtnetlink.c
+++ linux-2.6/net/core/rtnetlink.c
@@ -853,7 +853,7 @@ static int do_setlink(struct net_device
if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
write_lock_bh(&dev_base_lock);
dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
- write_lock_bh(&dev_base_lock);
+ write_unlock_bh(&dev_base_lock);
modified = 1;
}
}
@@ -870,8 +870,8 @@ errout:
if (send_addr_notify)
call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
- if (modified)
- netdev_state_change(dev);
+ if (modified && dev->flags & IFF_UP)
+ call_netdevice_notifiers(NETDEV_CHANGE, dev);
return err;
}
Hello,
Rafael J. Wysocki wrote:
> On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
>
>
> All in all, I gather you wanted me to test the patch below. :-)
>
> Yes, that helps.
Thanks for the testing. Dave already reverted the patch, also I don't
know I should resend it or leave it as is.
Regards,
Attila
> Thanks,
> Rafael
>
> ---
> Fix net/core/rtnetlink.c breakage caused by commit
> 45b503548210fe6f23e92b856421c2a3f05fd034
> "[RTNETLINK]: Send a single notification on device state changes."
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> net/core/rtnetlink.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> Index: linux-2.6/net/core/rtnetlink.c
> ===================================================================
> --- linux-2.6.orig/net/core/rtnetlink.c
> +++ linux-2.6/net/core/rtnetlink.c
> @@ -853,7 +853,7 @@ static int do_setlink(struct net_device
> if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
> write_lock_bh(&dev_base_lock);
> dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> - write_lock_bh(&dev_base_lock);
> + write_unlock_bh(&dev_base_lock);
> modified = 1;
> }
> }
> @@ -870,8 +870,8 @@ errout:
> if (send_addr_notify)
> call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
>
> - if (modified)
> - netdev_state_change(dev);
> + if (modified && dev->flags & IFF_UP)
> + call_netdevice_notifiers(NETDEV_CHANGE, dev);
>
> return err;
> }
>
On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> Hello,
>
> Rafael J. Wysocki wrote:
> > On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> >
> >
> > All in all, I gather you wanted me to test the patch below. :-)
> >
> > Yes, that helps.
>
> Thanks for the testing. Dave already reverted the patch, also I don't
> know I should resend it or leave it as is.
Since the patch has been reverted, I think you can fold the fix into it
and send it again for the next merge window (unless it's _really_ urgent).
Of course, if that's fine by Dave. :-)
Thanks,
Rafael
From: Laszlo Attila Toth <[email protected]>
Date: Mon, 18 Feb 2008 18:03:47 +0100
> Hello,
>
> Rafael J. Wysocki wrote:
> > On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> >
> >
> > All in all, I gather you wanted me to test the patch below. :-)
> >
> > Yes, that helps.
>
> Thanks for the testing. Dave already reverted the patch, also I don't
> know I should resend it or leave it as is.
After this sits and gets testing and feedback for a few days
you'll need to resubmit the entire original patch with these
fixes added on top.
For now, it's staying reverted.
On Mon, Feb 18 2008, David Miller wrote:
> From: Laszlo Attila Toth <[email protected]>
> Date: Mon, 18 Feb 2008 18:03:47 +0100
>
> > Hello,
> >
> > Rafael J. Wysocki wrote:
> > > On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> > >
> > >
> > > All in all, I gather you wanted me to test the patch below. :-)
> > >
> > > Yes, that helps.
> >
> > Thanks for the testing. Dave already reverted the patch, also I don't
> > know I should resend it or leave it as is.
>
> After this sits and gets testing and feedback for a few days
> you'll need to resubmit the entire original patch with these
> fixes added on top.
>
> For now, it's staying reverted.
Can we please get the revert pushed out please? It renders my laptop
unusable and I keep forgetting to revert that pesky change when testing
new kernels.
--
Jens Axboe
From: Jens Axboe <[email protected]>
Date: Tue, 19 Feb 2008 11:54:19 +0100
> On Mon, Feb 18 2008, David Miller wrote:
> > From: Laszlo Attila Toth <[email protected]>
> > Date: Mon, 18 Feb 2008 18:03:47 +0100
> >
> > > Hello,
> > >
> > > Rafael J. Wysocki wrote:
> > > > On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> > > >
> > > >
> > > > All in all, I gather you wanted me to test the patch below. :-)
> > > >
> > > > Yes, that helps.
> > >
> > > Thanks for the testing. Dave already reverted the patch, also I don't
> > > know I should resend it or leave it as is.
> >
> > After this sits and gets testing and feedback for a few days
> > you'll need to resubmit the entire original patch with these
> > fixes added on top.
> >
> > For now, it's staying reverted.
>
> Can we please get the revert pushed out please? It renders my laptop
> unusable and I keep forgetting to revert that pesky change when testing
> new kernels.
I sent a push to Linus an hour or so ago, he should pick it up
in the morning.
On Tue, Feb 19 2008, David Miller wrote:
> From: Jens Axboe <[email protected]>
> Date: Tue, 19 Feb 2008 11:54:19 +0100
>
> > On Mon, Feb 18 2008, David Miller wrote:
> > > From: Laszlo Attila Toth <[email protected]>
> > > Date: Mon, 18 Feb 2008 18:03:47 +0100
> > >
> > > > Hello,
> > > >
> > > > Rafael J. Wysocki wrote:
> > > > > On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> > > > >
> > > > >
> > > > > All in all, I gather you wanted me to test the patch below. :-)
> > > > >
> > > > > Yes, that helps.
> > > >
> > > > Thanks for the testing. Dave already reverted the patch, also I don't
> > > > know I should resend it or leave it as is.
> > >
> > > After this sits and gets testing and feedback for a few days
> > > you'll need to resubmit the entire original patch with these
> > > fixes added on top.
> > >
> > > For now, it's staying reverted.
> >
> > Can we please get the revert pushed out please? It renders my laptop
> > unusable and I keep forgetting to revert that pesky change when testing
> > new kernels.
>
> I sent a push to Linus an hour or so ago, he should pick it up
> in the morning.
Great, thanks!
--
Jens Axboe