2008-02-17 10:55:17

by Zdenek Kabelac

[permalink] [raw]
Subject: My system stops during startup with curretn git tree of 2.6.25-rc2

Hi

It looks like there is something weird as my systems stops when the
swap is mounted.
I've played bisection game and this is the commit which makes the
system unusable:

# bad: [45b503548210fe6f23e92b856421c2a3f05fd034] [RTNETLINK]: Send a
single notification on device state changes.
git-bisect bad 45b503548210fe6f23e92b856421c2a3f05fd034

I've tried to reverse this commit - and it has compiled & worked.

Zdenek


2008-02-17 12:05:28

by Jiri Kosina

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

On Sun, 17 Feb 2008, Zdenek Kabelac wrote:

> It looks like there is something weird as my systems stops when the swap
> is mounted. I've played bisection game and this is the commit which
> makes the system unusable:
> # bad: [45b503548210fe6f23e92b856421c2a3f05fd034] [RTNETLINK]: Send a
> single notification on device state changes.
> git-bisect bad 45b503548210fe6f23e92b856421c2a3f05fd034
> I've tried to reverse this commit - and it has compiled & worked.

This commit is completely broken (it, for example, breaks locking around
dev->link_mode), as has been already mentioned by Rafael at
http://lkml.org/lkml/2008/2/15/542

Dave, do you have a proper fix queued? Otherwise I would propose just to
revert it completely from Linus' tree for now.

--
Jiri Kosina

2008-02-18 02:35:52

by David Miller

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

From: Jiri Kosina <[email protected]>
Date: Sun, 17 Feb 2008 13:04:59 +0100 (CET)

> On Sun, 17 Feb 2008, Zdenek Kabelac wrote:
>
> > It looks like there is something weird as my systems stops when the swap
> > is mounted. I've played bisection game and this is the commit which
> > makes the system unusable:
> > # bad: [45b503548210fe6f23e92b856421c2a3f05fd034] [RTNETLINK]: Send a
> > single notification on device state changes.
> > git-bisect bad 45b503548210fe6f23e92b856421c2a3f05fd034
> > I've tried to reverse this commit - and it has compiled & worked.
>
> This commit is completely broken (it, for example, breaks locking around
> dev->link_mode), as has been already mentioned by Rafael at
> http://lkml.org/lkml/2008/2/15/542
>
> Dave, do you have a proper fix queued? Otherwise I would propose just to
> revert it completely from Linus' tree for now.

I just reverted and I'll push that to Linus.

2008-02-18 13:06:52

by Laszlo Attila Toth

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

David Miller wrote:
> From: Jiri Kosina <[email protected]>
> Date: Sun, 17 Feb 2008 13:04:59 +0100 (CET)
>
>> On Sun, 17 Feb 2008, Zdenek Kabelac wrote:
>>
>>> It looks like there is something weird as my systems stops when the swap
>>> is mounted. I've played bisection game and this is the commit which
>>> makes the system unusable:
>>> # bad: [45b503548210fe6f23e92b856421c2a3f05fd034] [RTNETLINK]: Send a
>>> single notification on device state changes.
>>> git-bisect bad 45b503548210fe6f23e92b856421c2a3f05fd034
>>> I've tried to reverse this commit - and it has compiled & worked.
>> This commit is completely broken (it, for example, breaks locking around
>> dev->link_mode), as has been already mentioned by Rafael at
>> http://lkml.org/lkml/2008/2/15/542
>>
>> Dave, do you have a proper fix queued? Otherwise I would propose just to
>> revert it completely from Linus' tree for now.
>
> I just reverted and I'll push that to Linus.
>

Okay, but I can't figure out what's the problem with it. I don't have
wireless card on my linux box also I can't test it but everything else
works. Swap is mounted.


The concurrency cannot be a problem because the write operation is
protected by a lock.

My only idea is the notification itself, because the netdev_state_change
function is used which calls rtmsg_ifinfo too.

void netdev_state_change(struct net_device *dev)
{
if (dev->flags & IFF_UP) {
call_netdevice_notifiers(NETDEV_CHANGE, dev);
rtmsg_ifinfo(RTM_NEWLINK, dev, 0);
}
}

Also last lines of my patch could be instead of
if (modified)
netdev_state_change(dev);

the following:
if (modified && dev->flags & IFF_UP) {
call_netdevice_notifiers(NETDEV_CHANGE, dev);
}

Regards,
Attila

2008-02-18 13:44:39

by Jiri Kosina

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

On Mon, 18 Feb 2008, Laszlo Attila Toth wrote:

> Okay, but I can't figure out what's the problem with it. I don't have
> wireless card on my linux box also I can't test it but everything else
> works. Swap is mounted. The concurrency cannot be a problem because the
> write operation is protected by a lock.

- write_lock_bh(&dev_base_lock);
- dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
- write_unlock_bh(&dev_base_lock);
+ if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
+ write_lock_bh(&dev_base_lock);
+ dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
+ write_lock_bh(&dev_base_lock);
+ modified = 1;
+ }
}

1) you are accessing dev->link_mode and tb[] outside the dev_base_lock
2) there is obvious and immediate deadlock -- you acquire the
dev_base_lock twice, without any unlock, just look at the chunk above
3) even with this deadlock fixed, Rafael states that either NM or
wpa_supplicant (I don't recall from top of my head) still don't work

--
Jiri Kosina
SUSE Labs

2008-02-18 14:13:42

by Laszlo Attila Toth

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

Jiri Kosina wrote:
> On Mon, 18 Feb 2008, Laszlo Attila Toth wrote:
>
>> Okay, but I can't figure out what's the problem with it. I don't have
>> wireless card on my linux box also I can't test it but everything else
>> works. Swap is mounted. The concurrency cannot be a problem because the
>> write operation is protected by a lock.
>
> - write_lock_bh(&dev_base_lock);
> - dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> - write_unlock_bh(&dev_base_lock);
> + if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
> + write_lock_bh(&dev_base_lock);
> + dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> + write_lock_bh(&dev_base_lock);
> + modified = 1;
> + }
> }
>
> 1) you are accessing dev->link_mode and tb[] outside the dev_base_lock

yes, because tb[IFLA_LINKMODE] is not used by someone else in this case
only dev->link_mode. Although its value is unpredictable in case of a
concurrent access in the condition, it does not affect the final value
of dev->link_mode but the length of the critical section remains
minimal. The if statement may be inside the lock.

> 2) there is obvious and immediate deadlock -- you acquire the
> dev_base_lock twice, without any unlock, just look at the chunk above

Indeed:
"Feb 16 16:51:49 sandman kernel: BUG: rwlock recursion on CPU#0,"

I missed it. I copied the code from another patch which didn't contain
the two locking statements and when I copied them back it became a
copy-paste bug.


> 3) even with this deadlock fixed, Rafael states that either NM or
> wpa_supplicant (I don't recall from top of my head) still don't work

That's bad. Does my suggestion solve the problem? Again:

- if (modified)
- netdev_state_change(dev);
+ if (modified && dev->flags & IFF_UP)
+ call_netdevice_notifiers(NETDEV_CHANGE, dev)

Regards,
Attila

2008-02-18 16:42:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> Jiri Kosina wrote:
> > On Mon, 18 Feb 2008, Laszlo Attila Toth wrote:
> >
> >> Okay, but I can't figure out what's the problem with it. I don't have
> >> wireless card on my linux box also I can't test it but everything else
> >> works. Swap is mounted. The concurrency cannot be a problem because the
> >> write operation is protected by a lock.
> >
> > - write_lock_bh(&dev_base_lock);
> > - dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> > - write_unlock_bh(&dev_base_lock);
> > + if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
> > + write_lock_bh(&dev_base_lock);
> > + dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> > + write_lock_bh(&dev_base_lock);
> > + modified = 1;
> > + }
> > }
> >
> > 1) you are accessing dev->link_mode and tb[] outside the dev_base_lock
>
> yes, because tb[IFLA_LINKMODE] is not used by someone else in this case
> only dev->link_mode. Although its value is unpredictable in case of a
> concurrent access in the condition, it does not affect the final value
> of dev->link_mode but the length of the critical section remains
> minimal. The if statement may be inside the lock.
>
> > 2) there is obvious and immediate deadlock -- you acquire the
> > dev_base_lock twice, without any unlock, just look at the chunk above
>
> Indeed:
> "Feb 16 16:51:49 sandman kernel: BUG: rwlock recursion on CPU#0,"
>
> I missed it. I copied the code from another patch which didn't contain
> the two locking statements and when I copied them back it became a
> copy-paste bug.
>
>
> > 3) even with this deadlock fixed, Rafael states that either NM or
> > wpa_supplicant (I don't recall from top of my head) still don't work
>
> That's bad. Does my suggestion solve the problem? Again:
>
> - if (modified)
> - netdev_state_change(dev);
> + if (modified && dev->flags & IFF_UP)
> + call_netdevice_notifiers(NETDEV_CHANGE, dev)

All in all, I gather you wanted me to test the patch below. :-)

Yes, that helps.

Thanks,
Rafael

---
Fix net/core/rtnetlink.c breakage caused by commit
45b503548210fe6f23e92b856421c2a3f05fd034
"[RTNETLINK]: Send a single notification on device state changes."

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
net/core/rtnetlink.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6/net/core/rtnetlink.c
===================================================================
--- linux-2.6.orig/net/core/rtnetlink.c
+++ linux-2.6/net/core/rtnetlink.c
@@ -853,7 +853,7 @@ static int do_setlink(struct net_device
if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
write_lock_bh(&dev_base_lock);
dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
- write_lock_bh(&dev_base_lock);
+ write_unlock_bh(&dev_base_lock);
modified = 1;
}
}
@@ -870,8 +870,8 @@ errout:
if (send_addr_notify)
call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);

- if (modified)
- netdev_state_change(dev);
+ if (modified && dev->flags & IFF_UP)
+ call_netdevice_notifiers(NETDEV_CHANGE, dev);

return err;
}

2008-02-18 17:03:58

by Laszlo Attila Toth

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

Hello,

Rafael J. Wysocki wrote:
> On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
>
>
> All in all, I gather you wanted me to test the patch below. :-)
>
> Yes, that helps.

Thanks for the testing. Dave already reverted the patch, also I don't
know I should resend it or leave it as is.

Regards,
Attila

> Thanks,
> Rafael
>
> ---
> Fix net/core/rtnetlink.c breakage caused by commit
> 45b503548210fe6f23e92b856421c2a3f05fd034
> "[RTNETLINK]: Send a single notification on device state changes."
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> net/core/rtnetlink.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> Index: linux-2.6/net/core/rtnetlink.c
> ===================================================================
> --- linux-2.6.orig/net/core/rtnetlink.c
> +++ linux-2.6/net/core/rtnetlink.c
> @@ -853,7 +853,7 @@ static int do_setlink(struct net_device
> if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
> write_lock_bh(&dev_base_lock);
> dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> - write_lock_bh(&dev_base_lock);
> + write_unlock_bh(&dev_base_lock);
> modified = 1;
> }
> }
> @@ -870,8 +870,8 @@ errout:
> if (send_addr_notify)
> call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
>
> - if (modified)
> - netdev_state_change(dev);
> + if (modified && dev->flags & IFF_UP)
> + call_netdevice_notifiers(NETDEV_CHANGE, dev);
>
> return err;
> }
>

2008-02-18 17:08:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> Hello,
>
> Rafael J. Wysocki wrote:
> > On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> >
> >
> > All in all, I gather you wanted me to test the patch below. :-)
> >
> > Yes, that helps.
>
> Thanks for the testing. Dave already reverted the patch, also I don't
> know I should resend it or leave it as is.

Since the patch has been reverted, I think you can fold the fix into it
and send it again for the next merge window (unless it's _really_ urgent).
Of course, if that's fine by Dave. :-)

Thanks,
Rafael

2008-02-19 04:51:12

by David Miller

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

From: Laszlo Attila Toth <[email protected]>
Date: Mon, 18 Feb 2008 18:03:47 +0100

> Hello,
>
> Rafael J. Wysocki wrote:
> > On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> >
> >
> > All in all, I gather you wanted me to test the patch below. :-)
> >
> > Yes, that helps.
>
> Thanks for the testing. Dave already reverted the patch, also I don't
> know I should resend it or leave it as is.

After this sits and gets testing and feedback for a few days
you'll need to resubmit the entire original patch with these
fixes added on top.

For now, it's staying reverted.

2008-02-19 10:54:31

by Jens Axboe

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

On Mon, Feb 18 2008, David Miller wrote:
> From: Laszlo Attila Toth <[email protected]>
> Date: Mon, 18 Feb 2008 18:03:47 +0100
>
> > Hello,
> >
> > Rafael J. Wysocki wrote:
> > > On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> > >
> > >
> > > All in all, I gather you wanted me to test the patch below. :-)
> > >
> > > Yes, that helps.
> >
> > Thanks for the testing. Dave already reverted the patch, also I don't
> > know I should resend it or leave it as is.
>
> After this sits and gets testing and feedback for a few days
> you'll need to resubmit the entire original patch with these
> fixes added on top.
>
> For now, it's staying reverted.

Can we please get the revert pushed out please? It renders my laptop
unusable and I keep forgetting to revert that pesky change when testing
new kernels.

--
Jens Axboe

2008-02-19 11:11:20

by David Miller

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

From: Jens Axboe <[email protected]>
Date: Tue, 19 Feb 2008 11:54:19 +0100

> On Mon, Feb 18 2008, David Miller wrote:
> > From: Laszlo Attila Toth <[email protected]>
> > Date: Mon, 18 Feb 2008 18:03:47 +0100
> >
> > > Hello,
> > >
> > > Rafael J. Wysocki wrote:
> > > > On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> > > >
> > > >
> > > > All in all, I gather you wanted me to test the patch below. :-)
> > > >
> > > > Yes, that helps.
> > >
> > > Thanks for the testing. Dave already reverted the patch, also I don't
> > > know I should resend it or leave it as is.
> >
> > After this sits and gets testing and feedback for a few days
> > you'll need to resubmit the entire original patch with these
> > fixes added on top.
> >
> > For now, it's staying reverted.
>
> Can we please get the revert pushed out please? It renders my laptop
> unusable and I keep forgetting to revert that pesky change when testing
> new kernels.

I sent a push to Linus an hour or so ago, he should pick it up
in the morning.

2008-02-19 11:12:46

by Jens Axboe

[permalink] [raw]
Subject: Re: My system stops during startup with curretn git tree of 2.6.25-rc2

On Tue, Feb 19 2008, David Miller wrote:
> From: Jens Axboe <[email protected]>
> Date: Tue, 19 Feb 2008 11:54:19 +0100
>
> > On Mon, Feb 18 2008, David Miller wrote:
> > > From: Laszlo Attila Toth <[email protected]>
> > > Date: Mon, 18 Feb 2008 18:03:47 +0100
> > >
> > > > Hello,
> > > >
> > > > Rafael J. Wysocki wrote:
> > > > > On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> > > > >
> > > > >
> > > > > All in all, I gather you wanted me to test the patch below. :-)
> > > > >
> > > > > Yes, that helps.
> > > >
> > > > Thanks for the testing. Dave already reverted the patch, also I don't
> > > > know I should resend it or leave it as is.
> > >
> > > After this sits and gets testing and feedback for a few days
> > > you'll need to resubmit the entire original patch with these
> > > fixes added on top.
> > >
> > > For now, it's staying reverted.
> >
> > Can we please get the revert pushed out please? It renders my laptop
> > unusable and I keep forgetting to revert that pesky change when testing
> > new kernels.
>
> I sent a push to Linus an hour or so ago, he should pick it up
> in the morning.

Great, thanks!

--
Jens Axboe