Hi,
there seems to be a race in register_netdevice(), which is reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=15606
This is visible at least with flimflam and ath6kl. Basically what
happens is this:
Apr 29 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
Apr 29 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
4): No such device
Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
0xbfefda3c len 1004
Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
NEWLINK len 1004 type 16 flags 0x0000 seq 0
(ignore the 10 s delay, I added that to reproduce the issue easily)
There are two ways to fix this, first is to move kobject registration
after the call to list_netdevice():
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5425,11 +5425,6 @@ int register_netdevice(struct net_device *dev)
if (ret)
goto err_uninit;
- ret = netdev_register_kobject(dev);
- if (ret)
- goto err_uninit;
- dev->reg_state = NETREG_REGISTERED;
-
netdev_update_features(dev);
/*
@@ -5443,6 +5438,11 @@ int register_netdevice(struct net_device *dev)
dev_hold(dev);
list_netdevice(dev);
+ ret = netdev_register_kobject(dev);
+ if (ret)
+ goto err_uninit;
+ dev->reg_state = NETREG_REGISTERED;
+
/* Notify protocols, that a new device appeared. */
ret = call_netdevice_notifiers(NETDEV_REGISTER, dev);
ret = notifier_to_errno(ret);
Other option, noticed by Jouni Malinen, is to take rtnl for
SIOCGIFNAME. For some reason it's currently unprotected:
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4917,8 +4917,12 @@ int dev_ioctl(struct net *net, unsigned int
cmd, void __user *arg)
rtnl_unlock();
return ret;
}
- if (cmd == SIOCGIFNAME)
- return dev_ifname(net, (struct ifreq __user *)arg);
+ if (cmd == SIOCGIFNAME) {
+ rtnl_lock();
+ ret = dev_ifname(net, (struct ifreq __user
- *)arg);
+ rtnl_unlock();
+ return ret;
+ }
if (copy_from_user(&ifr, arg, sizeof(struct ifreq)))
return -EFAULT;
I have confirmed that both of these patches fix the issue. Now I'm
wondering which one is the best way forward. Or is there a better way
to fix this?
--
Kalle Valo
Stephen Hemminger <[email protected]> writes:
> On Fri, 29 Apr 2011 01:36:37 +0300
>
>> there seems to be a race in register_netdevice(), which is reported here:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=15606
>>
>> This is visible at least with flimflam and ath6kl. Basically what
>> happens is this:
>>
>> Apr 29 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
>> Apr 29 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
>> 4): No such device
>> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
>> 0xbfefda3c len 1004
>> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
>> NEWLINK len 1004 type 16 flags 0x0000 seq 0
[...]
>> I have confirmed that both of these patches fix the issue. Now I'm
>> wondering which one is the best way forward. Or is there a better way
>> to fix this?
>>
>
> I see no problem with moving this.
> SIOCGIFNAME should not need to hold rtnl.
Ok, thanks for comments. I'll send a proper patch.
--
Kalle Valo
On Fri, 29 Apr 2011 01:36:37 +0300
Kalle Valo <[email protected]> wrote:
> Hi,
>
> there seems to be a race in register_netdevice(), which is reported here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=15606
>
> This is visible at least with flimflam and ath6kl. Basically what
> happens is this:
>
> Apr 29 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
> Apr 29 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
> 4): No such device
> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
> 0xbfefda3c len 1004
> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
> NEWLINK len 1004 type 16 flags 0x0000 seq 0
>
> (ignore the 10 s delay, I added that to reproduce the issue easily)
>
> There are two ways to fix this, first is to move kobject registration
> after the call to list_netdevice():
>
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5425,11 +5425,6 @@ int register_netdevice(struct net_device *dev)
> if (ret)
> goto err_uninit;
>
> - ret = netdev_register_kobject(dev);
> - if (ret)
> - goto err_uninit;
> - dev->reg_state = NETREG_REGISTERED;
> -
> netdev_update_features(dev);
>
> /*
> @@ -5443,6 +5438,11 @@ int register_netdevice(struct net_device *dev)
> dev_hold(dev);
> list_netdevice(dev);
>
> + ret = netdev_register_kobject(dev);
> + if (ret)
> + goto err_uninit;
> + dev->reg_state = NETREG_REGISTERED;
> +
> /* Notify protocols, that a new device appeared. */
> ret = call_netdevice_notifiers(NETDEV_REGISTER, dev);
> ret = notifier_to_errno(ret);
>
> Other option, noticed by Jouni Malinen, is to take rtnl for
> SIOCGIFNAME. For some reason it's currently unprotected:
>
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4917,8 +4917,12 @@ int dev_ioctl(struct net *net, unsigned int
> cmd, void __user *arg)
> rtnl_unlock();
> return ret;
> }
> - if (cmd == SIOCGIFNAME)
> - return dev_ifname(net, (struct ifreq __user *)arg);
> + if (cmd == SIOCGIFNAME) {
> + rtnl_lock();
> + ret = dev_ifname(net, (struct ifreq __user
> - *)arg);
> + rtnl_unlock();
> + return ret;
> + }
>
> if (copy_from_user(&ifr, arg, sizeof(struct ifreq)))
> return -EFAULT;
>
> I have confirmed that both of these patches fix the issue. Now I'm
> wondering which one is the best way forward. Or is there a better way
> to fix this?
>
I see no problem with moving this.
SIOCGIFNAME should not need to hold rtnl.
--
On Wed, 04 May 2011 02:18:11 +0300
Kalle Valo <[email protected]> wrote:
> Hi Stephen,
>
> Stephen Hemminger <[email protected]> writes:
>
> > On Fri, 29 Apr 2011 01:36:37 +0300
> > Kalle Valo <[email protected]> wrote:
> >
> >> there seems to be a race in register_netdevice(), which is reported here:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=15606
> >>
> >> This is visible at least with flimflam and ath6kl. Basically what
> >> happens is this:
> >>
> >> Apr 29 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
> >> Apr 29 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
> >> 4): No such device
> >> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
> >> 0xbfefda3c len 1004
> >> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
> >> NEWLINK len 1004 type 16 flags 0x0000 seq 0
>
> [...]
>
> >> I have confirmed that both of these patches fix the issue. Now I'm
> >> wondering which one is the best way forward. Or is there a better way
> >> to fix this?
> >>
> >
> > I see no problem with moving this.
> > SIOCGIFNAME should not need to hold rtnl.
>
> I'm having difficulties of fixing the race and exploring other
> options. Is there any particular issue why SIOCGIFNAME should not take
> rtnl?
None really, but the answer given by SIOCGIFNAME is going to race
anyway. I.e if ioctl returns a value, by the time user space sees it
the result may have changed.
Hi Stephen,
Stephen Hemminger <[email protected]> writes:
> On Fri, 29 Apr 2011 01:36:37 +0300
> Kalle Valo <[email protected]> wrote:
>
>> there seems to be a race in register_netdevice(), which is reported here:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=15606
>>
>> This is visible at least with flimflam and ath6kl. Basically what
>> happens is this:
>>
>> Apr 29 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
>> Apr 29 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
>> 4): No such device
>> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
>> 0xbfefda3c len 1004
>> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
>> NEWLINK len 1004 type 16 flags 0x0000 seq 0
[...]
>> I have confirmed that both of these patches fix the issue. Now I'm
>> wondering which one is the best way forward. Or is there a better way
>> to fix this?
>>
>
> I see no problem with moving this.
> SIOCGIFNAME should not need to hold rtnl.
I'm having difficulties of fixing the race and exploring other
options. Is there any particular issue why SIOCGIFNAME should not take
rtnl?
--
Kalle Valo