2011-04-28 22:36:38

by Kalle Valo

[permalink] [raw]
Subject: A race in register_netdevice()

Hi,

there seems to be a race in register_netdevice(), which is reported here:

https://bugzilla.kernel.org/show_bug.cgi?id=15606

This is visible at least with flimflam and ath6kl. Basically what
happens is this:

Apr 29 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
Apr 29 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
4): No such device
Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
0xbfefda3c len 1004
Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
NEWLINK len 1004 type 16 flags 0x0000 seq 0

(ignore the 10 s delay, I added that to reproduce the issue easily)

There are two ways to fix this, first is to move kobject registration
after the call to list_netdevice():

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5425,11 +5425,6 @@ int register_netdevice(struct net_device *dev)
if (ret)
goto err_uninit;

- ret = netdev_register_kobject(dev);
- if (ret)
- goto err_uninit;
- dev->reg_state = NETREG_REGISTERED;
-
netdev_update_features(dev);

/*
@@ -5443,6 +5438,11 @@ int register_netdevice(struct net_device *dev)
dev_hold(dev);
list_netdevice(dev);

+ ret = netdev_register_kobject(dev);
+ if (ret)
+ goto err_uninit;
+ dev->reg_state = NETREG_REGISTERED;
+
/* Notify protocols, that a new device appeared. */
ret = call_netdevice_notifiers(NETDEV_REGISTER, dev);
ret = notifier_to_errno(ret);

Other option, noticed by Jouni Malinen, is to take rtnl for
SIOCGIFNAME. For some reason it's currently unprotected:

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4917,8 +4917,12 @@ int dev_ioctl(struct net *net, unsigned int
cmd, void __user *arg)
rtnl_unlock();
return ret;
}
- if (cmd == SIOCGIFNAME)
- return dev_ifname(net, (struct ifreq __user *)arg);
+ if (cmd == SIOCGIFNAME) {
+ rtnl_lock();
+ ret = dev_ifname(net, (struct ifreq __user
- *)arg);
+ rtnl_unlock();
+ return ret;
+ }

if (copy_from_user(&ifr, arg, sizeof(struct ifreq)))
return -EFAULT;

I have confirmed that both of these patches fix the issue. Now I'm
wondering which one is the best way forward. Or is there a better way
to fix this?

--
Kalle Valo


2011-04-29 17:20:43

by Kalle Valo

[permalink] [raw]
Subject: Re: A race in register_netdevice()

Stephen Hemminger <[email protected]> writes:

> On Fri, 29 Apr 2011 01:36:37 +0300
>
>> there seems to be a race in register_netdevice(), which is reported here:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=15606
>>
>> This is visible at least with flimflam and ath6kl. Basically what
>> happens is this:
>>
>> Apr 29 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
>> Apr 29 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
>> 4): No such device
>> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
>> 0xbfefda3c len 1004
>> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
>> NEWLINK len 1004 type 16 flags 0x0000 seq 0

[...]

>> I have confirmed that both of these patches fix the issue. Now I'm
>> wondering which one is the best way forward. Or is there a better way
>> to fix this?
>>
>
> I see no problem with moving this.
> SIOCGIFNAME should not need to hold rtnl.

Ok, thanks for comments. I'll send a proper patch.

--
Kalle Valo

2011-04-28 23:52:42

by Stephen Hemminger

[permalink] [raw]
Subject: Re: A race in register_netdevice()

On Fri, 29 Apr 2011 01:36:37 +0300
Kalle Valo <[email protected]> wrote:

> Hi,
>
> there seems to be a race in register_netdevice(), which is reported here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=15606
>
> This is visible at least with flimflam and ath6kl. Basically what
> happens is this:
>
> Apr 29 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
> Apr 29 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
> 4): No such device
> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
> 0xbfefda3c len 1004
> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
> NEWLINK len 1004 type 16 flags 0x0000 seq 0
>
> (ignore the 10 s delay, I added that to reproduce the issue easily)
>
> There are two ways to fix this, first is to move kobject registration
> after the call to list_netdevice():
>
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5425,11 +5425,6 @@ int register_netdevice(struct net_device *dev)
> if (ret)
> goto err_uninit;
>
> - ret = netdev_register_kobject(dev);
> - if (ret)
> - goto err_uninit;
> - dev->reg_state = NETREG_REGISTERED;
> -
> netdev_update_features(dev);
>
> /*
> @@ -5443,6 +5438,11 @@ int register_netdevice(struct net_device *dev)
> dev_hold(dev);
> list_netdevice(dev);
>
> + ret = netdev_register_kobject(dev);
> + if (ret)
> + goto err_uninit;
> + dev->reg_state = NETREG_REGISTERED;
> +
> /* Notify protocols, that a new device appeared. */
> ret = call_netdevice_notifiers(NETDEV_REGISTER, dev);
> ret = notifier_to_errno(ret);
>
> Other option, noticed by Jouni Malinen, is to take rtnl for
> SIOCGIFNAME. For some reason it's currently unprotected:
>
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4917,8 +4917,12 @@ int dev_ioctl(struct net *net, unsigned int
> cmd, void __user *arg)
> rtnl_unlock();
> return ret;
> }
> - if (cmd == SIOCGIFNAME)
> - return dev_ifname(net, (struct ifreq __user *)arg);
> + if (cmd == SIOCGIFNAME) {
> + rtnl_lock();
> + ret = dev_ifname(net, (struct ifreq __user
> - *)arg);
> + rtnl_unlock();
> + return ret;
> + }
>
> if (copy_from_user(&ifr, arg, sizeof(struct ifreq)))
> return -EFAULT;
>
> I have confirmed that both of these patches fix the issue. Now I'm
> wondering which one is the best way forward. Or is there a better way
> to fix this?
>

I see no problem with moving this.
SIOCGIFNAME should not need to hold rtnl.


--

2011-05-03 23:42:05

by Stephen Hemminger

[permalink] [raw]
Subject: Re: A race in register_netdevice()

On Wed, 04 May 2011 02:18:11 +0300
Kalle Valo <[email protected]> wrote:

> Hi Stephen,
>
> Stephen Hemminger <[email protected]> writes:
>
> > On Fri, 29 Apr 2011 01:36:37 +0300
> > Kalle Valo <[email protected]> wrote:
> >
> >> there seems to be a race in register_netdevice(), which is reported here:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=15606
> >>
> >> This is visible at least with flimflam and ath6kl. Basically what
> >> happens is this:
> >>
> >> Apr 29 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
> >> Apr 29 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
> >> 4): No such device
> >> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
> >> 0xbfefda3c len 1004
> >> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
> >> NEWLINK len 1004 type 16 flags 0x0000 seq 0
>
> [...]
>
> >> I have confirmed that both of these patches fix the issue. Now I'm
> >> wondering which one is the best way forward. Or is there a better way
> >> to fix this?
> >>
> >
> > I see no problem with moving this.
> > SIOCGIFNAME should not need to hold rtnl.
>
> I'm having difficulties of fixing the race and exploring other
> options. Is there any particular issue why SIOCGIFNAME should not take
> rtnl?

None really, but the answer given by SIOCGIFNAME is going to race
anyway. I.e if ioctl returns a value, by the time user space sees it
the result may have changed.

2011-05-03 23:18:13

by Kalle Valo

[permalink] [raw]
Subject: Re: A race in register_netdevice()

Hi Stephen,

Stephen Hemminger <[email protected]> writes:

> On Fri, 29 Apr 2011 01:36:37 +0300
> Kalle Valo <[email protected]> wrote:
>
>> there seems to be a race in register_netdevice(), which is reported here:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=15606
>>
>> This is visible at least with flimflam and ath6kl. Basically what
>> happens is this:
>>
>> Apr 29 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
>> Apr 29 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
>> 4): No such device
>> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
>> 0xbfefda3c len 1004
>> Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
>> NEWLINK len 1004 type 16 flags 0x0000 seq 0

[...]

>> I have confirmed that both of these patches fix the issue. Now I'm
>> wondering which one is the best way forward. Or is there a better way
>> to fix this?
>>
>
> I see no problem with moving this.
> SIOCGIFNAME should not need to hold rtnl.

I'm having difficulties of fixing the race and exploring other
options. Is there any particular issue why SIOCGIFNAME should not take
rtnl?

--
Kalle Valo