2014-04-16 07:08:56

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 1/1] net: Add rtnl_lock for netif_device_attach/detach

From: "Li, Zhen-Hua" <[email protected]>

As netif_running is called in netif_device_attach/detach. There should be
rtnl_lock/unlock called, to avoid dev stat change during netif_device_attach
and detach being called.
I checked NIC some drivers, some of them have netif_device_attach/detach
called between rtnl_lock/unlock, while some drivers do not.

This patch is tring to find a generic way to fix this for all NIC drivers.

Signed-off-by: Li, Zhen-Hua <[email protected]>
---
net/core/dev.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5b3042e..795bbc5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any);
*/
void netif_device_detach(struct net_device *dev)
{
+ /**
+ * As netif_running is called , rtnl_lock and unlock are needed to
+ * avoid __LINK_STATE_START bit changes during this function call.
+ */
+ int need_unlock;
+
+ need_unlock = rtnl_trylock();
if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
netif_running(dev)) {
netif_tx_stop_all_queues(dev);
}
+ if (need_unlock)
+ rtnl_unlock();
}
EXPORT_SYMBOL(netif_device_detach);

@@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach);
*/
void netif_device_attach(struct net_device *dev)
{
+ /**
+ * As netif_running is called , rtnl_lock and unlock are needed to
+ * avoid __LINK_STATE_START bit changes during this function call.
+ */
+ int need_unlock;
+
+ need_unlock = rtnl_trylock();
if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
netif_running(dev)) {
netif_tx_wake_all_queues(dev);
__netdev_watchdog_up(dev);
}
+ if (need_unlock)
+ rtnl_unlock();
}
EXPORT_SYMBOL(netif_device_attach);

--
1.7.10.4


2014-04-16 07:38:21

by Veaceslav Falico

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: Add rtnl_lock for netif_device_attach/detach

On Wed, Apr 16, 2014 at 03:08:02PM +0800, Li, Zhen-Hua wrote:
>From: "Li, Zhen-Hua" <[email protected]>
>
>As netif_running is called in netif_device_attach/detach. There should be
>rtnl_lock/unlock called, to avoid dev stat change during netif_device_attach
>and detach being called.
>I checked NIC some drivers, some of them have netif_device_attach/detach
>called between rtnl_lock/unlock, while some drivers do not.

It can race with any other thread that takes the lock - i.e. suppose you
have a driver that doesn't take the lock and calls netif_device_attach(),
while another thread (completely unrelated to the issue) holds rtnl_lock -
this way the trylock will return false, the thread that took rtnl releases
it - and you'll see the exact same behaviour as without your patch.

I'm not sure about the issue you're trying to fix here - there might be a
better approach which I'm not aware of, however with your approach you
should really either remove the rtnl locking from all drivers that use this
function (and insert a normal rtnl_lock here) or, vice-versa, add it to all
drivers and add an ASSERT_RTNL to netif_device_detach/attach.

>
>This patch is tring to find a generic way to fix this for all NIC drivers.
>
>Signed-off-by: Li, Zhen-Hua <[email protected]>
>---
> net/core/dev.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 5b3042e..795bbc5 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any);
> */
> void netif_device_detach(struct net_device *dev)
> {
>+ /**
>+ * As netif_running is called , rtnl_lock and unlock are needed to
>+ * avoid __LINK_STATE_START bit changes during this function call.
>+ */
>+ int need_unlock;
>+
>+ need_unlock = rtnl_trylock();
> if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
> netif_running(dev)) {
> netif_tx_stop_all_queues(dev);
> }
>+ if (need_unlock)
>+ rtnl_unlock();
> }
> EXPORT_SYMBOL(netif_device_detach);
>
>@@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach);
> */
> void netif_device_attach(struct net_device *dev)
> {
>+ /**
>+ * As netif_running is called , rtnl_lock and unlock are needed to
>+ * avoid __LINK_STATE_START bit changes during this function call.
>+ */
>+ int need_unlock;
>+
>+ need_unlock = rtnl_trylock();
> if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
> netif_running(dev)) {
> netif_tx_wake_all_queues(dev);
> __netdev_watchdog_up(dev);
> }
>+ if (need_unlock)
>+ rtnl_unlock();
> }
> EXPORT_SYMBOL(netif_device_attach);
>
>--
>1.7.10.4
>

2014-04-16 08:35:34

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: Add rtnl_lock for netif_device_attach/detach

The problem I am trying to fix is: when netif_device_attach/detached is
called, it get a return value from netif_running, but at this moment, in
another thread, the stat of this dev changes. But in
netif_device_attach, it does not know stat changed, and this may cause bugs.


I think you are right, this patch cannot fix race with another thread
that takes the lock. But that's what is happening now(with out this
patch). I do not yet find a way to fix it completely.

And another problem is: we only need a lock for this dev , not full all
dev. So how about adding a single lock for each net device?

Regards
Zhenhua

On 04/16/2014 03:38 PM, Veaceslav Falico wrote:
> On Wed, Apr 16, 2014 at 03:08:02PM +0800, Li, Zhen-Hua wrote:
>> From: "Li, Zhen-Hua" <[email protected]>
>>
>> As netif_running is called in netif_device_attach/detach. There should be
>> rtnl_lock/unlock called, to avoid dev stat change during
>> netif_device_attach
>> and detach being called.
>> I checked NIC some drivers, some of them have netif_device_attach/detach
>> called between rtnl_lock/unlock, while some drivers do not.
>
> It can race with any other thread that takes the lock - i.e. suppose you
> have a driver that doesn't take the lock and calls netif_device_attach(),
> while another thread (completely unrelated to the issue) holds rtnl_lock -
> this way the trylock will return false, the thread that took rtnl releases
> it - and you'll see the exact same behaviour as without your patch.
>
> I'm not sure about the issue you're trying to fix here - there might be a
> better approach which I'm not aware of, however with your approach you
> should really either remove the rtnl locking from all drivers that use this
> function (and insert a normal rtnl_lock here) or, vice-versa, add it to all
> drivers and add an ASSERT_RTNL to netif_device_detach/attach.
>
>>
>> This patch is tring to find a generic way to fix this for all NIC
>> drivers.
>>
>> Signed-off-by: Li, Zhen-Hua <[email protected]>
>> ---
>> net/core/dev.c | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 5b3042e..795bbc5 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any);
>> */
>> void netif_device_detach(struct net_device *dev)
>> {
>> + /**
>> + * As netif_running is called , rtnl_lock and unlock are needed to
>> + * avoid __LINK_STATE_START bit changes during this function call.
>> + */
>> + int need_unlock;
>> +
>> + need_unlock = rtnl_trylock();
>> if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
>> netif_running(dev)) {
>> netif_tx_stop_all_queues(dev);
>> }
>> + if (need_unlock)
>> + rtnl_unlock();
>> }
>> EXPORT_SYMBOL(netif_device_detach);
>>
>> @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach);
>> */
>> void netif_device_attach(struct net_device *dev)
>> {
>> + /**
>> + * As netif_running is called , rtnl_lock and unlock are needed to
>> + * avoid __LINK_STATE_START bit changes during this function call.
>> + */
>> + int need_unlock;
>> +
>> + need_unlock = rtnl_trylock();
>> if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
>> netif_running(dev)) {
>> netif_tx_wake_all_queues(dev);
>> __netdev_watchdog_up(dev);
>> }
>> + if (need_unlock)
>> + rtnl_unlock();
>> }
>> EXPORT_SYMBOL(netif_device_attach);
>>
>> --
>> 1.7.10.4
>>

2014-04-18 19:01:27

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: Add rtnl_lock for netif_device_attach/detach

Hello.

On 04/16/2014 11:08 AM, Li, Zhen-Hua wrote:

> From: "Li, Zhen-Hua" <[email protected]>

> As netif_running is called in netif_device_attach/detach. There should be
> rtnl_lock/unlock called, to avoid dev stat change during netif_device_attach
> and detach being called.
> I checked NIC some drivers, some of them have netif_device_attach/detach
> called between rtnl_lock/unlock, while some drivers do not.

> This patch is tring to find a generic way to fix this for all NIC drivers.

> Signed-off-by: Li, Zhen-Hua <[email protected]>
> ---
> net/core/dev.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 5b3042e..795bbc5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any);
> */
> void netif_device_detach(struct net_device *dev)
> {
> + /**

Hm, why kernel-doc style comment here?

> + * As netif_running is called , rtnl_lock and unlock are needed to
> + * avoid __LINK_STATE_START bit changes during this function call.
> + */
> + int need_unlock;
> +
> + need_unlock = rtnl_trylock();
> if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
> netif_running(dev)) {
> netif_tx_stop_all_queues(dev);
> }
> + if (need_unlock)
> + rtnl_unlock();
> }
> EXPORT_SYMBOL(netif_device_detach);
>
> @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach);
> */
> void netif_device_attach(struct net_device *dev)
> {
> + /**

... and here?

> + * As netif_running is called , rtnl_lock and unlock are needed to
> + * avoid __LINK_STATE_START bit changes during this function call.
> + */

WBR, Sergei

2014-04-21 06:31:45

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: Add rtnl_lock for netif_device_attach/detach

The comment is trying to explain why add a lock here.

On 04/19/2014 03:01 AM, Sergei Shtylyov wrote:
> Hello.
>
> On 04/16/2014 11:08 AM, Li, Zhen-Hua wrote:
>
>> From: "Li, Zhen-Hua" <[email protected]>
>
>> As netif_running is called in netif_device_attach/detach. There
>> should be
>> rtnl_lock/unlock called, to avoid dev stat change during
>> netif_device_attach
>> and detach being called.
>> I checked NIC some drivers, some of them have
>> netif_device_attach/detach
>> called between rtnl_lock/unlock, while some drivers do not.
>
>> This patch is tring to find a generic way to fix this for all NIC
>> drivers.
>
>> Signed-off-by: Li, Zhen-Hua <[email protected]>
>> ---
>> net/core/dev.c | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 5b3042e..795bbc5 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any);
>> */
>> void netif_device_detach(struct net_device *dev)
>> {
>> + /**
>
> Hm, why kernel-doc style comment here?
>
>> + * As netif_running is called , rtnl_lock and unlock are needed to
>> + * avoid __LINK_STATE_START bit changes during this function call.
>> + */
>> + int need_unlock;
>> +
>> + need_unlock = rtnl_trylock();
>> if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
>> netif_running(dev)) {
>> netif_tx_stop_all_queues(dev);
>> }
>> + if (need_unlock)
>> + rtnl_unlock();
>> }
>> EXPORT_SYMBOL(netif_device_detach);
>>
>> @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach);
>> */
>> void netif_device_attach(struct net_device *dev)
>> {
>> + /**
>
> ... and here?
>
>> + * As netif_running is called , rtnl_lock and unlock are needed to
>> + * avoid __LINK_STATE_START bit changes during this function call.
>> + */
>
> WBR, Sergei
>

2014-04-21 11:54:50

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: Add rtnl_lock for netif_device_attach/detach

Hello.

On 21-04-2014 10:30, Li, ZhenHua wrote:

> The comment is trying to explain why add a lock here.

I can read, thanks. :-)
I was wondering about the kernel-doc comment style you've used; AFAIK,
it's only good for documenting functions and data structures. The normal
multi-line comment style in the networking code is this:

/* bla
* bla
*/

>>> From: "Li, Zhen-Hua" <[email protected]>

>>> As netif_running is called in netif_device_attach/detach. There should be
>>> rtnl_lock/unlock called, to avoid dev stat change during netif_device_attach
>>> and detach being called.
>>> I checked NIC some drivers, some of them have netif_device_attach/detach
>>> called between rtnl_lock/unlock, while some drivers do not.

>>> This patch is tring to find a generic way to fix this for all NIC drivers.

>>> Signed-off-by: Li, Zhen-Hua <[email protected]>
>>> ---
>>> net/core/dev.c | 18 ++++++++++++++++++
>>> 1 file changed, 18 insertions(+)

>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 5b3042e..795bbc5 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any);
>>> */
>>> void netif_device_detach(struct net_device *dev)
>>> {
>>> + /**

>> Hm, why kernel-doc style comment here?

>>> + * As netif_running is called , rtnl_lock and unlock are needed to

Space before comma not needed.

>>> + * avoid __LINK_STATE_START bit changes during this function call.
>>> + */
>>> + int need_unlock;
>>> +
>>> + need_unlock = rtnl_trylock();
>>> if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
>>> netif_running(dev)) {
>>> netif_tx_stop_all_queues(dev);
>>> }
>>> + if (need_unlock)
>>> + rtnl_unlock();
>>> }
>>> EXPORT_SYMBOL(netif_device_detach);
>>>
>>> @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach);
>>> */
>>> void netif_device_attach(struct net_device *dev)
>>> {
>>> + /**

>> ... and here?

>>> + * As netif_running is called , rtnl_lock and unlock are needed to

Space before comma not needed.

>>> + * avoid __LINK_STATE_START bit changes during this function call.
>>> + */

WBR, Sergei

2014-04-22 17:26:59

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: Add rtnl_lock for netif_device_attach/detach

On Wed, 2014-04-16 at 15:08 +0800, Li, Zhen-Hua wrote:
> From: "Li, Zhen-Hua" <[email protected]>
>
> As netif_running is called in netif_device_attach/detach. There should be
> rtnl_lock/unlock called, to avoid dev stat change during netif_device_attach
> and detach being called.
> I checked NIC some drivers, some of them have netif_device_attach/detach
> called between rtnl_lock/unlock, while some drivers do not.
>
> This patch is tring to find a generic way to fix this for all NIC drivers.

I don't think you can generically use the RTNL lock for this.

> Signed-off-by: Li, Zhen-Hua <[email protected]>
> ---
> net/core/dev.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 5b3042e..795bbc5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any);
> */
> void netif_device_detach(struct net_device *dev)
> {
> + /**
> + * As netif_running is called , rtnl_lock and unlock are needed to
> + * avoid __LINK_STATE_START bit changes during this function call.
> + */
> + int need_unlock;
> +
> + need_unlock = rtnl_trylock();

It is never correct to use trylock and then continue even if it fails.
I think you're trying to simulate a reentrant mutex but this will fail
if *any* task already has the mutex.

Furthermore it is currently allowed and useful to call these functions
from atomic context (transmit or completion path) where it is not
possible to hold the mutex.

> if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
> netif_running(dev)) {
> netif_tx_stop_all_queues(dev);
> }
> + if (need_unlock)
> + rtnl_unlock();
> }
> EXPORT_SYMBOL(netif_device_detach);

For netif_device_detach(), I wonder whether it is necessary to check
netif_running(). What are we trying to avoid?

> @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach);
> */
> void netif_device_attach(struct net_device *dev)
> {
> + /**
> + * As netif_running is called , rtnl_lock and unlock are needed to
> + * avoid __LINK_STATE_START bit changes during this function call.
> + */
> + int need_unlock;
> +
> + need_unlock = rtnl_trylock();
> if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
> netif_running(dev)) {
> netif_tx_wake_all_queues(dev);
> __netdev_watchdog_up(dev);
> }
> + if (need_unlock)
> + rtnl_unlock();
> }
> EXPORT_SYMBOL(netif_device_attach);

I do see a problem if netif_device_detach() races with
dev_deactivate_many(), which is being mitigated but not avoided by the
test of netif_running().

I think a proper solution is going to involve changing
dev_deactivate_many() as well, removing the use of netif_running(), and
possible using cmpxchg() to atomically manipulate multiple bits of
dev->state.

Ben.

--
Ben Hutchings
Beware of programmers who carry screwdrivers. - Leonard Brandwein


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part