LinuxLists.cc - [PATCH v2 net-next] net: link_watch: prevent starvation when processing linkwatch wq

2019-05-31 09:03:20

Subject: [PATCH v2 net-next] net: link_watch: prevent starvation when processing linkwatch wq

When user has configured a large number of virtual netdev, such
as 4K vlans, the carrier on/off operation of the real netdev
will also cause it's virtual netdev's link state to be processed
in linkwatch. Currently, the processing is done in a work queue,
which may cause cpu and rtnl locking starvation problem.

This patch releases the cpu and rtnl lock when link watch worker
has processed a fixed number of netdev' link watch event.

Currently __linkwatch_run_queue is called with rtnl lock, so
enfore it with ASSERT_RTNL();

Signed-off-by: Yunsheng Lin <[email protected]>
---
V2: use cond_resched and rtnl_unlock after processing a fixed
number of events
---
net/core/link_watch.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 7f51efb..07eebfb 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -168,9 +168,18 @@ static void linkwatch_do_dev(struct net_device *dev)

static void __linkwatch_run_queue(int urgent_only)
{
+#define MAX_DO_DEV_PER_LOOP 100
+
+ int do_dev = MAX_DO_DEV_PER_LOOP;
struct net_device *dev;
LIST_HEAD(wrk);

+ ASSERT_RTNL();
+
+ /* Give urgent case more budget */
+ if (urgent_only)
+ do_dev += MAX_DO_DEV_PER_LOOP;
+
/*
* Limit the number of linkwatch events to one
* per second so that a runaway driver does not
@@ -200,6 +209,14 @@ static void __linkwatch_run_queue(int urgent_only)
}
spin_unlock_irq(&lweventlist_lock);
linkwatch_do_dev(dev);
+
+ if (--do_dev < 0) {
+ rtnl_unlock();
+ cond_resched();
+ do_dev = MAX_DO_DEV_PER_LOOP;
+ rtnl_lock();
+ }
+
spin_lock_irq(&lweventlist_lock);
}

--
2.8.1

2019-05-31 09:55:50

by Salil Mehta

[permalink] [raw]

Subject: RE: [PATCH v2 net-next] net: link_watch: prevent starvation when processing linkwatch wq

> From: [email protected] On Behalf Of Yunsheng Lin
> Sent: Friday, May 31, 2019 10:01 AM
> To: [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; Linuxarm <[email protected]>
> Subject: [PATCH v2 net-next] net: link_watch: prevent starvation when
> processing linkwatch wq
>
> When user has configured a large number of virtual netdev, such
> as 4K vlans, the carrier on/off operation of the real netdev
> will also cause it's virtual netdev's link state to be processed
> in linkwatch. Currently, the processing is done in a work queue,
> which may cause cpu and rtnl locking starvation problem.
>
> This patch releases the cpu and rtnl lock when link watch worker
> has processed a fixed number of netdev' link watch event.
>
> Currently __linkwatch_run_queue is called with rtnl lock, so
> enfore it with ASSERT_RTNL();

Typo enfore --> enforce ?

> Signed-off-by: Yunsheng Lin <[email protected]>
> ---
> V2: use cond_resched and rtnl_unlock after processing a fixed
> number of events
> ---
> net/core/link_watch.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> index 7f51efb..07eebfb 100644
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -168,9 +168,18 @@ static void linkwatch_do_dev(struct net_device
> *dev)
>
> static void __linkwatch_run_queue(int urgent_only)
> {
> +#define MAX_DO_DEV_PER_LOOP 100
> +
> + int do_dev = MAX_DO_DEV_PER_LOOP;
> struct net_device *dev;
> LIST_HEAD(wrk);
>
> + ASSERT_RTNL();
> +
> + /* Give urgent case more budget */
> + if (urgent_only)
> + do_dev += MAX_DO_DEV_PER_LOOP;
> +
> /*
> * Limit the number of linkwatch events to one
> * per second so that a runaway driver does not
> @@ -200,6 +209,14 @@ static void __linkwatch_run_queue(int urgent_only)
> }
> spin_unlock_irq(&lweventlist_lock);
> linkwatch_do_dev(dev);
> +

A comment like below would be helpful in explaining the reason of the code.

/* This function is called with rtnl_lock held. If excessive events
* are present as part of the watch list, their processing could
* monopolize the rtnl_lock and which could lead to starvation in
* other modules which want to acquire this lock. Hence, co-operative
* scheme like below might be helpful in mitigating the problem.
* This also tries to be fair CPU wise by conditional rescheduling.
*/

> + if (--do_dev < 0) {
> + rtnl_unlock();
> + cond_resched();
> + do_dev = MAX_DO_DEV_PER_LOOP;
> + rtnl_lock();
> + }
> +
> spin_lock_irq(&lweventlist_lock);
> }

2019-05-31 11:20:05

by Salil Mehta

[permalink] [raw]

Subject: RE: [PATCH v2 net-next] net: link_watch: prevent starvation when processing linkwatch wq

> From: [email protected] [mailto:netdev-
> [email protected]] On Behalf Of Yunsheng Lin
> Sent: Friday, May 31, 2019 10:01 AM
> To: [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; Linuxarm <[email protected]>
> Subject: [PATCH v2 net-next] net: link_watch: prevent starvation when
> processing linkwatch wq
>
> When user has configured a large number of virtual netdev, such
> as 4K vlans, the carrier on/off operation of the real netdev
> will also cause it's virtual netdev's link state to be processed
> in linkwatch. Currently, the processing is done in a work queue,
> which may cause cpu and rtnl locking starvation problem.
>
> This patch releases the cpu and rtnl lock when link watch worker
> has processed a fixed number of netdev' link watch event.
>
> Currently __linkwatch_run_queue is called with rtnl lock, so
> enfore it with ASSERT_RTNL();
>
> Signed-off-by: Yunsheng Lin <[email protected]>
> ---
> V2: use cond_resched and rtnl_unlock after processing a fixed
> number of events
> ---
> net/core/link_watch.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> index 7f51efb..07eebfb 100644
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -168,9 +168,18 @@ static void linkwatch_do_dev(struct net_device
> *dev)
>
> static void __linkwatch_run_queue(int urgent_only)
> {
> +#define MAX_DO_DEV_PER_LOOP 100
> +
> + int do_dev = MAX_DO_DEV_PER_LOOP;
> struct net_device *dev;
> LIST_HEAD(wrk);
>
> + ASSERT_RTNL();
> +
> + /* Give urgent case more budget */
> + if (urgent_only)
> + do_dev += MAX_DO_DEV_PER_LOOP;
> +
> /*
> * Limit the number of linkwatch events to one
> * per second so that a runaway driver does not
> @@ -200,6 +209,14 @@ static void __linkwatch_run_queue(int urgent_only)
> }
> spin_unlock_irq(&lweventlist_lock);
> linkwatch_do_dev(dev);
> +
> + if (--do_dev < 0) {
> + rtnl_unlock();
> + cond_resched();

Sorry, missed in my earlier comment. I could see multiple problems here
and please correct me if I am wrong:

1. It looks like releasing the rtnl_lock here and then res-scheduling might
not be safe, especially when you have already held *lweventlist_lock*
(which is global and not per-netdev), and when you are trying to
reschedule. This can cause *deadlock* with itself.

Reason: once you release the rtnl_lock() the similar leg of function
netdev_wait_allrefs() could be called for some other netdevice which
might end up in waiting for same global linkwatch event list lock
i.e. *lweventlist_lock*.

2. After releasing the rtnl_lock() we have not ensured that all the rcu
operations are complete. Perhaps we need to take rcu_barrier() before
retaking the rtnl_lock()

> + do_dev = MAX_DO_DEV_PER_LOOP;

Here, I think rcu_barrier() should exist.

> + rtnl_lock();
> + }
> +
> spin_lock_irq(&lweventlist_lock);
> }

2019-06-03 01:22:31

by Yunsheng Lin

[permalink] [raw]

Subject: Re: [PATCH v2 net-next] net: link_watch: prevent starvation when processing linkwatch wq

On 2019/5/31 17:54, Salil Mehta wrote:
>> From: [email protected] On Behalf Of Yunsheng Lin
>> Sent: Friday, May 31, 2019 10:01 AM
>> To: [email protected]
>> Cc: [email protected]; [email protected];
>> [email protected]; [email protected]; linux-
>> [email protected]; Linuxarm <[email protected]>
>> Subject: [PATCH v2 net-next] net: link_watch: prevent starvation when
>> processing linkwatch wq
>>
>> When user has configured a large number of virtual netdev, such
>> as 4K vlans, the carrier on/off operation of the real netdev
>> will also cause it's virtual netdev's link state to be processed
>> in linkwatch. Currently, the processing is done in a work queue,
>> which may cause cpu and rtnl locking starvation problem.
>>
>> This patch releases the cpu and rtnl lock when link watch worker
>> has processed a fixed number of netdev' link watch event.
>>
>> Currently __linkwatch_run_queue is called with rtnl lock, so
>> enfore it with ASSERT_RTNL();
>
>
> Typo enfore --> enforce ?

My mistake.

thanks.

>
>
>
>> Signed-off-by: Yunsheng Lin <[email protected]>
>> ---
>> V2: use cond_resched and rtnl_unlock after processing a fixed
>> number of events
>> ---
>> net/core/link_watch.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>> index 7f51efb..07eebfb 100644
>> --- a/net/core/link_watch.c
>> +++ b/net/core/link_watch.c
>> @@ -168,9 +168,18 @@ static void linkwatch_do_dev(struct net_device
>> *dev)
>>
>> static void __linkwatch_run_queue(int urgent_only)
>> {
>> +#define MAX_DO_DEV_PER_LOOP 100
>> +
>> + int do_dev = MAX_DO_DEV_PER_LOOP;
>> struct net_device *dev;
>> LIST_HEAD(wrk);
>>
>> + ASSERT_RTNL();
>> +
>> + /* Give urgent case more budget */
>> + if (urgent_only)
>> + do_dev += MAX_DO_DEV_PER_LOOP;
>> +
>> /*
>> * Limit the number of linkwatch events to one
>> * per second so that a runaway driver does not
>> @@ -200,6 +209,14 @@ static void __linkwatch_run_queue(int urgent_only)
>> }
>> spin_unlock_irq(&lweventlist_lock);
>> linkwatch_do_dev(dev);
>> +
>
>
> A comment like below would be helpful in explaining the reason of the code.
>
> /* This function is called with rtnl_lock held. If excessive events
> * are present as part of the watch list, their processing could
> * monopolize the rtnl_lock and which could lead to starvation in
> * other modules which want to acquire this lock. Hence, co-operative
> * scheme like below might be helpful in mitigating the problem.
> * This also tries to be fair CPU wise by conditional rescheduling.
> */

Yes, thanks for the helpful comment.

>
>
>> + if (--do_dev < 0) {
>> + rtnl_unlock();
>> + cond_resched();
>> + do_dev = MAX_DO_DEV_PER_LOOP;
>> + rtnl_lock();
>> + }
>> +
>> spin_lock_irq(&lweventlist_lock);
>> }
>
> .
>

2019-06-03 02:14:27

by Yunsheng Lin

[permalink] [raw]

Subject: Re: [PATCH v2 net-next] net: link_watch: prevent starvation when processing linkwatch wq

On 2019/5/31 19:17, Salil Mehta wrote:
>> From: [email protected] [mailto:netdev-
>> [email protected]] On Behalf Of Yunsheng Lin
>> Sent: Friday, May 31, 2019 10:01 AM
>> To: [email protected]
>> Cc: [email protected]; [email protected];
>> [email protected]; [email protected]; linux-
>> [email protected]; Linuxarm <[email protected]>
>> Subject: [PATCH v2 net-next] net: link_watch: prevent starvation when
>> processing linkwatch wq
>>
>> When user has configured a large number of virtual netdev, such
>> as 4K vlans, the carrier on/off operation of the real netdev
>> will also cause it's virtual netdev's link state to be processed
>> in linkwatch. Currently, the processing is done in a work queue,
>> which may cause cpu and rtnl locking starvation problem.
>>
>> This patch releases the cpu and rtnl lock when link watch worker
>> has processed a fixed number of netdev' link watch event.
>>
>> Currently __linkwatch_run_queue is called with rtnl lock, so
>> enfore it with ASSERT_RTNL();
>>
>> Signed-off-by: Yunsheng Lin <[email protected]>
>> ---
>> V2: use cond_resched and rtnl_unlock after processing a fixed
>> number of events
>> ---
>> net/core/link_watch.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>> index 7f51efb..07eebfb 100644
>> --- a/net/core/link_watch.c
>> +++ b/net/core/link_watch.c
>> @@ -168,9 +168,18 @@ static void linkwatch_do_dev(struct net_device
>> *dev)
>>
>> static void __linkwatch_run_queue(int urgent_only)
>> {
>> +#define MAX_DO_DEV_PER_LOOP 100
>> +
>> + int do_dev = MAX_DO_DEV_PER_LOOP;
>> struct net_device *dev;
>> LIST_HEAD(wrk);
>>
>> + ASSERT_RTNL();
>> +
>> + /* Give urgent case more budget */
>> + if (urgent_only)
>> + do_dev += MAX_DO_DEV_PER_LOOP;
>> +
>> /*
>> * Limit the number of linkwatch events to one
>> * per second so that a runaway driver does not
>> @@ -200,6 +209,14 @@ static void __linkwatch_run_queue(int urgent_only)
>> }
>> spin_unlock_irq(&lweventlist_lock);
>> linkwatch_do_dev(dev);
>> +
>> + if (--do_dev < 0) {
>> + rtnl_unlock();
>> + cond_resched();
>
>
>
> Sorry, missed in my earlier comment. I could see multiple problems here
> and please correct me if I am wrong:
>
> 1. It looks like releasing the rtnl_lock here and then res-scheduling might
> not be safe, especially when you have already held *lweventlist_lock*
> (which is global and not per-netdev), and when you are trying to
> reschedule. This can cause *deadlock* with itself.
>
> Reason: once you release the rtnl_lock() the similar leg of function
> netdev_wait_allrefs() could be called for some other netdevice which
> might end up in waiting for same global linkwatch event list lock
> i.e. *lweventlist_lock*.

lweventlist_lock has been released before releasing the rtnl_lock and
rescheduling.

>
> 2. After releasing the rtnl_lock() we have not ensured that all the rcu
> operations are complete. Perhaps we need to take rcu_barrier() before
> retaking the rtnl_lock()
Why do we need to ensure all the rcu operations are complete here?

>
>
>
>
>> + do_dev = MAX_DO_DEV_PER_LOOP;
>
>
>
> Here, I think rcu_barrier() should exist.

In netdev_wait_allrefs, rcu_barrier is indeed called between
__rtnl_unlock and rtnl_lock and is added by below commit
0115e8e30d6f ("net: remove delay at device dismantle"), which
seems to work with NETDEV_UNREGISTER_FINAL.

And the NETDEV_UNREGISTER_FINAL is removed by commit
070f2d7e264a ("net: Drop NETDEV_UNREGISTER_FINAL"), which says
something about whether the rcu_barrier is still needed.

"dev_change_net_namespace() and netdev_wait_allrefs()
have rcu_barrier() before NETDEV_UNREGISTER_FINAL call,
and the source commits say they were introduced to
delemit the call with NETDEV_UNREGISTER, but this patch
leaves them on the places, since they require additional
analysis, whether we need in them for something else."

So the reason of calling rcu_barrier in netdev_wait_allrefs
is unclear now.

Also rcu_barrier in netdev_wait_allrefs is added to fix the
device dismantle problem, so for linkwatch, maybe it is not
needed.

>
>
>
>> + rtnl_lock();
>> + }
>> +
>> spin_lock_irq(&lweventlist_lock);
>> }
>
>
> .
>