2019-05-27 01:51:29

by Yunsheng Lin

[permalink] [raw]
Subject: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

When user has configured a large number of virtual netdev, such
as 4K vlans, the carrier on/off operation of the real netdev
will also cause it's virtual netdev's link state to be processed
in linkwatch. Currently, the processing is done in a work queue,
which may cause worker starvation problem for other work queue.

This patch releases the cpu when link watch worker has processed
a fixed number of netdev' link watch event, and schedule the
work queue again when there is still link watch event remaining.

Signed-off-by: Yunsheng Lin <[email protected]>
---
net/core/link_watch.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 7f51efb..06276ff 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -168,9 +168,16 @@ static void linkwatch_do_dev(struct net_device *dev)

static void __linkwatch_run_queue(int urgent_only)
{
+#define MAX_DO_DEV_PER_LOOP 100
+
+ int do_dev = MAX_DO_DEV_PER_LOOP;
struct net_device *dev;
LIST_HEAD(wrk);

+ /* Give urgent case more budget */
+ if (urgent_only)
+ do_dev += MAX_DO_DEV_PER_LOOP;
+
/*
* Limit the number of linkwatch events to one
* per second so that a runaway driver does not
@@ -189,7 +196,7 @@ static void __linkwatch_run_queue(int urgent_only)
spin_lock_irq(&lweventlist_lock);
list_splice_init(&lweventlist, &wrk);

- while (!list_empty(&wrk)) {
+ while (!list_empty(&wrk) && do_dev > 0) {

dev = list_first_entry(&wrk, struct net_device, link_watch_list);
list_del_init(&dev->link_watch_list);
@@ -201,8 +208,13 @@ static void __linkwatch_run_queue(int urgent_only)
spin_unlock_irq(&lweventlist_lock);
linkwatch_do_dev(dev);
spin_lock_irq(&lweventlist_lock);
+
+ do_dev--;
}

+ /* Add the remaining work back to lweventlist */
+ list_splice_init(&wrk, &lweventlist);
+
if (!list_empty(&lweventlist))
linkwatch_schedule_work(0);
spin_unlock_irq(&lweventlist_lock);
--
2.8.1


2019-05-27 15:00:05

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

On Mon, 27 May 2019 09:47:54 +0800
Yunsheng Lin <[email protected]> wrote:

> When user has configured a large number of virtual netdev, such
> as 4K vlans, the carrier on/off operation of the real netdev
> will also cause it's virtual netdev's link state to be processed
> in linkwatch. Currently, the processing is done in a work queue,
> which may cause worker starvation problem for other work queue.
>
> This patch releases the cpu when link watch worker has processed
> a fixed number of netdev' link watch event, and schedule the
> work queue again when there is still link watch event remaining.
>
> Signed-off-by: Yunsheng Lin <[email protected]>

Why not put link watch in its own workqueue so it is scheduled
separately from the system workqueue?

2019-05-28 01:05:59

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

On 2019/5/27 22:58, Stephen Hemminger wrote:
> On Mon, 27 May 2019 09:47:54 +0800
> Yunsheng Lin <[email protected]> wrote:
>
>> When user has configured a large number of virtual netdev, such
>> as 4K vlans, the carrier on/off operation of the real netdev
>> will also cause it's virtual netdev's link state to be processed
>> in linkwatch. Currently, the processing is done in a work queue,
>> which may cause worker starvation problem for other work queue.
>>
>> This patch releases the cpu when link watch worker has processed
>> a fixed number of netdev' link watch event, and schedule the
>> work queue again when there is still link watch event remaining.
>>
>> Signed-off-by: Yunsheng Lin <[email protected]>
>
> Why not put link watch in its own workqueue so it is scheduled
> separately from the system workqueue?

From testing and debuging, the workqueue runs on the cpu where the
workqueue is schedule when using normal workqueue, even using its
own workqueue instead of system workqueue. So if the cpu is busy
processing the linkwatch event, it is not able to process other
workqueue' work when the workqueue is scheduled on the same cpu.

Using unbound workqueue may solve the cpu starvation problem.
But the __linkwatch_run_queue is called with rtnl_lock, so if it
takes a lot time to process, other need to take the rtnl_lock may
not be able to move forward.


>
>

2019-05-28 01:20:05

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

On Tue, 28 May 2019 09:04:18 +0800
Yunsheng Lin <[email protected]> wrote:

> On 2019/5/27 22:58, Stephen Hemminger wrote:
> > On Mon, 27 May 2019 09:47:54 +0800
> > Yunsheng Lin <[email protected]> wrote:
> >
> >> When user has configured a large number of virtual netdev, such
> >> as 4K vlans, the carrier on/off operation of the real netdev
> >> will also cause it's virtual netdev's link state to be processed
> >> in linkwatch. Currently, the processing is done in a work queue,
> >> which may cause worker starvation problem for other work queue.
> >>
> >> This patch releases the cpu when link watch worker has processed
> >> a fixed number of netdev' link watch event, and schedule the
> >> work queue again when there is still link watch event remaining.
> >>
> >> Signed-off-by: Yunsheng Lin <[email protected]>
> >
> > Why not put link watch in its own workqueue so it is scheduled
> > separately from the system workqueue?
>
> From testing and debuging, the workqueue runs on the cpu where the
> workqueue is schedule when using normal workqueue, even using its
> own workqueue instead of system workqueue. So if the cpu is busy
> processing the linkwatch event, it is not able to process other
> workqueue' work when the workqueue is scheduled on the same cpu.
>
> Using unbound workqueue may solve the cpu starvation problem.
> But the __linkwatch_run_queue is called with rtnl_lock, so if it
> takes a lot time to process, other need to take the rtnl_lock may
> not be able to move forward.

Agree with the starvation issue. My cocern is that large number of
events that end up being delayed would impact things that are actually
watching for link events (like routing daemons).

It probably would be not accepted to do rtnl_unlock/sched_yield/rtnl_lock
in the loop, but that is another alternative.


2019-05-28 01:50:01

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

On 2019/5/28 9:17, Stephen Hemminger wrote:
> On Tue, 28 May 2019 09:04:18 +0800
> Yunsheng Lin <[email protected]> wrote:
>
>> On 2019/5/27 22:58, Stephen Hemminger wrote:
>>> On Mon, 27 May 2019 09:47:54 +0800
>>> Yunsheng Lin <[email protected]> wrote:
>>>
>>>> When user has configured a large number of virtual netdev, such
>>>> as 4K vlans, the carrier on/off operation of the real netdev
>>>> will also cause it's virtual netdev's link state to be processed
>>>> in linkwatch. Currently, the processing is done in a work queue,
>>>> which may cause worker starvation problem for other work queue.
>>>>
>>>> This patch releases the cpu when link watch worker has processed
>>>> a fixed number of netdev' link watch event, and schedule the
>>>> work queue again when there is still link watch event remaining.
>>>>
>>>> Signed-off-by: Yunsheng Lin <[email protected]>
>>>
>>> Why not put link watch in its own workqueue so it is scheduled
>>> separately from the system workqueue?
>>
>> From testing and debuging, the workqueue runs on the cpu where the
>> workqueue is schedule when using normal workqueue, even using its
>> own workqueue instead of system workqueue. So if the cpu is busy
>> processing the linkwatch event, it is not able to process other
>> workqueue' work when the workqueue is scheduled on the same cpu.
>>
>> Using unbound workqueue may solve the cpu starvation problem.
>> But the __linkwatch_run_queue is called with rtnl_lock, so if it
>> takes a lot time to process, other need to take the rtnl_lock may
>> not be able to move forward.
>
> Agree with the starvation issue. My cocern is that large number of
> events that end up being delayed would impact things that are actually
> watching for link events (like routing daemons).

Agreed. I am not familiar with above use cases, it would be very helpful
if someone can help testing the impact of above use case.

>
> It probably would be not accepted to do rtnl_unlock/sched_yield/rtnl_lock
> in the loop, but that is another alternative.

Yes. But seems not very efficient to do rtnl_unlock/sched_yield/rtnl_lock
for very linkwatch_do_dev.

>
>
>
> .
>

2019-05-29 07:00:21

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

From: Yunsheng Lin <[email protected]>
Date: Mon, 27 May 2019 09:47:54 +0800

> When user has configured a large number of virtual netdev, such
> as 4K vlans, the carrier on/off operation of the real netdev
> will also cause it's virtual netdev's link state to be processed
> in linkwatch. Currently, the processing is done in a work queue,
> which may cause worker starvation problem for other work queue.
>
> This patch releases the cpu when link watch worker has processed
> a fixed number of netdev' link watch event, and schedule the
> work queue again when there is still link watch event remaining.
>
> Signed-off-by: Yunsheng Lin <[email protected]>

Why not rtnl_unlock(); yield(); rtnl_lock(); every "100" events
processed?

That seems better than adding all of this overhead to reschedule the
workqueue every 100 items.

2019-05-29 08:13:56

by Salil Mehta

[permalink] [raw]
Subject: RE: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

> From: [email protected] [mailto:[email protected]] On Behalf Of Yunsheng Lin
> Sent: Tuesday, May 28, 2019 2:04 AM
>
> On 2019/5/27 22:58, Stephen Hemminger wrote:
> > On Mon, 27 May 2019 09:47:54 +0800
> > Yunsheng Lin <[email protected]> wrote:
> >
> >> When user has configured a large number of virtual netdev, such
> >> as 4K vlans, the carrier on/off operation of the real netdev
> >> will also cause it's virtual netdev's link state to be processed
> >> in linkwatch. Currently, the processing is done in a work queue,
> >> which may cause worker starvation problem for other work queue.


I think we had already discussed about this internally and using separate
workqueue with WQ_UNBOUND should solve this problem. HNS3 driver was sharing
workqueue with the system workqueue.


> >> This patch releases the cpu when link watch worker has processed
> >> a fixed number of netdev' link watch event, and schedule the
> >> work queue again when there is still link watch event remaining.


We need proper examples/use-cases because of which we require above
kind of co-operative scheduling. Touching the common shared queue logic
which solid argument might invite for more problem to other modules.


> >> Signed-off-by: Yunsheng Lin <[email protected]>
> >
> > Why not put link watch in its own workqueue so it is scheduled
> > separately from the system workqueue?
>
> From testing and debuging, the workqueue runs on the cpu where the
> workqueue is schedule when using normal workqueue, even using its
> own workqueue instead of system workqueue. So if the cpu is busy
> processing the linkwatch event, it is not able to process other
> workqueue' work when the workqueue is scheduled on the same cpu.
>
> Using unbound workqueue may solve the cpu starvation problem.

[...]

> But the __linkwatch_run_queue is called with rtnl_lock, so if it
> takes a lot time to process, other need to take the rtnl_lock may
> not be able to move forward.

Please help me in understanding, Are you trying to pitch this patch
to solve more general system issue OR still your argument/concern
is related to the HNS3 driver problem mentioned in this patch?

Salil.







2019-05-29 08:43:51

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

On 2019/5/29 16:12, Salil Mehta wrote:
>> From: [email protected] [mailto:[email protected]] On Behalf Of Yunsheng Lin
>> Sent: Tuesday, May 28, 2019 2:04 AM
>>
>> On 2019/5/27 22:58, Stephen Hemminger wrote:
>>> On Mon, 27 May 2019 09:47:54 +0800
>>> Yunsheng Lin <[email protected]> wrote:
>>>
>>>> When user has configured a large number of virtual netdev, such
>>>> as 4K vlans, the carrier on/off operation of the real netdev
>>>> will also cause it's virtual netdev's link state to be processed
>>>> in linkwatch. Currently, the processing is done in a work queue,
>>>> which may cause worker starvation problem for other work queue.
>
>
> I think we had already discussed about this internally and using separate
> workqueue with WQ_UNBOUND should solve this problem. HNS3 driver was sharing
> workqueue with the system workqueue.

Yes, using WQ_UNBOUND wq in hns3 solved the cpu starvation for hns3
workqueue.

But the rtnl_lock taken by linkwatch is still a problem for hns3's
reset workqueue to do the down operation, which need a rtnl_lock.

>
>
>>>> This patch releases the cpu when link watch worker has processed
>>>> a fixed number of netdev' link watch event, and schedule the
>>>> work queue again when there is still link watch event remaining.
>
>
> We need proper examples/use-cases because of which we require above
> kind of co-operative scheduling. Touching the common shared queue logic
> which solid argument might invite for more problem to other modules.
>
>
>>>> Signed-off-by: Yunsheng Lin <[email protected]>
>>>
>>> Why not put link watch in its own workqueue so it is scheduled
>>> separately from the system workqueue?
>>
>> From testing and debuging, the workqueue runs on the cpu where the
>> workqueue is schedule when using normal workqueue, even using its
>> own workqueue instead of system workqueue. So if the cpu is busy
>> processing the linkwatch event, it is not able to process other
>> workqueue' work when the workqueue is scheduled on the same cpu.
>>
>> Using unbound workqueue may solve the cpu starvation problem.
>
> [...]
>
>> But the __linkwatch_run_queue is called with rtnl_lock, so if it
>> takes a lot time to process, other need to take the rtnl_lock may
>> not be able to move forward.
>
> Please help me in understanding, Are you trying to pitch this patch
> to solve more general system issue OR still your argument/concern
> is related to the HNS3 driver problem mentioned in this patch?

As about.

>
> Salil.
>
>
>
>
>
>
>

2019-05-29 09:01:30

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

On 2019/5/29 14:58, David Miller wrote:
> From: Yunsheng Lin <[email protected]>
> Date: Mon, 27 May 2019 09:47:54 +0800
>
>> When user has configured a large number of virtual netdev, such
>> as 4K vlans, the carrier on/off operation of the real netdev
>> will also cause it's virtual netdev's link state to be processed
>> in linkwatch. Currently, the processing is done in a work queue,
>> which may cause worker starvation problem for other work queue.
>>
>> This patch releases the cpu when link watch worker has processed
>> a fixed number of netdev' link watch event, and schedule the
>> work queue again when there is still link watch event remaining.
>>
>> Signed-off-by: Yunsheng Lin <[email protected]>
>
> Why not rtnl_unlock(); yield(); rtnl_lock(); every "100" events
> processed?
>
> That seems better than adding all of this overhead to reschedule the
> workqueue every 100 items.

One minor concern, the above solution does not seem to solve the cpu
starvation for other normal workqueue which was scheduled on the same
cpu as linkwatch. Maybe I misunderstand the workqueue or there is other
consideration here? :)

Anyway, I will implemet it as you suggested and test it before posting V2.
Thanks.

>
> .
>

2019-06-25 04:11:13

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

On 2019/5/29 16:59, Yunsheng Lin wrote:
> On 2019/5/29 14:58, David Miller wrote:
>> From: Yunsheng Lin <[email protected]>
>> Date: Mon, 27 May 2019 09:47:54 +0800
>>
>>> When user has configured a large number of virtual netdev, such
>>> as 4K vlans, the carrier on/off operation of the real netdev
>>> will also cause it's virtual netdev's link state to be processed
>>> in linkwatch. Currently, the processing is done in a work queue,
>>> which may cause worker starvation problem for other work queue.
>>>
>>> This patch releases the cpu when link watch worker has processed
>>> a fixed number of netdev' link watch event, and schedule the
>>> work queue again when there is still link watch event remaining.
>>>
>>> Signed-off-by: Yunsheng Lin <[email protected]>
>>
>> Why not rtnl_unlock(); yield(); rtnl_lock(); every "100" events
>> processed?
>>
>> That seems better than adding all of this overhead to reschedule the
>> workqueue every 100 items.
>
> One minor concern, the above solution does not seem to solve the cpu
> starvation for other normal workqueue which was scheduled on the same
> cpu as linkwatch. Maybe I misunderstand the workqueue or there is other
> consideration here? :)
>
> Anyway, I will implemet it as you suggested and test it before posting V2.
> Thanks.

Hi, David

I stress tested the above solution with a lot of vlan dev and qemu-kvm with
vf passthrongh mode, the linkwatch wq sometimes block the irqfd_inject wq
when they are scheduled on the same cpu, which may cause interrupt delay
problem for vm.

Rescheduling workqueue every 100 items does give irqfd_inject wq to run sooner,
which alleviate the interrupt delay problems for vm.

So It is ok for me to fall back to reschedule the link watch wq every 100 items,
or is there a better way to fix it properly?



>
>>
>> .
>>
>
>
> .
>

2019-06-27 18:17:45

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

From: Yunsheng Lin <[email protected]>
Date: Tue, 25 Jun 2019 10:28:04 +0800

> So It is ok for me to fall back to reschedule the link watch wq
> every 100 items, or is there a better way to fix it properly?

Yes, that is fine for now.