2023-05-08 06:28:25

by Paolo Abeni

[permalink] [raw]
Subject: [PATCH] revert: "softirq: Let ksoftirqd do its job"

Due to the mentioned commit, when the ksoftirqd processes take charge
of softirq processing, the system can experience high latencies.

In the past a few workarounds have been implemented for specific
side-effects of the above:

commit 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred")
commit 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
commit 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")
commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous")

but the latency problem still exists in real-life workloads, see the
link below.

The reverted commit intended to solve a live-lock scenario that can now
be addressed with the NAPI threaded mode, introduced with commit
29863d41bb6e ("net: implement threaded-able napi poll loop support"),
and nowadays in a pretty stable status.

While a complete solution to put softirq processing under nice resource
control would be preferable, that has proven to be a very hard task. In
the short term, remove the main pain point, and also simplify a bit the
current softirq implementation.

Note that this change also reverts commit 3c53776e29f8 ("Mark HI and
TASKLET softirq synchronous") and commit 1342d8080f61 ("softirq: Don't
skip softirq execution when softirq thread is parking"), which are
direct follow-ups of the feature commit. A single change is preferred to
avoid known bad intermediate states introduced by a patch series
reverting them individually.

Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Paolo Abeni <[email protected]>
Tested-by: Jason Xing <[email protected]>
---
kernel/softirq.c | 22 ++--------------------
1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 1b725510dd0f..807b34ccd797 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -80,21 +80,6 @@ static void wakeup_softirqd(void)
wake_up_process(tsk);
}

-/*
- * If ksoftirqd is scheduled, we do not want to process pending softirqs
- * right now. Let ksoftirqd handle this at its own rate, to get fairness,
- * unless we're doing some of the synchronous softirqs.
- */
-#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
-static bool ksoftirqd_running(unsigned long pending)
-{
- struct task_struct *tsk = __this_cpu_read(ksoftirqd);
-
- if (pending & SOFTIRQ_NOW_MASK)
- return false;
- return tsk && task_is_running(tsk) && !__kthread_should_park(tsk);
-}
-
#ifdef CONFIG_TRACE_IRQFLAGS
DEFINE_PER_CPU(int, hardirqs_enabled);
DEFINE_PER_CPU(int, hardirq_context);
@@ -236,7 +221,7 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
goto out;

pending = local_softirq_pending();
- if (!pending || ksoftirqd_running(pending))
+ if (!pending)
goto out;

/*
@@ -432,9 +417,6 @@ static inline bool should_wake_ksoftirqd(void)

static inline void invoke_softirq(void)
{
- if (ksoftirqd_running(local_softirq_pending()))
- return;
-
if (!force_irqthreads() || !__this_cpu_read(ksoftirqd)) {
#ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
/*
@@ -468,7 +450,7 @@ asmlinkage __visible void do_softirq(void)

pending = local_softirq_pending();

- if (pending && !ksoftirqd_running(pending))
+ if (pending)
do_softirq_own_stack();

local_irq_restore(flags);
--
2.40.0


2023-05-08 21:36:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] revert: "softirq: Let ksoftirqd do its job"

Paolo!

On Mon, May 08 2023 at 08:17, Paolo Abeni wrote:
> Due to the mentioned commit, when the ksoftirqd processes take charge
> of softirq processing, the system can experience high latencies.
>
> In the past a few workarounds have been implemented for specific
> side-effects of the above:
>
> commit 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred")
> commit 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
> commit 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")
> commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous")
>
> but the latency problem still exists in real-life workloads, see the
> link below.
>
> The reverted commit intended to solve a live-lock scenario that can now
> be addressed with the NAPI threaded mode, introduced with commit
> 29863d41bb6e ("net: implement threaded-able napi poll loop support"),
> and nowadays in a pretty stable status.
>
> While a complete solution to put softirq processing under nice resource
> control would be preferable, that has proven to be a very hard task. In
> the short term, remove the main pain point, and also simplify a bit the
> current softirq implementation.
>
> Note that this change also reverts commit 3c53776e29f8 ("Mark HI and
> TASKLET softirq synchronous") and commit 1342d8080f61 ("softirq: Don't
> skip softirq execution when softirq thread is parking"), which are
> direct follow-ups of the feature commit. A single change is preferred to
> avoid known bad intermediate states introduced by a patch series
> reverting them individually.

I'm fine with this change, but I definitely want that to be
acked/reviewed by the other stakeholders in the networking arena.

Thanks,

tglx

2023-05-09 01:51:44

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] revert: "softirq: Let ksoftirqd do its job"

On Mon, 8 May 2023 08:17:44 +0200 Paolo Abeni wrote:
> Due to the mentioned commit, when the ksoftirqd processes take charge
> of softirq processing, the system can experience high latencies.
>
> In the past a few workarounds have been implemented for specific
> side-effects of the above:
>
> commit 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred")
> commit 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
> commit 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")
> commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous")
>
> but the latency problem still exists in real-life workloads, see the
> link below.
>
> The reverted commit intended to solve a live-lock scenario that can now
> be addressed with the NAPI threaded mode, introduced with commit
> 29863d41bb6e ("net: implement threaded-able napi poll loop support"),
> and nowadays in a pretty stable status.
>
> While a complete solution to put softirq processing under nice resource
> control would be preferable, that has proven to be a very hard task. In
> the short term, remove the main pain point, and also simplify a bit the
> current softirq implementation.
>
> Note that this change also reverts commit 3c53776e29f8 ("Mark HI and
> TASKLET softirq synchronous") and commit 1342d8080f61 ("softirq: Don't
> skip softirq execution when softirq thread is parking"), which are
> direct follow-ups of the feature commit. A single change is preferred to
> avoid known bad intermediate states introduced by a patch series
> reverting them individually.
>
> Link: https://lore.kernel.org/netdev/[email protected]/
> Signed-off-by: Paolo Abeni <[email protected]>
> Tested-by: Jason Xing <[email protected]>

Reviewed-by: Jakub Kicinski <[email protected]>

2023-05-09 09:28:02

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] revert: "softirq: Let ksoftirqd do its job"

On Tue, May 9, 2023 at 3:42 AM Jakub Kicinski <[email protected]> wrote:
>
> On Mon, 8 May 2023 08:17:44 +0200 Paolo Abeni wrote:
> > Due to the mentioned commit, when the ksoftirqd processes take charge
> > of softirq processing, the system can experience high latencies.
> >
> > In the past a few workarounds have been implemented for specific
> > side-effects of the above:
> >
> > commit 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred")
> > commit 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
> > commit 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")
> > commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous")
> >
> > but the latency problem still exists in real-life workloads, see the
> > link below.
> >
> > The reverted commit intended to solve a live-lock scenario that can now
> > be addressed with the NAPI threaded mode, introduced with commit
> > 29863d41bb6e ("net: implement threaded-able napi poll loop support"),
> > and nowadays in a pretty stable status.
> >
> > While a complete solution to put softirq processing under nice resource
> > control would be preferable, that has proven to be a very hard task. In
> > the short term, remove the main pain point, and also simplify a bit the
> > current softirq implementation.
> >
> > Note that this change also reverts commit 3c53776e29f8 ("Mark HI and
> > TASKLET softirq synchronous") and commit 1342d8080f61 ("softirq: Don't
> > skip softirq execution when softirq thread is parking"), which are
> > direct follow-ups of the feature commit. A single change is preferred to
> > avoid known bad intermediate states introduced by a patch series
> > reverting them individually.
> >
> > Link: https://lore.kernel.org/netdev/[email protected]/
> > Signed-off-by: Paolo Abeni <[email protected]>
> > Tested-by: Jason Xing <[email protected]>
>
> Reviewed-by: Jakub Kicinski <[email protected]>

Reviewed-by: Eric Dumazet <[email protected]>

Thanks.

Subject: Re: [PATCH] revert: "softirq: Let ksoftirqd do its job"

On 2023-05-08 08:17:44 [+0200], Paolo Abeni wrote:
> Due to the mentioned commit, when the ksoftirqd processes take charge
> of softirq processing, the system can experience high latencies.

Yes. RT wise I tried a lot to keep ksoftirqd from getting scheduled.
With this change, it makes the life a lot easier.

Reviewed-by: Sebastian Andrzej Siewior <[email protected]>

Sebastian