2018-11-15 17:43:31

by Zhivich, Michael

[permalink] [raw]
Subject: [PATCH] softirq: don't push timer softirq handling to ksoftirqd

Require TIMER_SOFTIRQ to be handled immediately instead of delaying until
ksoftirqd runs, thus preventing problems with reading clocksources that
wrap often (e.g. acpi_pm).

If acpi_pm is used as the clocksource watchdog, and machine is under heavy
load, the time period for the watchdog check may be significantly longer
than the requested 0.5 seconds. If the watchdog check is delayed by 2
seconds (observed behavior), then acpi_pm time delta will be

2.5 sec * 3579545 ticks/sec = 8948863 = 0x888c3f

which will be treated as negative (since acpi_pm is only 24-bits wide) and
truncated to 0. This behavior will cause tsc to be incorrectly declared
unstable in clocksource_watchdog(), as it no longer agrees with acpi_pm.
If the clocksource watchdog check is delayed by more than 4.7 sec, then the
acpi_pm clocksource will wrap altogether and produce incorrect time delta.

The likely cause of this delay is that timer interrupts are serviced in
ksoftirqd when the machine is very busy.

Per Linus' comment in commit 3c53776e29f8 ("Mark HI and TASKLET softirq
synchronous"):
...
We should probably also consider the timer softirqs to be synchronous
and not be delayed to ksoftirqd (since they were the issue with the
earlier watchdog problems), but that should be done as a separate patch.
...

Signed-off-by: Michael Zhivich <[email protected]>
---
kernel/softirq.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index d28813306b2c..6d517ce0fba8 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -82,7 +82,8 @@ static void wakeup_softirqd(void)
* right now. Let ksoftirqd handle this at its own rate, to get fairness,
* unless we're doing some of the synchronous softirqs.
*/
-#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
+#define SOFTIRQ_NOW_MASK \
+ ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ) | (1 << TIMER_SOFTIRQ))
static bool ksoftirqd_running(unsigned long pending)
{
struct task_struct *tsk = __this_cpu_read(ksoftirqd);
--
2.19.1



2018-11-15 17:21:43

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH] softirq: don't push timer softirq handling to ksoftirqd

On Thu, Nov 15, 2018 at 9:07 AM, Michael Zhivich <[email protected]> wrote:
> Require TIMER_SOFTIRQ to be handled immediately instead of delaying until
> ksoftirqd runs, thus preventing problems with reading clocksources that
> wrap often (e.g. acpi_pm).
>
> If acpi_pm is used as the clocksource watchdog, and machine is under heavy
> load, the time period for the watchdog check may be significantly longer
> than the requested 0.5 seconds. If the watchdog check is delayed by 2
> seconds (observed behavior), then acpi_pm time delta will be
>
> 2.5 sec * 3579545 ticks/sec = 8948863 = 0x888c3f
>
> which will be treated as negative (since acpi_pm is only 24-bits wide) and
> truncated to 0. This behavior will cause tsc to be incorrectly declared
> unstable in clocksource_watchdog(), as it no longer agrees with acpi_pm.
> If the clocksource watchdog check is delayed by more than 4.7 sec, then the
> acpi_pm clocksource will wrap altogether and produce incorrect time delta.
>
> The likely cause of this delay is that timer interrupts are serviced in
> ksoftirqd when the machine is very busy.
>
> Per Linus' comment in commit 3c53776e29f8 ("Mark HI and TASKLET softirq
> synchronous"):
> ...
> We should probably also consider the timer softirqs to be synchronous
> and not be delayed to ksoftirqd (since they were the issue with the
> earlier watchdog problems), but that should be done as a separate patch.
> ...
>
> Signed-off-by: Michael Zhivich <[email protected]>
> ---
> kernel/softirq.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index d28813306b2c..6d517ce0fba8 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -82,7 +82,8 @@ static void wakeup_softirqd(void)
> * right now. Let ksoftirqd handle this at its own rate, to get fairness,
> * unless we're doing some of the synchronous softirqs.
> */
> -#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
> +#define SOFTIRQ_NOW_MASK \
> + ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ) | (1 << TIMER_SOFTIRQ))
> static bool ksoftirqd_running(unsigned long pending)
> {
> struct task_struct *tsk = __this_cpu_read(ksoftirqd);

Thanks so much for sending this along! Sorry I didn't get back to your
mail earlier this week, I've been at Plumbers.

So while this does try to attack the reliability issue w/ the
clocksource watchdog being delayed, I worry this will have to many
side-effects elsewhere.

Would a more focused fix be to move the clocksource watchdog from a
normal timer to a hrtimer?

thanks
-john

2018-11-16 18:48:45

by Zhivich, Michael

[permalink] [raw]
Subject: Re: [PATCH] softirq: don't push timer softirq handling to ksoftirqd

On 11/15/18, 12:17 PM, "John Stultz" <[email protected]> wrote:

On Thu, Nov 15, 2018 at 9:07 AM, Michael Zhivich <[email protected]> wrote:
> Require TIMER_SOFTIRQ to be handled immediately instead of delaying until
> ksoftirqd runs, thus preventing problems with reading clocksources that
> wrap often (e.g. acpi_pm).
>
> If acpi_pm is used as the clocksource watchdog, and machine is under heavy
> load, the time period for the watchdog check may be significantly longer
> than the requested 0.5 seconds. If the watchdog check is delayed by 2
> seconds (observed behavior), then acpi_pm time delta will be
>
> 2.5 sec * 3579545 ticks/sec = 8948863 = 0x888c3f
>
> which will be treated as negative (since acpi_pm is only 24-bits wide) and
> truncated to 0. This behavior will cause tsc to be incorrectly declared
> unstable in clocksource_watchdog(), as it no longer agrees with acpi_pm.
> If the clocksource watchdog check is delayed by more than 4.7 sec, then the
> acpi_pm clocksource will wrap altogether and produce incorrect time delta.
>
> The likely cause of this delay is that timer interrupts are serviced in
> ksoftirqd when the machine is very busy.
>
> Per Linus' comment in commit 3c53776e29f8 ("Mark HI and TASKLET softirq
> synchronous"):
> ...
> We should probably also consider the timer softirqs to be synchronous
> and not be delayed to ksoftirqd (since they were the issue with the
> earlier watchdog problems), but that should be done as a separate patch.
> ...
>
> Signed-off-by: Michael Zhivich <[email protected]>
> ---
> kernel/softirq.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index d28813306b2c..6d517ce0fba8 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -82,7 +82,8 @@ static void wakeup_softirqd(void)
> * right now. Let ksoftirqd handle this at its own rate, to get fairness,
> * unless we're doing some of the synchronous softirqs.
> */
> -#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
> +#define SOFTIRQ_NOW_MASK \
> + ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ) | (1 << TIMER_SOFTIRQ))
> static bool ksoftirqd_running(unsigned long pending)
> {
> struct task_struct *tsk = __this_cpu_read(ksoftirqd);

Thanks so much for sending this along! Sorry I didn't get back to your
mail earlier this week, I've been at Plumbers.

So while this does try to attack the reliability issue w/ the
clocksource watchdog being delayed, I worry this will have to many
side-effects elsewhere.

Would a more focused fix be to move the clocksource watchdog from a
normal timer to a hrtimer?

thanks
-john

Hi John,

That's an interesting idea - it would get clocksource watchdog out of ksoftirqd. However, clocksource watchdog iterates over available CPUs to check the TSC on each core (see add_timer_on() call in clocksource_watchdog()). I'm not seeing an API to start an hrtimer on a specific CPU - is this possible and I'm missing it? Or would something like this have to be added to hrtimer?

Thanks,
~ Michael

2018-11-28 15:23:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] softirq: don't push timer softirq handling to ksoftirqd

Michael,

On Fri, 16 Nov 2018, Zhivich, Michael wrote:
> On 11/15/18, 12:17 PM, "John Stultz" <[email protected]> wrote:
> Would a more focused fix be to move the clocksource watchdog from a
> normal timer to a hrtimer?
>
> That's an interesting idea - it would get clocksource watchdog out of
> ksoftirqd. However, clocksource watchdog iterates over available CPUs to
> check the TSC on each core (see add_timer_on() call in
> clocksource_watchdog()). I'm not seeing an API to start an hrtimer on a
> specific CPU - is this possible and I'm missing it? Or would something
> like this have to be added to hrtimer?

It's surely an interesting idea, but it's not trivial.

There is no way to reliably queue hrtimers remote when high resolution mode
is enabled. It only works when the to be queued timer is not the first to
expire one. If it ends up being the first to expire timer, then there is
currently no way to rearm the remote per cpu clockevent device because it's
not remotely accesible.

It would need an SMP function call, but that needs to be asynchronous due
to locking/interrupt disabled constraints. Maybe ...

Thanks,

tglx

2018-11-28 17:59:28

by Zhivich, Michael

[permalink] [raw]
Subject: Re: [kreview] [PATCH] softirq: don't push timer softirq handling to ksoftirqd

On 11/28/18, 10:22 AM, "Thomas Gleixner" <[email protected]> wrote:

Michael,

On Fri, 16 Nov 2018, Zhivich, Michael wrote:
> On 11/15/18, 12:17 PM, "John Stultz" <[email protected]> wrote:
> Would a more focused fix be to move the clocksource watchdog from a
> normal timer to a hrtimer?
>
> That's an interesting idea - it would get clocksource watchdog out of
> ksoftirqd. However, clocksource watchdog iterates over available CPUs to
> check the TSC on each core (see add_timer_on() call in
> clocksource_watchdog()). I'm not seeing an API to start an hrtimer on a
> specific CPU - is this possible and I'm missing it? Or would something
> like this have to be added to hrtimer?

It's surely an interesting idea, but it's not trivial.

There is no way to reliably queue hrtimers remote when high resolution mode
is enabled. It only works when the to be queued timer is not the first to
expire one. If it ends up being the first to expire timer, then there is
currently no way to rearm the remote per cpu clockevent device because it's
not remotely accesible.

It would need an SMP function call, but that needs to be asynchronous due
to locking/interrupt disabled constraints. Maybe ...

Thanks,

tglx


Thomas,

Thanks for the feedback - it does sound tricky to get right. I'll spend some more time thinking about it.

My original patch tries to ensure that timer softirqs are serviced immediately, not punted to ksoftirqd. It is perhaps too heavy-handed (all softirqs will get serviced if a timer softirq fired), but I imagine the clocksource watchdog is not the only timer that doesn't want to be arbitrarily delayed when the machine is busy.

Would it make sense to modify the patch such that only timer softirqs are serviced immediately if fired? Or are most timers that require precision wakeups using hrtimer framework?

Thanks,
~ Michael