2021-04-27 08:39:42

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()

There are more indicators whether the SMP function calls on clock_was_set()
can be avoided:

- When the remote CPU is currently handling hrtimer_interrupt(). In
that case the remote CPU will update offsets and reevaluate the timer
bases before reprogramming anyway, so nothing to do.

By unconditionally updating the offsets the following checks are possible:

- When the offset update already happened on the remote CPU then the
remote update attempt will yield the same seqeuence number and no
IPI is required.

- After updating it can be checked whether the first expiring timer in
the affected clock bases moves before the first expiring (softirq)
timer of the CPU. If that's not the case then sending the IPI is not
required.

Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/time/hrtimer.c | 66 +++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 57 insertions(+), 9 deletions(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -880,6 +880,60 @@ static void hrtimer_reprogram(struct hrt
tick_program_event(expires, 1);
}

+static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base,
+ unsigned int active)
+{
+ struct hrtimer_clock_base *base;
+ unsigned int seq;
+ ktime_t expires;
+
+ /*
+ * If the remote CPU is currently handling an hrtimer interrupt, it
+ * will update and reevaluate the first expiring timer of all clock
+ * bases before reprogramming. Nothing to do here.
+ */
+ if (cpu_base->in_hrtirq)
+ return false;
+
+ /*
+ * Update the base offsets unconditionally so the following quick
+ * check whether the SMP function call is required works.
+ */
+ seq = cpu_base->clock_was_set_seq;
+ hrtimer_update_base(cpu_base);
+
+ /*
+ * If the sequence did not change over the update then the
+ * remote CPU already handled it.
+ */
+ if (seq == cpu_base->clock_was_set_seq)
+ return false;
+
+ /*
+ * Walk the affected clock bases and check whether the first expiring
+ * timer in a clock base is moving ahead of the first expiring timer of
+ * @cpu_base. If so, the IPI must be invoked because per CPU clock
+ * event devices cannot be remotely reprogrammed.
+ */
+ for_each_active_base(base, cpu_base, active) {
+ struct timerqueue_node *next;
+
+ next = timerqueue_getnext(&base->active);
+ expires = ktime_sub(next->expires, base->offset);
+ if (expires < cpu_base->expires_next)
+ return true;
+
+ /* Extra check for softirq clock bases */
+ if (base->clockid < HRTIMER_BASE_MONOTONIC_SOFT)
+ continue;
+ if (cpu_base->softirq_activated)
+ continue;
+ if (expires < cpu_base->softirq_expires_next)
+ return true;
+ }
+ return false;
+}
+
/*
* Clock was set. This might affect CLOCK_REALTIME, CLOCK_TAI and
* CLOCK_BOOTTIME (for late sleep time injection).
@@ -914,16 +968,10 @@ void clock_was_set(unsigned int bases)
unsigned long flags;

raw_spin_lock_irqsave(&cpu_base->lock, flags);
- /*
- * Only send the IPI when there are timers queued in one of
- * the affected clock bases. Otherwise update the base
- * remote to ensure that the next enqueue of a timer on
- * such a clock base will see the correct offsets.
- */
- if (cpu_base->active_bases & bases)
+
+ if (update_needs_ipi(cpu_base, bases))
cpumask_set_cpu(cpu, mask);
- else
- hrtimer_update_base(cpu_base);
+
raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
}



2021-04-27 15:18:51

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [patch 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()

On Tue, Apr 27, 2021 at 10:25:45AM +0200, Thomas Gleixner wrote:
> There are more indicators whether the SMP function calls on clock_was_set()
> can be avoided:
>
> - When the remote CPU is currently handling hrtimer_interrupt(). In
> that case the remote CPU will update offsets and reevaluate the timer
> bases before reprogramming anyway, so nothing to do.
>
> By unconditionally updating the offsets the following checks are possible:
>
> - When the offset update already happened on the remote CPU then the
> remote update attempt will yield the same seqeuence number and no
> IPI is required.
>
> - After updating it can be checked whether the first expiring timer in
> the affected clock bases moves before the first expiring (softirq)
> timer of the CPU. If that's not the case then sending the IPI is not
> required.
>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
> kernel/time/hrtimer.c | 66 +++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 57 insertions(+), 9 deletions(-)
>
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -880,6 +880,60 @@ static void hrtimer_reprogram(struct hrt
> tick_program_event(expires, 1);
> }
>
> +static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base,
> + unsigned int active)
> +{
> + struct hrtimer_clock_base *base;
> + unsigned int seq;
> + ktime_t expires;
> +
> + /*
> + * If the remote CPU is currently handling an hrtimer interrupt, it
> + * will update and reevaluate the first expiring timer of all clock
> + * bases before reprogramming. Nothing to do here.
> + */
> + if (cpu_base->in_hrtirq)
> + return false;

Thomas,

The base offsets are updated at

Cscope tag: hrtimer_update_base
# line filename / context / line
1 736 kernel/time/hrtimer.c <<retrigger_next_event>> (IPI handler)
hrtimer_update_base(base);
2 1617 kernel/time/hrtimer.c <<hrtimer_run_softirq>> (softirq handler)
now = hrtimer_update_base(cpu_base);
3 1645 kernel/time/hrtimer.c <<hrtimer_interrupt>> (hrtimer_interrupt entry)
entry_time = now = hrtimer_update_base(cpu_base);
4 1695 kernel/time/hrtimer.c <<hrtimer_interrupt>> (after tick_program_event failure)
now = hrtimer_update_base(cpu_base);
5 1768 kernel/time/hrtimer.c <<hrtimer_run_queues>> (sched_tick)
now = hrtimer_update_base(cpu_base);

Consider


hrtimer_interrupt
in_hrtirq = 1
__run_hrtimer
raw_spin_unlock_irqrestore(&cpu_base->lock, flags)
settimeofday
clock_was_set
lock cpu_base->lock
update_needs_ipi returns false
continue to process hrtimers with stale base->offset

No?

2021-04-27 20:04:03

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()

On Tue, Apr 27 2021 at 12:11, Marcelo Tosatti wrote:
> On Tue, Apr 27, 2021 at 10:25:45AM +0200, Thomas Gleixner wrote:
> Consider
>
>
> hrtimer_interrupt
> in_hrtirq = 1
> __run_hrtimer
> raw_spin_unlock_irqrestore(&cpu_base->lock, flags)
> settimeofday
> clock_was_set
> lock cpu_base->lock
> update_needs_ipi returns false
> continue to process hrtimers with stale base->offset

Bah. I somehow convinced myself that hrtimer_interrupt() rechecks the
offset before clearing in_hrtirq, but that's not true. It does so when
reprogramming fails, but not for the regular case.

Lemme stare at it some more.

Thanks,

tglx

2021-04-30 07:13:05

by Thomas Gleixner

[permalink] [raw]
Subject: [patch V2 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()

By unconditionally updating the offsets there are more indicators
whether the SMP function calls on clock_was_set() can be avoided:

- When the offset update already happened on the remote CPU then the
remote update attempt will yield the same seqeuence number and no
IPI is required.

- When the remote CPU is currently handling hrtimer_interrupt(). In
that case the remote CPU will reevaluate the timer bases before
reprogramming anyway, so nothing to do.

- After updating it can be checked whether the first expiring timer in
the affected clock bases moves before the first expiring (softirq)
timer of the CPU. If that's not the case then sending the IPI is not
required.

Signed-off-by: Thomas Gleixner <[email protected]>
---
V2: Fix the in_hrtirq thinko (Marcelo)
Add the missing masking (reported by 0day)

P.S.: The git branch is updated as well

---
kernel/time/hrtimer.c | 74 +++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 65 insertions(+), 9 deletions(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -880,6 +880,68 @@ static void hrtimer_reprogram(struct hrt
tick_program_event(expires, 1);
}

+static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base,
+ unsigned int active)
+{
+ struct hrtimer_clock_base *base;
+ unsigned int seq;
+ ktime_t expires;
+
+ /*
+ * Update the base offsets unconditionally so the following
+ * checks whether the SMP function call is required works.
+ *
+ * The update is safe even when the remote CPU is in the hrtimer
+ * interrupt or the hrtimer soft interrupt and expiring affected
+ * bases. Either it will see the update before handling a base or
+ * it will see it when it finishes the processing and reevaluates
+ * the next expiring timer.
+ */
+ seq = cpu_base->clock_was_set_seq;
+ hrtimer_update_base(cpu_base);
+
+ /*
+ * If the sequence did not change over the update then the
+ * remote CPU already handled it.
+ */
+ if (seq == cpu_base->clock_was_set_seq)
+ return false;
+
+ /*
+ * If the remote CPU is currently handling an hrtimer interrupt, it
+ * will reevaluate the first expiring timer of all clock bases
+ * before reprogramming. Nothing to do here.
+ */
+ if (cpu_base->in_hrtirq)
+ return false;
+
+ /*
+ * Walk the affected clock bases and check whether the first expiring
+ * timer in a clock base is moving ahead of the first expiring timer of
+ * @cpu_base. If so, the IPI must be invoked because per CPU clock
+ * event devices cannot be remotely reprogrammed.
+ */
+ active &= cpu_base->active_bases;
+
+ for_each_active_base(base, cpu_base, active) {
+ struct timerqueue_node *next;
+
+ next = timerqueue_getnext(&base->active);
+ expires = ktime_sub(next->expires, base->offset);
+ if (expires < cpu_base->expires_next)
+ return true;
+
+ /* Extra check for softirq clock bases */
+ if (base->clockid < HRTIMER_BASE_MONOTONIC_SOFT)
+ continue;
+ if (cpu_base->softirq_activated)
+ continue;
+ if (expires < cpu_base->softirq_expires_next)
+ return true;
+ }
+ return false;
+}
+
/*
* Clock was set. This might affect CLOCK_REALTIME, CLOCK_TAI and
* CLOCK_BOOTTIME (for late sleep time injection).
@@ -914,16 +976,10 @@ void clock_was_set(unsigned int bases)
unsigned long flags;

raw_spin_lock_irqsave(&cpu_base->lock, flags);
- /*
- * Only send the IPI when there are timers queued in one of
- * the affected clock bases. Otherwise update the base
- * remote to ensure that the next enqueue of a timer on
- * such a clock base will see the correct offsets.
- */
- if (cpu_base->active_bases & bases)
+
+ if (update_needs_ipi(cpu_base, bases))
cpumask_set_cpu(cpu, mask);
- else
- hrtimer_update_base(cpu_base);
+
raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
}

2021-04-30 16:51:47

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [patch V2 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()

On Fri, Apr 30, 2021 at 09:12:15AM +0200, Thomas Gleixner wrote:
> By unconditionally updating the offsets there are more indicators
> whether the SMP function calls on clock_was_set() can be avoided:
>
> - When the offset update already happened on the remote CPU then the
> remote update attempt will yield the same seqeuence number and no
> IPI is required.
>
> - When the remote CPU is currently handling hrtimer_interrupt(). In
> that case the remote CPU will reevaluate the timer bases before
> reprogramming anyway, so nothing to do.
>
> - After updating it can be checked whether the first expiring timer in
> the affected clock bases moves before the first expiring (softirq)
> timer of the CPU. If that's not the case then sending the IPI is not
> required.
>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
> V2: Fix the in_hrtirq thinko (Marcelo)
> Add the missing masking (reported by 0day)
>
> P.S.: The git branch is updated as well
>
> ---
> kernel/time/hrtimer.c | 74 +++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 65 insertions(+), 9 deletions(-)
>
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -880,6 +880,68 @@ static void hrtimer_reprogram(struct hrt
> tick_program_event(expires, 1);
> }
>
> +static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base,
> + unsigned int active)
> +{
> + struct hrtimer_clock_base *base;
> + unsigned int seq;
> + ktime_t expires;
> +
> + /*
> + * Update the base offsets unconditionally so the following
> + * checks whether the SMP function call is required works.
> + *
> + * The update is safe even when the remote CPU is in the hrtimer
> + * interrupt or the hrtimer soft interrupt and expiring affected
> + * bases. Either it will see the update before handling a base or
> + * it will see it when it finishes the processing and reevaluates
> + * the next expiring timer.
> + */
> + seq = cpu_base->clock_was_set_seq;
> + hrtimer_update_base(cpu_base);
> +
> + /*
> + * If the sequence did not change over the update then the
> + * remote CPU already handled it.
> + */
> + if (seq == cpu_base->clock_was_set_seq)
> + return false;
> +
> + /*
> + * If the remote CPU is currently handling an hrtimer interrupt, it
> + * will reevaluate the first expiring timer of all clock bases
> + * before reprogramming. Nothing to do here.
> + */
> + if (cpu_base->in_hrtirq)
> + return false;

Looks good, thanks.

> +
> + /*
> + * Walk the affected clock bases and check whether the first expiring
> + * timer in a clock base is moving ahead of the first expiring timer of
> + * @cpu_base. If so, the IPI must be invoked because per CPU clock
> + * event devices cannot be remotely reprogrammed.
> + */
> + active &= cpu_base->active_bases;
> +
> + for_each_active_base(base, cpu_base, active) {
> + struct timerqueue_node *next;
> +
> + next = timerqueue_getnext(&base->active);
> + expires = ktime_sub(next->expires, base->offset);
> + if (expires < cpu_base->expires_next)
> + return true;
> +
> + /* Extra check for softirq clock bases */
> + if (base->clockid < HRTIMER_BASE_MONOTONIC_SOFT)
> + continue;
> + if (cpu_base->softirq_activated)
> + continue;
> + if (expires < cpu_base->softirq_expires_next)
> + return true;
> + }
> + return false;
> +}
> +
> /*
> * Clock was set. This might affect CLOCK_REALTIME, CLOCK_TAI and
> * CLOCK_BOOTTIME (for late sleep time injection).
> @@ -914,16 +976,10 @@ void clock_was_set(unsigned int bases)
> unsigned long flags;
>
> raw_spin_lock_irqsave(&cpu_base->lock, flags);
> - /*
> - * Only send the IPI when there are timers queued in one of
> - * the affected clock bases. Otherwise update the base
> - * remote to ensure that the next enqueue of a timer on
> - * such a clock base will see the correct offsets.
> - */
> - if (cpu_base->active_bases & bases)
> +
> + if (update_needs_ipi(cpu_base, bases))
> cpumask_set_cpu(cpu, mask);
> - else
> - hrtimer_update_base(cpu_base);
> +
> raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
> }
>
>
>

2021-05-13 07:51:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch V2 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()

On Fri, Apr 30, 2021 at 09:12:15AM +0200, Thomas Gleixner wrote:
> +static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base,
> + unsigned int active)
> +{
> + struct hrtimer_clock_base *base;
> + unsigned int seq;
> + ktime_t expires;
> +
> + /*
> + * Update the base offsets unconditionally so the following
> + * checks whether the SMP function call is required works.
> + *
> + * The update is safe even when the remote CPU is in the hrtimer
> + * interrupt or the hrtimer soft interrupt and expiring affected
> + * bases. Either it will see the update before handling a base or
> + * it will see it when it finishes the processing and reevaluates
> + * the next expiring timer.
> + */
> + seq = cpu_base->clock_was_set_seq;
> + hrtimer_update_base(cpu_base);
> +
> + /*
> + * If the sequence did not change over the update then the
> + * remote CPU already handled it.
> + */
> + if (seq == cpu_base->clock_was_set_seq)
> + return false;
> +

So far so simple, if there's nothing to update, we done.

> + /*
> + * If the remote CPU is currently handling an hrtimer interrupt, it
> + * will reevaluate the first expiring timer of all clock bases
> + * before reprogramming. Nothing to do here.
> + */
> + if (cpu_base->in_hrtirq)
> + return false;

This one gives me a head-ache though; if we get here, that means
hrtimer_interrupt()'s hrtimer_update_base() happened before the change.
It also means that CPU is in __run_hrtimer() running a fn(), since we
own cpu_base->lock.

That in turn means it is in __hrtimer_run_queues(), possible on the last
base.

Now, if I understand it right, the thing that saves us, is that
hrtimer_update_next_event() -- right after returning from
__hrtimer_run_queues() -- will re-evaluate all bases (with the
hrtimer_update_base() we just did visible to it) and we'll eventually
goto retry if time moved such that we now have timers that should've ran
but were missed due to this concurrent shift in time.

However, since that retries thing is limited to 3; could we not trigger
that by generating a stream of these updates, causing the timer to keep
having to be reset? I suppose updating time is a root only thing, and
root can shoot its own foot off any time it damn well likes, so who
cares.

> + /*
> + * Walk the affected clock bases and check whether the first expiring
> + * timer in a clock base is moving ahead of the first expiring timer of
> + * @cpu_base. If so, the IPI must be invoked because per CPU clock
> + * event devices cannot be remotely reprogrammed.
> + */
> + active &= cpu_base->active_bases;
> +
> + for_each_active_base(base, cpu_base, active) {
> + struct timerqueue_node *next;
> +
> + next = timerqueue_getnext(&base->active);
> + expires = ktime_sub(next->expires, base->offset);
> + if (expires < cpu_base->expires_next)
> + return true;
> +
> + /* Extra check for softirq clock bases */
> + if (base->clockid < HRTIMER_BASE_MONOTONIC_SOFT)
> + continue;
> + if (cpu_base->softirq_activated)
> + continue;
> + if (expires < cpu_base->softirq_expires_next)
> + return true;
> + }

Fair enough..

> + return false;
> +}
> +
> /*
> * Clock was set. This might affect CLOCK_REALTIME, CLOCK_TAI and
> * CLOCK_BOOTTIME (for late sleep time injection).


2021-05-15 07:38:06

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V2 8/8] hrtimer: Avoid more SMP function calls in clock_was_set()

On Thu, May 13 2021 at 09:47, Peter Zijlstra wrote:
> On Fri, Apr 30, 2021 at 09:12:15AM +0200, Thomas Gleixner wrote:
>> + /*
>> + * If the remote CPU is currently handling an hrtimer interrupt, it
>> + * will reevaluate the first expiring timer of all clock bases
>> + * before reprogramming. Nothing to do here.
>> + */
>> + if (cpu_base->in_hrtirq)
>> + return false;
>
> This one gives me a head-ache though; if we get here, that means
> hrtimer_interrupt()'s hrtimer_update_base() happened before the change.
> It also means that CPU is in __run_hrtimer() running a fn(), since we
> own cpu_base->lock.
>
> That in turn means it is in __hrtimer_run_queues(), possible on the last
> base.
>
> Now, if I understand it right, the thing that saves us, is that
> hrtimer_update_next_event() -- right after returning from
> __hrtimer_run_queues() -- will re-evaluate all bases (with the
> hrtimer_update_base() we just did visible to it) and we'll eventually
> goto retry if time moved such that we now have timers that should've ran
> but were missed due to this concurrent shift in time.

Correct.

> However, since that retries thing is limited to 3; could we not trigger
> that by generating a stream of these updates, causing the timer to keep
> having to be reset? I suppose updating time is a root only thing, and
> root can shoot its own foot off any time it damn well likes, so who
> cares.

It's root only. Sou you could argue that a borked NTPd can cause this to
happen, but then you surely have other problems aside of hitting the
retries limit.

Thanks,

tglx