2020-07-07 01:33:44

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/9] timer: Reduce timers softirq v2

Hi,

No huge change here, just addressed reviews and fixed warnings:

* Reposted patch 1 separately with appropriate "Fixes:" tag and stable Cc'ed:
https://lore.kernel.org/lkml/[email protected]/

* Fix missing initialization of next_expiry in 4/9 (thanks Juri)

* Dropped "timer: Simplify LVL_START() and calc_index()" and added comments
to explain current layout instead in 2/9 (thanks Thomas)

* Rewrote changelog of 9/9 (Thanks Thomas)

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/softirq-v2

HEAD: 5545d80b7b9bd69ede1c17fda194ac6620e7063f

Thanks,
Frederic
---

Frederic Weisbecker (9):
timer: Move trigger_dyntick_cpu() to enqueue_timer()
timer: Add comments about calc_index() ceiling work
timer: Optimize _next_timer_interrupt() level iteration
timers: Always keep track of next expiry
timer: Reuse next expiry cache after nohz exit
timer: Expand clk forward logic beyond nohz
timer: Spare timer softirq until next expiry
timer: Remove must_forward_clk
timer: Lower base clock forwarding threshold


kernel/time/timer.c | 169 ++++++++++++++++++----------------------------------
1 file changed, 58 insertions(+), 111 deletions(-)


2020-07-07 01:33:45

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/9] timer: Move trigger_dyntick_cpu() to enqueue_timer()

Consolidate the code by calling trigger_dyntick_cpu() from
enqueue_timer() instead of calling it from all its callers.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Juri Lelli <[email protected]>
---
kernel/time/timer.c | 46 +++++++++++++++++++--------------------------
1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 9a838d38dbe6..4c977df3610b 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -529,29 +529,6 @@ static int calc_wheel_index(unsigned long expires, unsigned long clk)
return idx;
}

-/*
- * Enqueue the timer into the hash bucket, mark it pending in
- * the bitmap and store the index in the timer flags.
- */
-static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
- unsigned int idx)
-{
- hlist_add_head(&timer->entry, base->vectors + idx);
- __set_bit(idx, base->pending_map);
- timer_set_idx(timer, idx);
-
- trace_timer_start(timer, timer->expires, timer->flags);
-}
-
-static void
-__internal_add_timer(struct timer_base *base, struct timer_list *timer)
-{
- unsigned int idx;
-
- idx = calc_wheel_index(timer->expires, base->clk);
- enqueue_timer(base, timer, idx);
-}
-
static void
trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
{
@@ -596,13 +573,29 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
wake_up_nohz_cpu(base->cpu);
}

-static void
-internal_add_timer(struct timer_base *base, struct timer_list *timer)
+/*
+ * Enqueue the timer into the hash bucket, mark it pending in
+ * the bitmap and store the index in the timer flags.
+ */
+static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
+ unsigned int idx)
{
- __internal_add_timer(base, timer);
+ hlist_add_head(&timer->entry, base->vectors + idx);
+ __set_bit(idx, base->pending_map);
+ timer_set_idx(timer, idx);
+
+ trace_timer_start(timer, timer->expires, timer->flags);
trigger_dyntick_cpu(base, timer);
}

+static void internal_add_timer(struct timer_base *base, struct timer_list *timer)
+{
+ unsigned int idx;
+
+ idx = calc_wheel_index(timer->expires, base->clk);
+ enqueue_timer(base, timer, idx);
+}
+
#ifdef CONFIG_DEBUG_OBJECTS_TIMERS

static struct debug_obj_descr timer_debug_descr;
@@ -1059,7 +1052,6 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
*/
if (idx != UINT_MAX && clk == base->clk) {
enqueue_timer(base, timer, idx);
- trigger_dyntick_cpu(base, timer);
} else {
internal_add_timer(base, timer);
}
--
2.26.2

2020-07-07 01:34:00

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 9/9] timer: Lower base clock forwarding threshold

There is nothing that prevents us from forwarding the base clock if
it's one jiffy off. This reason for this arbitrary limit is historical
and doesn't seem to stand anymore.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Juri Lelli <[email protected]>
---
kernel/time/timer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 60126d5c79e1..814eaf42a8b5 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -891,7 +891,7 @@ static inline void forward_timer_base(struct timer_base *base)
* Also while executing timers, base->clk is 1 offset ahead
* of jiffies to avoid endless requeuing to current jffies.
*/
- if ((long)(jnow - base->clk) < 2)
+ if ((long)(jnow - base->clk) < 1)
return;

/*
--
2.26.2

2020-07-07 01:34:02

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 8/9] timer: Remove must_forward_clk

There is no reason to keep this guard around. The code makes sure that
base->clk remains sane and won't be forwarded beyond jiffies nor set
backward.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Juri Lelli <[email protected]>
---
kernel/time/timer.c | 22 ++++++----------------
1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index cbc5ac7f772d..60126d5c79e1 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -205,7 +205,6 @@ struct timer_base {
unsigned long next_expiry;
unsigned int cpu;
bool is_idle;
- bool must_forward_clk;
DECLARE_BITMAP(pending_map, WHEEL_SIZE);
struct hlist_head vectors[WHEEL_SIZE];
} ____cacheline_aligned;
@@ -885,12 +884,13 @@ get_target_base(struct timer_base *base, unsigned tflags)

static inline void forward_timer_base(struct timer_base *base)
{
- unsigned long jnow;
+ unsigned long jnow = READ_ONCE(jiffies);

- if (!base->must_forward_clk)
- return;
-
- jnow = READ_ONCE(jiffies);
+ /*
+ * No need to forward if we are close enough below jiffies.
+ * Also while executing timers, base->clk is 1 offset ahead
+ * of jiffies to avoid endless requeuing to current jffies.
+ */
if ((long)(jnow - base->clk) < 2)
return;

@@ -1721,16 +1721,8 @@ static inline void __run_timers(struct timer_base *base)
timer_base_lock_expiry(base);
raw_spin_lock_irq(&base->lock);

- /*
- * timer_base::must_forward_clk must be cleared before running
- * timers so that any timer functions that call mod_timer() will
- * not try to forward the base.
- */
- base->must_forward_clk = false;
-
while (time_after_eq(jiffies, base->clk) &&
time_after_eq(jiffies, base->next_expiry)) {
-
levels = collect_expired_timers(base, heads);
base->clk++;
base->next_expiry = __next_timer_interrupt(base);
@@ -1738,7 +1730,6 @@ static inline void __run_timers(struct timer_base *base)
while (levels--)
expire_timers(base, heads + levels);
}
- base->must_forward_clk = true;
raw_spin_unlock_irq(&base->lock);
timer_base_unlock_expiry(base);
}
@@ -1934,7 +1925,6 @@ int timers_prepare_cpu(unsigned int cpu)
base->clk = jiffies;
base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
base->is_idle = false;
- base->must_forward_clk = true;
}
return 0;
}
--
2.26.2

2020-07-07 01:34:59

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/9] timer: Optimize _next_timer_interrupt() level iteration

If a level has a timer that expires before we reach the next level,
there is no need to iterate further.

The next level is reached when the 3 lower bits of the current level are
cleared. If the next event happens before/during that, the next levels
won't give us an earlier expiration.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Juri Lelli <[email protected]>
---
kernel/time/timer.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ae1259f67486..acf7cb8c09f8 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1523,6 +1523,7 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
clk = base->clk;
for (lvl = 0; lvl < LVL_DEPTH; lvl++, offset += LVL_SIZE) {
int pos = next_pending_bucket(base, offset, clk & LVL_MASK);
+ unsigned long lvl_clk = clk & LVL_CLK_MASK;

if (pos >= 0) {
unsigned long tmp = clk + (unsigned long) pos;
@@ -1530,6 +1531,13 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
tmp <<= LVL_SHIFT(lvl);
if (time_before(tmp, next))
next = tmp;
+
+ /*
+ * If the next expiration happens before we reach
+ * the next level, no need to check further.
+ */
+ if (pos <= ((LVL_CLK_DIV - lvl_clk) & LVL_CLK_MASK))
+ break;
}
/*
* Clock for the next level. If the current level clock lower
@@ -1567,7 +1575,7 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
* So the simple check whether the lower bits of the current
* level are 0 or not is sufficient for all cases.
*/
- adj = clk & LVL_CLK_MASK ? 1 : 0;
+ adj = lvl_clk ? 1 : 0;
clk >>= LVL_CLK_SHIFT;
clk += adj;
}
--
2.26.2

2020-07-07 01:35:34

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 6/9] timer: Expand clk forward logic beyond nohz

As for next_expiry, we want to expand base->clk catch up logic beyond
nohz in order to avoid triggering useless softirqs.

If we want to fire softirqs only to expire pending timers, periodic
base->clk increments must be skippable for random amounts of time.
Therefore we must prepare to catch-up with missing updates whenever we
need an up-to-date base clock.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Juri Lelli <[email protected]>
---
kernel/time/timer.c | 26 ++++----------------------
1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 501d36ef0b75..ffa2c956d968 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -885,19 +885,12 @@ get_target_base(struct timer_base *base, unsigned tflags)

static inline void forward_timer_base(struct timer_base *base)
{
-#ifdef CONFIG_NO_HZ_COMMON
unsigned long jnow;

- /*
- * We only forward the base when we are idle or have just come out of
- * idle (must_forward_clk logic), and have a delta between base clock
- * and jiffies. In the common case, run_timers will take care of it.
- */
- if (likely(!base->must_forward_clk))
+ if (!base->must_forward_clk)
return;

jnow = READ_ONCE(jiffies);
- base->must_forward_clk = base->is_idle;
if ((long)(jnow - base->clk) < 2)
return;

@@ -912,7 +905,6 @@ static inline void forward_timer_base(struct timer_base *base)
return;
base->clk = base->next_expiry;
}
-#endif
}


@@ -1666,10 +1658,8 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
* logic is only maintained for the BASE_STD base, deferrable
* timers may still see large granularity skew (by design).
*/
- if ((expires - basem) > TICK_NSEC) {
- base->must_forward_clk = true;
+ if ((expires - basem) > TICK_NSEC)
base->is_idle = true;
- }
}
raw_spin_unlock(&base->lock);

@@ -1768,16 +1758,7 @@ static inline void __run_timers(struct timer_base *base)
/*
* timer_base::must_forward_clk must be cleared before running
* timers so that any timer functions that call mod_timer() will
- * not try to forward the base. Idle tracking / clock forwarding
- * logic is only used with BASE_STD timers.
- *
- * The must_forward_clk flag is cleared unconditionally also for
- * the deferrable base. The deferrable base is not affected by idle
- * tracking and never forwarded, so clearing the flag is a NOOP.
- *
- * The fact that the deferrable base is never forwarded can cause
- * large variations in granularity for deferrable timers, but they
- * can be deferred for long periods due to idle anyway.
+ * not try to forward the base.
*/
base->must_forward_clk = false;

@@ -1790,6 +1771,7 @@ static inline void __run_timers(struct timer_base *base)
while (levels--)
expire_timers(base, heads + levels);
}
+ base->must_forward_clk = true;
raw_spin_unlock_irq(&base->lock);
timer_base_unlock_expiry(base);
}
--
2.26.2

2020-07-07 01:35:40

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 5/9] timer: Reuse next expiry cache after nohz exit

Now that we track the next expiry unconditionally when a timer is added,
we can reuse that information on a tick firing after exiting nohz.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Juri Lelli <[email protected]>
---
kernel/time/timer.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 8a4138e47aa4..501d36ef0b75 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1705,13 +1705,11 @@ static int collect_expired_timers(struct timer_base *base,
* the next expiring timer.
*/
if ((long)(now - base->clk) > 2) {
- unsigned long next = __next_timer_interrupt(base);
-
/*
* If the next timer is ahead of time forward to current
* jiffies, otherwise forward to the next expiry time:
*/
- if (time_after(next, now)) {
+ if (time_after(base->next_expiry, now)) {
/*
* The call site will increment base->clk and then
* terminate the expiry loop immediately.
@@ -1719,7 +1717,7 @@ static int collect_expired_timers(struct timer_base *base,
base->clk = now;
return 0;
}
- base->clk = next;
+ base->clk = base->next_expiry;
}
return __collect_expired_timers(base, heads);
}
--
2.26.2

2020-07-07 01:35:43

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 4/9] timers: Always keep track of next expiry

So far next expiry was only tracked while the CPU was in nohz_idle mode
in order to cope with missing ticks that can't increment the base->clk
periodically anymore.

We are going to expand that logic beyond nohz in order to spare timers
softirqs so do it unconditionally.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Juri Lelli <[email protected]>
---
kernel/time/timer.c | 36 +++++++++++++++++++-----------------
1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index acf7cb8c09f8..8a4138e47aa4 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -558,8 +558,22 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
* timer is not deferrable. If the other CPU is on the way to idle
* then it can't set base->is_idle as we hold the base lock:
*/
- if (!base->is_idle)
- return;
+ if (base->is_idle)
+ wake_up_nohz_cpu(base->cpu);
+}
+
+/*
+ * Enqueue the timer into the hash bucket, mark it pending in
+ * the bitmap and store the index in the timer flags.
+ */
+static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
+ unsigned int idx)
+{
+ hlist_add_head(&timer->entry, base->vectors + idx);
+ __set_bit(idx, base->pending_map);
+ timer_set_idx(timer, idx);
+
+ trace_timer_start(timer, timer->expires, timer->flags);

/* Check whether this is the new first expiring timer: */
if (time_after_eq(timer->expires, base->next_expiry))
@@ -578,21 +592,7 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
} else {
base->next_expiry = timer->expires;
}
- wake_up_nohz_cpu(base->cpu);
-}

-/*
- * Enqueue the timer into the hash bucket, mark it pending in
- * the bitmap and store the index in the timer flags.
- */
-static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
- unsigned int idx)
-{
- hlist_add_head(&timer->entry, base->vectors + idx);
- __set_bit(idx, base->pending_map);
- timer_set_idx(timer, idx);
-
- trace_timer_start(timer, timer->expires, timer->flags);
trigger_dyntick_cpu(base, timer);
}

@@ -1490,7 +1490,6 @@ static int __collect_expired_timers(struct timer_base *base,
return levels;
}

-#ifdef CONFIG_NO_HZ_COMMON
/*
* Find the next pending bucket of a level. Search from level start (@offset)
* + @clk upwards and if nothing there, search from start of the level
@@ -1582,6 +1581,7 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
return next;
}

+#ifdef CONFIG_NO_HZ_COMMON
/*
* Check, if the next hrtimer event is before the next timer wheel
* event:
@@ -1787,6 +1787,7 @@ static inline void __run_timers(struct timer_base *base)

levels = collect_expired_timers(base, heads);
base->clk++;
+ base->next_expiry = __next_timer_interrupt(base);

while (levels--)
expire_timers(base, heads + levels);
@@ -2039,6 +2040,7 @@ static void __init init_timer_cpu(int cpu)
base->cpu = cpu;
raw_spin_lock_init(&base->lock);
base->clk = jiffies;
+ base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
timer_base_init_expiry_lock(base);
}
}
--
2.26.2

2020-07-07 01:37:04

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 7/9] timer: Spare timer softirq until next expiry

Now that the core timer infrastructure doesn't depend anymore on
periodic base->clk increments, even when the CPU is not in NO_HZ mode,
we can delay the timer softirqs until we have actual timers to expire.

Some spurious softirqs can still remain since base->next_expiry doesn't
keep track of canceled timers but we are still way ahead of the
unconditional periodic softirqs (~15 times less of them with 1000 Hz
and ~5 times less with 100 Hz).

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Juri Lelli <[email protected]>
---
kernel/time/timer.c | 49 ++++++++-------------------------------------
1 file changed, 8 insertions(+), 41 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ffa2c956d968..cbc5ac7f772d 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1457,10 +1457,10 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
}
}

-static int __collect_expired_timers(struct timer_base *base,
- struct hlist_head *heads)
+static int collect_expired_timers(struct timer_base *base,
+ struct hlist_head *heads)
{
- unsigned long clk = base->clk;
+ unsigned long clk = base->clk = base->next_expiry;
struct hlist_head *vec;
int i, levels = 0;
unsigned int idx;
@@ -1683,40 +1683,6 @@ void timer_clear_idle(void)
*/
base->is_idle = false;
}
-
-static int collect_expired_timers(struct timer_base *base,
- struct hlist_head *heads)
-{
- unsigned long now = READ_ONCE(jiffies);
-
- /*
- * NOHZ optimization. After a long idle sleep we need to forward the
- * base to current jiffies. Avoid a loop by searching the bitfield for
- * the next expiring timer.
- */
- if ((long)(now - base->clk) > 2) {
- /*
- * If the next timer is ahead of time forward to current
- * jiffies, otherwise forward to the next expiry time:
- */
- if (time_after(base->next_expiry, now)) {
- /*
- * The call site will increment base->clk and then
- * terminate the expiry loop immediately.
- */
- base->clk = now;
- return 0;
- }
- base->clk = base->next_expiry;
- }
- return __collect_expired_timers(base, heads);
-}
-#else
-static inline int collect_expired_timers(struct timer_base *base,
- struct hlist_head *heads)
-{
- return __collect_expired_timers(base, heads);
-}
#endif

/*
@@ -1749,7 +1715,7 @@ static inline void __run_timers(struct timer_base *base)
struct hlist_head heads[LVL_DEPTH];
int levels;

- if (!time_after_eq(jiffies, base->clk))
+ if (time_before(jiffies, base->next_expiry))
return;

timer_base_lock_expiry(base);
@@ -1762,7 +1728,8 @@ static inline void __run_timers(struct timer_base *base)
*/
base->must_forward_clk = false;

- while (time_after_eq(jiffies, base->clk)) {
+ while (time_after_eq(jiffies, base->clk) &&
+ time_after_eq(jiffies, base->next_expiry)) {

levels = collect_expired_timers(base, heads);
base->clk++;
@@ -1797,12 +1764,12 @@ void run_local_timers(void)

hrtimer_run_queues();
/* Raise the softirq only if required. */
- if (time_before(jiffies, base->clk)) {
+ if (time_before(jiffies, base->next_expiry)) {
if (!IS_ENABLED(CONFIG_NO_HZ_COMMON))
return;
/* CPU is awake, so check the deferrable base. */
base++;
- if (time_before(jiffies, base->clk))
+ if (time_before(jiffies, base->next_expiry))
return;
}
raise_softirq(TIMER_SOFTIRQ);
--
2.26.2

2020-07-07 01:37:38

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/9] timer: Add comments about calc_index() ceiling work

calc_index() adds 1 unit of the level granularity to the expiry passed
in parameter to ensure that the timer doesn't expire too early. Add a
comment to explain that and the resulting layout in the wheel.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Juri Lelli <[email protected]>
---
kernel/time/timer.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 4c977df3610b..ae1259f67486 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -156,7 +156,8 @@ EXPORT_SYMBOL(jiffies_64);

/*
* The time start value for each level to select the bucket at enqueue
- * time.
+ * time. We start from the last possible delta of the previous level
+ * so that we can later add an extra LVL_GRAN(n) to n (see calc_index()).
*/
#define LVL_START(n) ((LVL_SIZE - 1) << (((n) - 1) * LVL_CLK_SHIFT))

@@ -489,6 +490,13 @@ static inline void timer_set_idx(struct timer_list *timer, unsigned int idx)
*/
static inline unsigned calc_index(unsigned expires, unsigned lvl)
{
+ /*
+ * Time may have past since the clock last reached an index of
+ * this @lvl. And that time, below LVL_GRAN(@lvl), is going to
+ * be substracted from the delta until we reach @expires. To
+ * fix that we must add one level granularity unit to make sure
+ * we rather expire late than early. Prefer ceil over floor.
+ */
expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl);
return LVL_OFFS(lvl) + (expires & LVL_MASK);
}
--
2.26.2

2020-07-09 07:03:19

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH 0/9] timer: Reduce timers softirq v2

Hi,

On 07/07/20 03:32, Frederic Weisbecker wrote:
> Hi,
>
> No huge change here, just addressed reviews and fixed warnings:
>
> * Reposted patch 1 separately with appropriate "Fixes:" tag and stable Cc'ed:
> https://lore.kernel.org/lkml/[email protected]/
>
> * Fix missing initialization of next_expiry in 4/9 (thanks Juri)
>
> * Dropped "timer: Simplify LVL_START() and calc_index()" and added comments
> to explain current layout instead in 2/9 (thanks Thomas)
>
> * Rewrote changelog of 9/9 (Thanks Thomas)
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> timers/softirq-v2
>
> HEAD: 5545d80b7b9bd69ede1c17fda194ac6620e7063f
>
> Thanks,
> Frederic
> ---

Testing of this set looks good (even with RT). Feel free to add

Tested-by: Juri Lelli <[email protected]>

to all the patches and to the patch you posted separately (old 01).

Thanks!

Juri

2020-07-09 12:18:52

by Anna-Maria Behnsen

[permalink] [raw]
Subject: Re: [PATCH 1/9] timer: Move trigger_dyntick_cpu() to enqueue_timer()

Hi Frederic,

On Tue, 7 Jul 2020, Frederic Weisbecker wrote:

> Consolidate the code by calling trigger_dyntick_cpu() from
> enqueue_timer() instead of calling it from all its callers.

Looks good, but maybe you could also update the comments in the code (see
remarks below)?

> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Anna-Maria Gleixner <[email protected]>
> Cc: Juri Lelli <[email protected]>
> ---
> kernel/time/timer.c | 46 +++++++++++++++++++--------------------------
> 1 file changed, 19 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 9a838d38dbe6..4c977df3610b 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -596,13 +573,29 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
> wake_up_nohz_cpu(base->cpu);
> }
>
> -static void
> -internal_add_timer(struct timer_base *base, struct timer_list *timer)
> +/*
> + * Enqueue the timer into the hash bucket, mark it pending in
> + * the bitmap and store the index in the timer flags.
> + */

enqueue_timer() also calls trigger_dyntick_cpu().

> +static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
> + unsigned int idx)
> {
> - __internal_add_timer(base, timer);
> + hlist_add_head(&timer->entry, base->vectors + idx);
> + __set_bit(idx, base->pending_map);
> + timer_set_idx(timer, idx);
> +
> + trace_timer_start(timer, timer->expires, timer->flags);
> trigger_dyntick_cpu(base, timer);
> }
>
> +static void internal_add_timer(struct timer_base *base, struct timer_list *timer)
> +{
> + unsigned int idx;
> +
> + idx = calc_wheel_index(timer->expires, base->clk);
> + enqueue_timer(base, timer, idx);
> +}
> +
> #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
>
> static struct debug_obj_descr timer_debug_descr;
> @@ -1059,7 +1052,6 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
> */

The comment above still mentions that trigger_dyntick_cpu() is called which
is now part of enqueue_timer(). Might be a little confusing.

> if (idx != UINT_MAX && clk == base->clk) {
> enqueue_timer(base, timer, idx);
> - trigger_dyntick_cpu(base, timer);
> } else {
> internal_add_timer(base, timer);
> }


Thanks,

Anna-Maria

2020-07-14 08:52:36

by Anna-Maria Behnsen

[permalink] [raw]
Subject: Re: [PATCH 4/9] timers: Always keep track of next expiry

Hi Frederic,

On Tue, 7 Jul 2020, Frederic Weisbecker wrote:

> So far next expiry was only tracked while the CPU was in nohz_idle mode
> in order to cope with missing ticks that can't increment the base->clk
> periodically anymore.
>
> We are going to expand that logic beyond nohz in order to spare timers
> softirqs so do it unconditionally.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Anna-Maria Gleixner <[email protected]>
> Cc: Juri Lelli <[email protected]>
> ---
> kernel/time/timer.c | 36 +++++++++++++++++++-----------------
> 1 file changed, 19 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index acf7cb8c09f8..8a4138e47aa4 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -558,8 +558,22 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
> * timer is not deferrable. If the other CPU is on the way to idle
> * then it can't set base->is_idle as we hold the base lock:
> */
> - if (!base->is_idle)
> - return;
> + if (base->is_idle)
> + wake_up_nohz_cpu(base->cpu);
> +}
> +
> +/*
> + * Enqueue the timer into the hash bucket, mark it pending in
> + * the bitmap and store the index in the timer flags.
> + */
> +static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
> + unsigned int idx)
> +{
> + hlist_add_head(&timer->entry, base->vectors + idx);
> + __set_bit(idx, base->pending_map);
> + timer_set_idx(timer, idx);
> +
> + trace_timer_start(timer, timer->expires, timer->flags);
>
> /* Check whether this is the new first expiring timer: */
> if (time_after_eq(timer->expires, base->next_expiry))
> @@ -578,21 +592,7 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
> } else {
> base->next_expiry = timer->expires;
> }
> - wake_up_nohz_cpu(base->cpu);
> -}
>
> -/*
> - * Enqueue the timer into the hash bucket, mark it pending in
> - * the bitmap and store the index in the timer flags.
> - */
> -static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
> - unsigned int idx)
> -{
> - hlist_add_head(&timer->entry, base->vectors + idx);
> - __set_bit(idx, base->pending_map);
> - timer_set_idx(timer, idx);
> -
> - trace_timer_start(timer, timer->expires, timer->flags);
> trigger_dyntick_cpu(base, timer);
> }
>

Could you please split those two hunks which do only a restructuring into a
separate patch?

Thanks,

Anna-Maria

2020-07-14 09:16:09

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2/9] timer: Add comments about calc_index() ceiling work

Frederic Weisbecker <[email protected]> writes:
> static inline unsigned calc_index(unsigned expires, unsigned lvl)
> {
> + /*
> + * Time may have past since the clock last reached an index of
> + * this @lvl. And that time, below LVL_GRAN(@lvl), is going to
> + * be substracted from the delta until we reach @expires. To
> + * fix that we must add one level granularity unit to make sure
> + * we rather expire late than early. Prefer ceil over floor.

This comment confuses the hell out of me.

/*
* The timer wheel has to guarantee that a timer does not fire
* early. Early expiry can happen due to:
* - Timer is armed at the edge of a tick
* - Truncation of the expiry time in the outer wheel levels
*
* Round up with level granularity to prevent this.
*/

Hmm?

Thanks,

tglx

2020-07-15 13:21:18

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/9] timer: Move trigger_dyntick_cpu() to enqueue_timer()

On Thu, Jul 09, 2020 at 02:17:35PM +0200, Anna-Maria Behnsen wrote:
> Hi Frederic,
>
> On Tue, 7 Jul 2020, Frederic Weisbecker wrote:
>
> > Consolidate the code by calling trigger_dyntick_cpu() from
> > enqueue_timer() instead of calling it from all its callers.
>
> Looks good, but maybe you could also update the comments in the code (see
> remarks below)?

Ok, fixing that, thanks!

2020-07-17 12:52:43

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 4/9] timers: Always keep track of next expiry

On Tue, Jul 14, 2020 at 10:49:28AM +0200, Anna-Maria Behnsen wrote:
> Hi Frederic,
>
> On Tue, 7 Jul 2020, Frederic Weisbecker wrote:
>
> > So far next expiry was only tracked while the CPU was in nohz_idle mode
> > in order to cope with missing ticks that can't increment the base->clk
> > periodically anymore.
> >
> > We are going to expand that logic beyond nohz in order to spare timers
> > softirqs so do it unconditionally.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Anna-Maria Gleixner <[email protected]>
> > Cc: Juri Lelli <[email protected]>
> > ---
> > kernel/time/timer.c | 36 +++++++++++++++++++-----------------
> > 1 file changed, 19 insertions(+), 17 deletions(-)
> >
> > diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> > index acf7cb8c09f8..8a4138e47aa4 100644
> > --- a/kernel/time/timer.c
> > +++ b/kernel/time/timer.c
> > @@ -558,8 +558,22 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
> > * timer is not deferrable. If the other CPU is on the way to idle
> > * then it can't set base->is_idle as we hold the base lock:
> > */
> > - if (!base->is_idle)
> > - return;
> > + if (base->is_idle)
> > + wake_up_nohz_cpu(base->cpu);
> > +}
> > +
> > +/*
> > + * Enqueue the timer into the hash bucket, mark it pending in
> > + * the bitmap and store the index in the timer flags.
> > + */
> > +static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
> > + unsigned int idx)
> > +{
> > + hlist_add_head(&timer->entry, base->vectors + idx);
> > + __set_bit(idx, base->pending_map);
> > + timer_set_idx(timer, idx);
> > +
> > + trace_timer_start(timer, timer->expires, timer->flags);
> >
> > /* Check whether this is the new first expiring timer: */
> > if (time_after_eq(timer->expires, base->next_expiry))
> > @@ -578,21 +592,7 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
> > } else {
> > base->next_expiry = timer->expires;
> > }
> > - wake_up_nohz_cpu(base->cpu);
> > -}
> >
> > -/*
> > - * Enqueue the timer into the hash bucket, mark it pending in
> > - * the bitmap and store the index in the timer flags.
> > - */
> > -static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
> > - unsigned int idx)
> > -{
> > - hlist_add_head(&timer->entry, base->vectors + idx);
> > - __set_bit(idx, base->pending_map);
> > - timer_set_idx(timer, idx);
> > -
> > - trace_timer_start(timer, timer->expires, timer->flags);
> > trigger_dyntick_cpu(base, timer);
> > }
> >
>
> Could you please split those two hunks which do only a restructuring into a
> separate patch?

The problem is that those hunks are not only a restructuring but they also
change the way we update next_expiry, since we do it outside idle context.
And that update won't make sense without the proper initialization of next_expiry
that comes later in the patch.

Thanks.

2020-07-17 12:58:57

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/9] timer: Add comments about calc_index() ceiling work

On Tue, Jul 14, 2020 at 11:13:26AM +0200, Thomas Gleixner wrote:
> Frederic Weisbecker <[email protected]> writes:
> > static inline unsigned calc_index(unsigned expires, unsigned lvl)
> > {
> > + /*
> > + * Time may have past since the clock last reached an index of
> > + * this @lvl. And that time, below LVL_GRAN(@lvl), is going to
> > + * be substracted from the delta until we reach @expires. To
> > + * fix that we must add one level granularity unit to make sure
> > + * we rather expire late than early. Prefer ceil over floor.
>
> This comment confuses the hell out of me.

Me too...

>
> /*
> * The timer wheel has to guarantee that a timer does not fire
> * early. Early expiry can happen due to:
> * - Timer is armed at the edge of a tick
> * - Truncation of the expiry time in the outer wheel levels
> *
> * Round up with level granularity to prevent this.
> */
>
> Hmm?

That's relieving, I'm updating the patch.

Thanks!