2021-06-04 11:33:19

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/6] posix-cpu-timers: Bunch of fixes

The first is a race due to locklessly starting process wide cputime
counting.

The others stop the process wide cputime counting and tick dependency
when they are not necessary anymore.

Note I don't really like patch 5/6 and in fact I hope we could manage to
remove this early inline cpu_timer_fire() call from timer_settime(). All
we get from it is headaches. Besides, the handler isn't invoked from the
actual target and that doesn't sound ideal.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/urgent

HEAD: f8f908c9d2b2f1e100cd549206c95a4e65e5f023

Thanks,
Frederic
---

Frederic Weisbecker (6):
posix-cpu-timers: Fix rearm racing against process tick
posix-cpu-timers: Don't start process wide cputime counter if timer is disabled
posix-cpu-timers: Force next_expiration recalc after timer deletion
posix-cpu-timers: Force next_expiration recalc after timer reset
posix-cpu-timers: Force next expiration recalc after early timer firing
posix-cpu-timers: Force next expiration recalc after itimer reset


include/linux/posix-timers.h | 11 ++++-
kernel/time/posix-cpu-timers.c | 97 +++++++++++++++++++++++++++++++++---------
2 files changed, 87 insertions(+), 21 deletions(-)


2021-06-04 11:33:23

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/6] posix-cpu-timers: Fix rearm racing against process tick

Since the process wide cputime counter is started locklessly from
posix_cpu_timer_rearm(), it can be concurrently stopped by operations
on other timers from the same thread group, such as in the following
unlucky scenario:

CPU 0 CPU 1
----- -----
timer_settime(TIMER B)
posix_cpu_timer_rearm(TIMER A)
cpu_clock_sample_group()
(pct->timers_active already true)

handle_posix_cpu_timers()
check_process_timers()
stop_process_timers()
pct->timers_active = false
arm_timer(TIMER A)

tick -> run_posix_cpu_timers()
// sees !pct->timers_active, ignore
// our TIMER A

Fix this with simply locking process wide cputime counting start and
timer arm in the same block.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Eric W. Biederman <[email protected]>
---
kernel/time/posix-cpu-timers.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 3bb96a8b49c9..aa52fc85dbcb 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -991,6 +991,11 @@ static void posix_cpu_timer_rearm(struct k_itimer *timer)
if (!p)
goto out;

+ /* Protect timer list r/w in arm_timer() */
+ sighand = lock_task_sighand(p, &flags);
+ if (unlikely(sighand == NULL))
+ goto out;
+
/*
* Fetch the current sample and update the timer's expiry time.
*/
@@ -1001,11 +1006,6 @@ static void posix_cpu_timer_rearm(struct k_itimer *timer)

bump_cpu_timer(timer, now);

- /* Protect timer list r/w in arm_timer() */
- sighand = lock_task_sighand(p, &flags);
- if (unlikely(sighand == NULL))
- goto out;
-
/*
* Now re-arm for the new expiry time.
*/
--
2.25.1

2021-06-04 11:33:52

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/6] posix-cpu-timers: Don't start process wide cputime counter if timer is disabled

If timer_settime() is called with a 0 expiration on a timer that is
already disabled, the process wide cputime counter will be started
and won't ever get a chance to be stopped by stop_process_timer() since
no timer is actually armed to be processed.

This process wide counter might bring some performance hit due to the
concurrent atomic additions at the thread group scope.

The following snippet is enough to trigger the issue.

void trigger_process_counter(void)
{
timer_t id;
struct itimerspec val = { };

timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
timer_settime(id, TIMER_ABSTIME, &val, NULL);
timer_delete(id);
}

So make sure we don't needlessly start it.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Eric W. Biederman <[email protected]>
---
kernel/time/posix-cpu-timers.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index aa52fc85dbcb..132fd56fb1cd 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -632,10 +632,15 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
* times (in arm_timer). With an absolute time, we must
* check if it's already passed. In short, we need a sample.
*/
- if (CPUCLOCK_PERTHREAD(timer->it_clock))
+ if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
val = cpu_clock_sample(clkid, p);
- else
- val = cpu_clock_sample_group(clkid, p, true);
+ } else {
+ /*
+ * Sample group but only start the process wide cputime counter
+ * if the timer is to be enabled.
+ */
+ val = cpu_clock_sample_group(clkid, p, !!new_expires);
+ }

if (old) {
if (old_expires == 0) {
--
2.25.1

2021-06-04 11:34:10

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/6] posix-cpu-timers: Force next_expiration recalc after timer deletion

A timer deletion only dequeues the timer but it doesn't shutdown
the related costly process wide cputimer counter and the tick dependency.

The following code snippet keeps this overhead around for one week after
the timer deletion:

void trigger_process_counter(void)
{
timer_t id;
struct itimerspec val = { };

val.it_value.tv_sec = 604800;
timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
timer_settime(id, 0, &val, NULL);
timer_delete(id);
}

Make sure the next target's tick recalculates the nearest expiration and
clears the process wide counter and tick dependency if necessary.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Eric W. Biederman <[email protected]>
---
include/linux/posix-timers.h | 4 +++-
kernel/time/posix-cpu-timers.c | 29 ++++++++++++++++++++++++++++-
2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 896c16d2c5fb..4cf1fbe8d1bc 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -82,12 +82,14 @@ static inline bool cpu_timer_enqueue(struct timerqueue_head *head,
return timerqueue_add(head, &ctmr->node);
}

-static inline void cpu_timer_dequeue(struct cpu_timer *ctmr)
+static inline bool cpu_timer_dequeue(struct cpu_timer *ctmr)
{
if (ctmr->head) {
timerqueue_del(ctmr->head, &ctmr->node);
ctmr->head = NULL;
+ return true;
}
+ return false;
}

static inline u64 cpu_timer_getexpires(struct cpu_timer *ctmr)
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 132fd56fb1cd..bb1f862c785e 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -405,6 +405,33 @@ static int posix_cpu_timer_create(struct k_itimer *new_timer)
return 0;
}

+/*
+ * Dequeue the timer and reset the base if it was its earliest expiration.
+ * It makes sure the next tick recalculates the base next expiration so we
+ * don't keep the costly process wide cputime counter around for a random
+ * amount of time, along with the tick dependency.
+ */
+static void disarm_timer(struct k_itimer *timer, struct task_struct *p)
+{
+ struct cpu_timer *ctmr = &timer->it.cpu;
+ struct posix_cputimer_base *base;
+ int clkidx;
+
+ if (!cpu_timer_dequeue(ctmr))
+ return;
+
+ clkidx = CPUCLOCK_WHICH(timer->it_clock);
+
+ if (CPUCLOCK_PERTHREAD(timer->it_clock))
+ base = p->posix_cputimers.bases + clkidx;
+ else
+ base = p->signal->posix_cputimers.bases + clkidx;
+
+ if (cpu_timer_getexpires(ctmr) == base->nextevt)
+ base->nextevt = 0;
+}
+
+
/*
* Clean up a CPU-clock timer that is about to be destroyed.
* This is called from timer deletion with the timer already locked.
@@ -439,7 +466,7 @@ static int posix_cpu_timer_del(struct k_itimer *timer)
if (timer->it.cpu.firing)
ret = TIMER_RETRY;
else
- cpu_timer_dequeue(ctmr);
+ disarm_timer(timer, p);

unlock_task_sighand(p, &flags);
}
--
2.25.1

2021-06-04 11:34:26

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 4/6] posix-cpu-timers: Force next_expiration recalc after timer reset

A timer reset only dequeues the timer but it doesn't shutdown
the related costly process wide cputimer counter and the tick dependency.

The following code snippet keeps this overhead around for one week after
the timer reset:

void trigger_process_counter(void)
{
timer_t id;
struct itimerspec val = { };

val.it_value.tv_sec = 1;
timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
timer_settime(id, 0, &val, NULL);
val.it_value.tv_sec = 0;
timer_settime(id, 0, &val, NULL);
}

Make sure the next target's tick recalculates the nearest expiration and
clears the process wide counter and tick dependency if necessary.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Eric W. Biederman <[email protected]>
---
kernel/time/posix-cpu-timers.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index bb1f862c785e..0b5715c8db04 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -414,10 +414,14 @@ static int posix_cpu_timer_create(struct k_itimer *new_timer)
static void disarm_timer(struct k_itimer *timer, struct task_struct *p)
{
struct cpu_timer *ctmr = &timer->it.cpu;
+ u64 old_expires = cpu_timer_getexpires(ctmr);
struct posix_cputimer_base *base;
+ bool queued;
int clkidx;

- if (!cpu_timer_dequeue(ctmr))
+ queued = cpu_timer_dequeue(ctmr);
+ cpu_timer_setexpires(ctmr, 0);
+ if (!queued)
return;

clkidx = CPUCLOCK_WHICH(timer->it_clock);
@@ -427,7 +431,7 @@ static void disarm_timer(struct k_itimer *timer, struct task_struct *p)
else
base = p->signal->posix_cputimers.bases + clkidx;

- if (cpu_timer_getexpires(ctmr) == base->nextevt)
+ if (old_expires == base->nextevt)
base->nextevt = 0;
}

@@ -647,8 +651,6 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
if (unlikely(timer->it.cpu.firing)) {
timer->it.cpu.firing = -1;
ret = TIMER_RETRY;
- } else {
- cpu_timer_dequeue(ctmr);
}

/*
@@ -713,9 +715,13 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
* For a timer with no notification action, we don't actually
* arm the timer (we'll just fake it for timer_gettime).
*/
- cpu_timer_setexpires(ctmr, new_expires);
- if (new_expires != 0 && val < new_expires) {
- arm_timer(timer, p);
+ if (new_expires != 0) {
+ cpu_timer_dequeue(ctmr);
+ cpu_timer_setexpires(ctmr, new_expires);
+ if (val < new_expires)
+ arm_timer(timer, p);
+ } else {
+ disarm_timer(timer, p);
}

unlock_task_sighand(p, &flags);
--
2.25.1

2021-06-04 11:34:33

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 5/6] posix-cpu-timers: Force next expiration recalc after early timer firing

If we fire a per-process oneshot timer early and inline from the actual
call to timer_settime(), two issues can happen:

1) If the timer was initially deactivated, this call to timer_settime()
may have started the process wide cputime counter even though the
timer hasn't been queued and armed. As a result the process wide
cputime counter may never stop until a new timer is ever armed in
the future.

The following code snippet can reproduce this:

void trigger_process_counter(void)
{
timer_t id;
struct itimerspec val = { };

signal(SIGALRM, SIG_IGN);
timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
val.it_value.tv_nsec = 1;
timer_settime(id, TIMER_ABSTIME, &val, NULL);
}

2) If the timer was initially armed with a former expiration value
before this call to timer_settime(), it must have been dequeued
before firing inline with its new expiration value. Yet it hasn't
been disarmed in this case. So the process wide cputime counter and
the tick dependency may still be around for a while even after the
timer fired.

The following code snippet can reproduce this:

void trigger_process_counter(void)
{
timer_t id;
struct itimerspec val = { };

signal(SIGALRM, SIG_IGN);
timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
val.it_value.tv_sec = 100;
timer_settime(id, TIMER_ABSTIME, &val, NULL);
val.it_value.tv_sec = 0;
val.it_value.tv_nsec = 1;
timer_settime(id, TIMER_ABSTIME, &val, NULL);
}

To solve the first case, the base next event value needs to be explicilty
reset so that the target's next tick deactivates the process cputime
counter if necessary.

To solve the second case, the timer with its former value is explicitly
disarmed and not just dequeued so that the target's next tick deactivates
the process cputime counter and the tick dependency if necessary.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Eric W. Biederman <[email protected]>
---
include/linux/posix-timers.h | 7 ++++-
kernel/time/posix-cpu-timers.c | 51 ++++++++++++++++++++++++----------
2 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 4cf1fbe8d1bc..00fef0064355 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -82,9 +82,14 @@ static inline bool cpu_timer_enqueue(struct timerqueue_head *head,
return timerqueue_add(head, &ctmr->node);
}

+static inline bool cpu_timer_queued(struct cpu_timer *ctmr)
+{
+ return !!ctmr->head;
+}
+
static inline bool cpu_timer_dequeue(struct cpu_timer *ctmr)
{
- if (ctmr->head) {
+ if (cpu_timer_queued(ctmr)) {
timerqueue_del(ctmr->head, &ctmr->node);
ctmr->head = NULL;
return true;
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0b5715c8db04..d8325a906314 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -405,6 +405,21 @@ static int posix_cpu_timer_create(struct k_itimer *new_timer)
return 0;
}

+static void __disarm_timer(struct k_itimer *timer, struct task_struct *p,
+ u64 old_expires)
+{
+ int clkidx = CPUCLOCK_WHICH(timer->it_clock);
+ struct posix_cputimer_base *base;
+
+ if (CPUCLOCK_PERTHREAD(timer->it_clock))
+ base = p->posix_cputimers.bases + clkidx;
+ else
+ base = p->signal->posix_cputimers.bases + clkidx;
+
+ if (old_expires == base->nextevt)
+ base->nextevt = 0;
+}
+
/*
* Dequeue the timer and reset the base if it was its earliest expiration.
* It makes sure the next tick recalculates the base next expiration so we
@@ -415,24 +430,14 @@ static void disarm_timer(struct k_itimer *timer, struct task_struct *p)
{
struct cpu_timer *ctmr = &timer->it.cpu;
u64 old_expires = cpu_timer_getexpires(ctmr);
- struct posix_cputimer_base *base;
bool queued;
- int clkidx;

queued = cpu_timer_dequeue(ctmr);
cpu_timer_setexpires(ctmr, 0);
if (!queued)
return;

- clkidx = CPUCLOCK_WHICH(timer->it_clock);
-
- if (CPUCLOCK_PERTHREAD(timer->it_clock))
- base = p->posix_cputimers.bases + clkidx;
- else
- base = p->signal->posix_cputimers.bases + clkidx;
-
- if (old_expires == base->nextevt)
- base->nextevt = 0;
+ __disarm_timer(timer, p, old_expires);
}


@@ -686,8 +691,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
u64 exp = bump_cpu_timer(timer, val);

if (val < exp) {
- old_expires = exp - val;
- old->it_value = ns_to_timespec64(old_expires);
+ old->it_value = ns_to_timespec64(exp - val);
} else {
old->it_value.tv_nsec = 1;
old->it_value.tv_sec = 0;
@@ -748,9 +752,28 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
* accumulate more time on this clock.
*/
cpu_timer_fire(timer);
+
+ sighand = lock_task_sighand(p, &flags);
+ if (sighand == NULL)
+ goto out;
+ if (!cpu_timer_queued(&timer->it.cpu)) {
+ /*
+ * Disarm the previous timer to deactivate the tick
+ * dependency and process wide cputime counter if
+ * necessary.
+ */
+ __disarm_timer(timer, p, old_expires);
+ /*
+ * If the previous timer was deactivated, we might have
+ * just started the process wide cputime counter. Make
+ * sure we poke the tick to deactivate it then.
+ */
+ if (!old_expires && !CPUCLOCK_PERTHREAD(timer->it_clock))
+ p->signal->posix_cputimers.bases[clkid].nextevt = 0;
+ }
+ unlock_task_sighand(p, &flags);
}

- ret = 0;
out:
rcu_read_unlock();
if (old)
--
2.25.1

2021-06-04 11:35:17

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 6/6] posix-cpu-timers: Force next expiration recalc after itimer reset

When an itimer deactivates a previously armed expiration, it simply
doesn't do anything. As a result the process wide cputime counter keeps
running and the tick dependency stays set until we reach the old
ghost expiration value.

This can be reproduced with the following snippet:

void trigger_process_counter(void)
{
struct itimerval n = {};

n.it_value.tv_sec = 100;
setitimer(ITIMER_VIRTUAL, &n, NULL);
n.it_value.tv_sec = 0;
setitimer(ITIMER_VIRTUAL, &n, NULL);
}

Fix this with resetting the relevant base expiration. This is similar
to disarming a timer.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Eric W. Biederman <[email protected]>
---
kernel/time/posix-cpu-timers.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index d8325a906314..8a1ec78a7ebb 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1407,8 +1407,6 @@ void set_process_cpu_timer(struct task_struct *tsk, unsigned int clkid,
}
}

- if (!*newval)
- return;
*newval += now;
}

--
2.25.1

2021-06-09 17:22:15

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/6] posix-cpu-timers: Fix rearm racing against process tick

On Fri, Jun 04, 2021 at 01:31:54PM +0200, Frederic Weisbecker wrote:
> Since the process wide cputime counter is started locklessly from
> posix_cpu_timer_rearm(), it can be concurrently stopped by operations
> on other timers from the same thread group, such as in the following
> unlucky scenario:
>
> CPU 0 CPU 1
> ----- -----
> timer_settime(TIMER B)
> posix_cpu_timer_rearm(TIMER A)
> cpu_clock_sample_group()
> (pct->timers_active already true)
>
> handle_posix_cpu_timers()
> check_process_timers()
> stop_process_timers()
> pct->timers_active = false
> arm_timer(TIMER A)
>
> tick -> run_posix_cpu_timers()
> // sees !pct->timers_active, ignore
> // our TIMER A
>
> Fix this with simply locking process wide cputime counting start and
> timer arm in the same block.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Eric W. Biederman <[email protected]>

Fixes: 60f2ceaa8111 ("posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group")
Cc: [email protected]

2021-06-09 18:43:46

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/6] posix-cpu-timers: Don't start process wide cputime counter if timer is disabled

On Fri, Jun 04, 2021 at 01:31:55PM +0200, Frederic Weisbecker wrote:
> If timer_settime() is called with a 0 expiration on a timer that is
> already disabled, the process wide cputime counter will be started
> and won't ever get a chance to be stopped by stop_process_timer() since
> no timer is actually armed to be processed.
>
> This process wide counter might bring some performance hit due to the
> concurrent atomic additions at the thread group scope.
>
> The following snippet is enough to trigger the issue.
>
> void trigger_process_counter(void)
> {
> timer_t id;
> struct itimerspec val = { };
>
> timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
> timer_settime(id, TIMER_ABSTIME, &val, NULL);
> timer_delete(id);
> }
>
> So make sure we don't needlessly start it.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Eric W. Biederman <[email protected]>

No Fixes tag for this one. It has been there since year 1 AG.

I suspect it's the same for most other commits in the series, checking...

2021-06-10 10:25:45

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/6] posix-cpu-timers: Don't start process wide cputime counter if timer is disabled

On Wed, Jun 09, 2021 at 02:18:34PM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 04, 2021 at 01:31:55PM +0200, Frederic Weisbecker wrote:
> > If timer_settime() is called with a 0 expiration on a timer that is
> > already disabled, the process wide cputime counter will be started
> > and won't ever get a chance to be stopped by stop_process_timer() since
> > no timer is actually armed to be processed.
> >
> > This process wide counter might bring some performance hit due to the
> > concurrent atomic additions at the thread group scope.
> >
> > The following snippet is enough to trigger the issue.
> >
> > void trigger_process_counter(void)
> > {
> > timer_t id;
> > struct itimerspec val = { };
> >
> > timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
> > timer_settime(id, TIMER_ABSTIME, &val, NULL);
> > timer_delete(id);
> > }
> >
> > So make sure we don't needlessly start it.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Oleg Nesterov <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Peter Zijlstra (Intel) <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Eric W. Biederman <[email protected]>
>
> No Fixes tag for this one. It has been there since year 1 AG.
>
> I suspect it's the same for most other commits in the series, checking...

Right, so only the first commit needs one.

Thanks.

2021-06-11 11:53:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/6] posix-cpu-timers: Fix rearm racing against process tick

On Wed, Jun 09, 2021 at 01:54:00PM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 04, 2021 at 01:31:54PM +0200, Frederic Weisbecker wrote:
> > Since the process wide cputime counter is started locklessly from
> > posix_cpu_timer_rearm(), it can be concurrently stopped by operations
> > on other timers from the same thread group, such as in the following
> > unlucky scenario:
> >
> > CPU 0 CPU 1
> > ----- -----
> > timer_settime(TIMER B)
> > posix_cpu_timer_rearm(TIMER A)
> > cpu_clock_sample_group()
> > (pct->timers_active already true)
> >
> > handle_posix_cpu_timers()
> > check_process_timers()
> > stop_process_timers()
> > pct->timers_active = false
> > arm_timer(TIMER A)
> >
> > tick -> run_posix_cpu_timers()
> > // sees !pct->timers_active, ignore
> > // our TIMER A
> >
> > Fix this with simply locking process wide cputime counting start and
> > timer arm in the same block.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Oleg Nesterov <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Peter Zijlstra (Intel) <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Eric W. Biederman <[email protected]>
>
> Fixes: 60f2ceaa8111 ("posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group")
> Cc: [email protected]

Acked-by: Peter Zijlstra (Intel) <[email protected]>


Problem seems to be calling cpu_clock_sample_group(.start = true)
without sighand locked. Do we want a lockdep assertion for that?

2021-06-11 12:40:19

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/6] posix-cpu-timers: Fix rearm racing against process tick

On Fri, Jun 11, 2021 at 01:49:08PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 09, 2021 at 01:54:00PM +0200, Frederic Weisbecker wrote:
> > On Fri, Jun 04, 2021 at 01:31:54PM +0200, Frederic Weisbecker wrote:
> > > Since the process wide cputime counter is started locklessly from
> > > posix_cpu_timer_rearm(), it can be concurrently stopped by operations
> > > on other timers from the same thread group, such as in the following
> > > unlucky scenario:
> > >
> > > CPU 0 CPU 1
> > > ----- -----
> > > timer_settime(TIMER B)
> > > posix_cpu_timer_rearm(TIMER A)
> > > cpu_clock_sample_group()
> > > (pct->timers_active already true)
> > >
> > > handle_posix_cpu_timers()
> > > check_process_timers()
> > > stop_process_timers()
> > > pct->timers_active = false
> > > arm_timer(TIMER A)
> > >
> > > tick -> run_posix_cpu_timers()
> > > // sees !pct->timers_active, ignore
> > > // our TIMER A
> > >
> > > Fix this with simply locking process wide cputime counting start and
> > > timer arm in the same block.
> > >
> > > Signed-off-by: Frederic Weisbecker <[email protected]>
> > > Cc: Oleg Nesterov <[email protected]>
> > > Cc: Thomas Gleixner <[email protected]>
> > > Cc: Peter Zijlstra (Intel) <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: Eric W. Biederman <[email protected]>
> >
> > Fixes: 60f2ceaa8111 ("posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group")
> > Cc: [email protected]
>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>
>
>
> Problem seems to be calling cpu_clock_sample_group(.start = true)
> without sighand locked. Do we want a lockdep assertion for that?

It's part of the problem. The other part is that it must be locked in the
same sequence than arm_timer(). So yes, a lockdep assertion would already be
a good indicator that something goes wrong.

Thanks.

2021-06-16 08:55:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] posix-cpu-timers: Don't start process wide cputime counter if timer is disabled

On Fri, Jun 04, 2021 at 01:31:55PM +0200, Frederic Weisbecker wrote:
> If timer_settime() is called with a 0 expiration on a timer that is
> already disabled, the process wide cputime counter will be started
> and won't ever get a chance to be stopped by stop_process_timer() since
> no timer is actually armed to be processed.
>
> This process wide counter might bring some performance hit due to the
> concurrent atomic additions at the thread group scope.
>
> The following snippet is enough to trigger the issue.
>
> void trigger_process_counter(void)
> {
> timer_t id;
> struct itimerspec val = { };
>
> timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
> timer_settime(id, TIMER_ABSTIME, &val, NULL);
> timer_delete(id);
> }
>
> So make sure we don't needlessly start it.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Eric W. Biederman <[email protected]>
> ---
> kernel/time/posix-cpu-timers.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
> index aa52fc85dbcb..132fd56fb1cd 100644
> --- a/kernel/time/posix-cpu-timers.c
> +++ b/kernel/time/posix-cpu-timers.c
> @@ -632,10 +632,15 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
> * times (in arm_timer). With an absolute time, we must
> * check if it's already passed. In short, we need a sample.
> */
> - if (CPUCLOCK_PERTHREAD(timer->it_clock))
> + if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
> val = cpu_clock_sample(clkid, p);
> - else
> - val = cpu_clock_sample_group(clkid, p, true);
> + } else {
> + /*
> + * Sample group but only start the process wide cputime counter
> + * if the timer is to be enabled.
> + */
> + val = cpu_clock_sample_group(clkid, p, !!new_expires);
> + }

The cpu_timer_enqueue() is in arm_timer() and the condition for calling
that is:

'new_expires != 0 && val < new_expires'

Which is not the same as the one you add.

I'm thinking the fundamental problem here is the disconnect between
cpu_timer_enqueue() and pct->timers_active ?

2021-06-16 10:53:53

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/6] posix-cpu-timers: Don't start process wide cputime counter if timer is disabled

On Wed, Jun 16, 2021 at 10:51:21AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 04, 2021 at 01:31:55PM +0200, Frederic Weisbecker wrote:
> > If timer_settime() is called with a 0 expiration on a timer that is
> > already disabled, the process wide cputime counter will be started
> > and won't ever get a chance to be stopped by stop_process_timer() since
> > no timer is actually armed to be processed.
> >
> > This process wide counter might bring some performance hit due to the
> > concurrent atomic additions at the thread group scope.
> >
> > The following snippet is enough to trigger the issue.
> >
> > void trigger_process_counter(void)
> > {
> > timer_t id;
> > struct itimerspec val = { };
> >
> > timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
> > timer_settime(id, TIMER_ABSTIME, &val, NULL);
> > timer_delete(id);
> > }
> >
> > So make sure we don't needlessly start it.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Oleg Nesterov <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Peter Zijlstra (Intel) <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Eric W. Biederman <[email protected]>
> > ---
> > kernel/time/posix-cpu-timers.c | 11 ++++++++---
> > 1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
> > index aa52fc85dbcb..132fd56fb1cd 100644
> > --- a/kernel/time/posix-cpu-timers.c
> > +++ b/kernel/time/posix-cpu-timers.c
> > @@ -632,10 +632,15 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
> > * times (in arm_timer). With an absolute time, we must
> > * check if it's already passed. In short, we need a sample.
> > */
> > - if (CPUCLOCK_PERTHREAD(timer->it_clock))
> > + if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
> > val = cpu_clock_sample(clkid, p);
> > - else
> > - val = cpu_clock_sample_group(clkid, p, true);
> > + } else {
> > + /*
> > + * Sample group but only start the process wide cputime counter
> > + * if the timer is to be enabled.
> > + */
> > + val = cpu_clock_sample_group(clkid, p, !!new_expires);
> > + }
>
> The cpu_timer_enqueue() is in arm_timer() and the condition for calling
> that is:
>
> 'new_expires != 0 && val < new_expires'
>
> Which is not the same as the one you add.

There are two different things here:

1) the threadgroup cputime counter, activated by cpu_clock_sample_group(clkid,
p, true)

2) the expiration set (+ the callback enqueued) in arm_timer()

The issue here is that we go through 1) but not through 2)

>
> I'm thinking the fundamental problem here is the disconnect between
> cpu_timer_enqueue() and pct->timers_active ?

You're right it's the core issue. But what prevents the whole to be
fundamentally connected is a circular dependency: we need to know the
threadgroup cputime before arming the timer, but we would need to know
if we arm the timer before starting the threadgroup cputime counter

To sum up, the current sequence is:

* fetch the threadgroup cputime AND start the whole threadgroup counter

* arm the timer if it isn't zero and it hasn't yet expired

While the ideal sequence should be:

* fetch the threadgroup cputime (without starting the whole threadgroup counter
yet)

* arm the timer if it isn't zero and it hasn't yet expired

* iff we armed the timer, start the whole theadgroup counter

But that means re-iterating the whole threadgroup and update atomically
the group counter with each task's time.

2021-06-16 11:43:37

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 4/6] posix-cpu-timers: Force next_expiration recalc after timer reset

On Wed, Jun 16, 2021 at 11:23:33AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 04, 2021 at 01:31:57PM +0200, Frederic Weisbecker wrote:
>
> > @@ -647,8 +651,6 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
> > if (unlikely(timer->it.cpu.firing)) {
> > timer->it.cpu.firing = -1;
> > ret = TIMER_RETRY;
> > - } else {
> > - cpu_timer_dequeue(ctmr);
> > }
> >
> > /*
> > @@ -713,9 +715,13 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
> > * For a timer with no notification action, we don't actually
> > * arm the timer (we'll just fake it for timer_gettime).
> > */
> > - cpu_timer_setexpires(ctmr, new_expires);
> > - if (new_expires != 0 && val < new_expires) {
> > - arm_timer(timer, p);
> > + if (new_expires != 0) {
> > + cpu_timer_dequeue(ctmr);
> > + cpu_timer_setexpires(ctmr, new_expires);
> > + if (val < new_expires)
> > + arm_timer(timer, p);
> > + } else {
> > + disarm_timer(timer, p);
> > }
> >
> > unlock_task_sighand(p, &flags);
>
> AFAICT there's an error path in between where you've removed
> cpu_timer_dequeue() and added it back. This error path will now leave
> the timer enqueued.

Ah that's the case where the timer is firing. In this case it can't be queued
anyway. Also it's a retry path so we'll eventually dequeue it in any case
(should it be concurrently requeued after firing).

Thanks.

2021-06-16 11:43:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] posix-cpu-timers: Don't start process wide cputime counter if timer is disabled

On Wed, Jun 16, 2021 at 12:51:16PM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 16, 2021 at 10:51:21AM +0200, Peter Zijlstra wrote:

> > The cpu_timer_enqueue() is in arm_timer() and the condition for calling
> > that is:
> >
> > 'new_expires != 0 && val < new_expires'
> >
> > Which is not the same as the one you add.
>
> There are two different things here:
>
> 1) the threadgroup cputime counter, activated by cpu_clock_sample_group(clkid,
> p, true)
>
> 2) the expiration set (+ the callback enqueued) in arm_timer()
>
> The issue here is that we go through 1) but not through 2)

Correct, but then I would think the cleanup would need the same
conditions as 2, and not something slightly different, which is what
confused me.

> > I'm thinking the fundamental problem here is the disconnect between
> > cpu_timer_enqueue() and pct->timers_active ?
>
> You're right it's the core issue. But what prevents the whole to be
> fundamentally connected is a circular dependency: we need to know the
> threadgroup cputime before arming the timer, but we would need to know
> if we arm the timer before starting the threadgroup cputime counter
>
> To sum up, the current sequence is:
>
> * fetch the threadgroup cputime AND start the whole threadgroup counter
>
> * arm the timer if it isn't zero and it hasn't yet expired
>
> While the ideal sequence should be:
>
> * fetch the threadgroup cputime (without starting the whole threadgroup counter
> yet)
>
> * arm the timer if it isn't zero and it hasn't yet expired
>
> * iff we armed the timer, start the whole theadgroup counter
>
> But that means re-iterating the whole threadgroup and update atomically
> the group counter with each task's time.

Right, so by the time patch #5 comes around, you seem to be at the point
where you can do:

* fetch cputime and start threadgroup counter

* possibly arm timer

* if expired:
- fire now
- if armed, disarm (which leads to stop)

Which is the other 'obvious' solution to not starting it.

2021-06-16 12:27:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] posix-cpu-timers: Don't start process wide cputime counter if timer is disabled

On Wed, Jun 16, 2021 at 01:26:30PM +0200, Peter Zijlstra wrote:
> Right, so by the time patch #5 comes around, you seem to be at the point
> where you can do:
>
> * fetch cputime and start threadgroup counter
>
> * possibly arm timer

- possibly

>
> * if expired:
> - fire now
> - if armed, disarm (which leads to stop)
>
> Which is the other 'obvious' solution to not starting it.

So we unconditionally start and arm, and then have the early expire do
the same things as regular expire.

2021-06-16 13:21:36

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 5/6] posix-cpu-timers: Force next expiration recalc after early timer firing

On Wed, Jun 16, 2021 at 11:42:53AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 04, 2021 at 01:31:58PM +0200, Frederic Weisbecker wrote:
> > diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
> > index 0b5715c8db04..d8325a906314 100644
> > --- a/kernel/time/posix-cpu-timers.c
> > +++ b/kernel/time/posix-cpu-timers.c
> > @@ -405,6 +405,21 @@ static int posix_cpu_timer_create(struct k_itimer *new_timer)
> > return 0;
> > }
> >
> > +static void __disarm_timer(struct k_itimer *timer, struct task_struct *p,
> > + u64 old_expires)
> > +{
> > + int clkidx = CPUCLOCK_WHICH(timer->it_clock);
> > + struct posix_cputimer_base *base;
> > +
> > + if (CPUCLOCK_PERTHREAD(timer->it_clock))
> > + base = p->posix_cputimers.bases + clkidx;
> > + else
> > + base = p->signal->posix_cputimers.bases + clkidx;
> > +
> > + if (old_expires == base->nextevt)
> > + base->nextevt = 0;
> > +}
> > +
> > /*
> > * Dequeue the timer and reset the base if it was its earliest expiration.
> > * It makes sure the next tick recalculates the base next expiration so we
> > @@ -415,24 +430,14 @@ static void disarm_timer(struct k_itimer *timer, struct task_struct *p)
> > {
> > struct cpu_timer *ctmr = &timer->it.cpu;
> > u64 old_expires = cpu_timer_getexpires(ctmr);
> > - struct posix_cputimer_base *base;
> > bool queued;
> > - int clkidx;
> >
> > queued = cpu_timer_dequeue(ctmr);
> > cpu_timer_setexpires(ctmr, 0);
> > if (!queued)
> > return;
> >
> > - clkidx = CPUCLOCK_WHICH(timer->it_clock);
> > -
> > - if (CPUCLOCK_PERTHREAD(timer->it_clock))
> > - base = p->posix_cputimers.bases + clkidx;
> > - else
> > - base = p->signal->posix_cputimers.bases + clkidx;
> > -
> > - if (old_expires == base->nextevt)
> > - base->nextevt = 0;
> > + __disarm_timer(timer, p, old_expires);
> > }
> >
> >
> > @@ -686,8 +691,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
> > u64 exp = bump_cpu_timer(timer, val);
> >
> > if (val < exp) {
> > - old_expires = exp - val;
> > - old->it_value = ns_to_timespec64(old_expires);
> > + old->it_value = ns_to_timespec64(exp - val);
> > } else {
> > old->it_value.tv_nsec = 1;
> > old->it_value.tv_sec = 0;
> > @@ -748,9 +752,28 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
> > * accumulate more time on this clock.
> > */
> > cpu_timer_fire(timer);
> > +
> > + sighand = lock_task_sighand(p, &flags);
> > + if (sighand == NULL)
> > + goto out;
> > + if (!cpu_timer_queued(&timer->it.cpu)) {
> > + /*
> > + * Disarm the previous timer to deactivate the tick
> > + * dependency and process wide cputime counter if
> > + * necessary.
> > + */
> > + __disarm_timer(timer, p, old_expires);
> > + /*
> > + * If the previous timer was deactivated, we might have
> > + * just started the process wide cputime counter. Make
> > + * sure we poke the tick to deactivate it then.
> > + */
> > + if (!old_expires && !CPUCLOCK_PERTHREAD(timer->it_clock))
> > + p->signal->posix_cputimers.bases[clkid].nextevt = 0;
> > + }
> > + unlock_task_sighand(p, &flags);
> > }
>
> I'm thinking this is a better fix than patch #2. AFAICT you can now go
> back to unconditionally doing start, and then if we fire it early, we'll
> disarm the thing.
>
> That would avoid the disconnect between the start condition and the fire
> condition.

Right but the drawback is that we unconditionally start the threadgroup
counter while initializing the timer to 0 (deactivated).

Then in the next tick at least one thread will need to lock the sighand
and re-evaluate the whole list.

2021-06-16 20:23:25

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 5/6] posix-cpu-timers: Force next expiration recalc after early timer firing

On Wed, Jun 16, 2021 at 03:23:50PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 16, 2021 at 01:59:23PM +0200, Frederic Weisbecker wrote:
> > On Wed, Jun 16, 2021 at 11:42:53AM +0200, Peter Zijlstra wrote:
> > > I'm thinking this is a better fix than patch #2. AFAICT you can now go
> > > back to unconditionally doing start, and then if we fire it early, we'll
> > > disarm the thing.
> > >
> > > That would avoid the disconnect between the start condition and the fire
> > > condition.
> >
> > Right but the drawback is that we unconditionally start the threadgroup
> > counter while initializing the timer to 0 (deactivated).
> >
> > Then in the next tick at least one thread will need to lock the sighand
> > and re-evaluate the whole list.
>
> Yes.. but how common is it to enqueue expired timers? Surely that's an
> unlikely corner case. All normal timers will have to suffer one extra
> tick and iteration on exit, so I find it hard to justify complexity to
> optimize an unlikely case.
>
> I would rather have more obvious code.

Ok, I'm having a try at it.

Thanks!