2023-03-16 12:32:01

by Marco Elver

[permalink] [raw]
Subject: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

From: Dmitry Vyukov <[email protected]>

POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main
thread of a thread group for signal delivery. However, this has a
significant downside: it requires waking up a potentially idle thread.

Instead, prefer to deliver signals to the current thread (in the same
thread group) if SIGEV_THREAD_ID is not set by the user. This does not
change guaranteed semantics, since POSIX process CPU time timers have
never guaranteed that signal delivery is to a specific thread (without
SIGEV_THREAD_ID set).

The effect is that we no longer wake up potentially idle threads, and
the kernel is no longer biased towards delivering the timer signal to
any particular thread (which better distributes the timer signals esp.
when multiple timers fire concurrently).

Signed-off-by: Dmitry Vyukov <[email protected]>
Suggested-by: Oleg Nesterov <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
---
v6:
- Split test from this patch.
- Update wording on what this patch aims to improve.

v5:
- Rebased onto v6.2.

v4:
- Restructured checks in send_sigqueue() as suggested.

v3:
- Switched to the completely different implementation (much simpler)
based on the Oleg's idea.

RFC v2:
- Added additional Cc as Thomas asked.
---
kernel/signal.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 8cb28f1df294..605445fa27d4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
/*
* Now find a thread we can wake up to take the signal off the queue.
*
- * If the main thread wants the signal, it gets first crack.
- * Probably the least surprising to the average bear.
+ * Try the suggested task first (may or may not be the main thread).
*/
if (wants_signal(sig, p))
t = p;
@@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)

ret = -1;
rcu_read_lock();
+ /*
+ * This function is used by POSIX timers to deliver a timer signal.
+ * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID
+ * set), the signal must be delivered to the specific thread (queues
+ * into t->pending).
+ *
+ * Where type is not PIDTYPE_PID, signals must just be delivered to the
+ * current process. In this case, prefer to deliver to current if it is
+ * in the same thread group as the target, as it avoids unnecessarily
+ * waking up a potentially idle task.
+ */
t = pid_task(pid, type);
- if (!t || !likely(lock_task_sighand(t, &flags)))
+ if (!t)
+ goto ret;
+ if (type != PIDTYPE_PID && same_thread_group(t, current))
+ t = current;
+ if (!likely(lock_task_sighand(t, &flags)))
goto ret;

ret = 1; /* the signal is ignored */
@@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
q->info.si_overrun = 0;

signalfd_notify(t, sig);
+ /*
+ * If the type is not PIDTYPE_PID, we just use shared_pending, which
+ * won't guarantee that the specified task will receive the signal, but
+ * is sufficient if t==current in the common case.
+ */
pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
list_add_tail(&q->list, &pending->list);
sigaddset(&pending->signal, sig);
--
2.40.0.rc1.284.g88254d51c5-goog



2023-03-16 12:32:08

by Marco Elver

[permalink] [raw]
Subject: [PATCH v6 2/2] selftests/timers/posix_timers: Test delivery of signals across threads

From: Dmitry Vyukov <[email protected]>

Test that POSIX timers using CLOCK_PROCESS_CPUTIME_ID eventually deliver
a signal to all running threads. This effectively tests that the kernel
doesn't prefer any one thread (or subset of threads) for signal delivery.

Signed-off-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
---
v6:
- Update wording on what the test aims to test.
- Fix formatting per checkpatch.pl.
---
tools/testing/selftests/timers/posix_timers.c | 77 +++++++++++++++++++
1 file changed, 77 insertions(+)

diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index 0ba500056e63..8a17c0e8d82b 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -188,6 +188,80 @@ static int check_timer_create(int which)
return 0;
}

+int remain;
+__thread int got_signal;
+
+static void *distribution_thread(void *arg)
+{
+ while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+ return NULL;
+}
+
+static void distribution_handler(int nr)
+{
+ if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
+ __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
+}
+
+/*
+ * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
+ * timer signals. This primarily tests that the kernel does not favour any one.
+ */
+static int check_timer_distribution(void)
+{
+ int err, i;
+ timer_t id;
+ const int nthreads = 10;
+ pthread_t threads[nthreads];
+ struct itimerspec val = {
+ .it_value.tv_sec = 0,
+ .it_value.tv_nsec = 1000 * 1000,
+ .it_interval.tv_sec = 0,
+ .it_interval.tv_nsec = 1000 * 1000,
+ };
+
+ printf("Check timer_create() per process signal distribution... ");
+ fflush(stdout);
+
+ remain = nthreads + 1; /* worker threads + this thread */
+ signal(SIGALRM, distribution_handler);
+ err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
+ if (err < 0) {
+ perror("Can't create timer\n");
+ return -1;
+ }
+ err = timer_settime(id, 0, &val, NULL);
+ if (err < 0) {
+ perror("Can't set timer\n");
+ return -1;
+ }
+
+ for (i = 0; i < nthreads; i++) {
+ if (pthread_create(&threads[i], NULL, distribution_thread, NULL)) {
+ perror("Can't create thread\n");
+ return -1;
+ }
+ }
+
+ /* Wait for all threads to receive the signal. */
+ while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+
+ for (i = 0; i < nthreads; i++) {
+ if (pthread_join(threads[i], NULL)) {
+ perror("Can't join thread\n");
+ return -1;
+ }
+ }
+
+ if (timer_delete(id)) {
+ perror("Can't delete timer\n");
+ return -1;
+ }
+
+ printf("[OK]\n");
+ return 0;
+}
+
int main(int argc, char **argv)
{
printf("Testing posix timers. False negative may happen on CPU execution \n");
@@ -217,5 +291,8 @@ int main(int argc, char **argv)
if (check_timer_create(CLOCK_PROCESS_CPUTIME_ID) < 0)
return ksft_exit_fail();

+ if (check_timer_distribution() < 0)
+ return ksft_exit_fail();
+
return ksft_exit_pass();
}
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-30 10:21:52

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Thu, 16 Mar 2023 at 13:31, Marco Elver <[email protected]> wrote:
>
> From: Dmitry Vyukov <[email protected]>
>
> POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main
> thread of a thread group for signal delivery. However, this has a
> significant downside: it requires waking up a potentially idle thread.
>
> Instead, prefer to deliver signals to the current thread (in the same
> thread group) if SIGEV_THREAD_ID is not set by the user. This does not
> change guaranteed semantics, since POSIX process CPU time timers have
> never guaranteed that signal delivery is to a specific thread (without
> SIGEV_THREAD_ID set).
>
> The effect is that we no longer wake up potentially idle threads, and
> the kernel is no longer biased towards delivering the timer signal to
> any particular thread (which better distributes the timer signals esp.
> when multiple timers fire concurrently).
>
> Signed-off-by: Dmitry Vyukov <[email protected]>
> Suggested-by: Oleg Nesterov <[email protected]>
> Reviewed-by: Oleg Nesterov <[email protected]>
> Signed-off-by: Marco Elver <[email protected]>

Gentle ping...

Thanks,
-- Marco

2023-04-06 14:14:31

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Thu, 16 Mar 2023 at 13:31, Marco Elver <[email protected]> wrote:
>
> From: Dmitry Vyukov <[email protected]>
>
> POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main
> thread of a thread group for signal delivery. However, this has a
> significant downside: it requires waking up a potentially idle thread.
>
> Instead, prefer to deliver signals to the current thread (in the same
> thread group) if SIGEV_THREAD_ID is not set by the user. This does not
> change guaranteed semantics, since POSIX process CPU time timers have
> never guaranteed that signal delivery is to a specific thread (without
> SIGEV_THREAD_ID set).
>
> The effect is that we no longer wake up potentially idle threads, and
> the kernel is no longer biased towards delivering the timer signal to
> any particular thread (which better distributes the timer signals esp.
> when multiple timers fire concurrently).
>
> Signed-off-by: Dmitry Vyukov <[email protected]>
> Suggested-by: Oleg Nesterov <[email protected]>
> Reviewed-by: Oleg Nesterov <[email protected]>
> Signed-off-by: Marco Elver <[email protected]>
> ---
> v6:
> - Split test from this patch.
> - Update wording on what this patch aims to improve.
>
> v5:
> - Rebased onto v6.2.
>
> v4:
> - Restructured checks in send_sigqueue() as suggested.
>
> v3:
> - Switched to the completely different implementation (much simpler)
> based on the Oleg's idea.
>
> RFC v2:
> - Added additional Cc as Thomas asked.
> ---
> kernel/signal.c | 25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 8cb28f1df294..605445fa27d4 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
> /*
> * Now find a thread we can wake up to take the signal off the queue.
> *
> - * If the main thread wants the signal, it gets first crack.
> - * Probably the least surprising to the average bear.
> + * Try the suggested task first (may or may not be the main thread).
> */
> if (wants_signal(sig, p))
> t = p;
> @@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>
> ret = -1;
> rcu_read_lock();
> + /*
> + * This function is used by POSIX timers to deliver a timer signal.
> + * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID
> + * set), the signal must be delivered to the specific thread (queues
> + * into t->pending).
> + *
> + * Where type is not PIDTYPE_PID, signals must just be delivered to the
> + * current process. In this case, prefer to deliver to current if it is
> + * in the same thread group as the target, as it avoids unnecessarily
> + * waking up a potentially idle task.
> + */
> t = pid_task(pid, type);
> - if (!t || !likely(lock_task_sighand(t, &flags)))
> + if (!t)
> + goto ret;
> + if (type != PIDTYPE_PID && same_thread_group(t, current))
> + t = current;
> + if (!likely(lock_task_sighand(t, &flags)))
> goto ret;
>
> ret = 1; /* the signal is ignored */
> @@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
> q->info.si_overrun = 0;
>
> signalfd_notify(t, sig);
> + /*
> + * If the type is not PIDTYPE_PID, we just use shared_pending, which
> + * won't guarantee that the specified task will receive the signal, but
> + * is sufficient if t==current in the common case.
> + */
> pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
> list_add_tail(&q->list, &pending->list);
> sigaddset(&pending->signal, sig);
> --

One last semi-gentle ping. ;-)

1. We're seeing that in some applications that use POSIX timers
heavily, but where the main thread is mostly idle, the main thread
receives a disproportional amount of the signals along with being
woken up constantly. This is bad, because the main thread usually
waits with the help of a futex or really long sleeps. Now the main
thread will steal time (to go back to sleep) from another thread that
could have instead just proceeded with whatever it was doing.

2. Delivering signals to random threads is currently way too
expensive. We need to resort to this crazy algorithm: 1) receive timer
signal, 2) check if main thread, 3) if main thread (which is likely),
pick a random thread and do tgkill. To find a random thread, iterate
/proc/self/task, but that's just abysmal for various reasons. Other
alternatives, like inherited task clock perf events are too expensive
as soon as we need to enable/disable the timers (does IPIs), and
maintaining O(#threads) timers is just as horrible.

This patch solves both the above issues.

We acknowledge the unfortunate situation of attributing this patch to
one clear subsystem and owner: it straddles into signal delivery and
POSIX timers territory, and perhaps some scheduling. The patch itself
only touches kernel/signal.c.

If anyone has serious objections, please shout (soon'ish). Given the
patch has been reviewed by Oleg, and scrutinized by Dmitry and myself,
presumably we need to find a tree that currently takes kernel/signal.c
patches?

Thanks!

-- Marco

2023-04-06 15:26:41

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

Le Thu, Apr 06, 2023 at 04:12:04PM +0200, Marco Elver a ?crit :
> On Thu, 16 Mar 2023 at 13:31, Marco Elver <[email protected]> wrote:
> One last semi-gentle ping. ;-)
>
> 1. We're seeing that in some applications that use POSIX timers
> heavily, but where the main thread is mostly idle, the main thread
> receives a disproportional amount of the signals along with being
> woken up constantly. This is bad, because the main thread usually
> waits with the help of a futex or really long sleeps. Now the main
> thread will steal time (to go back to sleep) from another thread that
> could have instead just proceeded with whatever it was doing.
>
> 2. Delivering signals to random threads is currently way too
> expensive. We need to resort to this crazy algorithm: 1) receive timer
> signal, 2) check if main thread, 3) if main thread (which is likely),
> pick a random thread and do tgkill. To find a random thread, iterate
> /proc/self/task, but that's just abysmal for various reasons. Other
> alternatives, like inherited task clock perf events are too expensive
> as soon as we need to enable/disable the timers (does IPIs), and
> maintaining O(#threads) timers is just as horrible.
>
> This patch solves both the above issues.
>
> We acknowledge the unfortunate situation of attributing this patch to
> one clear subsystem and owner: it straddles into signal delivery and
> POSIX timers territory, and perhaps some scheduling. The patch itself
> only touches kernel/signal.c.
>
> If anyone has serious objections, please shout (soon'ish). Given the
> patch has been reviewed by Oleg, and scrutinized by Dmitry and myself,
> presumably we need to find a tree that currently takes kernel/signal.c
> patches?
>
> Thanks!

Thanks for the reminder!

In the very unlikely case Thomas ignores this before the next merge window,
I'll tentatively do a pull request to Linus.

Thanks.

>
> -- Marco

2023-04-06 20:26:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Thu, Mar 16, 2023 at 01:30:27PM +0100, Marco Elver wrote:
> From: Dmitry Vyukov <[email protected]>
>
> POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main
> thread of a thread group for signal delivery. However, this has a
> significant downside: it requires waking up a potentially idle thread.
>
> Instead, prefer to deliver signals to the current thread (in the same
> thread group) if SIGEV_THREAD_ID is not set by the user. This does not
> change guaranteed semantics, since POSIX process CPU time timers have
> never guaranteed that signal delivery is to a specific thread (without
> SIGEV_THREAD_ID set).
>
> The effect is that we no longer wake up potentially idle threads, and
> the kernel is no longer biased towards delivering the timer signal to
> any particular thread (which better distributes the timer signals esp.
> when multiple timers fire concurrently).
>
> Signed-off-by: Dmitry Vyukov <[email protected]>
> Suggested-by: Oleg Nesterov <[email protected]>
> Reviewed-by: Oleg Nesterov <[email protected]>
> Signed-off-by: Marco Elver <[email protected]>

Acked-by: Peter Zijlstra (Intel) <[email protected]>

> ---
> kernel/signal.c | 25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 8cb28f1df294..605445fa27d4 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
> /*
> * Now find a thread we can wake up to take the signal off the queue.
> *
> - * If the main thread wants the signal, it gets first crack.
> - * Probably the least surprising to the average bear.
> + * Try the suggested task first (may or may not be the main thread).
> */
> if (wants_signal(sig, p))
> t = p;
> @@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>
> ret = -1;
> rcu_read_lock();
> + /*
> + * This function is used by POSIX timers to deliver a timer signal.
> + * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID
> + * set), the signal must be delivered to the specific thread (queues
> + * into t->pending).
> + *
> + * Where type is not PIDTYPE_PID, signals must just be delivered to the
> + * current process. In this case, prefer to deliver to current if it is
> + * in the same thread group as the target, as it avoids unnecessarily
> + * waking up a potentially idle task.
> + */
> t = pid_task(pid, type);
> - if (!t || !likely(lock_task_sighand(t, &flags)))
> + if (!t)
> + goto ret;
> + if (type != PIDTYPE_PID && same_thread_group(t, current))
> + t = current;
> + if (!likely(lock_task_sighand(t, &flags)))
> goto ret;
>
> ret = 1; /* the signal is ignored */
> @@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
> q->info.si_overrun = 0;
>
> signalfd_notify(t, sig);
> + /*
> + * If the type is not PIDTYPE_PID, we just use shared_pending, which
> + * won't guarantee that the specified task will receive the signal, but
> + * is sufficient if t==current in the common case.
> + */
> pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
> list_add_tail(&q->list, &pending->list);
> sigaddset(&pending->signal, sig);
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>

Subject: [tip: timers/core] posix-timers: Prefer delivery of signals to the current thread

The following commit has been merged into the timers/core branch of tip:

Commit-ID: bcb7ee79029dcaeb09668a4d1489de256829a7cc
Gitweb: https://git.kernel.org/tip/bcb7ee79029dcaeb09668a4d1489de256829a7cc
Author: Dmitry Vyukov <[email protected]>
AuthorDate: Thu, 16 Mar 2023 13:30:27 +01:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Sun, 16 Apr 2023 09:00:18 +02:00

posix-timers: Prefer delivery of signals to the current thread

POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main
thread of a thread group for signal delivery. However, this has a
significant downside: it requires waking up a potentially idle thread.

Instead, prefer to deliver signals to the current thread (in the same
thread group) if SIGEV_THREAD_ID is not set by the user. This does not
change guaranteed semantics, since POSIX process CPU time timers have
never guaranteed that signal delivery is to a specific thread (without
SIGEV_THREAD_ID set).

The effect is that queueing the signal no longer wakes up potentially idle
threads, and the kernel is no longer biased towards delivering the timer
signal to any particular thread (which better distributes the timer signals
esp. when multiple timers fire concurrently).

Suggested-by: Oleg Nesterov <[email protected]>
Signed-off-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
kernel/signal.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 8cb28f1..8f6330f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
/*
* Now find a thread we can wake up to take the signal off the queue.
*
- * If the main thread wants the signal, it gets first crack.
- * Probably the least surprising to the average bear.
+ * Try the suggested task first (may or may not be the main thread).
*/
if (wants_signal(sig, p))
t = p;
@@ -1970,8 +1969,24 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)

ret = -1;
rcu_read_lock();
+
+ /*
+ * This function is used by POSIX timers to deliver a timer signal.
+ * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID
+ * set), the signal must be delivered to the specific thread (queues
+ * into t->pending).
+ *
+ * Where type is not PIDTYPE_PID, signals must be delivered to the
+ * process. In this case, prefer to deliver to current if it is in
+ * the same thread group as the target process, which avoids
+ * unnecessarily waking up a potentially idle task.
+ */
t = pid_task(pid, type);
- if (!t || !likely(lock_task_sighand(t, &flags)))
+ if (!t)
+ goto ret;
+ if (type != PIDTYPE_PID && same_thread_group(t, current))
+ t = current;
+ if (!likely(lock_task_sighand(t, &flags)))
goto ret;

ret = 1; /* the signal is ignored */

Subject: [tip: timers/core] selftests/timers/posix_timers: Test delivery of signals across threads

The following commit has been merged into the timers/core branch of tip:

Commit-ID: e797203fb3ba8c0ed2f4a8800d626c9d54fedfbf
Gitweb: https://git.kernel.org/tip/e797203fb3ba8c0ed2f4a8800d626c9d54fedfbf
Author: Dmitry Vyukov <[email protected]>
AuthorDate: Thu, 16 Mar 2023 13:30:28 +01:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Sun, 16 Apr 2023 09:00:18 +02:00

selftests/timers/posix_timers: Test delivery of signals across threads

Test that POSIX timers using CLOCK_PROCESS_CPUTIME_ID eventually deliver
a signal to all running threads. This effectively tests that the kernel
doesn't prefer any one thread (or subset of threads) for signal delivery.

Signed-off-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
tools/testing/selftests/timers/posix_timers.c | 77 ++++++++++++++++++-
1 file changed, 77 insertions(+)

diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index 0ba5000..8a17c0e 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -188,6 +188,80 @@ static int check_timer_create(int which)
return 0;
}

+int remain;
+__thread int got_signal;
+
+static void *distribution_thread(void *arg)
+{
+ while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+ return NULL;
+}
+
+static void distribution_handler(int nr)
+{
+ if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
+ __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
+}
+
+/*
+ * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
+ * timer signals. This primarily tests that the kernel does not favour any one.
+ */
+static int check_timer_distribution(void)
+{
+ int err, i;
+ timer_t id;
+ const int nthreads = 10;
+ pthread_t threads[nthreads];
+ struct itimerspec val = {
+ .it_value.tv_sec = 0,
+ .it_value.tv_nsec = 1000 * 1000,
+ .it_interval.tv_sec = 0,
+ .it_interval.tv_nsec = 1000 * 1000,
+ };
+
+ printf("Check timer_create() per process signal distribution... ");
+ fflush(stdout);
+
+ remain = nthreads + 1; /* worker threads + this thread */
+ signal(SIGALRM, distribution_handler);
+ err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
+ if (err < 0) {
+ perror("Can't create timer\n");
+ return -1;
+ }
+ err = timer_settime(id, 0, &val, NULL);
+ if (err < 0) {
+ perror("Can't set timer\n");
+ return -1;
+ }
+
+ for (i = 0; i < nthreads; i++) {
+ if (pthread_create(&threads[i], NULL, distribution_thread, NULL)) {
+ perror("Can't create thread\n");
+ return -1;
+ }
+ }
+
+ /* Wait for all threads to receive the signal. */
+ while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+
+ for (i = 0; i < nthreads; i++) {
+ if (pthread_join(threads[i], NULL)) {
+ perror("Can't join thread\n");
+ return -1;
+ }
+ }
+
+ if (timer_delete(id)) {
+ perror("Can't delete timer\n");
+ return -1;
+ }
+
+ printf("[OK]\n");
+ return 0;
+}
+
int main(int argc, char **argv)
{
printf("Testing posix timers. False negative may happen on CPU execution \n");
@@ -217,5 +291,8 @@ int main(int argc, char **argv)
if (check_timer_create(CLOCK_PROCESS_CPUTIME_ID) < 0)
return ksft_exit_fail();

+ if (check_timer_distribution() < 0)
+ return ksft_exit_fail();
+
return ksft_exit_pass();
}

2024-04-01 20:17:32

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Thu, Mar 16, 2023 at 5:30 AM Marco Elver <[email protected]> wrote:
>
> From: Dmitry Vyukov <[email protected]>
>
> POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main
> thread of a thread group for signal delivery. However, this has a
> significant downside: it requires waking up a potentially idle thread.
>
> Instead, prefer to deliver signals to the current thread (in the same
> thread group) if SIGEV_THREAD_ID is not set by the user. This does not
> change guaranteed semantics, since POSIX process CPU time timers have
> never guaranteed that signal delivery is to a specific thread (without
> SIGEV_THREAD_ID set).
>
> The effect is that we no longer wake up potentially idle threads, and
> the kernel is no longer biased towards delivering the timer signal to
> any particular thread (which better distributes the timer signals esp.
> when multiple timers fire concurrently).
>
> Signed-off-by: Dmitry Vyukov <[email protected]>
> Suggested-by: Oleg Nesterov <[email protected]>
> Reviewed-by: Oleg Nesterov <[email protected]>
> Signed-off-by: Marco Elver <[email protected]>

Apologies for drudging up this old thread.

I wanted to ask if anyone had objections to including this in the -stable trees?

After this and the follow-on patch e797203fb3ba
("selftests/timers/posix_timers: Test delivery of signals across
threads") landed, folks testing older kernels with the latest
selftests started to see the new test checking for this behavior to
stall. Thomas did submit an adjustment to the test here to avoid the
stall: https://lore.kernel.org/lkml/[email protected]/,
but it didn't seem to land, however that would just result in the test
failing instead of hanging.

This change does seem to cherry-pick cleanly back to at least
stable/linux-5.10.y cleanly, so it looks simple to pull this change
back. But I wanted to make sure there wasn't anything subtle I was
missing before sending patches.

thanks
-john

2024-04-02 09:08:01

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Mon, 1 Apr 2024 at 22:17, John Stultz <[email protected]> wrote:
>
> On Thu, Mar 16, 2023 at 5:30 AM Marco Elver <[email protected]> wrote:
> >
> > From: Dmitry Vyukov <[email protected]>
> >
> > POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main
> > thread of a thread group for signal delivery. However, this has a
> > significant downside: it requires waking up a potentially idle thread.
> >
> > Instead, prefer to deliver signals to the current thread (in the same
> > thread group) if SIGEV_THREAD_ID is not set by the user. This does not
> > change guaranteed semantics, since POSIX process CPU time timers have
> > never guaranteed that signal delivery is to a specific thread (without
> > SIGEV_THREAD_ID set).
> >
> > The effect is that we no longer wake up potentially idle threads, and
> > the kernel is no longer biased towards delivering the timer signal to
> > any particular thread (which better distributes the timer signals esp.
> > when multiple timers fire concurrently).
> >
> > Signed-off-by: Dmitry Vyukov <[email protected]>
> > Suggested-by: Oleg Nesterov <[email protected]>
> > Reviewed-by: Oleg Nesterov <[email protected]>
> > Signed-off-by: Marco Elver <[email protected]>
>
> Apologies for drudging up this old thread.
>
> I wanted to ask if anyone had objections to including this in the -stable trees?
>
> After this and the follow-on patch e797203fb3ba
> ("selftests/timers/posix_timers: Test delivery of signals across
> threads") landed, folks testing older kernels with the latest
> selftests started to see the new test checking for this behavior to
> stall. Thomas did submit an adjustment to the test here to avoid the
> stall: https://lore.kernel.org/lkml/[email protected]/,
> but it didn't seem to land, however that would just result in the test
> failing instead of hanging.
>
> This change does seem to cherry-pick cleanly back to at least
> stable/linux-5.10.y cleanly, so it looks simple to pull this change
> back. But I wanted to make sure there wasn't anything subtle I was
> missing before sending patches.

I don't have objections per se. But I wonder how other tests deal with
such situations. It should happen for any test for new functionality.
Can we do the same other tests are doing?

2024-04-02 14:57:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Mon, Apr 01 2024 at 13:17, John Stultz wrote:
> Apologies for drudging up this old thread.
> I wanted to ask if anyone had objections to including this in the -stable trees?
>
> After this and the follow-on patch e797203fb3ba
> ("selftests/timers/posix_timers: Test delivery of signals across
> threads") landed, folks testing older kernels with the latest
> selftests started to see the new test checking for this behavior to
> stall. Thomas did submit an adjustment to the test here to avoid the
> stall: https://lore.kernel.org/lkml/[email protected]/,
> but it didn't seem to land, however that would just result in the test
> failing instead of hanging.

Thanks for reminding me about this series. I completely forgot about it.

> This change does seem to cherry-pick cleanly back to at least
> stable/linux-5.10.y cleanly, so it looks simple to pull this change
> back. But I wanted to make sure there wasn't anything subtle I was
> missing before sending patches.

This test in particular exercises new functionality/behaviour, which
really has no business to be backported into stable just to make the
relevant test usable on older kernels.

Why would testing with latest tests against an older kernel be valid per
se?

Thanks,

tglx


2024-04-02 17:24:15

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Tue, Apr 2, 2024 at 7:57 AM Thomas Gleixner <[email protected]> wrote:
> On Mon, Apr 01 2024 at 13:17, John Stultz wrote:
> > This change does seem to cherry-pick cleanly back to at least
> > stable/linux-5.10.y cleanly, so it looks simple to pull this change
> > back. But I wanted to make sure there wasn't anything subtle I was
> > missing before sending patches.
>
> This test in particular exercises new functionality/behaviour, which
> really has no business to be backported into stable just to make the
> relevant test usable on older kernels.

That's fair. I didn't have all the context around what motivated the
change and the follow-on test, which is why I'm asking here.

> Why would testing with latest tests against an older kernel be valid per
> se?

So yeah, it definitely can get fuzzy trying to split hairs between
when a change in behavior is a "new feature" or a "fix".

Greg could probably articulate it better, but my understanding is the
main point for running newer tests on older kernels is that newer
tests will have more coverage of what is expected of the kernel. For
features that older kernels don't support, ideally the tests will
check for that functionality like userland applications would, and
skip that portion of the test if it's unsupported. This way, we're
able to find issues (important enough to warrant tests having been
created) that have not yet been patched in the -stable trees.

In this case, there is a behavioral change combined with a compliance
test, which makes it look a bit more like a fix, rather than a feature
(additionally the lack of a way for userland to probe for this new
"feature" makes it seem fix-like). But the intended result of this is
just spurring this discussion to see if it makes sense to backport or
not. Disabling/ignoring the test (maybe after Thomas' fix to avoid it
from hanging :) is a fine solution too, but not one I'd want folks to
do until they've synced with maintainers and had full context.

thanks
-john

2024-04-03 12:41:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Tue, Apr 02 2024 at 10:23, John Stultz wrote:
> On Tue, Apr 2, 2024 at 7:57 AM Thomas Gleixner <[email protected]> wrote:
>> This test in particular exercises new functionality/behaviour, which
>> really has no business to be backported into stable just to make the
>> relevant test usable on older kernels.
>
> That's fair. I didn't have all the context around what motivated the
> change and the follow-on test, which is why I'm asking here.

It's a performance enhancement to avoid waking up idle threads for
signal delivery instead of just delivering it to the current running
thread which made the CPU timer fire. So it does not qualify for fix.

>> Why would testing with latest tests against an older kernel be valid per
>> se?
>
> So yeah, it definitely can get fuzzy trying to split hairs between
> when a change in behavior is a "new feature" or a "fix".
>
> Greg could probably articulate it better, but my understanding is the
> main point for running newer tests on older kernels is that newer
> tests will have more coverage of what is expected of the kernel. For
> features that older kernels don't support, ideally the tests will
> check for that functionality like userland applications would, and
> skip that portion of the test if it's unsupported. This way, we're
> able to find issues (important enough to warrant tests having been
> created) that have not yet been patched in the -stable trees.
>
> In this case, there is a behavioral change combined with a compliance
> test, which makes it look a bit more like a fix, rather than a feature
> (additionally the lack of a way for userland to probe for this new
> "feature" makes it seem fix-like). But the intended result of this is
> just spurring this discussion to see if it makes sense to backport or
> not. Disabling/ignoring the test (maybe after Thomas' fix to avoid it
> from hanging :) is a fine solution too, but not one I'd want folks to
> do until they've synced with maintainers and had full context.

I was staring at this test because it hangs even on upstream on a
regular base, at least in a VM. The timeout change I posted prevents the
hang, but still the posixtimer test will not have 0 fails.

The test if fragile as hell as there is absolutely no guarantee that the
signal target distribution is as expected. The expectation is based on a
statistical assumption which does not really hold.

So I came up with a modified variant of that, which can deduce pretty
reliably that the test runs on an older kernel.

Thanks,

tglx
---
Subject: selftests/timers/posix_timers: Make signal distribution test less fragile
From: Thomas Gleixner <[email protected]>
Date: Mon, 15 May 2023 00:40:10 +0200

The signal distribution test has a tendency to hang for a long time as the
signal delivery is not really evenly distributed. In fact it might never be
distributed across all threads ever in the way it is written.

Address this by:

1) Adding a timeout which aborts the test

2) Letting the test threads do a usleep() once they got a signal instead
of running continuously. That ensures that the other threads will expire
the timer and get the signal

3) Adding a detection whether all signals arrvied at the main thread,
which allows to run the test on older kernels.

While at it get rid of the pointless atomic operation on a the thread local
variable in the signal handler.

Signed-off-by: Thomas Gleixner <[email protected]>
---
tools/testing/selftests/timers/posix_timers.c | 48 +++++++++++++++++---------
1 file changed, 32 insertions(+), 16 deletions(-)

--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -184,18 +184,22 @@ static int check_timer_create(int which)
return 0;
}

-int remain;
-__thread int got_signal;
+static int remain;
+static __thread int got_signal;

static void *distribution_thread(void *arg)
{
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
- return NULL;
+ while (__atomic_load_n(&remain, __ATOMIC_RELAXED) && !done) {
+ if (got_signal)
+ usleep(10);
+ }
+
+ return (void *)got_signal;
}

static void distribution_handler(int nr)
{
- if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
+ if (++got_signal == 1)
__atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
}

@@ -205,8 +209,6 @@ static void distribution_handler(int nr)
*/
static int check_timer_distribution(void)
{
- int err, i;
- timer_t id;
const int nthreads = 10;
pthread_t threads[nthreads];
struct itimerspec val = {
@@ -215,7 +217,11 @@ static int check_timer_distribution(void
.it_interval.tv_sec = 0,
.it_interval.tv_nsec = 1000 * 1000,
};
+ int err, i, nsigs;
+ time_t start, now;
+ timer_t id;

+ done = 0;
remain = nthreads + 1; /* worker threads + this thread */
signal(SIGALRM, distribution_handler);
err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
@@ -231,7 +237,7 @@ static int check_timer_distribution(void

for (i = 0; i < nthreads; i++) {
err = pthread_create(&threads[i], NULL, distribution_thread,
- NULL);
+ thread_sigs + i);
if (err) {
ksft_print_msg("Can't create thread: %s (%d)\n",
strerror(errno), errno);
@@ -240,23 +246,33 @@ static int check_timer_distribution(void
}

/* Wait for all threads to receive the signal. */
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+ now = start = time(NULL);
+ while (__atomic_load_n(&remain, __ATOMIC_RELAXED)) {
+ now = time(NULL);
+ if (now - start > 5)
+ break;
+ }
+ done = 1;

- for (i = 0; i < nthreads; i++) {
+ if (timer_delete(id)) {
+ ksft_perror("Can't delete timer\n");
+ return -1;
+ }
+
+ for (i = 0, nsigs = 0; i < nthreads; i++) {
err = pthread_join(threads[i], NULL);
if (err) {
ksft_print_msg("Can't join thread: %s (%d)\n",
strerror(errno), errno);
return -1;
}
+ nsigs += thread_sigs[i];
}

- if (timer_delete(id)) {
- ksft_perror("Can't delete timer");
- return -1;
- }
-
- ksft_test_result_pass("check_timer_distribution\n");
+ if (!nsigs)
+ ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
+ else
+ ksft_test_result(now - start < 5, "check_timer_distribution\n");
return 0;
}




2024-04-03 15:09:07

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On 04/03, Thomas Gleixner wrote:
>
> The test if fragile as hell as there is absolutely no guarantee that the
> signal target distribution is as expected. The expectation is based on a
> statistical assumption which does not really hold.

Agreed. I too never liked this test-case.

I forgot everything about this patch and test-case, I can't really read
your patch right now (sorry), so I am sure I missed something, but

> static void *distribution_thread(void *arg)
> {
> - while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> - return NULL;
> + while (__atomic_load_n(&remain, __ATOMIC_RELAXED) && !done) {
> + if (got_signal)
> + usleep(10);
> + }
> +
> + return (void *)got_signal;
> }

Why distribution_thread() can't simply exit if got_signal != 0 ?

See https://lore.kernel.org/all/[email protected]/

Oleg.


2024-04-03 15:46:37

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Wed, Apr 03 2024 at 17:03, Oleg Nesterov wrote:
> On 04/03, Thomas Gleixner wrote:
>> The test if fragile as hell as there is absolutely no guarantee that the
>> signal target distribution is as expected. The expectation is based on a
>> statistical assumption which does not really hold.
>
> Agreed. I too never liked this test-case.
>
> I forgot everything about this patch and test-case, I can't really read
> your patch right now (sorry), so I am sure I missed something, but
>
>> static void *distribution_thread(void *arg)
>> {
>> - while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
>> - return NULL;
>> + while (__atomic_load_n(&remain, __ATOMIC_RELAXED) && !done) {
>> + if (got_signal)
>> + usleep(10);
>> + }
>> +
>> + return (void *)got_signal;
>> }
>
> Why distribution_thread() can't simply exit if got_signal != 0 ?
>
> See https://lore.kernel.org/all/[email protected]/

Indeed. It's too obvious :)

2024-04-03 16:32:13

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Wed, Apr 03 2024 at 17:43, Thomas Gleixner wrote:
> On Wed, Apr 03 2024 at 17:03, Oleg Nesterov wrote:
>>
>> Why distribution_thread() can't simply exit if got_signal != 0 ?
>>
>> See https://lore.kernel.org/all/[email protected]/
>
> Indeed. It's too obvious :)

Revised simpler version below.

Thanks,

tglx
---
Subject: selftests/timers/posix_timers: Make signal distribution test less fragile
From: Thomas Gleixner <[email protected]>

The signal distribution test has a tendency to hang for a long time as the
signal delivery is not really evenly distributed. In fact it might never be
distributed across all threads ever in the way it is written.

Address this by:

1) Adding a timeout which aborts the test

2) Letting the test threads exit once they got a signal instead of
running continuously. That ensures that the other threads will
have a chance to expire the timer and get the signal.

3) Adding a detection whether all signals arrvied at the main thread,
which allows to run the test on older kernels and emit 'SKIP'.

While at it get rid of the pointless atomic operation on a the thread local
variable in the signal handler.

Signed-off-by: Thomas Gleixner <[email protected]>
---
tools/testing/selftests/timers/posix_timers.c | 41 ++++++++++++++++----------
1 file changed, 26 insertions(+), 15 deletions(-)

--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -184,18 +184,19 @@ static int check_timer_create(int which)
return 0;
}

-int remain;
-__thread int got_signal;
+static int remain;
+static __thread int got_signal;

static void *distribution_thread(void *arg)
{
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+ while (!done && !got_signal);
+
return NULL;
}

static void distribution_handler(int nr)
{
- if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
+ if (++got_signal == 1)
__atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
}

@@ -205,8 +206,6 @@ static void distribution_handler(int nr)
*/
static int check_timer_distribution(void)
{
- int err, i;
- timer_t id;
const int nthreads = 10;
pthread_t threads[nthreads];
struct itimerspec val = {
@@ -215,7 +214,11 @@ static int check_timer_distribution(void
.it_interval.tv_sec = 0,
.it_interval.tv_nsec = 1000 * 1000,
};
+ time_t start, now;
+ timer_t id;
+ int err, i;

+ done = 0;
remain = nthreads + 1; /* worker threads + this thread */
signal(SIGALRM, distribution_handler);
err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
@@ -230,8 +233,7 @@ static int check_timer_distribution(void
}

for (i = 0; i < nthreads; i++) {
- err = pthread_create(&threads[i], NULL, distribution_thread,
- NULL);
+ err = pthread_create(&threads[i], NULL, distribution_thread, NULL);
if (err) {
ksft_print_msg("Can't create thread: %s (%d)\n",
strerror(errno), errno);
@@ -240,7 +242,18 @@ static int check_timer_distribution(void
}

/* Wait for all threads to receive the signal. */
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+ now = start = time(NULL);
+ while (__atomic_load_n(&remain, __ATOMIC_RELAXED)) {
+ now = time(NULL);
+ if (now - start > 2)
+ break;
+ }
+ done = 1;
+
+ if (timer_delete(id)) {
+ ksft_perror("Can't delete timer\n");
+ return -1;
+ }

for (i = 0; i < nthreads; i++) {
err = pthread_join(threads[i], NULL);
@@ -251,12 +264,10 @@ static int check_timer_distribution(void
}
}

- if (timer_delete(id)) {
- ksft_perror("Can't delete timer");
- return -1;
- }
-
- ksft_test_result_pass("check_timer_distribution\n");
+ if (__atomic_load_n(&remain, __ATOMIC_RELAXED) == nthreads)
+ ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
+ else
+ ksft_test_result(now - start <= 2, "check signal distribution\n");
return 0;
}


2024-04-03 18:16:34

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Wed, Apr 3, 2024 at 9:32 AM Thomas Gleixner <[email protected]> wrote:
> Subject: selftests/timers/posix_timers: Make signal distribution test less fragile
> From: Thomas Gleixner <[email protected]>
>
> The signal distribution test has a tendency to hang for a long time as the
> signal delivery is not really evenly distributed. In fact it might never be
> distributed across all threads ever in the way it is written.
>
> Address this by:
>
> 1) Adding a timeout which aborts the test
>
> 2) Letting the test threads exit once they got a signal instead of
> running continuously. That ensures that the other threads will
> have a chance to expire the timer and get the signal.
>
> 3) Adding a detection whether all signals arrvied at the main thread,
> which allows to run the test on older kernels and emit 'SKIP'.
>
> While at it get rid of the pointless atomic operation on a the thread local
> variable in the signal handler.
>
> Signed-off-by: Thomas Gleixner <[email protected]>

Thanks for this, Thomas!

Just FYI: testing with 6.1, the test no longer hangs, but I don't see
the SKIP behavior. It just fails:
not ok 6 check signal distribution
# Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0

I've not had time yet to dig into what's going on, but let me know if
you need any further details.

thanks
-john

2024-04-03 19:10:13

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Wed, Apr 03 2024 at 11:16, John Stultz wrote:
> On Wed, Apr 3, 2024 at 9:32 AM Thomas Gleixner <[email protected]> wrote:
> Thanks for this, Thomas!
>
> Just FYI: testing with 6.1, the test no longer hangs, but I don't see
> the SKIP behavior. It just fails:
> not ok 6 check signal distribution
> # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0
>
> I've not had time yet to dig into what's going on, but let me know if
> you need any further details.

That's weird. I ran it on my laptop with 6.1.y ...

What kind of machine is that?

2024-04-03 22:25:13

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Wed, Apr 03 2024 at 12:35, John Stultz wrote:
> On Wed, Apr 3, 2024 at 12:10 PM Thomas Gleixner <[email protected]> wrote:
>>
>> On Wed, Apr 03 2024 at 11:16, John Stultz wrote:
>> > On Wed, Apr 3, 2024 at 9:32 AM Thomas Gleixner <tglx@linutronixde> wrote:
>> > Thanks for this, Thomas!
>> >
>> > Just FYI: testing with 6.1, the test no longer hangs, but I don't see
>> > the SKIP behavior. It just fails:
>> > not ok 6 check signal distribution
>> > # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0
>> >
>> > I've not had time yet to dig into what's going on, but let me know if
>> > you need any further details.
>>
>> That's weird. I ran it on my laptop with 6.1.y ...
>>
>> What kind of machine is that?
>
> I was running it in a VM.
>
> Interestingly with 64cpus it sometimes will do the skip behavior, but
> with 4 cpus it seems to always fail.

Duh, yes. The problem is that any thread might grab the signal as it is
process wide.

What was I thinking? Not much obviously.

The distribution mechanism is only targeting the wakeup at signal
queuing time and therefore avoids the wakeup of idle tasks. But it does
not guarantee that the signal is evenly distributed to the threads on
actual signal delivery.

Even with the change to stop the worker threads when they got a signal
it's not guaranteed that the last worker will actually get one within
the timeout simply because the main thread can win the race to collect
the signal every time. I just managed to make the patched test fail in
one out of 100 runs.

IOW, we cannot test this reliably at all with the current approach.

I'll think about it tomorrow again with brain awake.

Thanks,

tglx


2024-04-04 08:55:39

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Wed, 3 Apr 2024 at 17:43, Thomas Gleixner <[email protected]> wrote:
>
> On Wed, Apr 03 2024 at 17:03, Oleg Nesterov wrote:
> > On 04/03, Thomas Gleixner wrote:
> >> The test if fragile as hell as there is absolutely no guarantee that the
> >> signal target distribution is as expected. The expectation is based on a
> >> statistical assumption which does not really hold.
> >
> > Agreed. I too never liked this test-case.
> >
> > I forgot everything about this patch and test-case, I can't really read
> > your patch right now (sorry), so I am sure I missed something, but
> >
> >> static void *distribution_thread(void *arg)
> >> {
> >> - while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> >> - return NULL;
> >> + while (__atomic_load_n(&remain, __ATOMIC_RELAXED) && !done) {
> >> + if (got_signal)
> >> + usleep(10);
> >> + }
> >> +
> >> + return (void *)got_signal;
> >> }
> >
> > Why distribution_thread() can't simply exit if got_signal != 0 ?
> >
> > See https://lore.kernel.org/all/[email protected]/
>
> Indeed. It's too obvious :)

This test models the intended use-case that was the motivation for the change:
We want to sample execution of a running multi-threaded program, it
has multiple active threads (that don't exit), since all threads are
running and consuming CPU, they all should get a signal eventually.

If threads will exit once they get a signal, then the test will pass
even if signal delivery is biased towards a single running thread all
the time (the previous kernel impl).

2024-04-04 13:46:04

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

Perhaps I am totally confused, but.

On 04/04, Dmitry Vyukov wrote:
>
> On Wed, 3 Apr 2024 at 17:43, Thomas Gleixner <[email protected]> wrote:
> >
> > > Why distribution_thread() can't simply exit if got_signal != 0 ?
> > >
> > > See https://lore.kernel.org/all/[email protected]/
> >
> > Indeed. It's too obvious :)
>
> This test models the intended use-case that was the motivation for the change:
> We want to sample execution of a running multi-threaded program, it
> has multiple active threads (that don't exit), since all threads are
> running and consuming CPU,

Yes,

> they all should get a signal eventually.

Well, yes and no.

No, in a sense that the motivation was not to ensure that all threads
get a signal, the motivation was to ensure that cpu_timer_fire() paths
will use the current task as the default target for signal_wake_up/etc.
This is just optimization.

But yes, all should get a signal eventually. And this will happen with
or without the commit bcb7ee79029dca ("posix-timers: Prefer delivery of
signals to the current thread"). Any thread can dequeue a shared signal,
say, on return from interrupt.

Just without that commit this "eventually" means A_LOT_OF_TIME statistically.

> If threads will exit once they get a signal,

just in case, the main thread should not exit ...

> then the test will pass
> even if signal delivery is biased towards a single running thread all
> the time (the previous kernel impl).

See above.

But yes, I agree, if thread exits once it get a signal, then A_LOT_OF_TIME
will be significantly decreased. But again, this is just statistical issue,
I do not see how can we test the commit bcb7ee79029dca reliably.

OTOH. If the threads do not exit after they get signal, then _in theory_
nothing can guarantee that this test-case will ever complete even with
that commit. It is possible that one of the threads will "never" have a
chance to run cpu_timer_fire().

In short, I leave this to you and Thomas. I have no idea how to write a
"good" test for that commit.

Well... perhaps the main thread should just sleep in pause(), and
distribution_handler() should check that gettid() != getpid() ?
Something like this maybe... We need to ensure that the main thread
enters pause before timer_settime().

Oleg.


2024-04-04 15:10:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Thu, Apr 04 2024 at 15:43, Oleg Nesterov wrote:
> On 04/04, Dmitry Vyukov wrote:
>> they all should get a signal eventually.
>
> Well, yes and no.
>
> No, in a sense that the motivation was not to ensure that all threads
> get a signal, the motivation was to ensure that cpu_timer_fire() paths
> will use the current task as the default target for signal_wake_up/etc.
> This is just optimization.
>
> But yes, all should get a signal eventually. And this will happen with
> or without the commit bcb7ee79029dca ("posix-timers: Prefer delivery of
> signals to the current thread"). Any thread can dequeue a shared signal,
> say, on return from interrupt.
>
> Just without that commit this "eventually" means A_LOT_OF_TIME
> statistically.

bcb7ee79029dca only directs the wakeup to current, but the signal is
still queued in the process wide shared pending list. So the thread
which sees sigpending() first will grab and deliver it to itself.

> But yes, I agree, if thread exits once it get a signal, then A_LOT_OF_TIME
> will be significantly decreased. But again, this is just statistical issue,
> I do not see how can we test the commit bcb7ee79029dca reliably.

We can't.

What we can actually test is the avoidance of waking up the main thread
by doing the following in the main thread:

start_threads();
barrier_wait();
nanosleep(2 seconds);
done = 1;
stop_threads();

and in the first thread which is started:

first_thread()
barrier_wait();
start_timer();
loop()

On a pre 6.3 kernel nanosleep() will return early because the main
thread is woken up and will eventually win the race to deliver the
signal.

On a 6.3 and later kernel nanosleep() will not return early because the
main thread is not woken up as the wake up is directed at current,
i.e. a worker thread, which is running anyway and will consume the
signal.

> OTOH. If the threads do not exit after they get signal, then _in theory_
> nothing can guarantee that this test-case will ever complete even with
> that commit. It is possible that one of the threads will "never" have a
> chance to run cpu_timer_fire().

Even with the exit I managed to make one out of 100 runs run into the
timeout because the main thread always won the race.

> In short, I leave this to you and Thomas. I have no idea how to write a
> "good" test for that commit.
>
> Well... perhaps the main thread should just sleep in pause(), and
> distribution_handler() should check that gettid() != getpid() ?
> Something like this maybe... We need to ensure that the main thread
> enters pause before timer_settime().

I'm testing a modification which implements something like the above and
the success condition is that the main thread does not return early from
nanosleep() and has no signal accounted. It survived 2000 iterations by
now.

Let me polish it up.

Thanks,

tglx


2024-04-04 15:24:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On 04/04, Thomas Gleixner wrote:
>
> IOW, we cannot test this reliably at all with the current approach.

Agreed!

So how about a REALLY SIMPLE test-case below?

Lacks error checking, should be updated to match tools/testing/selftests.

Without commit bcb7ee79029dca assert(sig_cnt > SIG_CNT) fails, the very
1st tick wakes the leader up.

With that commit it doesn't fail.

Oleg.

#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>
#include <time.h>
#include <assert.h>

#define SIG_CNT 100
static volatile int sig_cnt;

static void alarm_func(int sig)
{
++sig_cnt;
}

static void *thread_func(void *arg)
{
// one second before the 1st tick to ensure the leader sleeps
struct itimerspec val = {
.it_value.tv_sec = 1,
.it_value.tv_nsec = 0,
.it_interval.tv_sec = 0,
.it_interval.tv_nsec = 1000 * 1000,
};
timer_t id;

timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
timer_settime(id, 0, &val, NULL);

while (sig_cnt < SIG_CNT)
;

// wake up the leader
kill(getpid(), SIGALRM);

return NULL;
}

int main(void)
{
pthread_t thread;

signal(SIGALRM, alarm_func);

pthread_create(&thread, NULL, thread_func, NULL);

pause();

assert(sig_cnt > SIG_CNT); // likely SIG_CNT + 1

return 0;
}


2024-04-04 15:33:33

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On 04/04, Thomas Gleixner wrote:
>
> On Thu, Apr 04 2024 at 15:43, Oleg Nesterov wrote:
>
> > And this will happen with
> > or without the commit bcb7ee79029dca ("posix-timers: Prefer delivery of
> > signals to the current thread"). Any thread can dequeue a shared signal,
> > say, on return from interrupt.
> >
> > Just without that commit this "eventually" means A_LOT_OF_TIME
> > statistically.
>
> bcb7ee79029dca only directs the wakeup to current, but the signal is
> still queued in the process wide shared pending list. So the thread
> which sees sigpending() first will grab and deliver it to itself.

This is what I tried to say above.

> What we can actually test is the avoidance of waking up the main thread
> by doing the following in the main thread:

Hmm... I think it can be even simpler,

> I'm testing a modification which implements something like the above and
> the success condition is that the main thread does not return early from
> nanosleep() and has no signal accounted. It survived 2000 iterations by
> now.

Yes, but please see a trivial test-case I sent you few minutes ago.

Oleg.


2024-04-04 18:08:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Thu, Apr 04 2024 at 16:54, Oleg Nesterov wrote:

> On 04/04, Thomas Gleixner wrote:
>>
>> IOW, we cannot test this reliably at all with the current approach.
>
> Agreed!
>
> So how about a REALLY SIMPLE test-case below?
>
> Lacks error checking, should be updated to match tools/testing/selftests.
>
> Without commit bcb7ee79029dca assert(sig_cnt > SIG_CNT) fails, the very
> 1st tick wakes the leader up.
>
> With that commit it doesn't fail.

Clever!

2024-04-03 20:13:30

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Wed, Apr 3, 2024 at 12:10 PM Thomas Gleixner <[email protected]> wrote:
>
> On Wed, Apr 03 2024 at 11:16, John Stultz wrote:
> > On Wed, Apr 3, 2024 at 9:32 AM Thomas Gleixner <tglx@linutronixde> wrote:
> > Thanks for this, Thomas!
> >
> > Just FYI: testing with 6.1, the test no longer hangs, but I don't see
> > the SKIP behavior. It just fails:
> > not ok 6 check signal distribution
> > # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0
> >
> > I've not had time yet to dig into what's going on, but let me know if
> > you need any further details.
>
> That's weird. I ran it on my laptop with 6.1.y ...
>
> What kind of machine is that?

I was running it in a VM.

Interestingly with 64cpus it sometimes will do the skip behavior, but
with 4 cpus it seems to always fail.

thanks
-john

2024-04-05 04:28:33

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

On Thu, 4 Apr 2024 at 15:45, Oleg Nesterov <[email protected]> wrote:
>
> Perhaps I am totally confused, but.
>
> On 04/04, Dmitry Vyukov wrote:
> >
> > On Wed, 3 Apr 2024 at 17:43, Thomas Gleixner <[email protected]> wrote:
> > >
> > > > Why distribution_thread() can't simply exit if got_signal != 0 ?
> > > >
> > > > See https://lore.kernel.org/all/[email protected]/
> > >
> > > Indeed. It's too obvious :)
> >
> > This test models the intended use-case that was the motivation for the change:
> > We want to sample execution of a running multi-threaded program, it
> > has multiple active threads (that don't exit), since all threads are
> > running and consuming CPU,
>
> Yes,
>
> > they all should get a signal eventually.
>
> Well, yes and no.
>
> No, in a sense that the motivation was not to ensure that all threads
> get a signal, the motivation was to ensure that cpu_timer_fire() paths
> will use the current task as the default target for signal_wake_up/etc.
> This is just optimization.
>
> But yes, all should get a signal eventually. And this will happen with
> or without the commit bcb7ee79029dca ("posix-timers: Prefer delivery of
> signals to the current thread"). Any thread can dequeue a shared signal,
> say, on return from interrupt.
>
> Just without that commit this "eventually" means A_LOT_OF_TIME statistically.

I agree that any thread can pick the signal, but this A_LOT_OF_TIME
makes it impossible for the test to reliably repeatedly pass w/o the
change in any reasonable testing system.
With the change the test was finishing/passing for me immediately all the time.

Again, if the test causes practical problems (flaky), then I don't
mind relaxing it (flaky tests suck). I was just against giving up on
testing proactively just in case.



> > If threads will exit once they get a signal,
>
> just in case, the main thread should not exit ...
>
> > then the test will pass
> > even if signal delivery is biased towards a single running thread all
> > the time (the previous kernel impl).
>
> See above.
>
> But yes, I agree, if thread exits once it get a signal, then A_LOT_OF_TIME
> will be significantly decreased. But again, this is just statistical issue,
> I do not see how can we test the commit bcb7ee79029dca reliably.
>
> OTOH. If the threads do not exit after they get signal, then _in theory_
> nothing can guarantee that this test-case will ever complete even with
> that commit. It is possible that one of the threads will "never" have a
> chance to run cpu_timer_fire().
>
> In short, I leave this to you and Thomas. I have no idea how to write a
> "good" test for that commit.
>
> Well... perhaps the main thread should just sleep in pause(), and
> distribution_handler() should check that gettid() != getpid() ?
> Something like this maybe... We need to ensure that the main thread
> enters pause before timer_settime().
>
> Oleg.
>

2024-04-06 15:11:57

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Thomas says:

The signal distribution test has a tendency to hang for a long
time as the signal delivery is not really evenly distributed. In
fact it might never be distributed across all threads ever in
the way it is written.

To me even the

This primarily tests that the kernel does not favour any one.

comment doesn't look right. The kernel does favour a thread which hits
the timer interrupt when CLOCK_PROCESS_CPUTIME_ID expires.

The new version simply checks that the group leader sleeping in join()
never receives SIGALRM, cpu_timer_fire() should always send the signal
to the thread which burns cpu.

Without the commit bcb7ee79029d ("posix-timers: Prefer delivery of signals
to the current thread") the test-case fails immediately, the very 1st tick
wakes the leader up. Otherwise it quickly succeeds after 100 ticks.

Signed-off-by: Oleg Nesterov <[email protected]>
---
tools/testing/selftests/timers/posix_timers.c | 102 ++++++++----------
1 file changed, 46 insertions(+), 56 deletions(-)

diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index d49dd3ffd0d9..2586a6552737 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -184,80 +184,70 @@ static int check_timer_create(int which)
return 0;
}

-int remain;
-__thread int got_signal;
+static pthread_t ctd_thread;
+static volatile int ctd_count, ctd_failed;

-static void *distribution_thread(void *arg)
+static void ctd_sighandler(int sig)
{
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
- return NULL;
+ if (pthread_self() != ctd_thread)
+ ctd_failed = 1;
+ ctd_count--;
}

-static void distribution_handler(int nr)
+static void *ctd_thread_func(void *arg)
{
- if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
- __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
-}
-
-/*
- * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
- * timer signals. This primarily tests that the kernel does not favour any one.
- */
-static int check_timer_distribution(void)
-{
- int err, i;
- timer_t id;
- const int nthreads = 10;
- pthread_t threads[nthreads];
struct itimerspec val = {
.it_value.tv_sec = 0,
.it_value.tv_nsec = 1000 * 1000,
.it_interval.tv_sec = 0,
.it_interval.tv_nsec = 1000 * 1000,
};
+ timer_t id;

- remain = nthreads + 1; /* worker threads + this thread */
- signal(SIGALRM, distribution_handler);
- err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
- if (err < 0) {
- ksft_perror("Can't create timer");
- return -1;
- }
- err = timer_settime(id, 0, &val, NULL);
- if (err < 0) {
- ksft_perror("Can't set timer");
- return -1;
- }
+ /* 1/10 seconds to ensure the leader sleeps */
+ usleep(10000);

- for (i = 0; i < nthreads; i++) {
- err = pthread_create(&threads[i], NULL, distribution_thread,
- NULL);
- if (err) {
- ksft_print_msg("Can't create thread: %s (%d)\n",
- strerror(errno), errno);
- return -1;
- }
- }
+ ctd_count = 100;
+ if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
+ return "Can't create timer";
+ if (timer_settime(id, 0, &val, NULL))
+ return "Can't set timer";

- /* Wait for all threads to receive the signal. */
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+ while (ctd_count > 0 && !ctd_failed)
+ ;

- for (i = 0; i < nthreads; i++) {
- err = pthread_join(threads[i], NULL);
- if (err) {
- ksft_print_msg("Can't join thread: %s (%d)\n",
- strerror(errno), errno);
- return -1;
- }
- }
+ if (timer_delete(id))
+ return "Can't delete timer";

- if (timer_delete(id)) {
- ksft_perror("Can't delete timer");
- return -1;
- }
+ return NULL;
+}
+
+/*
+ * Test that only the running thread receives the timer signal.
+ */
+static int check_timer_distribution(void)
+{
+ const char *errmsg;
+
+ signal(SIGALRM, ctd_sighandler);
+
+ errmsg = "Can't create thread";
+ if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
+ goto err;
+
+ errmsg = "Can't join thread";
+ if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
+ goto err;
+
+ if (ctd_failed)
+ ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
+ else
+ ksft_test_result_pass("check signal distribution\n");

- ksft_test_result_pass("check_timer_distribution\n");
return 0;
+err:
+ ksft_print_msg(errmsg);
+ return -1;
}

int main(int argc, char **argv)
--
2.25.1.362.g51ebf55



2024-04-06 15:13:03

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Dmitry, Thomas,

To simplify the review I've attached the code with this patch applied below.

Yes, this changes the "semantics" of check_timer_distribution(), perhaps it
should be renamed.

But I do not see a better approach, and in fact I think that

Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID

is the wrong goal.

Do you agree?

Oleg.
-------------------------------------------------------------------------------

static pthread_t ctd_thread;
static volatile int ctd_count, ctd_failed;

static void ctd_sighandler(int sig)
{
if (pthread_self() != ctd_thread)
ctd_failed = 1;
ctd_count--;
}

static void *ctd_thread_func(void *arg)
{
struct itimerspec val = {
.it_value.tv_sec = 0,
.it_value.tv_nsec = 1000 * 1000,
.it_interval.tv_sec = 0,
.it_interval.tv_nsec = 1000 * 1000,
};
timer_t id;

/* 1/10 seconds to ensure the leader sleeps */
usleep(10000);

ctd_count = 100;
if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
return "Can't create timer";
if (timer_settime(id, 0, &val, NULL))
return "Can't set timer";

while (ctd_count > 0 && !ctd_failed)
;

if (timer_delete(id))
return "Can't delete timer";

return NULL;
}

/*
* Test that only the running thread receives the timer signal.
*/
static int check_timer_distribution(void)
{
const char *errmsg;

signal(SIGALRM, ctd_sighandler);

errmsg = "Can't create thread";
if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
goto err;

errmsg = "Can't join thread";
if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
goto err;

if (ctd_failed)
ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
else
ksft_test_result_pass("check signal distribution\n");

return 0;
err:
ksft_print_msg(errmsg);
return -1;
}


2024-04-06 20:53:25

by Muhammad Usama Anjum

[permalink] [raw]
Subject: Re: [PATCH v6 2/2] selftests/timers/posix_timers: Test delivery of signals across threads

On 3/16/23 5:30 PM, Marco Elver wrote:
> From: Dmitry Vyukov <[email protected]>
>
> Test that POSIX timers using CLOCK_PROCESS_CPUTIME_ID eventually deliver
> a signal to all running threads. This effectively tests that the kernel
> doesn't prefer any one thread (or subset of threads) for signal delivery.
>
> Signed-off-by: Dmitry Vyukov <[email protected]>
> Signed-off-by: Marco Elver <[email protected]>
> ---
> v6:
> - Update wording on what the test aims to test.
> - Fix formatting per checkpatch.pl.
> ---
> tools/testing/selftests/timers/posix_timers.c | 77 +++++++++++++++++++
> 1 file changed, 77 insertions(+)
>
> diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
> index 0ba500056e63..8a17c0e8d82b 100644
> --- a/tools/testing/selftests/timers/posix_timers.c
> +++ b/tools/testing/selftests/timers/posix_timers.c
> @@ -188,6 +188,80 @@ static int check_timer_create(int which)
> return 0;
> }
>
> +int remain;
> +__thread int got_signal;
> +
> +static void *distribution_thread(void *arg)
> +{
> + while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> + return NULL;
> +}
> +
> +static void distribution_handler(int nr)
> +{
> + if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
> + __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
> +}
> +
> +/*
> + * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
> + * timer signals. This primarily tests that the kernel does not favour any one.
> + */
> +static int check_timer_distribution(void)
> +{
> + int err, i;
> + timer_t id;
> + const int nthreads = 10;
> + pthread_t threads[nthreads];
> + struct itimerspec val = {
> + .it_value.tv_sec = 0,
> + .it_value.tv_nsec = 1000 * 1000,
> + .it_interval.tv_sec = 0,
> + .it_interval.tv_nsec = 1000 * 1000,
> + };
> +
> + printf("Check timer_create() per process signal distribution... ");
Use APIs from kselftest.h. Use ksft_print_msg() here.

> + fflush(stdout);
> +
> + remain = nthreads + 1; /* worker threads + this thread */
> + signal(SIGALRM, distribution_handler);
> + err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
> + if (err < 0) {
> + perror("Can't create timer\n");
ksft_perror() here

> + return -1;
> + }
> + err = timer_settime(id, 0, &val, NULL);
> + if (err < 0) {
> + perror("Can't set timer\n");
> + return -1;
> + }
> +
> + for (i = 0; i < nthreads; i++) {
> + if (pthread_create(&threads[i], NULL, distribution_thread, NULL)) {
> + perror("Can't create thread\n");
> + return -1;
> + }
> + }
> +
> + /* Wait for all threads to receive the signal. */
> + while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> +
> + for (i = 0; i < nthreads; i++) {
> + if (pthread_join(threads[i], NULL)) {
> + perror("Can't join thread\n");
> + return -1;
> + }
> + }
> +
> + if (timer_delete(id)) {
> + perror("Can't delete timer\n");
> + return -1;
> + }
> +
> + printf("[OK]\n");
ksft_test_result or _pass variant as needed?

> + return 0;
> +}
> +
> int main(int argc, char **argv)
> {
> printf("Testing posix timers. False negative may happen on CPU execution \n");
> @@ -217,5 +291,8 @@ int main(int argc, char **argv)
> if (check_timer_create(CLOCK_PROCESS_CPUTIME_ID) < 0)
> return ksft_exit_fail();
>
> + if (check_timer_distribution() < 0)
> + return ksft_exit_fail();
> +
> return ksft_exit_pass();
> }

--
BR,
Muhammad Usama Anjum

2024-04-06 21:15:07

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v6 2/2] selftests/timers/posix_timers: Test delivery of signals across threads

Muhammad,

I am sorry, but... are you aware that this patch was applied over a year ago,
and then this code was updated to use the ksft_API?

Oleg.

On 04/07, Muhammad Usama Anjum wrote:
>
> On 3/16/23 5:30 PM, Marco Elver wrote:
> > From: Dmitry Vyukov <[email protected]>
> >
> > Test that POSIX timers using CLOCK_PROCESS_CPUTIME_ID eventually deliver
> > a signal to all running threads. This effectively tests that the kernel
> > doesn't prefer any one thread (or subset of threads) for signal delivery.
> >
> > Signed-off-by: Dmitry Vyukov <[email protected]>
> > Signed-off-by: Marco Elver <[email protected]>
> > ---
> > v6:
> > - Update wording on what the test aims to test.
> > - Fix formatting per checkpatch.pl.
> > ---
> > tools/testing/selftests/timers/posix_timers.c | 77 +++++++++++++++++++
> > 1 file changed, 77 insertions(+)
> >
> > diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
> > index 0ba500056e63..8a17c0e8d82b 100644
> > --- a/tools/testing/selftests/timers/posix_timers.c
> > +++ b/tools/testing/selftests/timers/posix_timers.c
> > @@ -188,6 +188,80 @@ static int check_timer_create(int which)
> > return 0;
> > }
> >
> > +int remain;
> > +__thread int got_signal;
> > +
> > +static void *distribution_thread(void *arg)
> > +{
> > + while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> > + return NULL;
> > +}
> > +
> > +static void distribution_handler(int nr)
> > +{
> > + if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
> > + __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
> > +}
> > +
> > +/*
> > + * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
> > + * timer signals. This primarily tests that the kernel does not favour any one.
> > + */
> > +static int check_timer_distribution(void)
> > +{
> > + int err, i;
> > + timer_t id;
> > + const int nthreads = 10;
> > + pthread_t threads[nthreads];
> > + struct itimerspec val = {
> > + .it_value.tv_sec = 0,
> > + .it_value.tv_nsec = 1000 * 1000,
> > + .it_interval.tv_sec = 0,
> > + .it_interval.tv_nsec = 1000 * 1000,
> > + };
> > +
> > + printf("Check timer_create() per process signal distribution... ");
> Use APIs from kselftest.h. Use ksft_print_msg() here.
>
> > + fflush(stdout);
> > +
> > + remain = nthreads + 1; /* worker threads + this thread */
> > + signal(SIGALRM, distribution_handler);
> > + err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
> > + if (err < 0) {
> > + perror("Can't create timer\n");
> ksft_perror() here
>
> > + return -1;
> > + }
> > + err = timer_settime(id, 0, &val, NULL);
> > + if (err < 0) {
> > + perror("Can't set timer\n");
> > + return -1;
> > + }
> > +
> > + for (i = 0; i < nthreads; i++) {
> > + if (pthread_create(&threads[i], NULL, distribution_thread, NULL)) {
> > + perror("Can't create thread\n");
> > + return -1;
> > + }
> > + }
> > +
> > + /* Wait for all threads to receive the signal. */
> > + while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> > +
> > + for (i = 0; i < nthreads; i++) {
> > + if (pthread_join(threads[i], NULL)) {
> > + perror("Can't join thread\n");
> > + return -1;
> > + }
> > + }
> > +
> > + if (timer_delete(id)) {
> > + perror("Can't delete timer\n");
> > + return -1;
> > + }
> > +
> > + printf("[OK]\n");
> ksft_test_result or _pass variant as needed?
>
> > + return 0;
> > +}
> > +
> > int main(int argc, char **argv)
> > {
> > printf("Testing posix timers. False negative may happen on CPU execution \n");
> > @@ -217,5 +291,8 @@ int main(int argc, char **argv)
> > if (check_timer_create(CLOCK_PROCESS_CPUTIME_ID) < 0)
> > return ksft_exit_fail();
> >
> > + if (check_timer_distribution() < 0)
> > + return ksft_exit_fail();
> > +
> > return ksft_exit_pass();
> > }
>
> --
> BR,
> Muhammad Usama Anjum
>


2024-04-06 21:32:00

by Muhammad Usama Anjum

[permalink] [raw]
Subject: Re: [PATCH v6 2/2] selftests/timers/posix_timers: Test delivery of signals across threads

On 4/7/24 2:13 AM, Oleg Nesterov wrote:
> Muhammad,
>
> I am sorry, but... are you aware that this patch was applied over a year ago,
> and then this code was updated to use the ksft_API?
Sorry, didn't realized this is already applied. So this patch is already
applied and it has already been made compliant.

Thanks

>
> Oleg.
>
> On 04/07, Muhammad Usama Anjum wrote:
>>
>> On 3/16/23 5:30 PM, Marco Elver wrote:
>>> From: Dmitry Vyukov <[email protected]>
>>>
>>> Test that POSIX timers using CLOCK_PROCESS_CPUTIME_ID eventually deliver
>>> a signal to all running threads. This effectively tests that the kernel
>>> doesn't prefer any one thread (or subset of threads) for signal delivery.
>>>
>>> Signed-off-by: Dmitry Vyukov <[email protected]>
>>> Signed-off-by: Marco Elver <[email protected]>
>>> ---
>>> v6:
>>> - Update wording on what the test aims to test.
>>> - Fix formatting per checkpatch.pl.
>>> ---
>>> tools/testing/selftests/timers/posix_timers.c | 77 +++++++++++++++++++
>>> 1 file changed, 77 insertions(+)
>>>
>>> diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
>>> index 0ba500056e63..8a17c0e8d82b 100644
>>> --- a/tools/testing/selftests/timers/posix_timers.c
>>> +++ b/tools/testing/selftests/timers/posix_timers.c
>>> @@ -188,6 +188,80 @@ static int check_timer_create(int which)
>>> return 0;
>>> }
>>>
>>> +int remain;
>>> +__thread int got_signal;
>>> +
>>> +static void *distribution_thread(void *arg)
>>> +{
>>> + while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
>>> + return NULL;
>>> +}
>>> +
>>> +static void distribution_handler(int nr)
>>> +{
>>> + if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
>>> + __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
>>> +}
>>> +
>>> +/*
>>> + * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
>>> + * timer signals. This primarily tests that the kernel does not favour any one.
>>> + */
>>> +static int check_timer_distribution(void)
>>> +{
>>> + int err, i;
>>> + timer_t id;
>>> + const int nthreads = 10;
>>> + pthread_t threads[nthreads];
>>> + struct itimerspec val = {
>>> + .it_value.tv_sec = 0,
>>> + .it_value.tv_nsec = 1000 * 1000,
>>> + .it_interval.tv_sec = 0,
>>> + .it_interval.tv_nsec = 1000 * 1000,
>>> + };
>>> +
>>> + printf("Check timer_create() per process signal distribution... ");
>> Use APIs from kselftest.h. Use ksft_print_msg() here.
>>
>>> + fflush(stdout);
>>> +
>>> + remain = nthreads + 1; /* worker threads + this thread */
>>> + signal(SIGALRM, distribution_handler);
>>> + err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
>>> + if (err < 0) {
>>> + perror("Can't create timer\n");
>> ksft_perror() here
>>
>>> + return -1;
>>> + }
>>> + err = timer_settime(id, 0, &val, NULL);
>>> + if (err < 0) {
>>> + perror("Can't set timer\n");
>>> + return -1;
>>> + }
>>> +
>>> + for (i = 0; i < nthreads; i++) {
>>> + if (pthread_create(&threads[i], NULL, distribution_thread, NULL)) {
>>> + perror("Can't create thread\n");
>>> + return -1;
>>> + }
>>> + }
>>> +
>>> + /* Wait for all threads to receive the signal. */
>>> + while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
>>> +
>>> + for (i = 0; i < nthreads; i++) {
>>> + if (pthread_join(threads[i], NULL)) {
>>> + perror("Can't join thread\n");
>>> + return -1;
>>> + }
>>> + }
>>> +
>>> + if (timer_delete(id)) {
>>> + perror("Can't delete timer\n");
>>> + return -1;
>>> + }
>>> +
>>> + printf("[OK]\n");
>> ksft_test_result or _pass variant as needed?
>>
>>> + return 0;
>>> +}
>>> +
>>> int main(int argc, char **argv)
>>> {
>>> printf("Testing posix timers. False negative may happen on CPU execution \n");
>>> @@ -217,5 +291,8 @@ int main(int argc, char **argv)
>>> if (check_timer_create(CLOCK_PROCESS_CPUTIME_ID) < 0)
>>> return ksft_exit_fail();
>>>
>>> + if (check_timer_distribution() < 0)
>>> + return ksft_exit_fail();
>>> +
>>> return ksft_exit_pass();
>>> }
>>
>> --
>> BR,
>> Muhammad Usama Anjum
>>
>

--
BR,
Muhammad Usama Anjum

2024-04-06 22:01:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Sat, Apr 06 2024 at 17:10, Oleg Nesterov wrote:
> Yes, this changes the "semantics" of check_timer_distribution(), perhaps it
> should be renamed.

Definitely.

> But I do not see a better approach, and in fact I think that
>
> Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
>
> is the wrong goal.
>
> Do you agree?

No argument from my side. All we can test is that the leader is not
woken up.

Thanks,

tglx

2024-04-08 08:39:33

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Sat, 6 Apr 2024 at 17:12, Oleg Nesterov <[email protected]> wrote:
>
> Dmitry, Thomas,
>
> To simplify the review I've attached the code with this patch applied below.
>
> Yes, this changes the "semantics" of check_timer_distribution(), perhaps it
> should be renamed.
>
> But I do not see a better approach, and in fact I think that
>
> Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
>
> is the wrong goal.
>
> Do you agree?
>
> Oleg.
> -------------------------------------------------------------------------------
>
> static pthread_t ctd_thread;
> static volatile int ctd_count, ctd_failed;
>
> static void ctd_sighandler(int sig)
> {
> if (pthread_self() != ctd_thread)
> ctd_failed = 1;
> ctd_count--;
> }
>
> static void *ctd_thread_func(void *arg)
> {
> struct itimerspec val = {
> .it_value.tv_sec = 0,
> .it_value.tv_nsec = 1000 * 1000,
> .it_interval.tv_sec = 0,
> .it_interval.tv_nsec = 1000 * 1000,
> };
> timer_t id;
>
> /* 1/10 seconds to ensure the leader sleeps */
> usleep(10000);
>
> ctd_count = 100;
> if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
> return "Can't create timer";
> if (timer_settime(id, 0, &val, NULL))
> return "Can't set timer";
>
> while (ctd_count > 0 && !ctd_failed)
> ;
>
> if (timer_delete(id))
> return "Can't delete timer";
>
> return NULL;
> }
>
> /*
> * Test that only the running thread receives the timer signal.
> */
> static int check_timer_distribution(void)
> {
> const char *errmsg;
>
> signal(SIGALRM, ctd_sighandler);
>
> errmsg = "Can't create thread";
> if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
> goto err;
>
> errmsg = "Can't join thread";
> if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
> goto err;
>
> if (ctd_failed)
> ksft_test_result_skip("No signal distribution. Assuming old kernel\n");

Shouldn't the test fail here? The goal of a test is to fail when
things don't work.
I don't see any other ksft_test_result_fail() calls, and it does not
look that the test will hang on incorrect distribution.


> else
> ksft_test_result_pass("check signal distribution\n");
>
> return 0;
> err:
> ksft_print_msg(errmsg);
> return -1;
> }
>

2024-04-08 10:01:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Mon, Apr 08 2024 at 10:30, Dmitry Vyukov wrote:
> On Sat, 6 Apr 2024 at 17:12, Oleg Nesterov <[email protected]> wrote:
>> if (ctd_failed)
>> ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
>
> Shouldn't the test fail here? The goal of a test is to fail when
> things don't work.
> I don't see any other ksft_test_result_fail() calls, and it does not
> look that the test will hang on incorrect distribution.

I have a fixup for older kernels. I'll get to Olegs patch and the fixup
later today.

Thanks,

tglx

2024-04-08 10:28:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On 04/08, Dmitry Vyukov wrote:
>
> >
> > if (ctd_failed)
> > ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
>
> Shouldn't the test fail here? The goal of a test is to fail when
> things don't work.

I've copied this from the previous patch from Thomas, I am fine
either way.

> I don't see any other ksft_test_result_fail() calls, and it does not
> look that the test will hang on incorrect distribution.

Yes, it should never hang.

Thanks,

Oleg.


2024-04-08 19:19:06

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On 04/08, Oleg Nesterov wrote:
>
> On 04/08, Dmitry Vyukov wrote:
> >
> > >
> > > if (ctd_failed)
> > > ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
> >
> > Shouldn't the test fail here? The goal of a test is to fail when
> > things don't work.
>
> I've copied this from the previous patch from Thomas, I am fine
> either way.
>
> > I don't see any other ksft_test_result_fail() calls, and it does not
> > look that the test will hang on incorrect distribution.
>
> Yes, it should never hang.

Forgot to say...

To me this test should simply do

ksft_test_result(!ctd_failed, "check signal distribution\n");
return 0;

but I am not familiar with tools/testing/selftests/ and I am not sure
I understand the last email from Thomas.

I agree with whatever you and Thomas decide.

Oleg.


2024-04-08 22:17:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Mon, Apr 08 2024 at 20:49, Oleg Nesterov wrote:
> To me this test should simply do
>
> ksft_test_result(!ctd_failed, "check signal distribution\n");
> return 0;

Right.

> but I am not familiar with tools/testing/selftests/ and I am not sure
> I understand the last email from Thomas.

The discussion started about running new tests on older kernels. As this
is a feature and not a bug fix that obviously fails on older kernels.

So something like the uncompiled below should work.

Thanks,

tglx
---
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -184,80 +184,83 @@ static int check_timer_create(int which)
return 0;
}

-int remain;
-__thread int got_signal;
+static pthread_t ctd_thread;
+static volatile int ctd_count, ctd_failed;

-static void *distribution_thread(void *arg)
+static void ctd_sighandler(int sig)
{
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
- return NULL;
-}
-
-static void distribution_handler(int nr)
-{
- if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
- __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
+ if (pthread_self() != ctd_thread)
+ ctd_failed = 1;
+ ctd_count--;
}

-/*
- * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
- * timer signals. This primarily tests that the kernel does not favour any one.
- */
-static int check_timer_distribution(void)
+static void *ctd_thread_func(void *arg)
{
- int err, i;
- timer_t id;
- const int nthreads = 10;
- pthread_t threads[nthreads];
struct itimerspec val = {
.it_value.tv_sec = 0,
.it_value.tv_nsec = 1000 * 1000,
.it_interval.tv_sec = 0,
.it_interval.tv_nsec = 1000 * 1000,
};
+ timer_t id;

- remain = nthreads + 1; /* worker threads + this thread */
- signal(SIGALRM, distribution_handler);
- err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
- if (err < 0) {
- ksft_perror("Can't create timer");
- return -1;
- }
- err = timer_settime(id, 0, &val, NULL);
- if (err < 0) {
- ksft_perror("Can't set timer");
- return -1;
- }
+ /* 1/10 seconds to ensure the leader sleeps */
+ usleep(10000);

- for (i = 0; i < nthreads; i++) {
- err = pthread_create(&threads[i], NULL, distribution_thread,
- NULL);
- if (err) {
- ksft_print_msg("Can't create thread: %s (%d)\n",
- strerror(errno), errno);
- return -1;
- }
- }
+ ctd_count = 100;
+ if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
+ return "Can't create timer";
+ if (timer_settime(id, 0, &val, NULL))
+ return "Can't set timer";

- /* Wait for all threads to receive the signal. */
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+ while (ctd_count > 0 && !ctd_failed)
+ ;

- for (i = 0; i < nthreads; i++) {
- err = pthread_join(threads[i], NULL);
- if (err) {
- ksft_print_msg("Can't join thread: %s (%d)\n",
- strerror(errno), errno);
- return -1;
- }
- }
+ if (timer_delete(id))
+ return "Can't delete timer";

- if (timer_delete(id)) {
- ksft_perror("Can't delete timer");
+ return NULL;
+}
+
+static bool check_kernel_version(unsigned int min_major, unsigned int min_minor)
+{
+ unsigned int major, minor;
+ struct utsname info;
+
+ uname(&info);
+ if (sscanf(info.release, "%u.%u.", &major, &minor) != 2)
+ ksft_exit_fail();
+ return major > min_major || (major == min_major && minor >= min_minor);
+}
+
+/*
+ * Test that only the running thread receives the timer signal.
+ */
+static int check_timer_distribution(void)
+{
+ const char *errmsg;
+
+ if (!check_kernel_version(6, 3)) {
+ ksft_test_result_skip("check signal distribution (old kernel)\n");
return 0;
}

- ksft_test_result_pass("check_timer_distribution\n");
+ signal(SIGALRM, ctd_sighandler);
+
+ errmsg = "Can't create thread";
+ if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
+ goto err;
+
+ errmsg = "Can't join thread";
+ if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
+ goto err;
+
+ ksft_test_result(!ctd_failed, "check signal distribution\n");
return 0;
+
+err:
+ ksft_print_msg(errmsg);
+ return -1;
}

int main(int argc, char **argv)

2024-04-09 11:13:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On 04/09, Thomas Gleixner wrote:
>
> The discussion started about running new tests on older kernels. As this
> is a feature and not a bug fix that obviously fails on older kernels.

OK, I see... please see below.

> So something like the uncompiled below should work.

Hmm... this patch doesn't apply to Linus's tree...

It seems that this is because in your tree check_timer_distribution() does

if (timer_delete(id)) {
ksft_perror("Can't delete timer");
return 0;
}

while in Linus's tree it returns -1 if timer_delete() fails. Nevermind.

Thomas, I am almost shy to continue this discussion and waste your time ;)
But ...

> +static bool check_kernel_version(unsigned int min_major, unsigned int min_minor)
> +{
> + unsigned int major, minor;
> + struct utsname info;
> +
> + uname(&info);
> + if (sscanf(info.release, "%u.%u.", &major, &minor) != 2)
> + ksft_exit_fail();
> + return major > min_major || (major == min_major && minor >= min_minor);
> +}

this looks useful regardless. Perhaps it should be moved into
tools/testing/selftests/kselftest.h as ksft_ck_kernel_version() ?

> +static int check_timer_distribution(void)
> +{
> + const char *errmsg;
> +
> + if (!check_kernel_version(6, 3)) {
> + ksft_test_result_skip("check signal distribution (old kernel)\n");
> return 0;

..

> + ksft_test_result(!ctd_failed, "check signal distribution\n");

Perhaps

if (!ctd_failed)
ksft_test_result_pass("check signal distribution\n");
else if (check_kernel_version(6, 3))
ksft_test_result_fail("check signal distribution\n");
else
ksft_test_result_skip("check signal distribution (old kernel)\n");

makes more sense?

This way it can be used on the older kernels with bcb7ee79029d backported.

Oleg.


2024-04-09 11:45:50

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Tue, 9 Apr 2024 at 13:12, Oleg Nesterov <[email protected]> wrote:
>
> On 04/09, Thomas Gleixner wrote:
> >
> > The discussion started about running new tests on older kernels. As this
> > is a feature and not a bug fix that obviously fails on older kernels.
>
> OK, I see... please see below.
>
> > So something like the uncompiled below should work.
>
> Hmm... this patch doesn't apply to Linus's tree...
>
> It seems that this is because in your tree check_timer_distribution() does
>
> if (timer_delete(id)) {
> ksft_perror("Can't delete timer");
> return 0;
> }
>
> while in Linus's tree it returns -1 if timer_delete() fails. Nevermind.
>
> Thomas, I am almost shy to continue this discussion and waste your time ;)
> But ...
>
> > +static bool check_kernel_version(unsigned int min_major, unsigned int min_minor)
> > +{
> > + unsigned int major, minor;
> > + struct utsname info;
> > +
> > + uname(&info);
> > + if (sscanf(info.release, "%u.%u.", &major, &minor) != 2)
> > + ksft_exit_fail();
> > + return major > min_major || (major == min_major && minor >= min_minor);
> > +}
>
> this looks useful regardless. Perhaps it should be moved into
> tools/testing/selftests/kselftest.h as ksft_ck_kernel_version() ?
>
> > +static int check_timer_distribution(void)
> > +{
> > + const char *errmsg;
> > +
> > + if (!check_kernel_version(6, 3)) {
> > + ksft_test_result_skip("check signal distribution (old kernel)\n");
> > return 0;
>
> ...
>
> > + ksft_test_result(!ctd_failed, "check signal distribution\n");
>
> Perhaps
>
> if (!ctd_failed)
> ksft_test_result_pass("check signal distribution\n");
> else if (check_kernel_version(6, 3))
> ksft_test_result_fail("check signal distribution\n");
> else
> ksft_test_result_skip("check signal distribution (old kernel)\n");
>
> makes more sense?

This looks even better!

> This way it can be used on the older kernels with bcb7ee79029d backported.
>
> Oleg.
>

2024-04-09 12:03:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Tue, Apr 09 2024 at 13:10, Oleg Nesterov wrote:
> On 04/09, Thomas Gleixner wrote:
> It seems that this is because in your tree check_timer_distribution() does
>
> if (timer_delete(id)) {
> ksft_perror("Can't delete timer");
> return 0;
> }
>
> while in Linus's tree it returns -1 if timer_delete()
> fails. Nevermind.

Ooops.

>> +static bool check_kernel_version(unsigned int min_major, unsigned int min_minor)
>> +{
>> + unsigned int major, minor;
>> + struct utsname info;
>> +
>> + uname(&info);
>> + if (sscanf(info.release, "%u.%u.", &major, &minor) != 2)
>> + ksft_exit_fail();
>> + return major > min_major || (major == min_major && minor >= min_minor);
>> +}
>
> this looks useful regardless. Perhaps it should be moved into
> tools/testing/selftests/kselftest.h as ksft_ck_kernel_version() ?

Makes sense.

>> +static int check_timer_distribution(void)
>> +{
>> + const char *errmsg;
>> +
>> + if (!check_kernel_version(6, 3)) {
>> + ksft_test_result_skip("check signal distribution (old kernel)\n");
>> return 0;
>
> ..
>
>> + ksft_test_result(!ctd_failed, "check signal distribution\n");
>
> Perhaps
>
> if (!ctd_failed)
> ksft_test_result_pass("check signal distribution\n");
> else if (check_kernel_version(6, 3))
> ksft_test_result_fail("check signal distribution\n");
> else
> ksft_test_result_skip("check signal distribution (old kernel)\n");
>
> makes more sense?
>
> This way it can be used on the older kernels with bcb7ee79029d backported.

Indeed.

2024-04-09 13:43:56

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2] selftests/timers/posix_timers: reimplement check_timer_distribution()

Thomas says:

The signal distribution test has a tendency to hang for a long
time as the signal delivery is not really evenly distributed. In
fact it might never be distributed across all threads ever in
the way it is written.

To me even the

This primarily tests that the kernel does not favour any one.

comment doesn't look right. The kernel does favour a thread which hits
the timer interrupt when CLOCK_PROCESS_CPUTIME_ID expires.

The new version simply checks that the group leader sleeping in join()
never receives SIGALRM, cpu_timer_fire() should always send the signal
to the thread which burns cpu.

Without the commit bcb7ee79029d ("posix-timers: Prefer delivery of signals
to the current thread") the test-case fails immediately, the very 1st tick
wakes the leader up. Otherwise it quickly succeeds after 100 ticks.

As Thomas suggested, the new version doesn't report the failure on the
pre v6.3 kernels that do not have the commit bcb7ee79029d; this is a
feature that obviously fails on the older kernels. So the patch adds the
new simple ksft_ck_kernel_version() helper and uses ksft_test_result_skip()
if check_timer_distribution() fails on the older kernel.

Signed-off-by: Oleg Nesterov <[email protected]>
---
tools/testing/selftests/kselftest.h | 14 +++
tools/testing/selftests/timers/posix_timers.c | 103 ++++++++----------
2 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 541bf192e30e..6aab3309c6a3 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -51,6 +51,7 @@
#include <stdarg.h>
#include <string.h>
#include <stdio.h>
+#include <sys/utsname.h>
#endif

#ifndef ARRAY_SIZE
@@ -388,4 +389,17 @@ static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
exit(KSFT_SKIP);
}

+static inline int ksft_ck_kernel_version(unsigned int min_major,
+ unsigned int min_minor)
+{
+ struct utsname info;
+ unsigned int major, minor;
+
+ uname(&info);
+ if (sscanf(info.release, "%u.%u.", &major, &minor) != 2)
+ ksft_exit_fail();
+
+ return major > min_major || (major == min_major && minor >= min_minor);
+}
+
#endif /* __KSELFTEST_H */
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index d49dd3ffd0d9..64c41463b704 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -184,80 +184,71 @@ static int check_timer_create(int which)
return 0;
}

-int remain;
-__thread int got_signal;
+static pthread_t ctd_thread;
+static volatile int ctd_count, ctd_failed;

-static void *distribution_thread(void *arg)
+static void ctd_sighandler(int sig)
{
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
- return NULL;
+ if (pthread_self() != ctd_thread)
+ ctd_failed = 1;
+ ctd_count--;
}

-static void distribution_handler(int nr)
+static void *ctd_thread_func(void *arg)
{
- if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
- __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
-}
-
-/*
- * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
- * timer signals. This primarily tests that the kernel does not favour any one.
- */
-static int check_timer_distribution(void)
-{
- int err, i;
- timer_t id;
- const int nthreads = 10;
- pthread_t threads[nthreads];
struct itimerspec val = {
.it_value.tv_sec = 0,
.it_value.tv_nsec = 1000 * 1000,
.it_interval.tv_sec = 0,
.it_interval.tv_nsec = 1000 * 1000,
};
+ timer_t id;

- remain = nthreads + 1; /* worker threads + this thread */
- signal(SIGALRM, distribution_handler);
- err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
- if (err < 0) {
- ksft_perror("Can't create timer");
- return -1;
- }
- err = timer_settime(id, 0, &val, NULL);
- if (err < 0) {
- ksft_perror("Can't set timer");
- return -1;
- }
+ /* 1/10 seconds to ensure the leader sleeps */
+ usleep(10000);

- for (i = 0; i < nthreads; i++) {
- err = pthread_create(&threads[i], NULL, distribution_thread,
- NULL);
- if (err) {
- ksft_print_msg("Can't create thread: %s (%d)\n",
- strerror(errno), errno);
- return -1;
- }
- }
+ ctd_count = 100;
+ if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
+ return "Can't create timer";
+ if (timer_settime(id, 0, &val, NULL))
+ return "Can't set timer";

- /* Wait for all threads to receive the signal. */
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+ while (ctd_count > 0 && !ctd_failed)
+ ;

- for (i = 0; i < nthreads; i++) {
- err = pthread_join(threads[i], NULL);
- if (err) {
- ksft_print_msg("Can't join thread: %s (%d)\n",
- strerror(errno), errno);
- return -1;
- }
- }
+ if (timer_delete(id))
+ return "Can't delete timer";

- if (timer_delete(id)) {
- ksft_perror("Can't delete timer");
- return -1;
- }
+ return NULL;
+}
+
+/*
+ * Test that only the running thread receives the timer signal.
+ */
+static int check_timer_distribution(void)
+{
+ const char *errmsg;

- ksft_test_result_pass("check_timer_distribution\n");
+ signal(SIGALRM, ctd_sighandler);
+
+ errmsg = "Can't create thread";
+ if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
+ goto err;
+
+ errmsg = "Can't join thread";
+ if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
+ goto err;
+
+ if (!ctd_failed)
+ ksft_test_result_pass("check signal distribution\n");
+ else if (ksft_ck_kernel_version(6, 3))
+ ksft_test_result_fail("check signal distribution\n");
+ else
+ ksft_test_result_skip("check signal distribution (old kernel)\n");
return 0;
+err:
+ ksft_print_msg(errmsg);
+ return -1;
}

int main(int argc, char **argv)
--
2.25.1.362.g51ebf55



Subject: [tip: timers/urgent] selftests/timers/posix_timers: Reimplement check_timer_distribution()

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID: 6d029c25b71f2de2838a6f093ce0fa0e69336154
Gitweb: https://git.kernel.org/tip/6d029c25b71f2de2838a6f093ce0fa0e69336154
Author: Oleg Nesterov <[email protected]>
AuthorDate: Tue, 09 Apr 2024 15:38:03 +02:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Tue, 09 Apr 2024 17:48:19 +02:00

selftests/timers/posix_timers: Reimplement check_timer_distribution()

check_timer_distribution() runs ten threads in a busy loop and tries to
test that the kernel distributes a process posix CPU timer signal to every
thread over time.

There is not guarantee that this is true even after commit bcb7ee79029d
("posix-timers: Prefer delivery of signals to the current thread") because
that commit only avoids waking up the sleeping process leader thread, but
that has nothing to do with the actual signal delivery.

As the signal is process wide the first thread which observes sigpending
and wins the race to lock sighand will deliver the signal. Testing shows
that this hangs on a regular base because some threads never win the race.

The comment "This primarily tests that the kernel does not favour any one."
is wrong. The kernel does favour a thread which hits the timer interrupt
when CLOCK_PROCESS_CPUTIME_ID expires.

Rewrite the test so it only checks that the group leader sleeping in join()
never receives SIGALRM and the thread which burns CPU cycles receives all
signals.

In older kernels which do not have commit bcb7ee79029d ("posix-timers:
Prefer delivery of signals to the current thread") the test-case fails
immediately, the very 1st tick wakes the leader up. Otherwise it quickly
succeeds after 100 ticks.

CI testing wants to use newer selftest versions on stable kernels. In this
case the test is guaranteed to fail.

So check in the failure case whether the kernel version is less than v6.3
and skip the test result in that case.

[ tglx: Massaged change log, renamed the version check helper ]

Fixes: e797203fb3ba ("selftests/timers/posix_timers: Test delivery of signals across threads")
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
---
tools/testing/selftests/kselftest.h | 13 ++-
tools/testing/selftests/timers/posix_timers.c | 103 +++++++----------
2 files changed, 60 insertions(+), 56 deletions(-)

diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 541bf19..973b18e 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -51,6 +51,7 @@
#include <stdarg.h>
#include <string.h>
#include <stdio.h>
+#include <sys/utsname.h>
#endif

#ifndef ARRAY_SIZE
@@ -388,4 +389,16 @@ static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
exit(KSFT_SKIP);
}

+static inline int ksft_min_kernel_version(unsigned int min_major,
+ unsigned int min_minor)
+{
+ unsigned int major, minor;
+ struct utsname info;
+
+ if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
+ ksft_exit_fail_msg("Can't parse kernel version\n");
+
+ return major > min_major || (major == min_major && minor >= min_minor);
+}
+
#endif /* __KSELFTEST_H */
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index d49dd3f..d86a0e0 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -184,80 +184,71 @@ static int check_timer_create(int which)
return 0;
}

-int remain;
-__thread int got_signal;
+static pthread_t ctd_thread;
+static volatile int ctd_count, ctd_failed;

-static void *distribution_thread(void *arg)
+static void ctd_sighandler(int sig)
{
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
- return NULL;
+ if (pthread_self() != ctd_thread)
+ ctd_failed = 1;
+ ctd_count--;
}

-static void distribution_handler(int nr)
+static void *ctd_thread_func(void *arg)
{
- if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
- __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
-}
-
-/*
- * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
- * timer signals. This primarily tests that the kernel does not favour any one.
- */
-static int check_timer_distribution(void)
-{
- int err, i;
- timer_t id;
- const int nthreads = 10;
- pthread_t threads[nthreads];
struct itimerspec val = {
.it_value.tv_sec = 0,
.it_value.tv_nsec = 1000 * 1000,
.it_interval.tv_sec = 0,
.it_interval.tv_nsec = 1000 * 1000,
};
+ timer_t id;

- remain = nthreads + 1; /* worker threads + this thread */
- signal(SIGALRM, distribution_handler);
- err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
- if (err < 0) {
- ksft_perror("Can't create timer");
- return -1;
- }
- err = timer_settime(id, 0, &val, NULL);
- if (err < 0) {
- ksft_perror("Can't set timer");
- return -1;
- }
+ /* 1/10 seconds to ensure the leader sleeps */
+ usleep(10000);

- for (i = 0; i < nthreads; i++) {
- err = pthread_create(&threads[i], NULL, distribution_thread,
- NULL);
- if (err) {
- ksft_print_msg("Can't create thread: %s (%d)\n",
- strerror(errno), errno);
- return -1;
- }
- }
+ ctd_count = 100;
+ if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
+ return "Can't create timer\n";
+ if (timer_settime(id, 0, &val, NULL))
+ return "Can't set timer\n";

- /* Wait for all threads to receive the signal. */
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+ while (ctd_count > 0 && !ctd_failed)
+ ;

- for (i = 0; i < nthreads; i++) {
- err = pthread_join(threads[i], NULL);
- if (err) {
- ksft_print_msg("Can't join thread: %s (%d)\n",
- strerror(errno), errno);
- return -1;
- }
- }
+ if (timer_delete(id))
+ return "Can't delete timer\n";

- if (timer_delete(id)) {
- ksft_perror("Can't delete timer");
- return -1;
- }
+ return NULL;
+}
+
+/*
+ * Test that only the running thread receives the timer signal.
+ */
+static int check_timer_distribution(void)
+{
+ const char *errmsg;

- ksft_test_result_pass("check_timer_distribution\n");
+ signal(SIGALRM, ctd_sighandler);
+
+ errmsg = "Can't create thread\n";
+ if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
+ goto err;
+
+ errmsg = "Can't join thread\n";
+ if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
+ goto err;
+
+ if (!ctd_failed)
+ ksft_test_result_pass("check signal distribution\n");
+ else if (ksft_min_kernel_version(6, 3))
+ ksft_test_result_fail("check signal distribution\n");
+ else
+ ksft_test_result_skip("check signal distribution (old kernel)\n");
return 0;
+err:
+ ksft_print_msg(errmsg);
+ return -1;
}

int main(int argc, char **argv)

2024-04-10 22:25:51

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v2] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Tue, Apr 9, 2024 at 6:39 AM Oleg Nesterov <[email protected]> wrote:
>
> Thomas says:
>
> The signal distribution test has a tendency to hang for a long
> time as the signal delivery is not really evenly distributed. In
> fact it might never be distributed across all threads ever in
> the way it is written.
>
> To me even the
>
> This primarily tests that the kernel does not favour any one.
>
> comment doesn't look right. The kernel does favour a thread which hits
> the timer interrupt when CLOCK_PROCESS_CPUTIME_ID expires.
>
> The new version simply checks that the group leader sleeping in join()
> never receives SIGALRM, cpu_timer_fire() should always send the signal
> to the thread which burns cpu.
>
> Without the commit bcb7ee79029d ("posix-timers: Prefer delivery of signals
> to the current thread") the test-case fails immediately, the very 1st tick
> wakes the leader up. Otherwise it quickly succeeds after 100 ticks.
>
> As Thomas suggested, the new version doesn't report the failure on the
> pre v6.3 kernels that do not have the commit bcb7ee79029d; this is a
> feature that obviously fails on the older kernels. So the patch adds the
> new simple ksft_ck_kernel_version() helper and uses ksft_test_result_skip()
> if check_timer_distribution() fails on the older kernel.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

This is working great here (on both 6.6 and the older 6.1)! Thanks so
much for fixing this!
One nit below, but otherwise:
Tested-by: John Stultz <[email protected]>

> +err:
> + ksft_print_msg(errmsg);

This bit is causing the following warning:
posix_timers.c:250:2: warning: format not a string literal and no
format arguments [-Wformat-security]
250 | ksft_print_msg(errmsg);
| ^~~~~~~~~~~~~~

A simple fix is just to switch it to:
ksft_print_msg("%s", errmsg);

thanks
-john

2024-04-10 22:32:03

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Wed, Apr 10 2024 at 15:21, John Stultz wrote:
> On Tue, Apr 9, 2024 at 6:39 AM Oleg Nesterov <[email protected]> wrote:
> This is working great here (on both 6.6 and the older 6.1)! Thanks so
> much for fixing this!
> One nit below, but otherwise:
> Tested-by: John Stultz <[email protected]>
>
>> +err:
>> + ksft_print_msg(errmsg);
>
> This bit is causing the following warning:
> posix_timers.c:250:2: warning: format not a string literal and no
> format arguments [-Wformat-security]
> 250 | ksft_print_msg(errmsg);
> | ^~~~~~~~~~~~~~
>
> A simple fix is just to switch it to:
> ksft_print_msg("%s", errmsg);

Can you please send a delta patch against tip timers/urgent?

2024-04-10 22:33:38

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v2] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Wed, Apr 10, 2024 at 3:31 PM Thomas Gleixner <[email protected]> wrote:
>
> On Wed, Apr 10 2024 at 15:21, John Stultz wrote:
> > On Tue, Apr 9, 2024 at 6:39 AM Oleg Nesterov <[email protected]> wrote:
> > This is working great here (on both 6.6 and the older 6.1)! Thanks so
> > much for fixing this!
> > One nit below, but otherwise:
> > Tested-by: John Stultz <[email protected]>
> >
> >> +err:
> >> + ksft_print_msg(errmsg);
> >
> > This bit is causing the following warning:
> > posix_timers.c:250:2: warning: format not a string literal and no
> > format arguments [-Wformat-security]
> > 250 | ksft_print_msg(errmsg);
> > | ^~~~~~~~~~~~~~
> >
> > A simple fix is just to switch it to:
> > ksft_print_msg("%s", errmsg);
>
> Can you please send a delta patch against tip timers/urgent?

Will do! Apologies for not getting to test and reply earlier.
-john

2024-04-11 12:42:14

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Sat, Apr 06, 2024 at 05:09:51PM +0200, Oleg Nesterov wrote:
> Thomas says:
>
> The signal distribution test has a tendency to hang for a long
> time as the signal delivery is not really evenly distributed. In
> fact it might never be distributed across all threads ever in
> the way it is written.
>
> To me even the
>
> This primarily tests that the kernel does not favour any one.
>
> comment doesn't look right. The kernel does favour a thread which hits
> the timer interrupt when CLOCK_PROCESS_CPUTIME_ID expires.
>
> The new version simply checks that the group leader sleeping in join()
> never receives SIGALRM, cpu_timer_fire() should always send the signal
> to the thread which burns cpu.
>
> Without the commit bcb7ee79029d ("posix-timers: Prefer delivery of signals
> to the current thread") the test-case fails immediately, the very 1st tick
> wakes the leader up. Otherwise it quickly succeeds after 100 ticks.

This has landed in -next and is causing warning spam throughout
kselftest when built with clang:

/home/broonie/git/bisect/tools/testing/selftests/kselftest.h:435:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
^~~~~~~~~~~~
/home/broonie/git/bisect/tools/testing/selftests/kselftest.h:438:9: note: uninitialized use occurs here
return major > min_major || (major == min_major && minor >= min_minor);
^~~~~
/home/broonie/git/bisect/tools/testing/selftests/kselftest.h:435:6: note: remove the '||' if its condition is always false
if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
^~~~~~~~~~~~~~~
/home/broonie/git/bisect/tools/testing/selftests/kselftest.h:432:20: note: initialize the variable 'major' to silence this warning
unsigned int major, minor;
^
= 0


Attachments:
(No filename) (1.97 kB)
signature.asc (499.00 B)
Download all attachments

2024-04-11 12:47:58

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Sat, Apr 06, 2024 at 05:09:51PM +0200, Oleg Nesterov wrote:
> Thomas says:
>
> The signal distribution test has a tendency to hang for a long
> time as the signal delivery is not really evenly distributed. In
> fact it might never be distributed across all threads ever in
> the way it is written.
>
> To me even the
>
> This primarily tests that the kernel does not favour any one.

Further to my previous mail it's also broken the arm64 selftest builds,
they use kselftest.h with nolibc in order to test low level
functionality mainly used by libc implementations and nolibc doesn't
implement uname():

In file included from za-fork.c:12:
../../kselftest.h:433:17: error: variable has incomplete type 'struct utsname'
struct utsname info;
^
../../kselftest.h:433:9: note: forward declaration of 'struct utsname'
struct utsname info;
^
../../kselftest.h:435:6: error: call to undeclared function 'uname'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
^
../../kselftest.h:435:22: error: call to undeclared function 'sscanf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
^
1 warning and 3 errors generated.


Attachments:
(No filename) (1.47 kB)
signature.asc (499.00 B)
Download all attachments

2024-04-11 14:19:12

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Thu, Apr 11 2024 at 13:44, Mark Brown wrote:
> On Sat, Apr 06, 2024 at 05:09:51PM +0200, Oleg Nesterov wrote:
>> Thomas says:
>>
>> The signal distribution test has a tendency to hang for a long
>> time as the signal delivery is not really evenly distributed. In
>> fact it might never be distributed across all threads ever in
>> the way it is written.
>>
>> To me even the
>>
>> This primarily tests that the kernel does not favour any one.
>
> Further to my previous mail it's also broken the arm64 selftest builds,
> they use kselftest.h with nolibc in order to test low level
> functionality mainly used by libc implementations and nolibc doesn't
> implement uname():
>
> In file included from za-fork.c:12:
> ../../kselftest.h:433:17: error: variable has incomplete type 'struct utsname'
> struct utsname info;
> ^
> ../../kselftest.h:433:9: note: forward declaration of 'struct utsname'
> struct utsname info;
> ^
> ../../kselftest.h:435:6: error: call to undeclared function 'uname'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
> if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
> ^
> ../../kselftest.h:435:22: error: call to undeclared function 'sscanf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
> if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)

Grrr. Let me stare at this.


2024-04-11 15:53:47

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Thu, Apr 11, 2024 at 5:42 AM Mark Brown <[email protected]> wrote:
> On Sat, Apr 06, 2024 at 05:09:51PM +0200, Oleg Nesterov wrote:
> > Without the commit bcb7ee79029d ("posix-timers: Prefer delivery of signals
> > to the current thread") the test-case fails immediately, the very 1st tick
> > wakes the leader up. Otherwise it quickly succeeds after 100 ticks.
>
> This has landed in -next and is causing warning spam throughout
> kselftest when built with clang:
>
> /home/broonie/git/bisect/tools/testing/selftests/kselftest.h:435:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
> if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
> ^~~~~~~~~~~~
> /home/broonie/git/bisect/tools/testing/selftests/kselftest.h:438:9: note: uninitialized use occurs here
> return major > min_major || (major == min_major && minor >= min_minor);
> ^~~~~
> /home/broonie/git/bisect/tools/testing/selftests/kselftest.h:435:6: note: remove the '||' if its condition is always false
> if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
> ^~~~~~~~~~~~~~~
> /home/broonie/git/bisect/tools/testing/selftests/kselftest.h:432:20: note: initialize the variable 'major' to silence this warning
> unsigned int major, minor;
> ^
> = 0

I hit this one too yesterday and included a fix for it here:
https://lore.kernel.org/lkml/[email protected]/

thanks
-john

2024-04-11 17:26:38

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On 04/11, Thomas Gleixner wrote:
>
> On Thu, Apr 11 2024 at 13:44, Mark Brown wrote:
> >
> > Further to my previous mail it's also broken the arm64 selftest builds,
> > they use kselftest.h with nolibc in order to test low level
> > functionality mainly used by libc implementations and nolibc doesn't
> > implement uname():
> >
> > In file included from za-fork.c:12:
> > ../../kselftest.h:433:17: error: variable has incomplete type 'struct utsname'
> > struct utsname info;
> > ^
> > ../../kselftest.h:433:9: note: forward declaration of 'struct utsname'
> > struct utsname info;
> > ^
> > ../../kselftest.h:435:6: error: call to undeclared function 'uname'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
> > if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
> > ^
> > ../../kselftest.h:435:22: error: call to undeclared function 'sscanf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
> > if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
>
> Grrr. Let me stare at this.

Damn ;)

Can't we just turn ksft_min_kernel_version() into

static inline int ksft_min_kernel_version(unsigned int min_major,
unsigned int min_minor)
{
#ifdef NOLIBC
return -1;
#else
unsigned int major, minor;
struct utsname info;

if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
ksft_exit_fail_msg("Can't parse kernel version\n");

return major > min_major || (major == min_major && minor >= min_minor);
#endif
}

?

Not sure what should check_timer_distribution() do in this case, to me
ksft_test_result_fail() is fine.

Oleg.


2024-04-11 17:32:49

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

On Thu, Apr 11, 2024 at 05:50:53PM +0200, Oleg Nesterov wrote:
> On 04/11, Thomas Gleixner wrote:

> > Grrr. Let me stare at this.

> Damn ;)

> Can't we just turn ksft_min_kernel_version() into

> static inline int ksft_min_kernel_version(unsigned int min_major,
> unsigned int min_minor)
> {
> #ifdef NOLIBC
> return -1;
> #else

That'd probably work well enough here. I think it's reasonable for
someone who wants to build a test that uses ksft_min_kernel_version()
with nolibc to figure out how to implement it, right now it's not
actually getting used with nolibc and just happens to be seen due to
being in the same header.

> Not sure what should check_timer_distribution() do in this case, to me
> ksft_test_result_fail() is fine.

I'd go with skip but yeah.


Attachments:
(No filename) (808.00 B)
signature.asc (499.00 B)
Download all attachments

2024-04-12 12:39:38

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] selftests: fix build failure with NOLIBC

As Mark explains ksft_min_kernel_version() can't be compiled with nolibc,
it doesn't implement uname().

Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
Reported-by: Mark Brown <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Oleg Nesterov <[email protected]>
---
tools/testing/selftests/kselftest.h | 6 ++++++
tools/testing/selftests/timers/posix_timers.c | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 973b18e156b2..0d9ed3255f5e 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -392,6 +392,11 @@ static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
static inline int ksft_min_kernel_version(unsigned int min_major,
unsigned int min_minor)
{
+#ifdef NOLIBC
+ ksft_print_msg("NOLIBC: Can't check kernel version: "
+ "Function not implemented\n");
+ return -1;
+#else
unsigned int major, minor;
struct utsname info;

@@ -399,6 +404,7 @@ static inline int ksft_min_kernel_version(unsigned int min_major,
ksft_exit_fail_msg("Can't parse kernel version\n");

return major > min_major || (major == min_major && minor >= min_minor);
+#endif
}

#endif /* __KSELFTEST_H */
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index d86a0e00711e..878496d2a656 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -241,7 +241,7 @@ static int check_timer_distribution(void)

if (!ctd_failed)
ksft_test_result_pass("check signal distribution\n");
- else if (ksft_min_kernel_version(6, 3))
+ else if (ksft_min_kernel_version(6, 3) > 0)
ksft_test_result_fail("check signal distribution\n");
else
ksft_test_result_skip("check signal distribution (old kernel)\n");
--
2.25.1.362.g51ebf55



Subject: [tip: timers/urgent] selftests: kselftest: Fix build failure with NOLIBC

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID: 16767502aa990cca2cb7d1372b31d328c4c85b40
Gitweb: https://git.kernel.org/tip/16767502aa990cca2cb7d1372b31d328c4c85b40
Author: Oleg Nesterov <[email protected]>
AuthorDate: Fri, 12 Apr 2024 14:35:36 +02:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Fri, 12 Apr 2024 16:55:00 +02:00

selftests: kselftest: Fix build failure with NOLIBC

As Mark explains ksft_min_kernel_version() can't be compiled with nolibc,
it doesn't implement uname().

Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
Reported-by: Mark Brown <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Closes: https://lore.kernel.org/all/[email protected]/
---
tools/testing/selftests/kselftest.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 0591974..7b89362 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -395,6 +395,10 @@ static inline __noreturn __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
static inline int ksft_min_kernel_version(unsigned int min_major,
unsigned int min_minor)
{
+#ifdef NOLIBC
+ ksft_print_msg("NOLIBC: Can't check kernel version: Function not implemented\n");
+ return 0;
+#else
unsigned int major, minor;
struct utsname info;

@@ -402,6 +406,7 @@ static inline int ksft_min_kernel_version(unsigned int min_major,
ksft_exit_fail_msg("Can't parse kernel version\n");

return major > min_major || (major == min_major && minor >= min_minor);
+#endif
}

#endif /* __KSELFTEST_H */

2024-04-14 07:42:51

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] selftests: fix build failure with NOLIBC

On Fri, Apr 12, 2024 at 02:35:36PM +0200, Oleg Nesterov wrote:
> As Mark explains ksft_min_kernel_version() can't be compiled with nolibc,
> it doesn't implement uname().
>
> Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
> Reported-by: Mark Brown <[email protected]>
> Closes: https://lore.kernel.org/all/[email protected]/
> Signed-off-by: Oleg Nesterov <[email protected]>

Makes sense to me given that there's not likely to be any immediate
users.

Reviewed-by: Mark Brown <[email protected]>


Attachments:
(No filename) (591.00 B)
signature.asc (499.00 B)
Download all attachments