2022-02-28 15:11:37

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH] tracing/osnoise: Force quiescent states while tracing

At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
kernel might have the side effect of extending grace periods too much.
This will eventually entice RCU to schedule a task on the isolated CPU
to end the overly extended grace period, adding unwarranted noise to the
CPU being traced in the process.

So, check if we're the only ones running on this isolated CPU and that
we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
between measurements.

Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
loop's cond_resched() will go though a quiescent state for them.

Note that this same exact problem is what extended quiescent states were
created for. But adapting them to this specific use-case isn't trivial
as it'll imply reworking entry/exit and dynticks/context tracking code.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 870a08da5b48..4928358f6e88 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -21,7 +21,9 @@
#include <linux/uaccess.h>
#include <linux/cpumask.h>
#include <linux/delay.h>
+#include <linux/tick.h>
#include <linux/sched/clock.h>
+#include <linux/sched/isolation.h>
#include <uapi/linux/sched/types.h>
#include <linux/sched.h>
#include "trace.h"
@@ -1295,6 +1297,7 @@ static int run_osnoise(void)
struct osnoise_sample s;
unsigned int threshold;
u64 runtime, stop_in;
+ unsigned long flags;
u64 sum_noise = 0;
int hw_count = 0;
int ret = -1;
@@ -1386,6 +1389,22 @@ static int run_osnoise(void)
osnoise_stop_tracing();
}

+ /*
+ * Check if we're the only ones running on this nohz_full CPU
+ * and that we're on a PREEMPT_RCU setup. If so, let's fake a
+ * QS since there is no way for RCU to know we're not making
+ * use of it.
+ *
+ * Otherwise it'll be done through cond_resched().
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
+ !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
+ tick_nohz_tick_stopped()) {
+ local_irq_save(flags);
+ rcu_momentary_dyntick_idle();
+ local_irq_restore(flags);
+ }
+
/*
* For the non-preemptive kernel config: let threads runs, if
* they so wish.
--
2.35.1


2022-02-28 17:11:07

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Mon, 28 Feb 2022 15:14:23 +0100
Nicolas Saenz Julienne <[email protected]> wrote:

> @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
> osnoise_stop_tracing();
> }
>
> + /*
> + * Check if we're the only ones running on this nohz_full CPU
> + * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> + * QS since there is no way for RCU to know we're not making
> + * use of it.
> + *
> + * Otherwise it'll be done through cond_resched().
> + */
> + if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> + !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
> + tick_nohz_tick_stopped()) {
> + local_irq_save(flags);
> + rcu_momentary_dyntick_idle();
> + local_irq_restore(flags);
> + }
> +

This looks very specific and a corner case. Something that depends on how
RCU works. This really should be in the RCU code such that if something
changes, RCU maintainers are aware of this and can update this too.

I wonder if this is similar to what we have in trace_benchmark(). Would
using: cond_resched_tasks_rcu_qs() work for you?

-- Steve


> /*

Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On 2/28/22 15:14, Nicolas Saenz Julienne wrote:
> At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> kernel might have the side effect of extending grace periods too much.
> This will eventually entice RCU to schedule a task on the isolated CPU
> to end the overly extended grace period, adding unwarranted noise to the
> CPU being traced in the process.
>
> So, check if we're the only ones running on this isolated CPU and that
> we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> between measurements.
>
> Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> loop's cond_resched() will go though a quiescent state for them.
>
> Note that this same exact problem is what extended quiescent states were
> created for. But adapting them to this specific use-case isn't trivial
> as it'll imply reworking entry/exit and dynticks/context tracking code.
>
> Signed-off-by: Nicolas Saenz Julienne <[email protected]>
> ---
> kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index 870a08da5b48..4928358f6e88 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -21,7 +21,9 @@
> #include <linux/uaccess.h>
> #include <linux/cpumask.h>
> #include <linux/delay.h>
> +#include <linux/tick.h>
> #include <linux/sched/clock.h>
> +#include <linux/sched/isolation.h>
> #include <uapi/linux/sched/types.h>
> #include <linux/sched.h>
> #include "trace.h"
> @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
> struct osnoise_sample s;
> unsigned int threshold;
> u64 runtime, stop_in;
> + unsigned long flags;
> u64 sum_noise = 0;
> int hw_count = 0;
> int ret = -1;
> @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
> osnoise_stop_tracing();
> }
>
> + /*
> + * Check if we're the only ones running on this nohz_full CPU
> + * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> + * QS since there is no way for RCU to know we're not making
> + * use of it.
> + *
> + * Otherwise it'll be done through cond_resched().
> + */
> + if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> + !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&

Does this restrict to only isolcpus cpus? what if this CPU was isolated via
other methods?

> + tick_nohz_tick_stopped()) {
> + local_irq_save(flags);

This code is always with interrupts enabled, so local_irq_disable()/enable()
should be enough (and faster).

> + rcu_momentary_dyntick_idle();
> + local_irq_restore(flags);
> + }

Question, if we set this once, we could avoid setting it on every loop unless we
have a preemption from another thread, right?

> /*
> * For the non-preemptive kernel config: let threads runs, if
> * they so wish.

/me keeps playing with this patch.

-- Daniel

2022-02-28 23:44:03

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Mon, Feb 28, 2022 at 03:14:23PM +0100, Nicolas Saenz Julienne wrote:
> At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> kernel might have the side effect of extending grace periods too much.
> This will eventually entice RCU to schedule a task on the isolated CPU
> to end the overly extended grace period, adding unwarranted noise to the
> CPU being traced in the process.
>
> So, check if we're the only ones running on this isolated CPU and that
> we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> between measurements.
>
> Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> loop's cond_resched() will go though a quiescent state for them.
>
> Note that this same exact problem is what extended quiescent states were
> created for. But adapting them to this specific use-case isn't trivial
> as it'll imply reworking entry/exit and dynticks/context tracking code.
>
> Signed-off-by: Nicolas Saenz Julienne <[email protected]>
> ---
> kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index 870a08da5b48..4928358f6e88 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -21,7 +21,9 @@
> #include <linux/uaccess.h>
> #include <linux/cpumask.h>
> #include <linux/delay.h>
> +#include <linux/tick.h>
> #include <linux/sched/clock.h>
> +#include <linux/sched/isolation.h>
> #include <uapi/linux/sched/types.h>
> #include <linux/sched.h>
> #include "trace.h"
> @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
> struct osnoise_sample s;
> unsigned int threshold;
> u64 runtime, stop_in;
> + unsigned long flags;
> u64 sum_noise = 0;
> int hw_count = 0;
> int ret = -1;
> @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
> osnoise_stop_tracing();
> }
>
> + /*
> + * Check if we're the only ones running on this nohz_full CPU
> + * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> + * QS since there is no way for RCU to know we're not making
> + * use of it.
> + *
> + * Otherwise it'll be done through cond_resched().
> + */
> + if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> + !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
> + tick_nohz_tick_stopped()) {
> + local_irq_save(flags);
> + rcu_momentary_dyntick_idle();
> + local_irq_restore(flags);

What is supposed to happen in this case is that RCU figures out that
there is a nohz_full CPU running for an extended period of time in the
kernel and takes matters into its own hands. This goes as follows on
a HZ=1000 kernel with default RCU settings:

o At about 20 milliseconds into the grace period, RCU makes
cond_resched() report quiescent states, among other things.
As you say, this does not help for CONFIG_PREEMPT=n kernels.

o At about 30 milliseconds into the grace period, RCU forces an
explicit context switch on the wayward CPU. This should get
the CPU's attention even in CONFIG_PREEMPT=y kernels.

So what is happening for you instead?

Thanx, Paul

> + }
> +
> /*
> * For the non-preemptive kernel config: let threads runs, if
> * they so wish.
> --
> 2.35.1
>

2022-03-01 11:17:06

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Mon, 2022-02-28 at 21:00 +0100, Daniel Bristot de Oliveira wrote:
> On 2/28/22 15:14, Nicolas Saenz Julienne wrote:
> > At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> > kernel might have the side effect of extending grace periods too much.
> > This will eventually entice RCU to schedule a task on the isolated CPU
> > to end the overly extended grace period, adding unwarranted noise to the
> > CPU being traced in the process.
> >
> > So, check if we're the only ones running on this isolated CPU and that
> > we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> > between measurements.
> >
> > Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> > loop's cond_resched() will go though a quiescent state for them.
> >
> > Note that this same exact problem is what extended quiescent states were
> > created for. But adapting them to this specific use-case isn't trivial
> > as it'll imply reworking entry/exit and dynticks/context tracking code.
> >
> > Signed-off-by: Nicolas Saenz Julienne <[email protected]>
> > ---
> > kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> > index 870a08da5b48..4928358f6e88 100644
> > --- a/kernel/trace/trace_osnoise.c
> > +++ b/kernel/trace/trace_osnoise.c
> > @@ -21,7 +21,9 @@
> > #include <linux/uaccess.h>
> > #include <linux/cpumask.h>
> > #include <linux/delay.h>
> > +#include <linux/tick.h>
> > #include <linux/sched/clock.h>
> > +#include <linux/sched/isolation.h>
> > #include <uapi/linux/sched/types.h>
> > #include <linux/sched.h>
> > #include "trace.h"
> > @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
> > struct osnoise_sample s;
> > unsigned int threshold;
> > u64 runtime, stop_in;
> > + unsigned long flags;
> > u64 sum_noise = 0;
> > int hw_count = 0;
> > int ret = -1;
> > @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
> > osnoise_stop_tracing();
> > }
> >
> > + /*
> > + * Check if we're the only ones running on this nohz_full CPU
> > + * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> > + * QS since there is no way for RCU to know we're not making
> > + * use of it.
> > + *
> > + * Otherwise it'll be done through cond_resched().
> > + */
> > + if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> > + !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
>
> Does this restrict to only isolcpus cpus?

nohz_full CPUs actually, IIUC HK_FLAG_MISC isn't set if isolcpus is used, which
is deprecated anyway.

> what if this CPU was isolated via other methods?

osnoise with an uncontested FIFO priority for example? I believe in that case
RCU will start throwing "rcu_preempt detected stalls" style warnings. As it
won't be able to preempt the osnoise CPU to force the grace period ending.

I see your point though, this would also help in that situation. We could maybe
relax the entry barrier to rcu_momentary_dyntick_idle(). I think it's safe to
call it regardless of nohz_full/tick state for most cases, I just wanted to
avoid the overhead. The only thing that worries me is PREEMPT_RT and its
rt_spinlocks, which can be preempted.

> > + tick_nohz_tick_stopped()) {
> > + local_irq_save(flags);
>
> This code is always with interrupts enabled, so local_irq_disable()/enable()
> should be enough (and faster).

Noted.

> > + rcu_momentary_dyntick_idle();
> > + local_irq_restore(flags);
> > + }
>
> Question, if we set this once, we could avoid setting it on every loop unless we
> have a preemption from another thread, right?

This tells RCU the CPU went through a quiescent state, which removes it from
the current grace period accounting. It's different from an extended quiescent
state, which fully disables the CPU from RCU's perspective.

We don't need to do it on every iteration, but as Paul explained in the mail
thread it has to happen at least every ~20-30ms.

Thanks!

--
Nicolás Sáenz

2022-03-01 11:32:59

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Mon, 2022-02-28 at 10:45 -0500, Steven Rostedt wrote:
> On Mon, 28 Feb 2022 15:14:23 +0100
> Nicolas Saenz Julienne <[email protected]> wrote:
>
> > @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
> > osnoise_stop_tracing();
> > }
> >
> > + /*
> > + * Check if we're the only ones running on this nohz_full CPU
> > + * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> > + * QS since there is no way for RCU to know we're not making
> > + * use of it.
> > + *
> > + * Otherwise it'll be done through cond_resched().
> > + */
> > + if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> > + !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
> > + tick_nohz_tick_stopped()) {
> > + local_irq_save(flags);
> > + rcu_momentary_dyntick_idle();
> > + local_irq_restore(flags);
> > + }
> > +
>
> This looks very specific and a corner case. Something that depends on how
> RCU works. This really should be in the RCU code such that if something
> changes, RCU maintainers are aware of this and can update this too.

The if statement could be relaxed to some extent (I'm looking into it ATM). I
made it more constraining to avoid the overhead for other users.

As for rcu_momentary_dyntick_idle(), I'm not the first one to use it for such
means. See multi_cpu_stop() in kernel/stop_machine.c. So I'm not sure it counts
as abusing RCU internals.

> I wonder if this is similar to what we have in trace_benchmark(). Would
> using: cond_resched_tasks_rcu_qs() work for you?

IIUC this only affects tasks_rcu, and doesn't help with regular RCU. We already
call cond_resched_tasks_rcu_qs() in osnoise when necesary, actually it was
introduced by you. :)

Thanks!

--
Nicolás Sáenz

2022-03-01 16:16:08

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Mon, 2022-02-28 at 14:11 -0800, Paul E. McKenney wrote:
> On Mon, Feb 28, 2022 at 03:14:23PM +0100, Nicolas Saenz Julienne wrote:
> > At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> > kernel might have the side effect of extending grace periods too much.
> > This will eventually entice RCU to schedule a task on the isolated CPU
> > to end the overly extended grace period, adding unwarranted noise to the
> > CPU being traced in the process.
> >
> > So, check if we're the only ones running on this isolated CPU and that
> > we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> > between measurements.
> >
> > Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> > loop's cond_resched() will go though a quiescent state for them.
> >
> > Note that this same exact problem is what extended quiescent states were
> > created for. But adapting them to this specific use-case isn't trivial
> > as it'll imply reworking entry/exit and dynticks/context tracking code.
> >
> > Signed-off-by: Nicolas Saenz Julienne <[email protected]>
> > ---
> > kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> > index 870a08da5b48..4928358f6e88 100644
> > --- a/kernel/trace/trace_osnoise.c
> > +++ b/kernel/trace/trace_osnoise.c
> > @@ -21,7 +21,9 @@
> > #include <linux/uaccess.h>
> > #include <linux/cpumask.h>
> > #include <linux/delay.h>
> > +#include <linux/tick.h>
> > #include <linux/sched/clock.h>
> > +#include <linux/sched/isolation.h>
> > #include <uapi/linux/sched/types.h>
> > #include <linux/sched.h>
> > #include "trace.h"
> > @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
> > struct osnoise_sample s;
> > unsigned int threshold;
> > u64 runtime, stop_in;
> > + unsigned long flags;
> > u64 sum_noise = 0;
> > int hw_count = 0;
> > int ret = -1;
> > @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
> > osnoise_stop_tracing();
> > }
> >
> > + /*
> > + * Check if we're the only ones running on this nohz_full CPU
> > + * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> > + * QS since there is no way for RCU to know we're not making
> > + * use of it.
> > + *
> > + * Otherwise it'll be done through cond_resched().
> > + */
> > + if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> > + !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
> > + tick_nohz_tick_stopped()) {
> > + local_irq_save(flags);
> > + rcu_momentary_dyntick_idle();
> > + local_irq_restore(flags);
>
> What is supposed to happen in this case is that RCU figures out that
> there is a nohz_full CPU running for an extended period of time in the
> kernel and takes matters into its own hands. This goes as follows on
> a HZ=1000 kernel with default RCU settings:
>
> o At about 20 milliseconds into the grace period, RCU makes
> cond_resched() report quiescent states, among other things.
> As you say, this does not help for CONFIG_PREEMPT=n kernels.
>
> o At about 30 milliseconds into the grace period, RCU forces an
> explicit context switch on the wayward CPU. This should get
> the CPU's attention even in CONFIG_PREEMPT=y kernels.
>
> So what is happening for you instead?

Well, that's exactly what I'm seeing, but it doesn't play well with osnoise.

Here's a simplified view of what the tracer does:

time1 = get_time();
while(1) {
time2 = get_time();
if (time2 - time1 > threshold)
trace_noise();
cond_resched();
time1 = time2;
}

This is pinned to a specific CPU, and in the most extreme cases is expected to
take 100% of CPU time. Eventually, some SMI, NMI/interrupt, or process
execution will trigger the threshold, and osnoise will provide some nice traces
explaining what happened.

RCU forcing a context switch on the wayward CPU is introducing unwarranted
noise as it's triggered by the fact we're measuring and wouldn't happen
otherwise.

If this were user-space, we'd be in an EQS, which would make this problem go
away. An option would be mimicking this behaviour (assuming irq entry/exit code
did the right thing):

rcu_eqs_enter(); <--
time1 = get_time();
while(1) {
time2 = get_time();
if (time2 - time1 > threshold)
trace_noise();
rcu_eqs_exit(); <--
cond_resched();
rcu_eqs_enter(); <--
time1 = time2;
}

But given the tight loop this isn't much different than what I'm proposing at
the moment, isn't it? rcu_momentary_dyntick_idle() just emulates a really fast
EQS entry/exit.

Thanks!

--
Nicolás Sáenz

Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On 3/1/22 11:00, Nicolas Saenz Julienne wrote:
> On Mon, 2022-02-28 at 14:11 -0800, Paul E. McKenney wrote:
>> On Mon, Feb 28, 2022 at 03:14:23PM +0100, Nicolas Saenz Julienne wrote:
>>> At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
>>> kernel might have the side effect of extending grace periods too much.
>>> This will eventually entice RCU to schedule a task on the isolated CPU
>>> to end the overly extended grace period, adding unwarranted noise to the
>>> CPU being traced in the process.
>>>
>>> So, check if we're the only ones running on this isolated CPU and that
>>> we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
>>> between measurements.
>>>
>>> Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
>>> loop's cond_resched() will go though a quiescent state for them.
>>>
>>> Note that this same exact problem is what extended quiescent states were
>>> created for. But adapting them to this specific use-case isn't trivial
>>> as it'll imply reworking entry/exit and dynticks/context tracking code.
>>>
>>> Signed-off-by: Nicolas Saenz Julienne <[email protected]>
>>> ---
>>> kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
>>> 1 file changed, 19 insertions(+)
>>>
>>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
>>> index 870a08da5b48..4928358f6e88 100644
>>> --- a/kernel/trace/trace_osnoise.c
>>> +++ b/kernel/trace/trace_osnoise.c
>>> @@ -21,7 +21,9 @@
>>> #include <linux/uaccess.h>
>>> #include <linux/cpumask.h>
>>> #include <linux/delay.h>
>>> +#include <linux/tick.h>
>>> #include <linux/sched/clock.h>
>>> +#include <linux/sched/isolation.h>
>>> #include <uapi/linux/sched/types.h>
>>> #include <linux/sched.h>
>>> #include "trace.h"
>>> @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
>>> struct osnoise_sample s;
>>> unsigned int threshold;
>>> u64 runtime, stop_in;
>>> + unsigned long flags;
>>> u64 sum_noise = 0;
>>> int hw_count = 0;
>>> int ret = -1;
>>> @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
>>> osnoise_stop_tracing();
>>> }
>>>
>>> + /*
>>> + * Check if we're the only ones running on this nohz_full CPU
>>> + * and that we're on a PREEMPT_RCU setup. If so, let's fake a
>>> + * QS since there is no way for RCU to know we're not making
>>> + * use of it.
>>> + *
>>> + * Otherwise it'll be done through cond_resched().
>>> + */
>>> + if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
>>> + !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
>>> + tick_nohz_tick_stopped()) {
>>> + local_irq_save(flags);
>>> + rcu_momentary_dyntick_idle();
>>> + local_irq_restore(flags);
>>
>> What is supposed to happen in this case is that RCU figures out that
>> there is a nohz_full CPU running for an extended period of time in the
>> kernel and takes matters into its own hands. This goes as follows on
>> a HZ=1000 kernel with default RCU settings:
>>
>> o At about 20 milliseconds into the grace period, RCU makes
>> cond_resched() report quiescent states, among other things.
>> As you say, this does not help for CONFIG_PREEMPT=n kernels.
>>
>> o At about 30 milliseconds into the grace period, RCU forces an
>> explicit context switch on the wayward CPU. This should get
>> the CPU's attention even in CONFIG_PREEMPT=y kernels.
>>
>> So what is happening for you instead?
>
> Well, that's exactly what I'm seeing, but it doesn't play well with osnoise.
>
> Here's a simplified view of what the tracer does:
>
> time1 = get_time();
> while(1) {
> time2 = get_time();
> if (time2 - time1 > threshold)
> trace_noise();
> cond_resched();
> time1 = time2;
> }
>
> This is pinned to a specific CPU, and in the most extreme cases is expected to
> take 100% of CPU time. Eventually, some SMI, NMI/interrupt, or process
> execution will trigger the threshold, and osnoise will provide some nice traces
> explaining what happened.
>
> RCU forcing a context switch on the wayward CPU is introducing unwarranted
> noise as it's triggered by the fact we're measuring and wouldn't happen
> otherwise.

I confirm this, it would be nice to have on osnoise a behavior similar to the
user-space (in EQS). I was about to send an RFC with a solution for the same
problem... (continues..)

>
> If this were user-space, we'd be in an EQS, which would make this problem go
> away. An option would be mimicking this behaviour (assuming irq entry/exit code
> did the right thing):
>
> rcu_eqs_enter(); <--
> time1 = get_time();
> while(1) {
> time2 = get_time();
> if (time2 - time1 > threshold)
> trace_noise();
> rcu_eqs_exit(); <--
> cond_resched();
> rcu_eqs_enter(); <--
> time1 = time2;
> }

... my patch was something in these lines. I was actually calling using
rcu_user_enter/exit (or context_tracking_enter/exit), pretending to be in
user-space. I left the user-mode if any thing from kernel would have to be
called, like tracepoints or re-sched. I had to keep preemption disabled to avoid
getting into the scheduler, enabling it in case of need resched. It worked (tm),
and it was equivalent to having osnoise preemptive.

BUT, it was more intrusive. I would prefer a solution in the lines of the one
Nicolas are proposing in his patch.

-- Danie

2022-03-01 19:49:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Tue, Mar 01, 2022 at 11:00:08AM +0100, Nicolas Saenz Julienne wrote:
> On Mon, 2022-02-28 at 14:11 -0800, Paul E. McKenney wrote:
> > On Mon, Feb 28, 2022 at 03:14:23PM +0100, Nicolas Saenz Julienne wrote:
> > > At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> > > kernel might have the side effect of extending grace periods too much.
> > > This will eventually entice RCU to schedule a task on the isolated CPU
> > > to end the overly extended grace period, adding unwarranted noise to the
> > > CPU being traced in the process.

Ah, I misread the above paragraph. Apologies!

Nevertheless, could you please add something explicit to the effect that
RCU is completing grace periods as required?

> > > So, check if we're the only ones running on this isolated CPU and that
> > > we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> > > between measurements.

And yes, if you don't want RCU to try to forcibly extract a quiescent
state from you, you must supply a quiescent state to RCU. ;-)

> > > Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> > > loop's cond_resched() will go though a quiescent state for them.
> > >
> > > Note that this same exact problem is what extended quiescent states were
> > > created for. But adapting them to this specific use-case isn't trivial
> > > as it'll imply reworking entry/exit and dynticks/context tracking code.
> > >
> > > Signed-off-by: Nicolas Saenz Julienne <[email protected]>
> > > ---
> > > kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
> > > 1 file changed, 19 insertions(+)
> > >
> > > diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> > > index 870a08da5b48..4928358f6e88 100644
> > > --- a/kernel/trace/trace_osnoise.c
> > > +++ b/kernel/trace/trace_osnoise.c
> > > @@ -21,7 +21,9 @@
> > > #include <linux/uaccess.h>
> > > #include <linux/cpumask.h>
> > > #include <linux/delay.h>
> > > +#include <linux/tick.h>
> > > #include <linux/sched/clock.h>
> > > +#include <linux/sched/isolation.h>
> > > #include <uapi/linux/sched/types.h>
> > > #include <linux/sched.h>
> > > #include "trace.h"
> > > @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
> > > struct osnoise_sample s;
> > > unsigned int threshold;
> > > u64 runtime, stop_in;
> > > + unsigned long flags;
> > > u64 sum_noise = 0;
> > > int hw_count = 0;
> > > int ret = -1;
> > > @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
> > > osnoise_stop_tracing();
> > > }
> > >
> > > + /*
> > > + * Check if we're the only ones running on this nohz_full CPU
> > > + * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> > > + * QS since there is no way for RCU to know we're not making
> > > + * use of it.
> > > + *
> > > + * Otherwise it'll be done through cond_resched().
> > > + */
> > > + if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> > > + !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
> > > + tick_nohz_tick_stopped()) {
> > > + local_irq_save(flags);
> > > + rcu_momentary_dyntick_idle();

And yes, rcu_momentary_dyntick_idle() is a good way to supply a quiescent
state to RCU. This won't help Tasks RCU or Tasks Rude RCU, but you can
avoid those by avoiding changing tracing state while running osnoise.

> > > + local_irq_restore(flags);
> >
> > What is supposed to happen in this case is that RCU figures out that
> > there is a nohz_full CPU running for an extended period of time in the
> > kernel and takes matters into its own hands. This goes as follows on
> > a HZ=1000 kernel with default RCU settings:
> >
> > o At about 20 milliseconds into the grace period, RCU makes
> > cond_resched() report quiescent states, among other things.
> > As you say, this does not help for CONFIG_PREEMPT=n kernels.
> >
> > o At about 30 milliseconds into the grace period, RCU forces an
> > explicit context switch on the wayward CPU. This should get
> > the CPU's attention even in CONFIG_PREEMPT=y kernels.
> >
> > So what is happening for you instead?
>
> Well, that's exactly what I'm seeing, but it doesn't play well with osnoise.

Whew!!! ;-)

> Here's a simplified view of what the tracer does:
>
> time1 = get_time();
> while(1) {
> time2 = get_time();
> if (time2 - time1 > threshold)
> trace_noise();
> cond_resched();
> time1 = time2;
> }
>
> This is pinned to a specific CPU, and in the most extreme cases is expected to
> take 100% of CPU time. Eventually, some SMI, NMI/interrupt, or process
> execution will trigger the threshold, and osnoise will provide some nice traces
> explaining what happened.
>
> RCU forcing a context switch on the wayward CPU is introducing unwarranted
> noise as it's triggered by the fact we're measuring and wouldn't happen
> otherwise.
>
> If this were user-space, we'd be in an EQS, which would make this problem go
> away. An option would be mimicking this behaviour (assuming irq entry/exit code
> did the right thing):
>
> rcu_eqs_enter(); <--
> time1 = get_time();
> while(1) {
> time2 = get_time();
> if (time2 - time1 > threshold)
> trace_noise();
> rcu_eqs_exit(); <--
> cond_resched();
> rcu_eqs_enter(); <--
> time1 = time2;
> }
>
> But given the tight loop this isn't much different than what I'm proposing at
> the moment, isn't it? rcu_momentary_dyntick_idle() just emulates a really fast
> EQS entry/exit.

And that is in fact exactly what rcu_momentary_dyntick_idle() was
intended for:

Acked-by: Paul E. McKenney <[email protected]>

2022-03-01 20:25:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Tue, Mar 01, 2022 at 08:29:23PM +0100, Daniel Bristot de Oliveira wrote:
> On 3/1/22 19:58, Paul E. McKenney wrote:
> > On Tue, Mar 01, 2022 at 07:44:38PM +0100, Daniel Bristot de Oliveira wrote:
> >> On 3/1/22 19:05, Paul E. McKenney wrote:
> >>>> I see, as long as it costs < 1 us, I am ok. If it gets > 1us in a reasonably
> >>>> fast machine, we start see HW noise where it does not exist, and that would
> >>>> reduce the resolution of osnoise. AFAICS, it is not causing that problem, but we
> >>>> need to make it as lightweight as possible.
> >>> In the common case, it is atomically incrementing a local per-CPU counter
> >>> and doing a store. This should be quite cheap.
> >>>
> >>> The uncommon case is when the osnoise process was preempted or otherwise
> >>> interfered with during a recent RCU read-side critical section and
> >>> preemption was disabled around that critical section's outermost
> >>> rcu_read_unlock(). This can be quite expensive. But I would expect
> >>> you to just not do this. ;-)
> >>
> >> Getting the expensive call after a preemption is not a problem, it is a side
> >> effect of the most costly preemption.
> >>
> >> It this case, we should "ping rcu" before reading the time to account the
> >> overhead for the previous preemption which caused it.
> >>
> >> like (using the current code as example):
> >>
> >> ------------------------- %< -------------------------------
> >> static u64
> >> set_int_safe_time(struct osnoise_variables *osn_var, u64 *time)
> >> {
> >> u64 int_counter;
> >>
> >> do {
> >> int_counter = local_read(&osn_var->int_counter);
> >>
> >> ------------> HERE <-------------------------------------
> >>
> >> /* synchronize with interrupts */
> >> barrier();
> >>
> >> *time = time_get();
> >>
> >> /* synchronize with interrupts */
> >> barrier();
> >> } while (int_counter != local_read(&osn_var->int_counter));
> >>
> >> return int_counter;
> >> }
> >> ------------------------- >% -------------------------------
> >>
> >> In this way anything that happens before this *time is accounted before it is
> >> get. If anything happens while this loop is running, it will run again, so it is
> >> safe to point to the previous case.
> >>
> >> We would have to make a copy of this function, and only use the copy for the
> >> run_osnoise() case. A good name would be something in the lines of
> >> set_int_safe_time_rcu().
> >>
> >> (Unless the expensive is < than 1us.)
> >
> > The outermost rcu_read_unlock() could involve a call into the scheduler
> > to do an RCU priority deboost, which I would imagine could be a bit
> > expensive. But I would expect you to configure the test in such a way
> > that there was no need for RCU priority boosting. For example, by making
> > sure that the osnoise process's RCU readers were never preempted.
>
> So, the noise will not be seeing in the call that Nicolas is doing. but in the
> rcu_read_unlock() inside osnoise processes?
>
> If that is the case, then the "noise" would already be accounted to the
> previously preempted thread... and we should be fine then.

It could be either at the rcu_read_unlock() itself, or, if preemption
was disabled across that rcu_read_unlock(), at a subsequent point where
preemption is enabled. Which might amount to the same thing given that
there won't be any preemption until preemption is enabled?

> > Just out of curiosity, why is this running in the kernel rather than in
> > userspace? To focus only on kernel-centric noise sources? Or are there
> > people implementing real-time applications within the kernel?
>
> It is in kernel because it allows me to sync the workload and the trace, getting
> more (and more precise) information.
>
> For example, I can read the "noise in time" and how many interrupts happened in
> between two reads of the time, so I can look back in the trace to figure out
> which sources of noise were the cause of the noise I am seeing - without false
> positives. If no "interference" happened, I can safely say that it was a
> hardware noise (this saves us time in the debug, no need to run hwlat - I run
> two tools in one).
>
> This all with a more cheap access to the data. I also use such information to
> parse trace in kernel in a cheaper way, printing less info to the trace buffer.

Fair enough!

> But the idea is to see the noise for an user-space application as much as
> possible (and no, I am not doing application in kernel... but I know people
> doing it using a unikernel, but that is another story... a longer one... :-)).

There have been people writing their applications in Linux kernel modules,
or at least attempting to do so! ;-)

Thanx, Paul

Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On 3/1/22 19:05, Paul E. McKenney wrote:
>> I see, as long as it costs < 1 us, I am ok. If it gets > 1us in a reasonably
>> fast machine, we start see HW noise where it does not exist, and that would
>> reduce the resolution of osnoise. AFAICS, it is not causing that problem, but we
>> need to make it as lightweight as possible.
> In the common case, it is atomically incrementing a local per-CPU counter
> and doing a store. This should be quite cheap.
>
> The uncommon case is when the osnoise process was preempted or otherwise
> interfered with during a recent RCU read-side critical section and
> preemption was disabled around that critical section's outermost
> rcu_read_unlock(). This can be quite expensive. But I would expect
> you to just not do this. ;-)

Getting the expensive call after a preemption is not a problem, it is a side
effect of the most costly preemption.

It this case, we should "ping rcu" before reading the time to account the
overhead for the previous preemption which caused it.

like (using the current code as example):

------------------------- %< -------------------------------
static u64
set_int_safe_time(struct osnoise_variables *osn_var, u64 *time)
{
u64 int_counter;

do {
int_counter = local_read(&osn_var->int_counter);

------------> HERE <-------------------------------------

/* synchronize with interrupts */
barrier();

*time = time_get();

/* synchronize with interrupts */
barrier();
} while (int_counter != local_read(&osn_var->int_counter));

return int_counter;
}
------------------------- >% -------------------------------

In this way anything that happens before this *time is accounted before it is
get. If anything happens while this loop is running, it will run again, so it is
safe to point to the previous case.

We would have to make a copy of this function, and only use the copy for the
run_osnoise() case. A good name would be something in the lines of
set_int_safe_time_rcu().

(Unless the expensive is < than 1us.)

-- Daniel

Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On 3/1/22 19:58, Paul E. McKenney wrote:
> On Tue, Mar 01, 2022 at 07:44:38PM +0100, Daniel Bristot de Oliveira wrote:
>> On 3/1/22 19:05, Paul E. McKenney wrote:
>>>> I see, as long as it costs < 1 us, I am ok. If it gets > 1us in a reasonably
>>>> fast machine, we start see HW noise where it does not exist, and that would
>>>> reduce the resolution of osnoise. AFAICS, it is not causing that problem, but we
>>>> need to make it as lightweight as possible.
>>> In the common case, it is atomically incrementing a local per-CPU counter
>>> and doing a store. This should be quite cheap.
>>>
>>> The uncommon case is when the osnoise process was preempted or otherwise
>>> interfered with during a recent RCU read-side critical section and
>>> preemption was disabled around that critical section's outermost
>>> rcu_read_unlock(). This can be quite expensive. But I would expect
>>> you to just not do this. ;-)
>>
>> Getting the expensive call after a preemption is not a problem, it is a side
>> effect of the most costly preemption.
>>
>> It this case, we should "ping rcu" before reading the time to account the
>> overhead for the previous preemption which caused it.
>>
>> like (using the current code as example):
>>
>> ------------------------- %< -------------------------------
>> static u64
>> set_int_safe_time(struct osnoise_variables *osn_var, u64 *time)
>> {
>> u64 int_counter;
>>
>> do {
>> int_counter = local_read(&osn_var->int_counter);
>>
>> ------------> HERE <-------------------------------------
>>
>> /* synchronize with interrupts */
>> barrier();
>>
>> *time = time_get();
>>
>> /* synchronize with interrupts */
>> barrier();
>> } while (int_counter != local_read(&osn_var->int_counter));
>>
>> return int_counter;
>> }
>> ------------------------- >% -------------------------------
>>
>> In this way anything that happens before this *time is accounted before it is
>> get. If anything happens while this loop is running, it will run again, so it is
>> safe to point to the previous case.
>>
>> We would have to make a copy of this function, and only use the copy for the
>> run_osnoise() case. A good name would be something in the lines of
>> set_int_safe_time_rcu().
>>
>> (Unless the expensive is < than 1us.)
>
> The outermost rcu_read_unlock() could involve a call into the scheduler
> to do an RCU priority deboost, which I would imagine could be a bit
> expensive. But I would expect you to configure the test in such a way
> that there was no need for RCU priority boosting. For example, by making
> sure that the osnoise process's RCU readers were never preempted.

So, the noise will not be seeing in the call that Nicolas is doing. but in the
rcu_read_unlock() inside osnoise processes?

If that is the case, then the "noise" would already be accounted to the
previously preempted thread... and we should be fine then.

>
> Just out of curiosity, why is this running in the kernel rather than in
> userspace? To focus only on kernel-centric noise sources? Or are there
> people implementing real-time applications within the kernel?

It is in kernel because it allows me to sync the workload and the trace, getting
more (and more precise) information.

For example, I can read the "noise in time" and how many interrupts happened in
between two reads of the time, so I can look back in the trace to figure out
which sources of noise were the cause of the noise I am seeing - without false
positives. If no "interference" happened, I can safely say that it was a
hardware noise (this saves us time in the debug, no need to run hwlat - I run
two tools in one).

This all with a more cheap access to the data. I also use such information to
parse trace in kernel in a cheaper way, printing less info to the trace buffer.

But the idea is to see the noise for an user-space application as much as
possible (and no, I am not doing application in kernel... but I know people
doing it using a unikernel, but that is another story... a longer one... :-)).

-- Daniel

>
> Thanx, Paul

2022-03-01 23:11:12

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Tue, Mar 01, 2022 at 07:44:38PM +0100, Daniel Bristot de Oliveira wrote:
> On 3/1/22 19:05, Paul E. McKenney wrote:
> >> I see, as long as it costs < 1 us, I am ok. If it gets > 1us in a reasonably
> >> fast machine, we start see HW noise where it does not exist, and that would
> >> reduce the resolution of osnoise. AFAICS, it is not causing that problem, but we
> >> need to make it as lightweight as possible.
> > In the common case, it is atomically incrementing a local per-CPU counter
> > and doing a store. This should be quite cheap.
> >
> > The uncommon case is when the osnoise process was preempted or otherwise
> > interfered with during a recent RCU read-side critical section and
> > preemption was disabled around that critical section's outermost
> > rcu_read_unlock(). This can be quite expensive. But I would expect
> > you to just not do this. ;-)
>
> Getting the expensive call after a preemption is not a problem, it is a side
> effect of the most costly preemption.
>
> It this case, we should "ping rcu" before reading the time to account the
> overhead for the previous preemption which caused it.
>
> like (using the current code as example):
>
> ------------------------- %< -------------------------------
> static u64
> set_int_safe_time(struct osnoise_variables *osn_var, u64 *time)
> {
> u64 int_counter;
>
> do {
> int_counter = local_read(&osn_var->int_counter);
>
> ------------> HERE <-------------------------------------
>
> /* synchronize with interrupts */
> barrier();
>
> *time = time_get();
>
> /* synchronize with interrupts */
> barrier();
> } while (int_counter != local_read(&osn_var->int_counter));
>
> return int_counter;
> }
> ------------------------- >% -------------------------------
>
> In this way anything that happens before this *time is accounted before it is
> get. If anything happens while this loop is running, it will run again, so it is
> safe to point to the previous case.
>
> We would have to make a copy of this function, and only use the copy for the
> run_osnoise() case. A good name would be something in the lines of
> set_int_safe_time_rcu().
>
> (Unless the expensive is < than 1us.)

The outermost rcu_read_unlock() could involve a call into the scheduler
to do an RCU priority deboost, which I would imagine could be a bit
expensive. But I would expect you to configure the test in such a way
that there was no need for RCU priority boosting. For example, by making
sure that the osnoise process's RCU readers were never preempted.

Just out of curiosity, why is this running in the kernel rather than in
userspace? To focus only on kernel-centric noise sources? Or are there
people implementing real-time applications within the kernel?

Thanx, Paul

Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On 3/1/22 11:52, Nicolas Saenz Julienne wrote:
> On Mon, 2022-02-28 at 21:00 +0100, Daniel Bristot de Oliveira wrote:
>> On 2/28/22 15:14, Nicolas Saenz Julienne wrote:
>>> At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
>>> kernel might have the side effect of extending grace periods too much.
>>> This will eventually entice RCU to schedule a task on the isolated CPU
>>> to end the overly extended grace period, adding unwarranted noise to the
>>> CPU being traced in the process.
>>>
>>> So, check if we're the only ones running on this isolated CPU and that
>>> we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
>>> between measurements.
>>>
>>> Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
>>> loop's cond_resched() will go though a quiescent state for them.
>>>
>>> Note that this same exact problem is what extended quiescent states were
>>> created for. But adapting them to this specific use-case isn't trivial
>>> as it'll imply reworking entry/exit and dynticks/context tracking code.
>>>
>>> Signed-off-by: Nicolas Saenz Julienne <[email protected]>
>>> ---
>>> kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
>>> 1 file changed, 19 insertions(+)
>>>
>>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
>>> index 870a08da5b48..4928358f6e88 100644
>>> --- a/kernel/trace/trace_osnoise.c
>>> +++ b/kernel/trace/trace_osnoise.c
>>> @@ -21,7 +21,9 @@
>>> #include <linux/uaccess.h>
>>> #include <linux/cpumask.h>
>>> #include <linux/delay.h>
>>> +#include <linux/tick.h>
>>> #include <linux/sched/clock.h>
>>> +#include <linux/sched/isolation.h>
>>> #include <uapi/linux/sched/types.h>
>>> #include <linux/sched.h>
>>> #include "trace.h"
>>> @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
>>> struct osnoise_sample s;
>>> unsigned int threshold;
>>> u64 runtime, stop_in;
>>> + unsigned long flags;
>>> u64 sum_noise = 0;
>>> int hw_count = 0;
>>> int ret = -1;
>>> @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
>>> osnoise_stop_tracing();
>>> }
>>>
>>> + /*
>>> + * Check if we're the only ones running on this nohz_full CPU
>>> + * and that we're on a PREEMPT_RCU setup. If so, let's fake a
>>> + * QS since there is no way for RCU to know we're not making
>>> + * use of it.
>>> + *
>>> + * Otherwise it'll be done through cond_resched().
>>> + */
>>> + if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
>>> + !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
>>
>> Does this restrict to only isolcpus cpus?
>
> nohz_full CPUs actually, IIUC HK_FLAG_MISC isn't set if isolcpus is used, which
> is deprecated anyway.

Perfecto!

>
>> what if this CPU was isolated via other methods?
>
> osnoise with an uncontested FIFO priority for example?

No, I was mentioning something like tuna/tasket/systemd/cgroup, anything other
than isolcpus... as it is doing (I miss interpreted the HK_FLAG_MISC).

I do not agree on using busy-loop with FIFO.

I believe in that case
> RCU will start throwing "rcu_preempt detected stalls" style warnings. As it
> won't be able to preempt the osnoise CPU to force the grace period ending.
>
> I see your point though, this would also help in that situation. We could maybe
> relax the entry barrier to rcu_momentary_dyntick_idle(). I think it's safe to
> call it regardless of nohz_full/tick state for most cases, I just wanted to
> avoid the overhead. The only thing that worries me is PREEMPT_RT and its
> rt_spinlocks, which can be preempted.

no, that was not my point.

>
>>> + tick_nohz_tick_stopped()) {
>>> + local_irq_save(flags);
>>
>> This code is always with interrupts enabled, so local_irq_disable()/enable()
>> should be enough (and faster).
>
> Noted.
>
>>> + rcu_momentary_dyntick_idle();
>>> + local_irq_restore(flags);
>>> + }
>>
>> Question, if we set this once, we could avoid setting it on every loop unless we
>> have a preemption from another thread, right?
>
> This tells RCU the CPU went through a quiescent state, which removes it from
> the current grace period accounting. It's different from an extended quiescent
> state, which fully disables the CPU from RCU's perspective.

Got it!

> We don't need to do it on every iteration, but as Paul explained in the mail
> thread it has to happen at least every ~20-30ms.

I see, as long as it costs < 1 us, I am ok. If it gets > 1us in a reasonably
fast machine, we start see HW noise where it does not exist, and that would
reduce the resolution of osnoise. AFAICS, it is not causing that problem, but we
need to make it as lightweight as possible.

-- Daniel

> Thanks!
>

2022-03-02 02:20:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Tue, Mar 01, 2022 at 06:55:23PM +0100, Daniel Bristot de Oliveira wrote:
> On 3/1/22 11:52, Nicolas Saenz Julienne wrote:
> > On Mon, 2022-02-28 at 21:00 +0100, Daniel Bristot de Oliveira wrote:
> >> On 2/28/22 15:14, Nicolas Saenz Julienne wrote:
> >>> At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> >>> kernel might have the side effect of extending grace periods too much.
> >>> This will eventually entice RCU to schedule a task on the isolated CPU
> >>> to end the overly extended grace period, adding unwarranted noise to the
> >>> CPU being traced in the process.
> >>>
> >>> So, check if we're the only ones running on this isolated CPU and that
> >>> we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> >>> between measurements.
> >>>
> >>> Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> >>> loop's cond_resched() will go though a quiescent state for them.
> >>>
> >>> Note that this same exact problem is what extended quiescent states were
> >>> created for. But adapting them to this specific use-case isn't trivial
> >>> as it'll imply reworking entry/exit and dynticks/context tracking code.
> >>>
> >>> Signed-off-by: Nicolas Saenz Julienne <[email protected]>
> >>> ---
> >>> kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
> >>> 1 file changed, 19 insertions(+)
> >>>
> >>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> >>> index 870a08da5b48..4928358f6e88 100644
> >>> --- a/kernel/trace/trace_osnoise.c
> >>> +++ b/kernel/trace/trace_osnoise.c
> >>> @@ -21,7 +21,9 @@
> >>> #include <linux/uaccess.h>
> >>> #include <linux/cpumask.h>
> >>> #include <linux/delay.h>
> >>> +#include <linux/tick.h>
> >>> #include <linux/sched/clock.h>
> >>> +#include <linux/sched/isolation.h>
> >>> #include <uapi/linux/sched/types.h>
> >>> #include <linux/sched.h>
> >>> #include "trace.h"
> >>> @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
> >>> struct osnoise_sample s;
> >>> unsigned int threshold;
> >>> u64 runtime, stop_in;
> >>> + unsigned long flags;
> >>> u64 sum_noise = 0;
> >>> int hw_count = 0;
> >>> int ret = -1;
> >>> @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
> >>> osnoise_stop_tracing();
> >>> }
> >>>
> >>> + /*
> >>> + * Check if we're the only ones running on this nohz_full CPU
> >>> + * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> >>> + * QS since there is no way for RCU to know we're not making
> >>> + * use of it.
> >>> + *
> >>> + * Otherwise it'll be done through cond_resched().
> >>> + */
> >>> + if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> >>> + !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
> >>
> >> Does this restrict to only isolcpus cpus?
> >
> > nohz_full CPUs actually, IIUC HK_FLAG_MISC isn't set if isolcpus is used, which
> > is deprecated anyway.
>
> Perfecto!
>
> >
> >> what if this CPU was isolated via other methods?
> >
> > osnoise with an uncontested FIFO priority for example?
>
> No, I was mentioning something like tuna/tasket/systemd/cgroup, anything other
> than isolcpus... as it is doing (I miss interpreted the HK_FLAG_MISC).
>
> I do not agree on using busy-loop with FIFO.
>
> I believe in that case
> > RCU will start throwing "rcu_preempt detected stalls" style warnings. As it
> > won't be able to preempt the osnoise CPU to force the grace period ending.
> >
> > I see your point though, this would also help in that situation. We could maybe
> > relax the entry barrier to rcu_momentary_dyntick_idle(). I think it's safe to
> > call it regardless of nohz_full/tick state for most cases, I just wanted to
> > avoid the overhead. The only thing that worries me is PREEMPT_RT and its
> > rt_spinlocks, which can be preempted.
>
> no, that was not my point.
>
> >
> >>> + tick_nohz_tick_stopped()) {
> >>> + local_irq_save(flags);
> >>
> >> This code is always with interrupts enabled, so local_irq_disable()/enable()
> >> should be enough (and faster).
> >
> > Noted.
> >
> >>> + rcu_momentary_dyntick_idle();
> >>> + local_irq_restore(flags);
> >>> + }
> >>
> >> Question, if we set this once, we could avoid setting it on every loop unless we
> >> have a preemption from another thread, right?
> >
> > This tells RCU the CPU went through a quiescent state, which removes it from
> > the current grace period accounting. It's different from an extended quiescent
> > state, which fully disables the CPU from RCU's perspective.
>
> Got it!
>
> > We don't need to do it on every iteration, but as Paul explained in the mail
> > thread it has to happen at least every ~20-30ms.
>
> I see, as long as it costs < 1 us, I am ok. If it gets > 1us in a reasonably
> fast machine, we start see HW noise where it does not exist, and that would
> reduce the resolution of osnoise. AFAICS, it is not causing that problem, but we
> need to make it as lightweight as possible.

In the common case, it is atomically incrementing a local per-CPU counter
and doing a store. This should be quite cheap.

The uncommon case is when the osnoise process was preempted or otherwise
interfered with during a recent RCU read-side critical section and
preemption was disabled around that critical section's outermost
rcu_read_unlock(). This can be quite expensive. But I would expect
you to just not do this. ;-)

Thanx, Paul

Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On 3/1/22 11:59, Nicolas Saenz Julienne wrote:
>> I wonder if this is similar to what we have in trace_benchmark(). Would
>> using: cond_resched_tasks_rcu_qs() work for you?
> IIUC this only affects tasks_rcu, and doesn't help with regular RCU. We already
> call cond_resched_tasks_rcu_qs() in osnoise when necesary, actually it was
> introduced by you. :)

I tried using cond_resched_tasks_rcu_qs() to solve the same problem before, it
did not work.

-- Daniel

Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On 3/1/22 20:46, Paul E. McKenney wrote:
>> But the idea is to see the noise for an user-space application as much as
>> possible (and no, I am not doing application in kernel... but I know people
>> doing it using a unikernel, but that is another story... a longer one... :-)).
> There have been people writing their applications in Linux kernel modules,
> or at least attempting to do so! ;-)

I have to admit, I once made a kernel application to forward RTP packets for
VoIP communication, between FPGA/DSPs and network with zero-copy. Times were
others, 200MHz of CPU, a single AMBA bus to everybody... fun times :-) But that
is not the case anymore (phew).

-- Daniel

2022-03-02 14:07:35

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Tue, 2022-03-01 at 09:56 -0800, Paul E. McKenney wrote:
> On Tue, Mar 01, 2022 at 11:00:08AM +0100, Nicolas Saenz Julienne wrote:
> > On Mon, 2022-02-28 at 14:11 -0800, Paul E. McKenney wrote:
> > > On Mon, Feb 28, 2022 at 03:14:23PM +0100, Nicolas Saenz Julienne wrote:
> > > > At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> > > > kernel might have the side effect of extending grace periods too much.
> > > > This will eventually entice RCU to schedule a task on the isolated CPU
> > > > to end the overly extended grace period, adding unwarranted noise to the
> > > > CPU being traced in the process.
>
> Ah, I misread the above paragraph. Apologies!
>
> Nevertheless, could you please add something explicit to the effect that
> RCU is completing grace periods as required?

Yes, of course.

[...]
> > > o At about 30 milliseconds into the grace period, RCU forces an
> > > explicit context switch on the wayward CPU. This should get
> > > the CPU's attention even in CONFIG_PREEMPT=y kernels.
> > >
> > > So what is happening for you instead?
> >
> > Well, that's exactly what I'm seeing, but it doesn't play well with osnoise.
>
> Whew!!! ;-)
>
> > Here's a simplified view of what the tracer does:
> >
> > time1 = get_time();
> > while(1) {
> > time2 = get_time();
> > if (time2 - time1 > threshold)
> > trace_noise();
> > cond_resched();
> > time1 = time2;
> > }
> >
> > This is pinned to a specific CPU, and in the most extreme cases is expected to
> > take 100% of CPU time. Eventually, some SMI, NMI/interrupt, or process
> > execution will trigger the threshold, and osnoise will provide some nice traces
> > explaining what happened.
> >
> > RCU forcing a context switch on the wayward CPU is introducing unwarranted
> > noise as it's triggered by the fact we're measuring and wouldn't happen
> > otherwise.
> >
> > If this were user-space, we'd be in an EQS, which would make this problem go
> > away. An option would be mimicking this behaviour (assuming irq entry/exit code
> > did the right thing):
> >
> > rcu_eqs_enter(); <--
> > time1 = get_time();
> > while(1) {
> > time2 = get_time();
> > if (time2 - time1 > threshold)
> > trace_noise();
> > rcu_eqs_exit(); <--
> > cond_resched();
> > rcu_eqs_enter(); <--
> > time1 = time2;
> > }
> >
> > But given the tight loop this isn't much different than what I'm proposing at
> > the moment, isn't it? rcu_momentary_dyntick_idle() just emulates a really fast
> > EQS entry/exit.
>
> And that is in fact exactly what rcu_momentary_dyntick_idle() was
> intended for:
>
> Acked-by: Paul E. McKenney <[email protected]>

Thanks!

--
Nicolás Sáenz

Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On 3/4/22 16:28, Nicolas Saenz Julienne wrote:
> Some comments:
> - You're not exiting/entering EQS on IRQ/NMI entry/exit. See
> irqentry_{enter,exit}() and irqentry_nmi_{enter,exit}().

hummm, right!

> - See this series[1], if we ever pursue this approach, it's important we got
> through context tracking, instead of poking at RCU directly.

I had a test patch with context_tracking as well... entering and leaving using
it. Lemme find it.... but it basically works in the same way as for RCU (or
pretend to work).

>
> [1] https://lore.kernel.org/lkml/[email protected]/T/#t

Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On 3/4/22 15:36, Nicolas Saenz Julienne wrote:
>> Hey Nicolas,
>>
>> While testing this patch with rtla osnoise on the 5.17.0-rc6-rt10+, when I hit
>> ^c on osnoise top, the system freezes :-/.
>>
>> Could you try that on your system?
> Yes of course, I'll get a build going.
>
> Were you using nohz_full?

Yep!

mce=off tsc=nowatchdog nohz=on intel_pstate=disable nosoftlockup isolcpus=11-23
nohz_full=11-23 rcu_nocbs=11-23

(I know what people thing about isolcpus, I am just lazy).

-- Daniel

2022-03-04 18:51:04

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Fri, 2022-03-04 at 16:37 +0100, Daniel Bristot de Oliveira wrote:
> On 3/4/22 16:28, Nicolas Saenz Julienne wrote:
> > Some comments:
> > - You're not exiting/entering EQS on IRQ/NMI entry/exit. See
> > irqentry_{enter,exit}() and irqentry_nmi_{enter,exit}().
>
> hummm, right!
>
> > - See this series[1], if we ever pursue this approach, it's important we got
> > through context tracking, instead of poking at RCU directly.
>
> I had a test patch with context_tracking as well... entering and leaving using
> it. Lemme find it.... but it basically works in the same way as for RCU (or
> pretend to work).

Yes, agree, it's fundamentally the same.

--
Nicolás Sáenz

2022-03-04 19:13:10

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Fri, Mar 04, 2022 at 04:55:06PM +0100, Nicolas Saenz Julienne wrote:
> On Fri, 2022-03-04 at 16:37 +0100, Daniel Bristot de Oliveira wrote:
> > On 3/4/22 16:28, Nicolas Saenz Julienne wrote:
> > > Some comments:
> > > - You're not exiting/entering EQS on IRQ/NMI entry/exit. See
> > > irqentry_{enter,exit}() and irqentry_nmi_{enter,exit}().
> >
> > hummm, right!
> >
> > > - See this series[1], if we ever pursue this approach, it's important we got
> > > through context tracking, instead of poking at RCU directly.
> >
> > I had a test patch with context_tracking as well... entering and leaving using
> > it. Lemme find it.... but it basically works in the same way as for RCU (or
> > pretend to work).
>
> Yes, agree, it's fundamentally the same.

And there is work in flight to make it even more the same. ;-)

https://lore.kernel.org/all/[email protected]/

Thanx, Paul

2022-03-04 19:28:40

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Fri, 2022-03-04 at 15:28 +0100, Daniel Bristot de Oliveira wrote:
> On 2/28/22 15:14, Nicolas Saenz Julienne wrote:
> > At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> > kernel might have the side effect of extending grace periods too much.
> > This will eventually entice RCU to schedule a task on the isolated CPU
> > to end the overly extended grace period, adding unwarranted noise to the
> > CPU being traced in the process.
> >
> > So, check if we're the only ones running on this isolated CPU and that
> > we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> > between measurements.
> >
> > Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> > loop's cond_resched() will go though a quiescent state for them.
> >
> > Note that this same exact problem is what extended quiescent states were
> > created for. But adapting them to this specific use-case isn't trivial
> > as it'll imply reworking entry/exit and dynticks/context tracking code.
>
> Hey Nicolas,
>
> While testing this patch with rtla osnoise on the 5.17.0-rc6-rt10+, when I hit
> ^c on osnoise top, the system freezes :-/.
>
> Could you try that on your system?

Yes of course, I'll get a build going.

Were you using nohz_full?

--
Nicolás Sáenz

Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

>> Hey Nicolas,
>>
>> While testing this patch with rtla osnoise on the 5.17.0-rc6-rt10+, when I hit
>> ^c on osnoise top, the system freezes :-/.
>>
>> Could you try that on your system?
> Yes of course, I'll get a build going.


also, could you try this?

it is an RFC I was thinking to send, as I mentioned before (and on IRC).

It works fine, I see nohz_full and rcu behaving like if osnoise was a
user-space tool. It is more invasive on osnoise, but the behavior does not
change - like, run cyclictest on top of osnoise and you will see that
the system is still preemptive with low latency even with osnoise with
"preempt_disabled."

do you mind having a look to see if it behaves as expected in your scenario?

[ note, there are things to cleanup in this patch, like adding a static key ]
[ in is_osnoise_cur(), it was a real RFC. ]

-- Daniel

tracing/osnoise: Pretend to be in user-space for RCU

To simulate an user-space workload, osnoise informs RCU that it
is going to user-space by calling rcu_user_enter(). However,
osnoise never actually goes to user-space. It keeps running
in an intermediate stage.

This stage runs with preemption disabled, like the idle thread
does. Likewise idle, osnoise will continuously check for need
resched, allowing its preemption, simulating a fully preemptive
mode.

Anytime a kernel function needs to be called, the rcu_user_enter()
needs to be called.

Any change on rcu_user_enter/exit needs to be tested with
CONFIG_RCU_EQS_DEBUG=y.

Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
---
include/linux/trace.h | 9 +++
kernel/rcu/tree.c | 7 +-
kernel/trace/trace_osnoise.c | 129 +++++++++++++++++++++++++++++++++--
3 files changed, 136 insertions(+), 9 deletions(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index bf169612ffe1..970d66f79cee 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -48,6 +48,15 @@ void osnoise_arch_unregister(void);
void osnoise_trace_irq_entry(int id);
void osnoise_trace_irq_exit(int id, const char *desc);

+#ifdef CONFIG_OSNOISE_TRACER
+extern bool is_osnoise_curr(void);
+#else
+static __always_inline bool is_osnoise_curr(void)
+{
+ return false;
+}
+#endif /* CONFIG_OSNOISE_TRACER */
+
#endif /* CONFIG_TRACING */

#endif /* _LINUX_TRACE_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a4c25a6283b0..ede0c468e75f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -62,6 +62,7 @@
#include <linux/vmalloc.h>
#include <linux/mm.h>
#include <linux/kasan.h>
+#include <linux/trace.h>
#include "../time/tick-internal.h"

#include "tree.h"
@@ -442,9 +443,11 @@ static int rcu_is_cpu_rrupt_from_idle(void)
return false;

/*
- * If we're not in an interrupt, we must be in the idle task!
+ * If we're not in an interrupt, we must be in the idle task or osnoise.
+ * The osnoise thread is an special case. It is a kernel thread that
+ * pretends to be in user-space. See kernel/trace/trace_osnoise.c.
*/
- WARN_ON_ONCE(!nesting && !is_idle_task(current));
+ WARN_ON_ONCE(!nesting && !(is_idle_task(current) || is_osnoise_curr()));

/* Does CPU appear to be idle from an RCU standpoint? */
return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index cfddb30e65ab..d52ef290c884 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -448,6 +448,31 @@ static void print_osnoise_headers(struct seq_file *s)
}
#endif /* CONFIG_PREEMPT_RT */

+static inline void osnoise_user_enter(void)
+{
+ local_irq_disable();
+ rcu_user_enter();
+ local_irq_enable();
+}
+
+static inline void osnoise_user_exit(void)
+{
+ local_irq_disable();
+ rcu_user_exit();
+ local_irq_enable();
+}
+
+static inline void osnoise_cond_resched(void)
+{
+ if (need_resched()) {
+ osnoise_user_exit();
+ preempt_enable();
+ cond_resched();
+ preempt_disable();
+ osnoise_user_enter();
+ }
+}
+
/*
* osnoise_taint - report an osnoise error.
*/
@@ -464,6 +489,14 @@ static void print_osnoise_headers(struct seq_file *s)
osnoise_data.tainted = true; \
})

+#define osnoise_taint_user(msg) ({ \
+ local_irq_disable(); \
+ rcu_user_exit(); \
+ osnoise_taint(msg); \
+ rcu_user_enter(); \
+ local_irq_enable(); \
+})
+
/*
* Record an osnoise_sample into the tracer buffer.
*/
@@ -819,6 +852,43 @@ set_int_safe_time(struct osnoise_variables *osn_var, u64 *time)
return int_counter;
}

+/*
+ * set_int_safe_time_thread - Save the current time on *time, aware of interference
+ *
+ * Get the time, taking into consideration a possible interference from
+ * higher priority interrupts and threads.
+ *
+ * See get_int_safe_duration() for an explanation.
+ */
+static u64
+set_int_safe_time_thread(struct osnoise_variables *osn_var, u64 *time)
+{
+ u64 int_counter;
+
+ do {
+ int_counter = local_read(&osn_var->int_counter);
+
+ /* let any thread awakened by an interrupt to run */
+ osnoise_cond_resched();
+
+ /* synchronize with interrupts */
+ barrier();
+
+ *time = time_get();
+
+ /* synchronize with interrupts */
+ barrier();
+
+ } while (int_counter != local_read(&osn_var->int_counter));
+
+ /*
+ * At this point, the time accounts all the interference from to
+ * consecutive get time that did not get interfered.
+ */
+ return int_counter;
+}
+
+
#ifdef CONFIG_TIMERLAT_TRACER
/*
* copy_int_safe_time - Copy *src into *desc aware of interference
@@ -1337,11 +1407,31 @@ static int run_osnoise(void)
*/
last_int_count = set_int_safe_time(osn_var, &last_sample);

+ /*
+ * to simulate an user-space workload, osnoise informs RCU that it
+ * is going to user-space by calling rcu_user_enter(). However,
+ * osnoise never actually goes to user-space. It keeps running
+ * in an intermediate stage.
+ *
+ * This stage runs with preemption disabled, like the idle thread
+ * does. Likewise idle, osnoise will continuously check for need
+ * resched, allowing its preemption, simulating a fully preemptive
+ * mode.
+ *
+ * Anytime a kernel function needs to be called, the rcu_user_enter()
+ * needs to be called.
+ *
+ * Any change on rcu_user_enter/exit needs to be tested with
+ * CONFIG_RCU_EQS_DEBUG=y.
+ */
+ preempt_disable();
+ osnoise_user_enter();
+
do {
/*
* Get sample!
*/
- int_count = set_int_safe_time(osn_var, &sample);
+ int_count = set_int_safe_time_thread(osn_var, &sample);

noise = time_sub(sample, last_sample);

@@ -1349,7 +1439,7 @@ static int run_osnoise(void)
* This shouldn't happen.
*/
if (noise < 0) {
- osnoise_taint("negative noise!");
+ osnoise_taint_user("negative noise!");
goto out;
}

@@ -1362,7 +1452,7 @@ static int run_osnoise(void)
* Check for possible overflows.
*/
if (total < last_total) {
- osnoise_taint("total overflow!");
+ osnoise_taint_user("total overflow!");
break;
}

@@ -1379,24 +1469,41 @@ static int run_osnoise(void)

sum_noise += noise;

+ /*
+ * osnoise is in fake user-space. Leave this mode to call
+ * the tracepoint. Interrupts are kept disabled to avoid
+ * having the overhead of enabling/disabling around
+ * rcu_user_exit/enter again. This does not change the
+ * behavior of osnoise.
+ */
+ local_irq_disable();
+ rcu_user_exit();
trace_sample_threshold(last_sample, noise, interference);
+ rcu_user_enter();
+ local_irq_enable();

if (osnoise_data.stop_tracing)
- if (noise > stop_in)
+ if (noise > stop_in) {
+ osnoise_user_exit();
osnoise_stop_tracing();
+ osnoise_user_enter();
+ }
}

+
/*
- * For the non-preemptive kernel config: let threads runs, if
- * they so wish.
+ * Let threads to interfere with osnoise.
*/
- cond_resched();
+ osnoise_cond_resched();

last_sample = sample;
last_int_count = int_count;

} while (total < runtime && !kthread_should_stop());

+ osnoise_user_exit();
+ preempt_enable();
+
/*
* Finish the above in the view for interrupts.
*/
@@ -2387,3 +2494,11 @@ __init static int init_osnoise_tracer(void)
return 0;
}
late_initcall(init_osnoise_tracer);
+
+bool is_osnoise_curr(void)
+{
+ struct osnoise_variables *osn_var = this_cpu_osn_var();
+ if (osn_var->kthread == current)
+ return 1;
+ return 0;
+}
--
2.34.1

2022-03-04 20:24:02

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

Hi Daniel,

On Fri, 2022-03-04 at 15:51 +0100, Daniel Bristot de Oliveira wrote:
> > > Hey Nicolas,
> > >
> > > While testing this patch with rtla osnoise on the 5.17.0-rc6-rt10+, when I hit
> > > ^c on osnoise top, the system freezes :-/.
> > >
> > > Could you try that on your system?
> > Yes of course, I'll get a build going.
>
>
> also, could you try this?

Yes, of course.

> it is an RFC I was thinking to send, as I mentioned before (and on IRC).
>
> It works fine, I see nohz_full and rcu behaving like if osnoise was a
> user-space tool. It is more invasive on osnoise, but the behavior does not
> change - like, run cyclictest on top of osnoise and you will see that
> the system is still preemptive with low latency even with osnoise with
> "preempt_disabled."
>
> do you mind having a look to see if it behaves as expected in your scenario?
>
> [ note, there are things to cleanup in this patch, like adding a static key ]
> [ in is_osnoise_cur(), it was a real RFC. ]
>
> -- Daniel
>
> tracing/osnoise: Pretend to be in user-space for RCU
>
> To simulate an user-space workload, osnoise informs RCU that it
> is going to user-space by calling rcu_user_enter(). However,
> osnoise never actually goes to user-space. It keeps running
> in an intermediate stage.
>
> This stage runs with preemption disabled, like the idle thread
> does. Likewise idle, osnoise will continuously check for need
> resched, allowing its preemption, simulating a fully preemptive
> mode.
>
> Anytime a kernel function needs to be called, the rcu_user_enter()
> needs to be called.
>
> Any change on rcu_user_enter/exit needs to be tested with
> CONFIG_RCU_EQS_DEBUG=y.
>
> Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
> ---

Some comments:
- You're not exiting/entering EQS on IRQ/NMI entry/exit. See
irqentry_{enter,exit}() and irqentry_nmi_{enter,exit}().

- See this series[1], if we ever pursue this approach, it's important we got
through context tracking, instead of poking at RCU directly.

[1] https://lore.kernel.org/lkml/[email protected]/T/#t

Regards,

--
Nicolás Sáenz

Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On 2/28/22 15:14, Nicolas Saenz Julienne wrote:
> At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> kernel might have the side effect of extending grace periods too much.
> This will eventually entice RCU to schedule a task on the isolated CPU
> to end the overly extended grace period, adding unwarranted noise to the
> CPU being traced in the process.
>
> So, check if we're the only ones running on this isolated CPU and that
> we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> between measurements.
>
> Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> loop's cond_resched() will go though a quiescent state for them.
>
> Note that this same exact problem is what extended quiescent states were
> created for. But adapting them to this specific use-case isn't trivial
> as it'll imply reworking entry/exit and dynticks/context tracking code.

Hey Nicolas,

While testing this patch with rtla osnoise on the 5.17.0-rc6-rt10+, when I hit
^c on osnoise top, the system freezes :-/.

Could you try that on your system?

-- Daniel