Hello, Peter!
TL;DR: If a normal !PF_NO_SETAFFINITY kthread invokes sched_setaffinity(),
and sched_setaffinity() returns 0, is it expected behavior for that
kthread to be running on some CPU other than one of the ones specified by
the in_mask argument? All CPUs are online, and there is no CPU-hotplug
activity taking place.
Thanx, Paul
Details:
I have long showed the following "toy" synchronize_rcu() implementation:
void synchronize_rcu(void)
{
int cpu;
for_each_online_cpu(cpu)
run_on(cpu);
}
I decided that if I was going to show it, I should test it. And it
occurred to me that run_on() can be a call to sched_setaffinity():
void synchronize_rcu(void)
{
int cpu;
for_each_online_cpu(cpu)
sched_setaffinity(current->pid, cpumask_of(cpu));
}
This actually passes rcutorture. But, as Andrea noted, not klitmus.
After some investigation, it turned out that klitmus was creating kthreads
with PF_NO_SETAFFINITY, hence the failures. But that prompted me to
put checks into my code: After all, rcutorture can be fooled.
void synchronize_rcu(void)
{
int cpu;
for_each_online_cpu(cpu) {
sched_setaffinity(current->pid, cpumask_of(cpu));
WARN_ON_ONCE(raw_smp_processor_id() != cpu);
}
}
This triggers fairly quickly, usually in less than a minute of rcutorture
testing. And further investigation shows that sched_setaffinity()
always returned 0. So I tried this hack:
void synchronize_rcu(void)
{
int cpu;
for_each_online_cpu(cpu) {
while (raw_smp_processor_id() != cpu)
sched_setaffinity(current->pid, cpumask_of(cpu));
WARN_ON_ONCE(raw_smp_processor_id() != cpu);
}
}
This never triggers, and rcutorture's grace-period rate is not significantly
affected.
Is this expected behavior? Is there some configuration or setup that I
might be missing?
On Sat, Apr 27, 2019 at 11:02:46AM -0700, Paul E. McKenney wrote:
> This actually passes rcutorture. But, as Andrea noted, not klitmus.
> After some investigation, it turned out that klitmus was creating kthreads
> with PF_NO_SETAFFINITY, hence the failures. But that prompted me to
> put checks into my code: After all, rcutorture can be fooled.
>
> void synchronize_rcu(void)
> {
> int cpu;
>
> for_each_online_cpu(cpu) {
> sched_setaffinity(current->pid, cpumask_of(cpu));
> WARN_ON_ONCE(raw_smp_processor_id() != cpu);
> }
> }
>
> This triggers fairly quickly, usually in less than a minute of rcutorture
> testing.
>
> And further investigation shows that sched_setaffinity()
> always returned 0.
> Is this expected behavior? Is there some configuration or setup that I
> might be missing?
ISTR there is hotplug involved in RCU torture? In that case, it can be
sched_setaffinity() succeeds to place us on a CPU, which CPU hotplug
then takes away. So when we run the WARN thingy, we'll be running on a
different CPU than expected.
If OTOH, your loop is written like (as it really should be):
void synchronize_rcu(void)
{
int cpu;
cpus_read_lock();
for_each_online_cpu(cpu) {
sched_setaffinity(current->pid, cpumask_of(cpu));
WARN_ON_ONCE(raw_smp_processor_id() != cpu);
}
cpus_read_unlock();
}
Then I'm not entirely sure how we can return 0 and not run on the
expected CPU. If we look at __set_cpus_allowed_ptr(), the only paths out
to 0 are:
- if the mask didn't change
- if we already run inside the new mask
- if we migrated ourself with the stop-task
- if we're not in fact running
That last case should never trigger in your circumstances, since @p ==
current and current is obviously running. But for completeness, the
wakeup of @p would do the task placement in that case.
On Tue, Apr 30, 2019 at 12:03:18PM +0200, Peter Zijlstra wrote:
> On Sat, Apr 27, 2019 at 11:02:46AM -0700, Paul E. McKenney wrote:
>
> > This actually passes rcutorture. But, as Andrea noted, not klitmus.
> > After some investigation, it turned out that klitmus was creating kthreads
> > with PF_NO_SETAFFINITY, hence the failures. But that prompted me to
> > put checks into my code: After all, rcutorture can be fooled.
> >
> > void synchronize_rcu(void)
> > {
> > int cpu;
> >
> > for_each_online_cpu(cpu) {
> > sched_setaffinity(current->pid, cpumask_of(cpu));
> > WARN_ON_ONCE(raw_smp_processor_id() != cpu);
> > }
> > }
> >
> > This triggers fairly quickly, usually in less than a minute of rcutorture
> > testing.
> >
> > And further investigation shows that sched_setaffinity()
> > always returned 0.
>
> > Is this expected behavior? Is there some configuration or setup that I
> > might be missing?
>
> ISTR there is hotplug involved in RCU torture? In that case, it can be
> sched_setaffinity() succeeds to place us on a CPU, which CPU hotplug
> then takes away. So when we run the WARN thingy, we'll be running on a
> different CPU than expected.
There can be CPU hotplug involved in rcutorture, but it was disabled
during this run.
> If OTOH, your loop is written like (as it really should be):
>
> void synchronize_rcu(void)
> {
> int cpu;
>
> cpus_read_lock();
> for_each_online_cpu(cpu) {
> sched_setaffinity(current->pid, cpumask_of(cpu));
> WARN_ON_ONCE(raw_smp_processor_id() != cpu);
> }
> cpus_read_unlock();
> }
>
> Then I'm not entirely sure how we can return 0 and not run on the
> expected CPU. If we look at __set_cpus_allowed_ptr(), the only paths out
> to 0 are:
>
> - if the mask didn't change
> - if we already run inside the new mask
> - if we migrated ourself with the stop-task
> - if we're not in fact running
>
> That last case should never trigger in your circumstances, since @p ==
> current and current is obviously running. But for completeness, the
> wakeup of @p would do the task placement in that case.
Are there some diagnostics I could add that would help track this down,
be it my bug or yours?
Thanx, Paul
On Tue, Apr 30, 2019 at 03:51:30AM -0700, Paul E. McKenney wrote:
> > Then I'm not entirely sure how we can return 0 and not run on the
> > expected CPU. If we look at __set_cpus_allowed_ptr(), the only paths out
> > to 0 are:
> >
> > - if the mask didn't change
> > - if we already run inside the new mask
> > - if we migrated ourself with the stop-task
> > - if we're not in fact running
> >
> > That last case should never trigger in your circumstances, since @p ==
> > current and current is obviously running. But for completeness, the
> > wakeup of @p would do the task placement in that case.
>
> Are there some diagnostics I could add that would help track this down,
> be it my bug or yours?
Maybe limited function trace combined with the scheduling tracepoints
would give clue.
Trouble is, I forever forget how to set that up properly :/ Maybe
something along these lines:
$ trace-cmd record -p function_graph -g sched_setaffinity -g migration_cpu_stop -e
sched_migirate_task -e sched_switch -e sched_wakeup
Also useful would be:
echo 1 > /proc/sys/kernel/traceoff_on_warning
which ensures the trace stops the moment we find fail.
On Tue, Apr 30, 2019 at 01:55:51PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 30, 2019 at 03:51:30AM -0700, Paul E. McKenney wrote:
> > > Then I'm not entirely sure how we can return 0 and not run on the
> > > expected CPU. If we look at __set_cpus_allowed_ptr(), the only paths out
> > > to 0 are:
> > >
> > > - if the mask didn't change
> > > - if we already run inside the new mask
> > > - if we migrated ourself with the stop-task
> > > - if we're not in fact running
> > >
> > > That last case should never trigger in your circumstances, since @p ==
> > > current and current is obviously running. But for completeness, the
> > > wakeup of @p would do the task placement in that case.
> >
> > Are there some diagnostics I could add that would help track this down,
> > be it my bug or yours?
>
> Maybe limited function trace combined with the scheduling tracepoints
> would give clue.
>
> Trouble is, I forever forget how to set that up properly :/ Maybe
> something along these lines:
>
> $ trace-cmd record -p function_graph -g sched_setaffinity -g migration_cpu_stop -e
> sched_migirate_task -e sched_switch -e sched_wakeup
>
> Also useful would be:
>
> echo 1 > /proc/sys/kernel/traceoff_on_warning
>
> which ensures the trace stops the moment we find fail.
OK, what I did was to apply the patch at the end of this email to -rcu
branch dev, then run rcutorture as follows:
nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop"
This resulted in the console output that I placed here:
http://www2.rdrop.com/~paulmck/submission/console.log.gz
But I don't see calls to sched_setaffinity() or migration_cpu_stop().
Steve, is something else needed on the kernel command line in addition to
the following?
ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index da04b5073dc3..ceae80522d64 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -680,12 +680,18 @@ static struct rcu_torture_ops tasks_ops = {
static void synchronize_rcu_trivial(void)
{
int cpu;
+ static int dont_trace;
for_each_online_cpu(cpu) {
- while (raw_smp_processor_id() != cpu)
- rcutorture_sched_setaffinity(current->pid,
- cpumask_of(cpu));
- WARN_ON_ONCE(raw_smp_processor_id() != cpu);
+ if (!READ_ONCE(dont_trace))
+ tracing_on();
+ rcutorture_sched_setaffinity(current->pid, cpumask_of(cpu));
+ tracing_off();
+ if (raw_smp_processor_id() != cpu) {
+ WRITE_ONCE(dont_trace, 1);
+ WARN_ON_ONCE(1);
+ ftrace_dump(DUMP_ALL);
+ }
}
}
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index caffee644932..edaf0ca22ff7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3495,6 +3495,7 @@ void __init rcu_init(void)
rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
WARN_ON(!rcu_par_gp_wq);
srcu_init();
+ tracing_off();
}
#include "tree_stall.h"
On Wed, 1 May 2019 12:12:13 -0700
"Paul E. McKenney" <[email protected]> wrote:
> OK, what I did was to apply the patch at the end of this email to -rcu
> branch dev, then run rcutorture as follows:
>
> nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop"
>
> This resulted in the console output that I placed here:
>
> http://www2.rdrop.com/~paulmck/submission/console.log.gz
>
> But I don't see calls to sched_setaffinity() or migration_cpu_stop().
> Steve, is something else needed on the kernel command line in addition to
> the following?
>
> ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop
Do you have function graph enabled in the config?
[ 2.098303] ftrace bootup tracer 'function_graph' not registered.
-- Steve
On Wed, May 01, 2019 at 03:16:55PM -0400, Steven Rostedt wrote:
> On Wed, 1 May 2019 12:12:13 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
>
> > OK, what I did was to apply the patch at the end of this email to -rcu
> > branch dev, then run rcutorture as follows:
> >
> > nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop"
> >
> > This resulted in the console output that I placed here:
> >
> > http://www2.rdrop.com/~paulmck/submission/console.log.gz
> >
> > But I don't see calls to sched_setaffinity() or migration_cpu_stop().
> > Steve, is something else needed on the kernel command line in addition to
> > the following?
> >
> > ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop
>
> Do you have function graph enabled in the config?
>
> [ 2.098303] ftrace bootup tracer 'function_graph' not registered.
I guess I don't! Thank you, will fix.
Let's see...
My .config has CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y. It looks like I
need CONFIG_FUNCTION_GRAPH_TRACER=y, which I don't have. And it looks
like that needs CONFIG_FUNCTION_TRACER=y, which I also don't have.
But I do have CONFIG_HAVE_FUNCTION_TRACER=y. So I should add this
to my rcutorture command line:
--kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y".
I fired this up. Here is hoping! ;-)
And it does have sched_setaffinity(), woo-hoo!!! I overwrote the old file:
http://www2.rdrop.com/~paulmck/submission/console.log.gz
Thanx, Paul
On Wed, May 01, 2019 at 01:27:13PM -0700, Paul E. McKenney wrote:
> On Wed, May 01, 2019 at 03:16:55PM -0400, Steven Rostedt wrote:
> > On Wed, 1 May 2019 12:12:13 -0700
> > "Paul E. McKenney" <[email protected]> wrote:
> >
> >
> > > OK, what I did was to apply the patch at the end of this email to -rcu
> > > branch dev, then run rcutorture as follows:
> > >
> > > nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop"
> > >
> > > This resulted in the console output that I placed here:
> > >
> > > http://www2.rdrop.com/~paulmck/submission/console.log.gz
> > >
> > > But I don't see calls to sched_setaffinity() or migration_cpu_stop().
> > > Steve, is something else needed on the kernel command line in addition to
> > > the following?
> > >
> > > ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop
> >
> > Do you have function graph enabled in the config?
> >
> > [ 2.098303] ftrace bootup tracer 'function_graph' not registered.
>
> I guess I don't! Thank you, will fix.
>
> Let's see...
>
> My .config has CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y. It looks like I
> need CONFIG_FUNCTION_GRAPH_TRACER=y, which I don't have. And it looks
> like that needs CONFIG_FUNCTION_TRACER=y, which I also don't have.
> But I do have CONFIG_HAVE_FUNCTION_TRACER=y. So I should add this
> to my rcutorture command line:
>
> --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y".
>
> I fired this up. Here is hoping! ;-)
>
> And it does have sched_setaffinity(), woo-hoo!!! I overwrote the old file:
>
> http://www2.rdrop.com/~paulmck/submission/console.log.gz
And I reran after adding a trace_printk(), which shows up as follows:
[ 211.409565] 6) | /* On unexpected CPU 6, expected 4!!! */
I placed the console log here:
http://www2.rdrop.com/~paulmck/submission/console.tpk.log.gz
Just in case the earlier log proves useful.
And it is acting as if the destination CPU proved to be offline. Except
that this rcutorture scenario doesn't offline anything, and I don't see
any CPU-hotplug removal messages. So I added another trace_printk() to
print out cpu_online_mask. This gets me the following:
[ 31.565605] 0) | /* On unexpected CPU 0, expected 1!!! */
[ 31.565605] 0) | /* Online CPUs: 0-7 */
So we didn't make it to CPU 1 despite its being online. I placed the
console log here:
http://www2.rdrop.com/~paulmck/submission/console.tpkol.log.gz
Thoughts?
Updated patch against -rcu below in case it is useful.
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index da04b5073dc3..23ec8ec7eddd 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -680,12 +680,24 @@ static struct rcu_torture_ops tasks_ops = {
static void synchronize_rcu_trivial(void)
{
int cpu;
+ int destcpu;
+ static int dont_trace;
for_each_online_cpu(cpu) {
- while (raw_smp_processor_id() != cpu)
- rcutorture_sched_setaffinity(current->pid,
- cpumask_of(cpu));
- WARN_ON_ONCE(raw_smp_processor_id() != cpu);
+ if (!READ_ONCE(dont_trace))
+ tracing_on();
+ rcutorture_sched_setaffinity(current->pid, cpumask_of(cpu));
+ destcpu = raw_smp_processor_id();
+ if (destcpu == cpu) {
+ tracing_off();
+ } else {
+ trace_printk("On unexpected CPU %d, expected %d!!!\n", destcpu, cpu);
+ trace_printk("Online CPUs: %*pbl\n", cpumask_pr_args(cpu_online_mask));
+ tracing_stop();
+ WRITE_ONCE(dont_trace, 1);
+ WARN_ON_ONCE(1);
+ rcu_ftrace_dump(DUMP_ALL);
+ }
}
}
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index caffee644932..edaf0ca22ff7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3495,6 +3495,7 @@ void __init rcu_init(void)
rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
WARN_ON(!rcu_par_gp_wq);
srcu_init();
+ tracing_off();
}
#include "tree_stall.h"
On Tue, May 07, 2019 at 03:16:13PM -0700, Paul E. McKenney wrote:
> On Wed, May 01, 2019 at 01:27:13PM -0700, Paul E. McKenney wrote:
> > On Wed, May 01, 2019 at 03:16:55PM -0400, Steven Rostedt wrote:
> > > On Wed, 1 May 2019 12:12:13 -0700
> > > "Paul E. McKenney" <[email protected]> wrote:
> > >
> > >
> > > > OK, what I did was to apply the patch at the end of this email to -rcu
> > > > branch dev, then run rcutorture as follows:
> > > >
> > > > nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop"
> > > >
> > > > This resulted in the console output that I placed here:
> > > >
> > > > http://www2.rdrop.com/~paulmck/submission/console.log.gz
> > > >
> > > > But I don't see calls to sched_setaffinity() or migration_cpu_stop().
> > > > Steve, is something else needed on the kernel command line in addition to
> > > > the following?
> > > >
> > > > ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop
> > >
> > > Do you have function graph enabled in the config?
> > >
> > > [ 2.098303] ftrace bootup tracer 'function_graph' not registered.
> >
> > I guess I don't! Thank you, will fix.
> >
> > Let's see...
> >
> > My .config has CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y. It looks like I
> > need CONFIG_FUNCTION_GRAPH_TRACER=y, which I don't have. And it looks
> > like that needs CONFIG_FUNCTION_TRACER=y, which I also don't have.
> > But I do have CONFIG_HAVE_FUNCTION_TRACER=y. So I should add this
> > to my rcutorture command line:
> >
> > --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y".
> >
> > I fired this up. Here is hoping! ;-)
> >
> > And it does have sched_setaffinity(), woo-hoo!!! I overwrote the old file:
> >
> > http://www2.rdrop.com/~paulmck/submission/console.log.gz
>
> And I reran after adding a trace_printk(), which shows up as follows:
>
> [ 211.409565] 6) | /* On unexpected CPU 6, expected 4!!! */
>
> I placed the console log here:
>
> http://www2.rdrop.com/~paulmck/submission/console.tpk.log.gz
>
> Just in case the earlier log proves useful.
>
> And it is acting as if the destination CPU proved to be offline. Except
> that this rcutorture scenario doesn't offline anything, and I don't see
> any CPU-hotplug removal messages. So I added another trace_printk() to
> print out cpu_online_mask. This gets me the following:
>
> [ 31.565605] 0) | /* On unexpected CPU 0, expected 1!!! */
> [ 31.565605] 0) | /* Online CPUs: 0-7 */
>
> So we didn't make it to CPU 1 despite its being online. I placed the
> console log here:
>
> http://www2.rdrop.com/~paulmck/submission/console.tpkol.log.gz
>
> Thoughts?
>
> Updated patch against -rcu below in case it is useful.
I added more debug and got this:
[ 215.097318] 4) | /* On unexpected CPU 4, expected 5!!! */
[ 215.098285] 4) | /* Online CPUs: 0-7 Active CPUs: 0-7 */
[ 215.099283] 4) | /* ret = 0, ->cpus_allowed 0-4,6-7 */
http://www.rdrop.com/~paulmck/submission/console.tpkm.log.gz
The task's ->cpus_allowed got set to the bitwise not of the requested
mask. I took a quick scan through the code but don't see how this
happens.
Very strange.
Any suggestions for further debugging? Left to myself, I would copy
the requested cpumask to a per-task location and insert checks in the
sched_setaffinity() code path, crude and ugly though that might seem.
(Ugly enough that I will wait until tomorrow to try it out.)
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index da04b5073dc3..51db19543012 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -680,12 +680,31 @@ static struct rcu_torture_ops tasks_ops = {
static void synchronize_rcu_trivial(void)
{
int cpu;
+ int destcpu;
+ static int dont_trace;
+ static cpumask_t my_cpumask;
+ static DEFINE_SPINLOCK(my_cpumask_lock);
+ int ret;
for_each_online_cpu(cpu) {
- while (raw_smp_processor_id() != cpu)
- rcutorture_sched_setaffinity(current->pid,
- cpumask_of(cpu));
- WARN_ON_ONCE(raw_smp_processor_id() != cpu);
+ if (!READ_ONCE(dont_trace))
+ tracing_on();
+ rcutorture_sched_setaffinity(current->pid, cpumask_of(cpu));
+ destcpu = raw_smp_processor_id();
+ if (destcpu == cpu) {
+ tracing_off();
+ } else {
+ trace_printk("On unexpected CPU %d, expected %d!!!\n", destcpu, cpu);
+ trace_printk("Online CPUs: %*pbl Active CPUs: %*pbl\n", cpumask_pr_args(cpu_online_mask), cpumask_pr_args(cpu_active_mask));
+ spin_lock(&my_cpumask_lock);
+ ret = sched_getaffinity(current->pid, &my_cpumask);
+ trace_printk("ret = %d, ->cpus_allowed %*pbl\n", ret, cpumask_pr_args(&my_cpumask));
+ spin_unlock(&my_cpumask_lock);
+ tracing_stop();
+ WRITE_ONCE(dont_trace, 1);
+ WARN_ON_ONCE(1);
+ rcu_ftrace_dump(DUMP_ALL);
+ }
}
}
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index caffee644932..edaf0ca22ff7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3495,6 +3495,7 @@ void __init rcu_init(void)
rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
WARN_ON(!rcu_par_gp_wq);
srcu_init();
+ tracing_off();
}
#include "tree_stall.h"
On Thu, May 09, 2019 at 10:36:54AM -0700, Paul E. McKenney wrote:
> On Tue, May 07, 2019 at 03:16:13PM -0700, Paul E. McKenney wrote:
> > On Wed, May 01, 2019 at 01:27:13PM -0700, Paul E. McKenney wrote:
> > > On Wed, May 01, 2019 at 03:16:55PM -0400, Steven Rostedt wrote:
> > > > On Wed, 1 May 2019 12:12:13 -0700
> > > > "Paul E. McKenney" <[email protected]> wrote:
> > > >
> > > >
> > > > > OK, what I did was to apply the patch at the end of this email to -rcu
> > > > > branch dev, then run rcutorture as follows:
> > > > >
> > > > > nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop"
> > > > >
> > > > > This resulted in the console output that I placed here:
> > > > >
> > > > > http://www2.rdrop.com/~paulmck/submission/console.log.gz
> > > > >
> > > > > But I don't see calls to sched_setaffinity() or migration_cpu_stop().
> > > > > Steve, is something else needed on the kernel command line in addition to
> > > > > the following?
> > > > >
> > > > > ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop
> > > >
> > > > Do you have function graph enabled in the config?
> > > >
> > > > [ 2.098303] ftrace bootup tracer 'function_graph' not registered.
> > >
> > > I guess I don't! Thank you, will fix.
> > >
> > > Let's see...
> > >
> > > My .config has CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y. It looks like I
> > > need CONFIG_FUNCTION_GRAPH_TRACER=y, which I don't have. And it looks
> > > like that needs CONFIG_FUNCTION_TRACER=y, which I also don't have.
> > > But I do have CONFIG_HAVE_FUNCTION_TRACER=y. So I should add this
> > > to my rcutorture command line:
> > >
> > > --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y".
> > >
> > > I fired this up. Here is hoping! ;-)
> > >
> > > And it does have sched_setaffinity(), woo-hoo!!! I overwrote the old file:
> > >
> > > http://www2.rdrop.com/~paulmck/submission/console.log.gz
> >
> > And I reran after adding a trace_printk(), which shows up as follows:
> >
> > [ 211.409565] 6) | /* On unexpected CPU 6, expected 4!!! */
> >
> > I placed the console log here:
> >
> > http://www2.rdrop.com/~paulmck/submission/console.tpk.log.gz
> >
> > Just in case the earlier log proves useful.
> >
> > And it is acting as if the destination CPU proved to be offline. Except
> > that this rcutorture scenario doesn't offline anything, and I don't see
> > any CPU-hotplug removal messages. So I added another trace_printk() to
> > print out cpu_online_mask. This gets me the following:
> >
> > [ 31.565605] 0) | /* On unexpected CPU 0, expected 1!!! */
> > [ 31.565605] 0) | /* Online CPUs: 0-7 */
> >
> > So we didn't make it to CPU 1 despite its being online. I placed the
> > console log here:
> >
> > http://www2.rdrop.com/~paulmck/submission/console.tpkol.log.gz
> >
> > Thoughts?
> >
> > Updated patch against -rcu below in case it is useful.
>
> I added more debug and got this:
>
> [ 215.097318] 4) | /* On unexpected CPU 4, expected 5!!! */
> [ 215.098285] 4) | /* Online CPUs: 0-7 Active CPUs: 0-7 */
> [ 215.099283] 4) | /* ret = 0, ->cpus_allowed 0-4,6-7 */
>
> http://www.rdrop.com/~paulmck/submission/console.tpkm.log.gz
>
> The task's ->cpus_allowed got set to the bitwise not of the requested
> mask. I took a quick scan through the code but don't see how this
> happens.
>
> Very strange.
>
> Any suggestions for further debugging? Left to myself, I would copy
> the requested cpumask to a per-task location and insert checks in the
> sched_setaffinity() code path, crude and ugly though that might seem.
> (Ugly enough that I will wait until tomorrow to try it out.)
And of course the nicer approach is to simply drop a few trace_printk()
calls into sched_setaffinity(), which got me this (with other CPU
tracing removed):
[ 207.440153] 0) | /* sched_setaffinity: new_mask 5 */
[ 207.445567] 0) | /* On unexpected CPU 0, expected 5!!! */
[ 207.450017] 0) | /* Online CPUs: 0-7 Active CPUs: 0-7 */
[ 207.454943] 0) | /* ret = 0, ->cpus_allowed 0-4,6-7 */
http://www.rdrop.com/~paulmck/submission/console.tkpms.log.gz
(Search for "->cpus_allowed".)
I forward-ported the relevant patches from -rcu and placed them on -rcu
branch peterz.2019.05.09a, and this is what produced the output above.
Any other debugging thoughts?
Or, if you wish, you can reproduce by running the following:
nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop" --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y"
This gets me the following summary output:
--- Thu May 9 12:08:31 PDT 2019 Test summary:
Results directory: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs TRIVIAL --bootargs trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop --kconfig CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y
TRIVIAL ------- 2177 GPs (18.1417/s) [trivial: g0 f0x0 ]
:CONFIG_HOTPLUG_CPU: improperly set
WARNING: BAD SEQ 2176:2176 last:2177 version 4
/home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31/TRIVIAL/console.log
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31/TRIVIAL/console.log
WARNING: Summary: Warnings: 1 Bugs: 1 Call Traces: 5 Stalls: 8
Thanx, Paul
On Thu, May 09, 2019 at 10:36:54AM -0700, Paul E. McKenney wrote:
> On Tue, May 07, 2019 at 03:16:13PM -0700, Paul E. McKenney wrote:
> > On Wed, May 01, 2019 at 01:27:13PM -0700, Paul E. McKenney wrote:
> > > On Wed, May 01, 2019 at 03:16:55PM -0400, Steven Rostedt wrote:
> > > > On Wed, 1 May 2019 12:12:13 -0700
> > > > "Paul E. McKenney" <[email protected]> wrote:
> > > >
> > > >
> > > > > OK, what I did was to apply the patch at the end of this email to -rcu
> > > > > branch dev, then run rcutorture as follows:
> > > > >
> > > > > nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop"
> > > > >
> > > > > This resulted in the console output that I placed here:
> > > > >
> > > > > http://www2.rdrop.com/~paulmck/submission/console.log.gz
> > > > >
> > > > > But I don't see calls to sched_setaffinity() or migration_cpu_stop().
> > > > > Steve, is something else needed on the kernel command line in addition to
> > > > > the following?
> > > > >
> > > > > ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop
> > > >
> > > > Do you have function graph enabled in the config?
> > > >
> > > > [ 2.098303] ftrace bootup tracer 'function_graph' not registered.
> > >
> > > I guess I don't! Thank you, will fix.
> > >
> > > Let's see...
> > >
> > > My .config has CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y. It looks like I
> > > need CONFIG_FUNCTION_GRAPH_TRACER=y, which I don't have. And it looks
> > > like that needs CONFIG_FUNCTION_TRACER=y, which I also don't have.
> > > But I do have CONFIG_HAVE_FUNCTION_TRACER=y. So I should add this
> > > to my rcutorture command line:
> > >
> > > --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y".
> > >
> > > I fired this up. Here is hoping! ;-)
> > >
> > > And it does have sched_setaffinity(), woo-hoo!!! I overwrote the old file:
> > >
> > > http://www2.rdrop.com/~paulmck/submission/console.log.gz
> >
> > And I reran after adding a trace_printk(), which shows up as follows:
> >
> > [ 211.409565] 6) | /* On unexpected CPU 6, expected 4!!! */
> >
> > I placed the console log here:
> >
> > http://www2.rdrop.com/~paulmck/submission/console.tpk.log.gz
> >
> > Just in case the earlier log proves useful.
> >
> > And it is acting as if the destination CPU proved to be offline. Except
> > that this rcutorture scenario doesn't offline anything, and I don't see
> > any CPU-hotplug removal messages. So I added another trace_printk() to
> > print out cpu_online_mask. This gets me the following:
> >
> > [ 31.565605] 0) | /* On unexpected CPU 0, expected 1!!! */
> > [ 31.565605] 0) | /* Online CPUs: 0-7 */
> >
> > So we didn't make it to CPU 1 despite its being online. I placed the
> > console log here:
> >
> > http://www2.rdrop.com/~paulmck/submission/console.tpkol.log.gz
> >
> > Thoughts?
And I can finally see/reproduce it!!
Frankly, no idea how this is happening (I have been staring at these
traces/functions for hours/days now... ;-/ )
Adding some "sched" folks in Cc: hopefully, they can shed some light
about this.
And yes, my only suggestion/approach would be to keep bisecting this
code with printk*..., 'fun' ;-/
Andrea
On Thu, May 09, 2019 at 11:40:25PM +0200, Andrea Parri wrote:
> On Thu, May 09, 2019 at 10:36:54AM -0700, Paul E. McKenney wrote:
> > On Tue, May 07, 2019 at 03:16:13PM -0700, Paul E. McKenney wrote:
> > > On Wed, May 01, 2019 at 01:27:13PM -0700, Paul E. McKenney wrote:
> > > > On Wed, May 01, 2019 at 03:16:55PM -0400, Steven Rostedt wrote:
> > > > > On Wed, 1 May 2019 12:12:13 -0700
> > > > > "Paul E. McKenney" <[email protected]> wrote:
> > > > >
> > > > >
> > > > > > OK, what I did was to apply the patch at the end of this email to -rcu
> > > > > > branch dev, then run rcutorture as follows:
> > > > > >
> > > > > > nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop"
> > > > > >
> > > > > > This resulted in the console output that I placed here:
> > > > > >
> > > > > > http://www2.rdrop.com/~paulmck/submission/console.log.gz
> > > > > >
> > > > > > But I don't see calls to sched_setaffinity() or migration_cpu_stop().
> > > > > > Steve, is something else needed on the kernel command line in addition to
> > > > > > the following?
> > > > > >
> > > > > > ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop
> > > > >
> > > > > Do you have function graph enabled in the config?
> > > > >
> > > > > [ 2.098303] ftrace bootup tracer 'function_graph' not registered.
> > > >
> > > > I guess I don't! Thank you, will fix.
> > > >
> > > > Let's see...
> > > >
> > > > My .config has CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y. It looks like I
> > > > need CONFIG_FUNCTION_GRAPH_TRACER=y, which I don't have. And it looks
> > > > like that needs CONFIG_FUNCTION_TRACER=y, which I also don't have.
> > > > But I do have CONFIG_HAVE_FUNCTION_TRACER=y. So I should add this
> > > > to my rcutorture command line:
> > > >
> > > > --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y".
> > > >
> > > > I fired this up. Here is hoping! ;-)
> > > >
> > > > And it does have sched_setaffinity(), woo-hoo!!! I overwrote the old file:
> > > >
> > > > http://www2.rdrop.com/~paulmck/submission/console.log.gz
> > >
> > > And I reran after adding a trace_printk(), which shows up as follows:
> > >
> > > [ 211.409565] 6) | /* On unexpected CPU 6, expected 4!!! */
> > >
> > > I placed the console log here:
> > >
> > > http://www2.rdrop.com/~paulmck/submission/console.tpk.log.gz
> > >
> > > Just in case the earlier log proves useful.
> > >
> > > And it is acting as if the destination CPU proved to be offline. Except
> > > that this rcutorture scenario doesn't offline anything, and I don't see
> > > any CPU-hotplug removal messages. So I added another trace_printk() to
> > > print out cpu_online_mask. This gets me the following:
> > >
> > > [ 31.565605] 0) | /* On unexpected CPU 0, expected 1!!! */
> > > [ 31.565605] 0) | /* Online CPUs: 0-7 */
> > >
> > > So we didn't make it to CPU 1 despite its being online. I placed the
> > > console log here:
> > >
> > > http://www2.rdrop.com/~paulmck/submission/console.tpkol.log.gz
> > >
> > > Thoughts?
>
> And I can finally see/reproduce it!!
>
> Frankly, no idea how this is happening (I have been staring at these
> traces/functions for hours/days now... ;-/ )
>
> Adding some "sched" folks in Cc: hopefully, they can shed some light
> about this.
+Thomas, +Sebastian
Thread starts here:
http://lkml.kernel.org/r/[email protected]
Andrea
>
> And yes, my only suggestion/approach would be to keep bisecting this
> code with printk*..., 'fun' ;-/
>
> Andrea
On Thu, May 09, 2019 at 11:56:35PM +0200, Andrea Parri wrote:
> On Thu, May 09, 2019 at 11:40:25PM +0200, Andrea Parri wrote:
> > On Thu, May 09, 2019 at 10:36:54AM -0700, Paul E. McKenney wrote:
> > > On Tue, May 07, 2019 at 03:16:13PM -0700, Paul E. McKenney wrote:
> > > > On Wed, May 01, 2019 at 01:27:13PM -0700, Paul E. McKenney wrote:
> > > > > On Wed, May 01, 2019 at 03:16:55PM -0400, Steven Rostedt wrote:
> > > > > > On Wed, 1 May 2019 12:12:13 -0700
> > > > > > "Paul E. McKenney" <[email protected]> wrote:
> > > > > >
> > > > > >
> > > > > > > OK, what I did was to apply the patch at the end of this email to -rcu
> > > > > > > branch dev, then run rcutorture as follows:
> > > > > > >
> > > > > > > nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop"
> > > > > > >
> > > > > > > This resulted in the console output that I placed here:
> > > > > > >
> > > > > > > http://www2.rdrop.com/~paulmck/submission/console.log.gz
> > > > > > >
> > > > > > > But I don't see calls to sched_setaffinity() or migration_cpu_stop().
> > > > > > > Steve, is something else needed on the kernel command line in addition to
> > > > > > > the following?
> > > > > > >
> > > > > > > ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop
> > > > > >
> > > > > > Do you have function graph enabled in the config?
> > > > > >
> > > > > > [ 2.098303] ftrace bootup tracer 'function_graph' not registered.
> > > > >
> > > > > I guess I don't! Thank you, will fix.
> > > > >
> > > > > Let's see...
> > > > >
> > > > > My .config has CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y. It looks like I
> > > > > need CONFIG_FUNCTION_GRAPH_TRACER=y, which I don't have. And it looks
> > > > > like that needs CONFIG_FUNCTION_TRACER=y, which I also don't have.
> > > > > But I do have CONFIG_HAVE_FUNCTION_TRACER=y. So I should add this
> > > > > to my rcutorture command line:
> > > > >
> > > > > --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y".
> > > > >
> > > > > I fired this up. Here is hoping! ;-)
> > > > >
> > > > > And it does have sched_setaffinity(), woo-hoo!!! I overwrote the old file:
> > > > >
> > > > > http://www2.rdrop.com/~paulmck/submission/console.log.gz
> > > >
> > > > And I reran after adding a trace_printk(), which shows up as follows:
> > > >
> > > > [ 211.409565] 6) | /* On unexpected CPU 6, expected 4!!! */
> > > >
> > > > I placed the console log here:
> > > >
> > > > http://www2.rdrop.com/~paulmck/submission/console.tpk.log.gz
> > > >
> > > > Just in case the earlier log proves useful.
> > > >
> > > > And it is acting as if the destination CPU proved to be offline. Except
> > > > that this rcutorture scenario doesn't offline anything, and I don't see
> > > > any CPU-hotplug removal messages. So I added another trace_printk() to
> > > > print out cpu_online_mask. This gets me the following:
> > > >
> > > > [ 31.565605] 0) | /* On unexpected CPU 0, expected 1!!! */
> > > > [ 31.565605] 0) | /* Online CPUs: 0-7 */
> > > >
> > > > So we didn't make it to CPU 1 despite its being online. I placed the
> > > > console log here:
> > > >
> > > > http://www2.rdrop.com/~paulmck/submission/console.tpkol.log.gz
> > > >
> > > > Thoughts?
> >
> > And I can finally see/reproduce it!!
> >
> > Frankly, no idea how this is happening (I have been staring at these
> > traces/functions for hours/days now... ;-/ )
> >
> > Adding some "sched" folks in Cc: hopefully, they can shed some light
> > about this.
>
> +Thomas, +Sebastian
>
> Thread starts here:
>
> http://lkml.kernel.org/r/[email protected]
Peter Zijlstra kindly volunteered over IRC to look at this more closely
tomorrow (well, today, his time). It is quite strange, though! ;-)
Thanx, Paul
> Andrea
>
>
> >
> > And yes, my only suggestion/approach would be to keep bisecting this
> > code with printk*..., 'fun' ;-/
> >
> > Andrea
>
> > > Adding some "sched" folks in Cc: hopefully, they can shed some light
> > > about this.
> >
> > +Thomas, +Sebastian
> >
> > Thread starts here:
> >
> > http://lkml.kernel.org/r/[email protected]
>
> Peter Zijlstra kindly volunteered over IRC to look at this more closely
> tomorrow (well, today, his time). It is quite strange, though! ;-)
Sounds good! ;-)
Thanx,
Andrea
On Thu, May 09, 2019 at 12:36:25PM -0700, Paul E. McKenney wrote:
> I forward-ported the relevant patches from -rcu and placed them on -rcu
> branch peterz.2019.05.09a, and this is what produced the output above.
>
> Any other debugging thoughts?
>
> Or, if you wish, you can reproduce by running the following:
>
> nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop" --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y"
>
> This gets me the following summary output:
>
> --- Thu May 9 12:08:31 PDT 2019 Test summary:
> Results directory: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31
> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs TRIVIAL --bootargs trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop --kconfig CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y
> TRIVIAL ------- 2177 GPs (18.1417/s) [trivial: g0 f0x0 ]
> :CONFIG_HOTPLUG_CPU: improperly set
> WARNING: BAD SEQ 2176:2176 last:2177 version 4
> /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31/TRIVIAL/console.log
> WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31/TRIVIAL/console.log
> WARNING: Summary: Warnings: 1 Bugs: 1 Call Traces: 5 Stalls: 8
So I could reproduce...
I must first complain about your scripts; it does "make mrproper" on the
source tree every time you run it, this is not appreciated. For one, it
deletes my 'tags' file.
Getting it to not rebuild the whole kernel every time wasn't easy
either.
Aside from that it seems to 'work'.
The below trace explain the issue. Some Paul person did it, see below.
It's broken per construction :-)
$ rm -rf tools/testing/selftests/rcutorture/res/ ; tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup,sched:sched_migrate_task ftrace=function_graph ftrace_graph_filter=set_cpus_allowed_ptr,migration_cpu_stop" --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y" --kmake-arg "-j8" --datestamp "ponies"
[ 210.222389] 4) | /* __set_cpus_allowed_ptr: (A) current: 4 new_mask: 5 valid: 0-7 */
[ 210.223205] 4) | do_set_cpus_allowed() {
[ 210.224203] 4) | dequeue_task_fair() {
[ 210.224678] 4) | update_curr() {
[ 210.226504] 4) 0.119 us | __calc_delta();
[ 210.227999] 4) 0.138 us | update_min_vruntime();
[ 210.229617] 4) 0.122 us | cpuacct_charge();
[ 210.230435] 4) 0.833 us | }
[ 210.230822] 4) 0.131 us | __update_load_avg_se();
[ 210.231892] 4) 0.124 us | __update_load_avg_cfs_rq();
[ 210.233000] 4) 0.119 us | clear_buddies();
[ 210.233498] 4) 0.133 us | account_entity_dequeue();
[ 210.237359] 4) 0.113 us | update_cfs_group();
[ 210.238355] 4) 2.260 us | }
[ 210.239055] 4) | put_prev_task_fair() {
[ 210.239521] 4) 0.119 us | put_prev_entity();
[ 210.240847] 4) 0.336 us | }
[ 210.241668] 4) 0.115 us | set_cpus_allowed_common();
[ 210.243608] 4) | enqueue_task_fair() {
[ 210.244564] 4) 0.125 us | update_curr();
[ 210.247119] 4) 0.112 us | __update_load_avg_se();
[ 210.248608] 4) 0.119 us | __update_load_avg_cfs_rq();
[ 210.249789] 4) 0.111 us | update_cfs_group();
[ 210.251499] 4) 0.127 us | account_entity_enqueue();
[ 210.252704] 4) 0.116 us | __enqueue_entity();
[ 210.254331] 4) 1.542 us | }
[ 210.255703] 4) | set_curr_task_fair() {
[ 210.256275] 4) | set_next_entity() {
[ 210.257310] 4) 0.112 us | __update_load_avg_se();
[ 210.258681] 4) 0.112 us | __update_load_avg_cfs_rq();
[ 210.260781] 4) 0.565 us | }
[ 210.262182] 4) 0.874 us | }
[ 210.263485] 4) 5.843 us | }
[ 210.266251] 4) | /* __set_cpus_allowed_ptr: (B) current: 5 new_mask: 5 valid: 0-7 */
[ 210.267390] 4) | /* __set_cpus_allowed_ptr: (C) 5 */
[ 210.269895] 4) 0.121 us | _raw_spin_unlock_irqrestore();
[ 210.271482] 4) | stop_one_cpu() {
[ 210.272518] 4) | cpu_stop_init_done() {
[ 210.273052] 4) 0.110 us | __init_waitqueue_head();
[ 210.274140] 4) 0.328 us | }
[ 210.274936] 4) | cpu_stop_queue_work() {
[ 210.275953] 4) 0.112 us | _raw_spin_lock_irqsave();
[ 210.276970] 4) 0.116 us | wake_q_add();
[ 210.278301] 4) 0.114 us | _raw_spin_unlock_irqrestore();
[ 210.279405] 4) | wake_up_q() {
[ 210.280276] 4) | try_to_wake_up() {
[ 210.280751] 4) 0.195 us | _raw_spin_lock_irqsave();
[ 210.282486] 4) 0.112 us | _raw_spin_lock();
[ 210.283454] 4) 0.118 us | update_rq_clock.part.81();
[ 210.285104] 4) | ttwu_do_activate.isra.85() {
[ 210.286408] 4) | activate_task() {
[ 210.288275] 4) 0.123 us | enqueue_task_stop();
[ 210.290128] 4) 0.348 us | }
[ 210.290567] 4) | ttwu_do_wakeup.isra.84() {
[ 210.291220] 4) | check_preempt_curr() {
[ 210.292790] 4) 0.113 us | resched_curr();
[ 210.293903] 4) 0.351 us | }
[ 210.295273] 4) | /* sched_wakeup: comm=migration/4 pid=29 prio=0 target_cpu=004 */
[ 210.297661] 4) 0.731 us | }
[ 210.299030] 4) 1.392 us | }
[ 210.300365] 4) 0.141 us | _raw_spin_unlock_irqrestore();
[ 210.302745] 4) 2.625 us | }
[ 210.304702] 4) 2.867 us | }
[ 210.306291] 4) 3.753 us | }
[ 210.306914] 4) | _cond_resched() {
[ 210.307508] 4) | rcu_note_context_switch() {
[ 210.309684] 4) 0.111 us | rcu_qs();
[ 210.311011] 4) 0.328 us | }
[ 210.312819] 4) 0.113 us | _raw_spin_lock();
[ 210.314061] 4) | pick_next_task_stop() {
[ 210.315791] 4) | put_prev_task_fair() {
[ 210.316399] 4) | put_prev_entity() {
[ 210.318532] 4) | update_curr() {
[ 210.319375] 4) 0.116 us | __calc_delta();
[ 210.320421] 4) 0.117 us | update_min_vruntime();
[ 210.321319] 4) 0.113 us | cpuacct_charge();
[ 210.322131] 4) 0.782 us | }
[ 210.324157] 4) 0.124 us | __enqueue_entity();
[ 210.325997] 4) 0.117 us | __update_load_avg_se();
[ 210.326996] 4) 0.116 us | __update_load_avg_cfs_rq();
[ 210.329194] 4) 1.732 us | }
[ 210.329655] 4) 1.947 us | }
[ 210.330669] 4) 2.175 us | }
[ 210.331114] 4) | /* sched_switch: prev_comm=rcu_torture_fak prev_pid=126 prev_prio=139 prev_state=R+ ==> next_comm=migration/4 next_pid=29 next_prio=0 */
[ 210.333252] 4) 0.111 us | enter_lazy_tlb();
[ 210.335410] 4) rcu_tor-126 => migrati-29
[ 210.335410] 4) rcu_tor-126 => migrati-29
[ 210.335410] 4) | migration_cpu_stop() {
[ 210.337537] 4) 0.114 us | sched_ttwu_pending();
[ 210.338411] 4) | _raw_spin_lock() {
[ 210.339466] 4) + 28.614 us | queued_spin_lock_slowpath();
[ 209.701653] 0) rcu_tor-132 => torture-136
[ 210.330101] 0) | set_cpus_allowed_ptr() {
[ 210.332653] 0) | __set_cpus_allowed_ptr() {
[ 210.333800] 0) | task_rq_lock() {
[ 210.334293] 0) 0.198 us | _raw_spin_lock_irqsave();
[ 210.334899] 0) 0.199 us | _raw_spin_lock();
[ 210.338041] 0) 0.904 us | }
[ 210.338908] 0) 0.236 us | update_rq_clock.part.81();
[ 210.340061] 0) | /* __set_cpus_allowed_ptr: (A) current: 5 new_mask: 0-3,5-7 valid: 0-7 */
[ 210.341300] 0) | do_set_cpus_allowed() {
[ 210.342067] 0) | dequeue_task_fair() {
[ 210.342992] 0) 0.217 us | update_curr();
[ 210.343637] 0) 0.203 us | __update_load_avg_se();
[ 210.344315] 0) 0.203 us | __update_load_avg_cfs_rq();
[ 210.345165] 0) 0.176 us | clear_buddies();
[ 210.345777] 0) 0.173 us | account_entity_dequeue();
[ 210.346627] 0) 0.182 us | update_cfs_group();
[ 210.347306] 0) 2.468 us | }
[ 210.347874] 0) 0.189 us | set_cpus_allowed_common();
[ 210.348561] 0) | enqueue_task_fair() {
[ 210.349148] 0) 0.189 us | update_curr();
[ 210.349635] 0) 0.174 us | __update_load_avg_se();
[ 210.350185] 0) 0.182 us | __update_load_avg_cfs_rq();
[ 210.352091] 0) 0.178 us | update_cfs_group();
[ 210.353542] 0) 0.178 us | account_entity_enqueue();
[ 210.354159] 0) 0.194 us | __enqueue_entity();
[ 210.355814] 0) 2.360 us | }
[ 210.356572] 0) 5.746 us | }
[ 210.357591] 0) | /* __set_cpus_allowed_ptr: (B) current: 0-3,5-7 new_mask: 0-3,5-7 valid: 0-7 */
[ 210.358523] 0) | /* __set_cpus_allowed_ptr: (C) 0 */
[ 210.359130] 0) | move_queued_task.isra.86() {
[ 210.359674] 0) | dequeue_task_fair() {
[ 210.360172] 0) 0.192 us | update_curr();
[ 210.360627] 0) 0.180 us | __update_load_avg_se();
[ 210.361174] 0) 0.194 us | __update_load_avg_cfs_rq();
[ 210.363072] 0) 0.178 us | clear_buddies();
[ 210.364838] 0) 0.179 us | account_entity_dequeue();
[ 210.367303] 0) 0.183 us | update_cfs_group();
[ 210.368270] 0) 0.194 us | update_min_vruntime();
[ 210.369911] 0) 2.791 us | }
[ 210.370890] 0) | set_task_cpu() {
[ 210.371802] 0) | /* sched_migrate_task: comm=rcu_torture_fak pid=126 prio=139 orig_cpu=4 dest_cpu=0 */
[ 210.374814] 0) | migrate_task_rq_fair() {
[ 210.375862] 0) | detach_entity_cfs_rq() {
[ 210.376380] 0) 0.208 us | __update_load_avg_se();
[ 210.377900] 0) 0.180 us | __update_load_avg_cfs_rq();
[ 210.382398] 0) 4.007 us | }
[ 210.383005] 0) 4.360 us | }
[ 210.384378] 0) 0.177 us | set_task_rq_fair();
[ 210.385721] 0) 5.872 us | }
[ 210.386970] 0) 0.182 us | _raw_spin_lock();
[ 210.388990] 0) 0.189 us | update_rq_clock.part.81();
[ 210.390491] 0) | enqueue_task_fair() {
[ 210.391030] 0) | update_curr() {
[ 210.391861] 0) 0.305 us | update_min_vruntime();
[ 210.399184] 0) 0.190 us | cpuacct_charge();
[ 210.400219] 0) 6.009 us | }
[ 210.401182] 0) 0.182 us | __update_load_avg_cfs_rq();
[ 210.402201] 0) 0.194 us | attach_entity_load_avg();
[ 210.403180] 0) 0.169 us | update_cfs_group();
[ 210.404647] 0) 0.173 us | account_entity_enqueue();
[ 210.406043] 0) 0.261 us | __enqueue_entity();
[ 210.407171] 0) 8.265 us | }
[ 210.408609] 0) | check_preempt_curr() {
[ 210.409712] 0) | check_preempt_wakeup() {
[ 210.410576] 0) 0.179 us | update_curr();
[ 210.411104] 0) 0.185 us | wakeup_preempt_entity.isra.93();
[ 210.412454] 0) 0.890 us | }
[ 210.412845] 0) 1.270 us | }
[ 210.413888] 0) + 19.818 us | }
[ 210.414968] 0) 0.182 us | _raw_spin_unlock_irqrestore();
[ 210.417043] 0) + 30.113 us | }
[ 210.418607] 0) + 30.451 us | }
This looks like:
kernel/torture.c:torture_shuffle_tasks()
Written by some Paul McKenney person, I'd to ask him :-)
[ 210.416420] 4) + 28.901 us | }
[ 210.418122] 4) 0.153 us | _raw_spin_lock();
[ 210.420277] 4) | /* migration_cpu_stop: this: 4 p: 0 dest: 5 on_rq: 1 */
[ 210.422796] 4) + 29.893 us | }
[ 210.430554] 4) | /* sched_switch: prev_comm=migration/4 prev_pid=29 prev_prio=0 prev_state=S ==> next_comm=swapper/4 next_pid=0 next_prio=120 */
[ 210.857115] 0) migrati-11 => rcu_tor-126
[ 210.857115] 0) 0.184 us | finish_task_switch();
[ 210.863058] 0) # 2749.563 us | } /* _cond_resched */
[ 210.863616] 0) | wait_for_completion() {
[ 210.864157] 0) 0.171 us | _raw_spin_lock_irq();
[ 210.866866] 0) 0.556 us | }
[ 210.867759] 0) # 2755.063 us | } /* stop_one_cpu */
[ 210.868774] 0) # 2763.608 us | } /* __set_cpus_allowed_ptr */
[ 210.869873] 0) # 2763.873 us | } /* set_cpus_allowed_ptr */
[ 210.871830] 0) | /* On unexpected CPU 0, expected 5!!! */
[ 210.873981] 0) | /* Online CPUs: 0-7 Active CPUs: 0-7 */
[ 210.876291] 0) | /* ret = 0, ->cpus_allowed 0-3,5-7 */
---
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 4b58c907b4b7..f06f1cad68c5 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -451,6 +451,7 @@ void rcu_request_urgent_qs_task(struct task_struct *t);
enum rcutorture_type {
RCU_FLAVOR,
RCU_TASKS_FLAVOR,
+ RCU_TRIVIAL_FLAVOR,
SRCU_FLAVOR,
INVALID_RCU_FLAVOR
};
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index efaa5b3f4d3f..0952efa55247 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -670,6 +670,66 @@ static struct rcu_torture_ops tasks_ops = {
.name = "tasks"
};
+/*
+ * Definitions for trivial CONFIG_PREEMPT=n-only torture testing.
+ */
+
+static void synchronize_rcu_trivial(void)
+{
+ static int dont_trace;
+ int destcpu, cpu, ret;
+
+ cpus_read_lock();
+ for_each_online_cpu(cpu) {
+ if (!READ_ONCE(dont_trace))
+ tracing_on();
+
+ trace_printk("switching to CPU: %d\n", cpu);
+ ret = set_cpus_allowed_ptr(current, cpumask_of(cpu));
+ WARN_ONCE(ret, "%s: sched_setaffinity() returned %d\n", __func__, ret);
+
+ destcpu = raw_smp_processor_id();
+ if (destcpu == cpu) {
+ tracing_off();
+ } else {
+ trace_printk("On unexpected CPU %d, expected %d!!!\n", destcpu, cpu);
+ trace_printk("Online CPUs: %*pbl Active CPUs: %*pbl\n", cpumask_pr_args(cpu_online_mask), cpumask_pr_args(cpu_active_mask));
+ trace_printk("ret = %d, ->cpus_allowed %*pbl\n", ret, cpumask_pr_args(¤t->cpus_allowed));
+ tracing_stop();
+ WRITE_ONCE(dont_trace, 1);
+ WARN_ON_ONCE(1);
+ rcu_ftrace_dump(DUMP_ALL);
+ }
+ }
+ cpus_read_unlock();
+}
+
+static int rcu_torture_read_lock_trivial(void) __acquires(RCU)
+{
+ preempt_disable();
+ return 0;
+}
+
+static void rcu_torture_read_unlock_trivial(int idx) __releases(RCU)
+{
+ preempt_enable();
+}
+
+static struct rcu_torture_ops trivial_ops = {
+ .ttype = RCU_TRIVIAL_FLAVOR,
+ .init = rcu_sync_torture_init,
+ .readlock = rcu_torture_read_lock_trivial,
+ .read_delay = rcu_read_delay, /* just reuse rcu's version. */
+ .readunlock = rcu_torture_read_unlock_trivial,
+ .get_gp_seq = rcu_no_completed,
+ .sync = synchronize_rcu_trivial,
+ .exp_sync = synchronize_rcu_trivial,
+ .fqs = NULL,
+ .stats = NULL,
+ .irq_capable = 1,
+ .name = "trivial"
+};
+
static unsigned long rcutorture_seq_diff(unsigned long new, unsigned long old)
{
if (!cur_ops->gp_diff)
@@ -1765,6 +1825,8 @@ static void rcu_torture_fwd_prog_cr(void)
if (READ_ONCE(rcu_fwd_emergency_stop))
return; /* Get out of the way quickly, no GP wait! */
+ if (!cur_ops->call)
+ return; /* Can't do call_rcu() fwd prog without ->call. */
/* Loop continuously posting RCU callbacks. */
WRITE_ONCE(rcu_fwd_cb_nodelay, true);
@@ -2240,7 +2302,7 @@ rcu_torture_init(void)
int firsterr = 0;
static struct rcu_torture_ops *torture_ops[] = {
&rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops,
- &busted_srcud_ops, &tasks_ops,
+ &busted_srcud_ops, &tasks_ops, &trivial_ops,
};
if (!torture_init_begin(torture_type, verbose))
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b4d88a594785..4b53f0ed4e45 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3375,6 +3375,7 @@ void __init rcu_init(void)
rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
WARN_ON(!rcu_par_gp_wq);
srcu_init();
+ tracing_off();
}
#include "tree_stall.h"
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 102dfcf0a29a..5dd84ce34528 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -996,8 +996,10 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
struct task_struct *p, int dest_cpu)
{
/* Affinity changed (again). */
- if (!is_cpu_allowed(p, dest_cpu))
+ if (!is_cpu_allowed(p, dest_cpu)) {
+ trace_printk("%s: !allowed\n", __func__);
return rq;
+ }
update_rq_clock(rq);
rq = move_queued_task(rq, rf, p, dest_cpu);
@@ -1031,6 +1033,10 @@ static int migration_cpu_stop(void *data)
raw_spin_lock(&p->pi_lock);
rq_lock(rq, &rf);
+
+ trace_printk("%s: this: %d p: %d dest: %d on_rq: %x\n", __func__,
+ rq->cpu, task_cpu(p), arg->dest_cpu, p->on_rq);
+
/*
* If task_rq(p) != rq, it cannot be migrated here, because we're
* holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
@@ -1116,6 +1122,11 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
cpu_valid_mask = cpu_online_mask;
}
+ trace_printk("%s: (A) current: %*pbl new_mask: %*pbl valid: %*pbl\n", __func__,
+ cpumask_pr_args(&p->cpus_allowed),
+ cpumask_pr_args(new_mask),
+ cpumask_pr_args(cpu_valid_mask));
+
/*
* Must re-check here, to close a race against __kthread_bind(),
* sched_setaffinity() is not guaranteed to observe the flag.
@@ -1145,11 +1156,19 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
p->nr_cpus_allowed != 1);
}
+ trace_printk("%s: (B) current: %*pbl new_mask: %*pbl valid: %*pbl\n", __func__,
+ cpumask_pr_args(&p->cpus_allowed),
+ cpumask_pr_args(new_mask),
+ cpumask_pr_args(cpu_valid_mask));
+
/* Can the task run on the task's current CPU? If so, we're done */
if (cpumask_test_cpu(task_cpu(p), new_mask))
goto out;
dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
+
+ trace_printk("%s: (C) %d\n", __func__, dest_cpu);
+
if (task_running(rq, p) || p->state == TASK_WAKING) {
struct migration_arg arg = { p, dest_cpu };
/* Need help from migration thread: drop lock and wait. */
@@ -1591,8 +1610,13 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
* [ this allows ->select_task() to simply return task_cpu(p) and
* not worry about this generic constraint ]
*/
- if (unlikely(!is_cpu_allowed(p, cpu)))
+ if (unlikely(!is_cpu_allowed(p, cpu))) {
+ trace_printk("%s: (A) cpus_allowed: %*pbl nr_cpus_allowed: %d cpu: %d\n", __func__,
+ cpumask_pr_args(&p->cpus_allowed), p->nr_cpus_allowed, cpu);
cpu = select_fallback_rq(task_cpu(p), p);
+ trace_printk("%s: (B) cpus_allowed: %*pbl nr_cpus_allowed: %d cpu: %d\n", __func__,
+ cpumask_pr_args(&p->cpus_allowed), p->nr_cpus_allowed, cpu);
+ }
return cpu;
}
diff --git a/tools/testing/selftests/rcutorture/bin/configinit.sh b/tools/testing/selftests/rcutorture/bin/configinit.sh
index 40359486b3a8..d53d7bf52c1d 100755
--- a/tools/testing/selftests/rcutorture/bin/configinit.sh
+++ b/tools/testing/selftests/rcutorture/bin/configinit.sh
@@ -45,8 +45,8 @@ sed -e 's/^\(CONFIG[0-9A-Z_]*\)=.*$/grep -v "^# \1" |/' < $c > $T/u.sh
sed -e 's/^\(CONFIG[0-9A-Z_]*=\).*$/grep -v \1 |/' < $c >> $T/u.sh
grep '^grep' < $T/u.sh > $T/upd.sh
echo "cat - $c" >> $T/upd.sh
-make mrproper
-make $buildloc distclean > $resdir/Make.distclean 2>&1
+#make mrproper
+#make $buildloc distclean > $resdir/Make.distclean 2>&1
make $buildloc $TORTURE_DEFCONFIG > $resdir/Make.defconfig.out 2>&1
mv $builddir/.config $builddir/.config.sav
sh $T/upd.sh < $builddir/.config.sav > $builddir/.config
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL b/tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL
new file mode 100644
index 000000000000..4d8eb5bfb6f6
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL
@@ -0,0 +1,14 @@
+CONFIG_SMP=y
+CONFIG_NR_CPUS=8
+CONFIG_PREEMPT_NONE=y
+CONFIG_PREEMPT_VOLUNTARY=n
+CONFIG_PREEMPT=n
+CONFIG_HZ_PERIODIC=n
+CONFIG_NO_HZ_IDLE=y
+CONFIG_NO_HZ_FULL=n
+CONFIG_HOTPLUG_CPU=n
+CONFIG_SUSPEND=n
+CONFIG_HIBERNATION=n
+CONFIG_DEBUG_LOCK_ALLOC=n
+CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
+CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL.boot b/tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL.boot
new file mode 100644
index 000000000000..cab2dc4a708c
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL.boot
@@ -0,0 +1,2 @@
+rcutorture.torture_type=trivial
+rcutorture.onoff_interval=0
On Fri, May 10, 2019 at 02:08:19PM +0200, Peter Zijlstra wrote:
> On Thu, May 09, 2019 at 12:36:25PM -0700, Paul E. McKenney wrote:
> > I forward-ported the relevant patches from -rcu and placed them on -rcu
> > branch peterz.2019.05.09a, and this is what produced the output above.
> >
> > Any other debugging thoughts?
> >
> > Or, if you wish, you can reproduce by running the following:
> >
> > nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop" --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y"
> >
> > This gets me the following summary output:
> >
> > --- Thu May 9 12:08:31 PDT 2019 Test summary:
> > Results directory: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31
> > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs TRIVIAL --bootargs trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop --kconfig CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y
> > TRIVIAL ------- 2177 GPs (18.1417/s) [trivial: g0 f0x0 ]
> > :CONFIG_HOTPLUG_CPU: improperly set
> > WARNING: BAD SEQ 2176:2176 last:2177 version 4
> > /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31/TRIVIAL/console.log
> > WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31/TRIVIAL/console.log
> > WARNING: Summary: Warnings: 1 Bugs: 1 Call Traces: 5 Stalls: 8
>
> So I could reproduce...
>
> I must first complain about your scripts; it does "make mrproper" on the
> source tree every time you run it, this is not appreciated. For one, it
> deletes my 'tags' file.
This is because it builds in a different directory, and "make O=/path"
complains if you don't have the source directory pristine.
But there really is no longer any reason to build in a different
directory, I suppose. This is a largish change, but working on it.
> Getting it to not rebuild the whole kernel every time wasn't easy
> either.
You trust "make" far more than I do! I am thinking of adding a
"--trust-make" argument that suppresses the "make clean". Maybe if
I grow to trust "make" in the fulness of time, I can remove the "make
clean" entirely. But given ccache, and given the duration of the typical
rcutorture run, and given that there are multiple rcutorture scenarios
each with a different .config, this hasn't been a priority. The build
step is already omitted for repeated runs.
> Aside from that it seems to 'work'.
>
> The below trace explain the issue. Some Paul person did it, see below.
> It's broken per construction :-)
*facepalm* Hence the very strange ->cpus_allowed mask. I really
should have figured that one out.
The fix is straightforward. I just added "rcutorture.shuffle_interval=0"
to the TRIVIAL.boot file, which stops rcutorture from shuffling its
kthreads around.
Please accept my apologies for the hassle, and thank you for tracking
this down!!!
Thanx, Paul
> > The below trace explain the issue. Some Paul person did it, see below.
> > It's broken per construction :-)
>
> *facepalm* Hence the very strange ->cpus_allowed mask. I really
> should have figured that one out.
>
> The fix is straightforward. I just added "rcutorture.shuffle_interval=0"
> to the TRIVIAL.boot file, which stops rcutorture from shuffling its
> kthreads around.
I added the option to the file and I didn't reproduce the issue.
> Please accept my apologies for the hassle, and thank you for tracking
> this down!!!
Peter (echoing Paul): Thank you for pointing that shuffler out!
Andrea
On Sat, May 11, 2019 at 11:45:20PM +0200, Andrea Parri wrote:
> > > The below trace explain the issue. Some Paul person did it, see below.
> > > It's broken per construction :-)
> >
> > *facepalm* Hence the very strange ->cpus_allowed mask. I really
> > should have figured that one out.
> >
> > The fix is straightforward. I just added "rcutorture.shuffle_interval=0"
> > to the TRIVIAL.boot file, which stops rcutorture from shuffling its
> > kthreads around.
>
> I added the option to the file and I didn't reproduce the issue.
Thank you! May I add your Tested-by?
Thanx, Paul
> > Please accept my apologies for the hassle, and thank you for tracking
> > this down!!!
>
> Peter (echoing Paul): Thank you for pointing that shuffler out!
>
> Andrea
>
> > > The fix is straightforward. I just added "rcutorture.shuffle_interval=0"
> > > to the TRIVIAL.boot file, which stops rcutorture from shuffling its
> > > kthreads around.
> >
> > I added the option to the file and I didn't reproduce the issue.
>
> Thank you! May I add your Tested-by?
Please feel free to do so. But it may be worth to squash "the commits"
(and adjust the changelogs accordingly). And you might want to remove
some of those debug checks/prints?
Andrea
On Fri, May 10, 2019 at 04:07:42PM -0700, Paul E. McKenney wrote:
> > The below trace explain the issue. Some Paul person did it, see below.
> > It's broken per construction :-)
>
> *facepalm* Hence the very strange ->cpus_allowed mask. I really
> should have figured that one out.
I guess it's called a torture framework for a reason ;-)
On Sun, May 12, 2019 at 03:05:39AM +0200, Andrea Parri wrote:
> > > > The fix is straightforward. I just added "rcutorture.shuffle_interval=0"
> > > > to the TRIVIAL.boot file, which stops rcutorture from shuffling its
> > > > kthreads around.
> > >
> > > I added the option to the file and I didn't reproduce the issue.
> >
> > Thank you! May I add your Tested-by?
>
> Please feel free to do so. But it may be worth to squash "the commits"
> (and adjust the changelogs accordingly). And you might want to remove
> some of those debug checks/prints?
Revert/remove a number of the commits, but yes. ;-)
And remove the extra loop, but leave the single WARN_ON() complaining
about being on the wrong CPU.
Thanx, Paul
On Mon, May 13, 2019 at 10:10:52AM +0200, Peter Zijlstra wrote:
> On Fri, May 10, 2019 at 04:07:42PM -0700, Paul E. McKenney wrote:
> > > The below trace explain the issue. Some Paul person did it, see below.
> > > It's broken per construction :-)
> >
> > *facepalm* Hence the very strange ->cpus_allowed mask. I really
> > should have figured that one out.
>
> I guess it's called a torture framework for a reason ;-)
Fair enough. And I doubt that this is the firs time that it has
tortured other people rather than torturing RCU. ;-)
Thanx, Paul
On Mon, May 13, 2019 at 11:37:14AM -0400, Joel Fernandes wrote:
> On Mon, May 13, 2019 at 05:20:43AM -0700, Paul E. McKenney wrote:
> > On Sun, May 12, 2019 at 03:05:39AM +0200, Andrea Parri wrote:
> > > > > > The fix is straightforward. I just added "rcutorture.shuffle_interval=0"
> > > > > > to the TRIVIAL.boot file, which stops rcutorture from shuffling its
> > > > > > kthreads around.
> > > > >
> > > > > I added the option to the file and I didn't reproduce the issue.
> > > >
> > > > Thank you! May I add your Tested-by?
> > >
> > > Please feel free to do so. But it may be worth to squash "the commits"
> > > (and adjust the changelogs accordingly). And you might want to remove
> > > some of those debug checks/prints?
> >
> > Revert/remove a number of the commits, but yes. ;-)
> >
> > And remove the extra loop, but leave the single WARN_ON() complaining
> > about being on the wrong CPU.
>
> The other "toy" implementation I noticed is based on reader/writer locking.
>
> Would you see value in having that as an additional rcu torture type?
Interesting question!
My kneejerk reaction is "no" because the fact that reader-writer locking
primitives pass locktorture imply that they have the needed semantics
to be a toy RCU implementation. (Things like NMI handlers prevent them
from operating correctly within the Linux kernel, and even things like
interrupt handlers would require disabling interrupts for Linux-kernel
use, but from a toy/textbook perspective, they qualify.)
We do have a large number of toy RCU implementations in perfbook, though,
and I believe reader-writer locking is one of them.
But the current "trivial" version would actually work in the Linux
kernel as it is, give or take more esoteric things like CPU hotplug
and respecting user-level uses of sched_setaffinity(). Which could be
"fixed", but at the expense of making it quite a bit less trivial.
(See early-2000s LKML traffic for some proposals along these lines.)
Thanx, Paul
On Mon, May 13, 2019 at 05:20:43AM -0700, Paul E. McKenney wrote:
> On Sun, May 12, 2019 at 03:05:39AM +0200, Andrea Parri wrote:
> > > > > The fix is straightforward. I just added "rcutorture.shuffle_interval=0"
> > > > > to the TRIVIAL.boot file, which stops rcutorture from shuffling its
> > > > > kthreads around.
> > > >
> > > > I added the option to the file and I didn't reproduce the issue.
> > >
> > > Thank you! May I add your Tested-by?
> >
> > Please feel free to do so. But it may be worth to squash "the commits"
> > (and adjust the changelogs accordingly). And you might want to remove
> > some of those debug checks/prints?
>
> Revert/remove a number of the commits, but yes. ;-)
>
> And remove the extra loop, but leave the single WARN_ON() complaining
> about being on the wrong CPU.
The other "toy" implementation I noticed is based on reader/writer locking.
Would you see value in having that as an additional rcu torture type?
thanks,
- Joel