2015-11-03 10:06:49

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc: tracing: don't trace hcalls on offline CPUs

On Thu, 2015-10-29 at 22:10 +0300, Denis Kirjanov wrote:

> ./drmgr -c cpu -a -r gives the following warning:
>
> [ 2327.035563] RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1
> [ 2327.035564] no locks held by swapper/12/0.
> [ 2327.035565] stack backtrace:
> [ 2327.035567] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G S 4.3.0-rc3-00060-g353169a #5
> [ 2327.035568] Call Trace:
> [ 2327.035573] [c0000001d62578e0] [c0000000008977fc] .dump_stack+0x98/0xd4 (unreliable)
> [ 2327.035577] [c0000001d6257960] [c000000000109bd8] .lockdep_rcu_suspicious+0x108/0x170
> [ 2327.035580] [c0000001d62579f0] [c00000000006a1d0] .__trace_hcall_exit+0x2b0/0x2c0
> [ 2327.035584] [c0000001d6257ab0] [c00000000006a2e8] plpar_hcall_norets_trace+0x70/0x8c
> [ 2327.035588] [c0000001d6257b20] [c000000000067a14] .icp_hv_set_cpu_priority+0x54/0xc0
> [ 2327.035592] [c0000001d6257ba0] [c000000000066c5c] .xics_teardown_cpu+0x5c/0xa0
> [ 2327.035595] [c0000001d6257c20] [c0000000000747ac] .pseries_mach_cpu_die+0x6c/0x320
> [ 2327.035598] [c0000001d6257cd0] [c0000000000439cc] .cpu_die+0x3c/0x60
> [ 2327.035602] [c0000001d6257d40] [c0000000000183d8] .arch_cpu_idle_dead+0x28/0x40
> [ 2327.035606] [c0000001d6257db0] [c0000000000ff1dc] .cpu_startup_entry+0x4fc/0x560
> [ 2327.035610] [c0000001d6257ed0] [c000000000043728] .start_secondary+0x328/0x360
> [ 2327.035614] [c0000001d6257f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
> [ 2327.035620] cpu 12 (hwid 12) Ready to die...
> [ 2327.144463] cpu 13 (hwid 13) Ready to die...
> [ 2327.294180] cpu 14 (hwid 14) Ready to die...
> [ 2327.403599] cpu 15 (hwid 15) Ready to die...
>
> Make the hypervisor tracepoints conditional by introducing
> TRACE_EVENT_FN_COND similar to TRACE_EVENT_FN

We've fixed other cases like this with RCU_NONIDLE(), but I assume that
doesn't work here because we're actually offline?

cheers


2015-11-03 12:58:51

by Denis Kirjanov

[permalink] [raw]
Subject: Re: [PATCH] powerpc: tracing: don't trace hcalls on offline CPUs

On 11/3/15, Michael Ellerman <[email protected]> wrote:
> On Thu, 2015-10-29 at 22:10 +0300, Denis Kirjanov wrote:
>
>> ./drmgr -c cpu -a -r gives the following warning:
>>
>> [ 2327.035563] RCU used illegally from offline CPU! rcu_scheduler_active
>> = 1, debug_locks = 1
>> [ 2327.035564] no locks held by swapper/12/0.
>> [ 2327.035565] stack backtrace:
>> [ 2327.035567] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G S
>> 4.3.0-rc3-00060-g353169a #5
>> [ 2327.035568] Call Trace:
>> [ 2327.035573] [c0000001d62578e0] [c0000000008977fc] .dump_stack+0x98/0xd4
>> (unreliable)
>> [ 2327.035577] [c0000001d6257960] [c000000000109bd8]
>> .lockdep_rcu_suspicious+0x108/0x170
>> [ 2327.035580] [c0000001d62579f0] [c00000000006a1d0]
>> .__trace_hcall_exit+0x2b0/0x2c0
>> [ 2327.035584] [c0000001d6257ab0] [c00000000006a2e8]
>> plpar_hcall_norets_trace+0x70/0x8c
>> [ 2327.035588] [c0000001d6257b20] [c000000000067a14]
>> .icp_hv_set_cpu_priority+0x54/0xc0
>> [ 2327.035592] [c0000001d6257ba0] [c000000000066c5c]
>> .xics_teardown_cpu+0x5c/0xa0
>> [ 2327.035595] [c0000001d6257c20] [c0000000000747ac]
>> .pseries_mach_cpu_die+0x6c/0x320
>> [ 2327.035598] [c0000001d6257cd0] [c0000000000439cc] .cpu_die+0x3c/0x60
>> [ 2327.035602] [c0000001d6257d40] [c0000000000183d8]
>> .arch_cpu_idle_dead+0x28/0x40
>> [ 2327.035606] [c0000001d6257db0] [c0000000000ff1dc]
>> .cpu_startup_entry+0x4fc/0x560
>> [ 2327.035610] [c0000001d6257ed0] [c000000000043728]
>> .start_secondary+0x328/0x360
>> [ 2327.035614] [c0000001d6257f90] [c000000000008a6c]
>> start_secondary_prolog+0x10/0x14
>> [ 2327.035620] cpu 12 (hwid 12) Ready to die...
>> [ 2327.144463] cpu 13 (hwid 13) Ready to die...
>> [ 2327.294180] cpu 14 (hwid 14) Ready to die...
>> [ 2327.403599] cpu 15 (hwid 15) Ready to die...
>>
>> Make the hypervisor tracepoints conditional by introducing
>> TRACE_EVENT_FN_COND similar to TRACE_EVENT_FN
>
> We've fixed other cases like this with RCU_NONIDLE(), but I assume that
> doesn't work here because we're actually offline?

Yes, in this case we're moving the complete core offline through dlpar...

>
> cheers
>
>

2015-11-03 13:25:40

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] powerpc: tracing: don't trace hcalls on offline CPUs

On Tue, Nov 03, 2015 at 09:06:42PM +1100, Michael Ellerman wrote:
> On Thu, 2015-10-29 at 22:10 +0300, Denis Kirjanov wrote:
>
> > ./drmgr -c cpu -a -r gives the following warning:
> >
> > [ 2327.035563] RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1
> > [ 2327.035564] no locks held by swapper/12/0.
> > [ 2327.035565] stack backtrace:
> > [ 2327.035567] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G S 4.3.0-rc3-00060-g353169a #5
> > [ 2327.035568] Call Trace:
> > [ 2327.035573] [c0000001d62578e0] [c0000000008977fc] .dump_stack+0x98/0xd4 (unreliable)
> > [ 2327.035577] [c0000001d6257960] [c000000000109bd8] .lockdep_rcu_suspicious+0x108/0x170
> > [ 2327.035580] [c0000001d62579f0] [c00000000006a1d0] .__trace_hcall_exit+0x2b0/0x2c0
> > [ 2327.035584] [c0000001d6257ab0] [c00000000006a2e8] plpar_hcall_norets_trace+0x70/0x8c
> > [ 2327.035588] [c0000001d6257b20] [c000000000067a14] .icp_hv_set_cpu_priority+0x54/0xc0
> > [ 2327.035592] [c0000001d6257ba0] [c000000000066c5c] .xics_teardown_cpu+0x5c/0xa0
> > [ 2327.035595] [c0000001d6257c20] [c0000000000747ac] .pseries_mach_cpu_die+0x6c/0x320
> > [ 2327.035598] [c0000001d6257cd0] [c0000000000439cc] .cpu_die+0x3c/0x60
> > [ 2327.035602] [c0000001d6257d40] [c0000000000183d8] .arch_cpu_idle_dead+0x28/0x40
> > [ 2327.035606] [c0000001d6257db0] [c0000000000ff1dc] .cpu_startup_entry+0x4fc/0x560
> > [ 2327.035610] [c0000001d6257ed0] [c000000000043728] .start_secondary+0x328/0x360
> > [ 2327.035614] [c0000001d6257f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
> > [ 2327.035620] cpu 12 (hwid 12) Ready to die...
> > [ 2327.144463] cpu 13 (hwid 13) Ready to die...
> > [ 2327.294180] cpu 14 (hwid 14) Ready to die...
> > [ 2327.403599] cpu 15 (hwid 15) Ready to die...
> >
> > Make the hypervisor tracepoints conditional by introducing
> > TRACE_EVENT_FN_COND similar to TRACE_EVENT_FN
>
> We've fixed other cases like this with RCU_NONIDLE(), but I assume that
> doesn't work here because we're actually offline?

Yes, RCU_NONIDLE() only works for idle CPUs. (For tracing, you can also
use the _rcuidle() event-tracing suffix.) The only way to safely have
RCU readers on offline CPUs is to bring them online. (SRCU being the
only exception.)

Thanx, Paul

2015-11-20 12:22:50

by Denis Kirjanov

[permalink] [raw]
Subject: Re: [PATCH] powerpc: tracing: don't trace hcalls on offline CPUs

On 11/3/15, Denis Kirjanov <[email protected]> wrote:
> On 11/3/15, Michael Ellerman <[email protected]> wrote:
>> On Thu, 2015-10-29 at 22:10 +0300, Denis Kirjanov wrote:
>>
>>> ./drmgr -c cpu -a -r gives the following warning:
>>>
>>> [ 2327.035563] RCU used illegally from offline CPU!
>>> rcu_scheduler_active
>>> = 1, debug_locks = 1
>>> [ 2327.035564] no locks held by swapper/12/0.
>>> [ 2327.035565] stack backtrace:
>>> [ 2327.035567] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G S
>>> 4.3.0-rc3-00060-g353169a #5
>>> [ 2327.035568] Call Trace:
>>> [ 2327.035573] [c0000001d62578e0] [c0000000008977fc]
>>> .dump_stack+0x98/0xd4
>>> (unreliable)
>>> [ 2327.035577] [c0000001d6257960] [c000000000109bd8]
>>> .lockdep_rcu_suspicious+0x108/0x170
>>> [ 2327.035580] [c0000001d62579f0] [c00000000006a1d0]
>>> .__trace_hcall_exit+0x2b0/0x2c0
>>> [ 2327.035584] [c0000001d6257ab0] [c00000000006a2e8]
>>> plpar_hcall_norets_trace+0x70/0x8c
>>> [ 2327.035588] [c0000001d6257b20] [c000000000067a14]
>>> .icp_hv_set_cpu_priority+0x54/0xc0
>>> [ 2327.035592] [c0000001d6257ba0] [c000000000066c5c]
>>> .xics_teardown_cpu+0x5c/0xa0
>>> [ 2327.035595] [c0000001d6257c20] [c0000000000747ac]
>>> .pseries_mach_cpu_die+0x6c/0x320
>>> [ 2327.035598] [c0000001d6257cd0] [c0000000000439cc] .cpu_die+0x3c/0x60
>>> [ 2327.035602] [c0000001d6257d40] [c0000000000183d8]
>>> .arch_cpu_idle_dead+0x28/0x40
>>> [ 2327.035606] [c0000001d6257db0] [c0000000000ff1dc]
>>> .cpu_startup_entry+0x4fc/0x560
>>> [ 2327.035610] [c0000001d6257ed0] [c000000000043728]
>>> .start_secondary+0x328/0x360
>>> [ 2327.035614] [c0000001d6257f90] [c000000000008a6c]
>>> start_secondary_prolog+0x10/0x14
>>> [ 2327.035620] cpu 12 (hwid 12) Ready to die...
>>> [ 2327.144463] cpu 13 (hwid 13) Ready to die...
>>> [ 2327.294180] cpu 14 (hwid 14) Ready to die...
>>> [ 2327.403599] cpu 15 (hwid 15) Ready to die...
>>>
>>> Make the hypervisor tracepoints conditional by introducing
>>> TRACE_EVENT_FN_COND similar to TRACE_EVENT_FN
>>
>> We've fixed other cases like this with RCU_NONIDLE(), but I assume that
>> doesn't work here because we're actually offline?
>
> Yes, in this case we're moving the complete core offline through dlpar...
>

Hi Michael,
Could you pick this patch?

Thanks!

>>
>> cheers
>>
>>
>

2015-11-22 23:36:04

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc: tracing: don't trace hcalls on offline CPUs

On Fri, 2015-11-20 at 15:22 +0300, Denis Kirjanov wrote:
> On 11/3/15, Denis Kirjanov <[email protected]> wrote:
> > On 11/3/15, Michael Ellerman <[email protected]> wrote:
> > > On Thu, 2015-10-29 at 22:10 +0300, Denis Kirjanov wrote:
> > > > ./drmgr -c cpu -a -r gives the following warning:
> > > >
> > > > [ 2327.035563] RCU used illegally from offline CPU!
> > > >
> > > > Make the hypervisor tracepoints conditional by introducing
> > > > TRACE_EVENT_FN_COND similar to TRACE_EVENT_FN
> > >
> > > We've fixed other cases like this with RCU_NONIDLE(), but I assume that
> > > doesn't work here because we're actually offline?
> >
> > Yes, in this case we're moving the complete core offline through dlpar...
> >
>
> Hi Michael,
> Could you pick this patch?

It's mostly a tracing patch, so I'd need an ACK from Steve at least.

It would probably be best if you split it into a "tracing: .. " patch which
adds the new macros and then a powerpc patch which uses them.

cheers

2015-11-23 00:52:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] powerpc: tracing: don't trace hcalls on offline CPUs

On Mon, 23 Nov 2015 10:35:59 +1100
Michael Ellerman <[email protected]> wrote:

> It's mostly a tracing patch, so I'd need an ACK from Steve at least.
>
> It would probably be best if you split it into a "tracing: .. " patch which
> adds the new macros and then a powerpc patch which uses them.

Yes please do that. I would like to throw the tracing part through my
test suite.

-- Steve

2015-11-23 09:31:10

by Denis Kirjanov

[permalink] [raw]
Subject: Re: [PATCH] powerpc: tracing: don't trace hcalls on offline CPUs

On 11/23/15, Michael Ellerman <[email protected]> wrote:
> On Fri, 2015-11-20 at 15:22 +0300, Denis Kirjanov wrote:
>> On 11/3/15, Denis Kirjanov <[email protected]> wrote:
>> > On 11/3/15, Michael Ellerman <[email protected]> wrote:
>> > > On Thu, 2015-10-29 at 22:10 +0300, Denis Kirjanov wrote:
>> > > > ./drmgr -c cpu -a -r gives the following warning:
>> > > >
>> > > > [ 2327.035563] RCU used illegally from offline CPU!
>> > > >
>> > > > Make the hypervisor tracepoints conditional by introducing
>> > > > TRACE_EVENT_FN_COND similar to TRACE_EVENT_FN
>> > >
>> > > We've fixed other cases like this with RCU_NONIDLE(), but I assume
>> > > that
>> > > doesn't work here because we're actually offline?
>> >
>> > Yes, in this case we're moving the complete core offline through
>> > dlpar...
>> >
>>
>> Hi Michael,
>> Could you pick this patch?
>
> It's mostly a tracing patch, so I'd need an ACK from Steve at least.
>
> It would probably be best if you split it into a "tracing: .. " patch which
> adds the new macros and then a powerpc patch which uses them.

Ok, sounds reasonable.

Thanks!
>
> cheers
>
>