Since tracepoints use RCU for protection, they must not be called on
offline cpus. Running with LOCKDEP enabled discovered few scenarios
where trace points were called on offline cpus. These patches fix
this by converting these trace events into conditonal trace events
where condition is cpu_online(smp_processor_id())
Previous posting of the patchset can be found here:
https://lkml.org/lkml/2015/4/28/512
This resend CCs people who added/edited the trace events in question.
Shreyas B. Prabhu (3):
tracing/mm: Don't trace kmem_cache_free on offline cpus
tracing/mm: Don't trace mm_page_free on offline cpus
tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus
include/trace/events/kmem.h | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
--
1.9.3
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_kmem_cache_free can be called on an offline cpu in
this scenario caught by LOCKDEP:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc1+ #9 Not tainted
-------------------------------
include/trace/events/kmem.h:148 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
Call Trace:
[c000001fed2f78f0] [c0000000009dee8c] .dump_stack+0x98/0xd4 (unreliable)
[c000001fed2f7970] [c000000000128d88] .lockdep_rcu_suspicious+0x108/0x170
[c000001fed2f7a00] [c00000000026f924] .kmem_cache_free+0x344/0x4b0
[c000001fed2f7ab0] [c0000000000bd1cc] .__mmdrop+0x4c/0x160
[c000001fed2f7b40] [c0000000001068e0] .idle_task_exit+0xf0/0x100
[c000001fed2f7bc0] [c000000000066948] .pnv_smp_cpu_kill_self+0x58/0x2c0
[c000001fed2f7ca0] [c00000000003ce34] .cpu_die+0x34/0x50
[c000001fed2f7d10] [c0000000000176d0] .arch_cpu_idle_dead+0x20/0x40
[c000001fed2f7d80] [c00000000011f9a8] .cpu_startup_entry+0x708/0x7a0
[c000001fed2f7ec0] [c00000000003cb6c] .start_secondary+0x36c/0x3a0
[c000001fed2f7f90] [c000000000008b6c] start_secondary_prolog+0x10/0x14
Fix this by converting kmem_cache_free trace point into TRACE_EVENT_CONDITION
where condition is cpu_online(smp_processor_id())
Signed-off-by: Shreyas B. Prabhu <[email protected]>
Reported-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Preeti U Murthy <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
---
include/trace/events/kmem.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 81ea598..dd9e612 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -140,11 +140,13 @@ DEFINE_EVENT(kmem_free, kfree,
TP_ARGS(call_site, ptr)
);
-DEFINE_EVENT(kmem_free, kmem_cache_free,
+DEFINE_EVENT_CONDITION(kmem_free, kmem_cache_free,
TP_PROTO(unsigned long call_site, const void *ptr),
- TP_ARGS(call_site, ptr)
+ TP_ARGS(call_site, ptr),
+
+ TP_CONDITION(cpu_online(smp_processor_id()))
);
TRACE_EVENT(mm_page_free,
--
1.9.3
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_mm_page_free can be called on an offline cpu in
this scenario caught by LOCKDEP:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc1+ #9 Not tainted
-------------------------------
include/trace/events/kmem.h:170 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
Call Trace:
[c000001fed2f7790] [c0000000009dee8c] .dump_stack+0x98/0xd4 (unreliable)
[c000001fed2f7810] [c000000000128d88] .lockdep_rcu_suspicious+0x108/0x170
[c000001fed2f78a0] [c000000000203bc4] .free_pages_prepare+0x494/0x680
[c000001fed2f7980] [c000000000207fd0] .free_hot_cold_page+0x50/0x280
[c000001fed2f7a30] [c00000000004d000] .destroy_context+0x90/0xd0
[c000001fed2f7ab0] [c0000000000bd1d8] .__mmdrop+0x58/0x160
[c000001fed2f7b40] [c0000000001068e0] .idle_task_exit+0xf0/0x100
[c000001fed2f7bc0] [c000000000066948] .pnv_smp_cpu_kill_self+0x58/0x2c0
[c000001fed2f7ca0] [c00000000003ce34] .cpu_die+0x34/0x50
[c000001fed2f7d10] [c0000000000176d0] .arch_cpu_idle_dead+0x20/0x40
[c000001fed2f7d80] [c00000000011f9a8] .cpu_startup_entry+0x708/0x7a0
[c000001fed2f7ec0] [c00000000003cb6c] .start_secondary+0x36c/0x3a0
[c000001fed2f7f90] [c000000000008b6c] start_secondary_prolog+0x10/0x14
Fix this by converting mm_page_free trace point into TRACE_EVENT_CONDITION
where condition is cpu_online(smp_processor_id())
Signed-off-by: Shreyas B. Prabhu <[email protected]>
Reviewed-by: Preeti U Murthy <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
---
include/trace/events/kmem.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index dd9e612..4abda92 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -149,12 +149,14 @@ DEFINE_EVENT_CONDITION(kmem_free, kmem_cache_free,
TP_CONDITION(cpu_online(smp_processor_id()))
);
-TRACE_EVENT(mm_page_free,
+TRACE_EVENT_CONDITION(mm_page_free,
TP_PROTO(struct page *page, unsigned int order),
TP_ARGS(page, order),
+ TP_CONDITION(cpu_online(smp_processor_id())),
+
TP_STRUCT__entry(
__field( unsigned long, pfn )
__field( unsigned int, order )
--
1.9.3
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_mm_page_pcpu_drain can be called on an offline cpu
in this scenario caught by LOCKDEP:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc1+ #9 Not tainted
-------------------------------
include/trace/events/kmem.h:265 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by swapper/5/0:
#0: (&(&zone->lock)->rlock){..-...}, at: [<c0000000002073b0>] .free_pcppages_bulk+0x70/0x920
stack backtrace:
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.1.0-rc1+ #9
Call Trace:
[c000001fed2e7720] [c0000000009dee8c] .dump_stack+0x98/0xd4 (unreliable)
[c000001fed2e77a0] [c000000000128d88] .lockdep_rcu_suspicious+0x108/0x170
[c000001fed2e7830] [c00000000020794c] .free_pcppages_bulk+0x60c/0x920
[c000001fed2e7980] [c000000000208188] .free_hot_cold_page+0x208/0x280
[c000001fed2e7a30] [c00000000004d000] .destroy_context+0x90/0xd0
[c000001fed2e7ab0] [c0000000000bd1d8] .__mmdrop+0x58/0x160
[c000001fed2e7b40] [c0000000001068e0] .idle_task_exit+0xf0/0x100
[c000001fed2e7bc0] [c000000000066948] .pnv_smp_cpu_kill_self+0x58/0x2c0
[c000001fed2e7ca0] [c00000000003ce34] .cpu_die+0x34/0x50
[c000001fed2e7d10] [c0000000000176d0] .arch_cpu_idle_dead+0x20/0x40
[c000001fed2e7d80] [c00000000011f9a8] .cpu_startup_entry+0x708/0x7a0
[c000001fed2e7ec0] [c00000000003cb6c] .start_secondary+0x36c/0x3a0
[c000001fed2e7f90] [c000000000008b6c] start_secondary_prolog+0x10/0x14
Fix this by converting mm_page_pcpu_drain trace point into TRACE_EVENT_CONDITION
where condition is cpu_online(smp_processor_id())
Signed-off-by: Shreyas B. Prabhu <[email protected]>
Reviewed-by: Preeti U Murthy <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
---
include/trace/events/kmem.h | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 4abda92..6cd975f 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -257,12 +257,26 @@ DEFINE_EVENT(mm_page, mm_page_alloc_zone_locked,
TP_ARGS(page, order, migratetype)
);
-DEFINE_EVENT_PRINT(mm_page, mm_page_pcpu_drain,
+TRACE_EVENT_CONDITION(mm_page_pcpu_drain,
TP_PROTO(struct page *page, unsigned int order, int migratetype),
TP_ARGS(page, order, migratetype),
+ TP_CONDITION(cpu_online(smp_processor_id())),
+
+ TP_STRUCT__entry(
+ __field( unsigned long, pfn )
+ __field( unsigned int, order )
+ __field( int, migratetype )
+ ),
+
+ TP_fast_assign(
+ __entry->pfn = page ? page_to_pfn(page) : -1UL;
+ __entry->order = order;
+ __entry->migratetype = migratetype;
+ ),
+
TP_printk("page=%p pfn=%lu order=%d migratetype=%d",
pfn_to_page(__entry->pfn), __entry->pfn,
__entry->order, __entry->migratetype)
--
1.9.3
Hi Steven,
On closer look, there is no particular maintainer who picks changes to
this file. Can you please pick these up?
On Friday 08 May 2015 10:50 PM, Shreyas B. Prabhu wrote:
> Since tracepoints use RCU for protection, they must not be called on
> offline cpus. Running with LOCKDEP enabled discovered few scenarios
> where trace points were called on offline cpus. These patches fix
> this by converting these trace events into conditonal trace events
> where condition is cpu_online(smp_processor_id())
>
> Previous posting of the patchset can be found here:
> https://lkml.org/lkml/2015/4/28/512
>
> This resend CCs people who added/edited the trace events in question.
>
> Shreyas B. Prabhu (3):
> tracing/mm: Don't trace kmem_cache_free on offline cpus
> tracing/mm: Don't trace mm_page_free on offline cpus
> tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus
>
> include/trace/events/kmem.h | 26 ++++++++++++++++++++++----
> 1 file changed, 22 insertions(+), 4 deletions(-)
>
On Tue, 12 May 2015 12:59:26 +0530
Shreyas B Prabhu <[email protected]> wrote:
> Hi Steven,
> On closer look, there is no particular maintainer who picks changes to
> this file. Can you please pick these up?
Perhaps Andrew Morton?
No problem, I can take these too.
-- Steve
>
> On Friday 08 May 2015 10:50 PM, Shreyas B. Prabhu wrote:
> > Since tracepoints use RCU for protection, they must not be called on
> > offline cpus. Running with LOCKDEP enabled discovered few scenarios
> > where trace points were called on offline cpus. These patches fix
> > this by converting these trace events into conditonal trace events
> > where condition is cpu_online(smp_processor_id())
> >
> > Previous posting of the patchset can be found here:
> > https://lkml.org/lkml/2015/4/28/512
> >
> > This resend CCs people who added/edited the trace events in question.
> >
> > Shreyas B. Prabhu (3):
> > tracing/mm: Don't trace kmem_cache_free on offline cpus
> > tracing/mm: Don't trace mm_page_free on offline cpus
> > tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus
> >
> > include/trace/events/kmem.h | 26 ++++++++++++++++++++++----
> > 1 file changed, 22 insertions(+), 4 deletions(-)
> >
On Tue, 12 May 2015 16:03:51 -0400 Steven Rostedt <[email protected]> wrote:
> On Tue, 12 May 2015 12:59:26 +0530
> Shreyas B Prabhu <[email protected]> wrote:
>
> > Hi Steven,
> > On closer look, there is no particular maintainer who picks changes to
> > this file. Can you please pick these up?
>
> Perhaps Andrew Morton?
>
> No problem, I can take these too.
>
I grabbed them, thanks.
> + TP_CONDITION(cpu_online(smp_processor_id())),
Are we sure these can't generate check_preemption_disabled() warnings?
Is there some reason why all these calls always occur with preemption
disabled?
On Tue, 12 May 2015 13:36:01 -0700
Andrew Morton <[email protected]> wrote:
> On Tue, 12 May 2015 16:03:51 -0400 Steven Rostedt <[email protected]> wrote:
>
> > On Tue, 12 May 2015 12:59:26 +0530
> > Shreyas B Prabhu <[email protected]> wrote:
> >
> > > Hi Steven,
> > > On closer look, there is no particular maintainer who picks changes to
> > > this file. Can you please pick these up?
> >
> > Perhaps Andrew Morton?
> >
> > No problem, I can take these too.
> >
>
> I grabbed them, thanks.
>
> > + TP_CONDITION(cpu_online(smp_processor_id())),
>
> Are we sure these can't generate check_preemption_disabled() warnings?
> Is there some reason why all these calls always occur with preemption
> disabled?
Good catch. I don't think the code does.
Now, I'm not sure if we should just add a raw_smp_processor_id(). The
idea is just to make sure that the CPU we are running on is online,
because it is possible to call theses trace points when the CPU is
going offline. If that happens, then there's a race with RCU.
Since a task can not be migrated to an offline CPU, we don't need to
worry about the cpu_online(smp_processor_id()) returning a false
positive. A false negative would just skip a tracepoint, but I'm not
sure that is possible either.
In any case, comments should also be added to why the condition is
there.
-- Steve
On Wednesday 13 May 2015 02:24 AM, Steven Rostedt wrote:
> On Tue, 12 May 2015 13:36:01 -0700
> Andrew Morton <[email protected]> wrote:
>
>> On Tue, 12 May 2015 16:03:51 -0400 Steven Rostedt <[email protected]> wrote:
>>
>>> On Tue, 12 May 2015 12:59:26 +0530
>>> Shreyas B Prabhu <[email protected]> wrote:
>>>
>>>> Hi Steven,
>>>> On closer look, there is no particular maintainer who picks changes to
>>>> this file. Can you please pick these up?
>>>
>>> Perhaps Andrew Morton?
>>>
>>> No problem, I can take these too.
>>>
>>
>> I grabbed them, thanks.
>>
Thanks Andrew.
>>> + TP_CONDITION(cpu_online(smp_processor_id())),
>>
>> Are we sure these can't generate check_preemption_disabled() warnings?
>> Is there some reason why all these calls always occur with preemption
>> disabled?
>
> Good catch. I don't think the code does.
>
> Now, I'm not sure if we should just add a raw_smp_processor_id(). The
> idea is just to make sure that the CPU we are running on is online,
> because it is possible to call theses trace points when the CPU is
> going offline. If that happens, then there's a race with RCU.
>
> Since a task can not be migrated to an offline CPU, we don't need to
> worry about the cpu_online(smp_processor_id()) returning a false
> positive. A false negative would just skip a tracepoint, but I'm not
> sure that is possible either.
>
> In any case, comments should also be added to why the condition is
> there.
>
I'll send a patch adding the comments.
Thanks,
Shreyas
On Wed, 13 May 2015 13:17:06 +0530 Shreyas B Prabhu <[email protected]> wrote:
> >>> + TP_CONDITION(cpu_online(smp_processor_id())),
> >>
> >> Are we sure these can't generate check_preemption_disabled() warnings?
> >> Is there some reason why all these calls always occur with preemption
> >> disabled?
> >
> > Good catch. I don't think the code does.
> >
> > Now, I'm not sure if we should just add a raw_smp_processor_id(). The
> > idea is just to make sure that the CPU we are running on is online,
> > because it is possible to call theses trace points when the CPU is
> > going offline. If that happens, then there's a race with RCU.
> >
> > Since a task can not be migrated to an offline CPU, we don't need to
> > worry about the cpu_online(smp_processor_id()) returning a false
> > positive. A false negative would just skip a tracepoint, but I'm not
> > sure that is possible either.
> >
> > In any case, comments should also be added to why the condition is
> > there.
> >
> I'll send a patch adding the comments.
Please also confirm that these tracepoints cannot be called
from preemptible contexts. Or switch to raw_smp_processor_id().
On Thu, 14 May 2015 13:08:23 -0700
Andrew Morton <[email protected]> wrote:
>
> Please also confirm that these tracepoints cannot be called
> from preemptible contexts. Or switch to raw_smp_processor_id().
I checked. They can be called from preemptible context, and suggested
that the next patch uses raw_smp_processor_id().
-- Steve