2013-05-30 19:59:51

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

[ Peter and Frederic, can you give me ACKs on this? Thanks ]

Dave Jones hit the following bug report:

===============================
[ INFO: suspicious RCU usage. ]
3.10.0-rc2+ #1 Not tainted
-------------------------------
include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
other info that might help us debug this:
RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
2 locks held by cc1/63645:
#0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0

CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
Call Trace:
[<ffffffff816ae383>] dump_stack+0x19/0x1b
[<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
[<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
[<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
[<ffffffff8108dffc>] update_curr+0xec/0x240
[<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
[<ffffffff816b3a71>] __schedule+0x161/0x9b0
[<ffffffff816b4721>] preempt_schedule+0x51/0x80
[<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
[<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
[<ffffffff816be280>] ftrace_call+0x5/0x2f
[<ffffffff816b681d>] ? retint_careful+0xb/0x2e
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
------------[ cut here ]------------

What happened was that the function tracer traced the schedule_user() code
that tells RCU that the system is coming back from userspace, and to
add the CPU back to the RCU monitoring.

Because the function tracer does a preempt_disable/enable_notrace() calls
the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
then preempt_schedule() is called. But this is called before the user_exit()
function can inform the kernel that the CPU is no longer in user mode and
needs to be accounted for by RCU.

The fix is to create a new preempt_schedule_context() that checks if
the kernel is still in user mode and if so to switch it to kernel mode
before calling schedule. It also switches back to user mode coming back
from schedule in need be.

The only user of this currently is the preempt_enable_notrace(), which is
only used by the tracing subsystem.

Link: http://lkml.kernel.org/r/[email protected]

Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/preempt.h | 18 +++++++++++++++++-
kernel/context_tracking.c | 40 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 1 deletions(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 87a03c7..f5d4723 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -33,9 +33,25 @@ do { \
preempt_schedule(); \
} while (0)

+#ifdef CONFIG_CONTEXT_TRACKING
+
+void preempt_schedule_context(void);
+
+#define preempt_check_resched_context() \
+do { \
+ if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+ preempt_schedule_context(); \
+} while (0)
+#else
+
+#define preempt_check_resched_context() preempt_check_resched()
+
+#endif /* CONFIG_CONTEXT_TRACKING */
+
#else /* !CONFIG_PREEMPT */

#define preempt_check_resched() do { } while (0)
+#define preempt_check_resched_context() do { } while (0)

#endif /* CONFIG_PREEMPT */

@@ -88,7 +104,7 @@ do { \
do { \
preempt_enable_no_resched_notrace(); \
barrier(); \
- preempt_check_resched(); \
+ preempt_check_resched_context(); \
} while (0)

#else /* !CONFIG_PREEMPT_COUNT */
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 65349f0..ac3a312 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -71,6 +71,46 @@ void user_enter(void)
local_irq_restore(flags);
}

+/**
+ * preempt_schedule_context - preempt_schedule called by tracing
+ *
+ * The tracing infrastructure uses preempt_enable_notrace to prevent
+ * recursion and tracing preempt enabling caused by the tracing
+ * infrastructure itself. But as tracing can happen in areas coming
+ * from userspace or just about to enter userspace, a preempt enable
+ * can occur before user_exit() is called. This will cause the scheduler
+ * to be called when the system is still in usermode.
+ *
+ * To prevent this, the preempt_enable_notrace will use this function
+ * instead of preempt_schedule() to exit user context if needed before
+ * calling the scheduler.
+ */
+void __sched notrace preempt_schedule_context(void)
+{
+ struct thread_info *ti = current_thread_info();
+ enum ctx_state prev_ctx;
+
+ if (likely(ti->preempt_count || irqs_disabled()))
+ return;
+
+ /*
+ * Need to disable preemption in case user_exit() is traced
+ * and the tracer calls preempt_enable_notrace() causing
+ * an infinite recursion.
+ */
+ preempt_disable_notrace();
+ prev_ctx = this_cpu_read(context_tracking.state);
+ user_exit();
+ preempt_enable_no_resched_notrace();
+
+ preempt_schedule();
+
+ preempt_disable_notrace();
+ if (prev_ctx == IN_USER)
+ user_enter();
+ preempt_enable_notrace();
+}
+EXPORT_SYMBOL(preempt_schedule_context);

/**
* user_exit - Inform the context tracking that the CPU is
--
1.7.3.4



2013-05-31 13:43:32

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Thu, May 30, 2013 at 03:59:41PM -0400, Steven Rostedt wrote:
> [ Peter and Frederic, can you give me ACKs on this? Thanks ]
>
> Dave Jones hit the following bug report:
>
> ===============================
> [ INFO: suspicious RCU usage. ]
> 3.10.0-rc2+ #1 Not tainted
> -------------------------------
> include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
> other info that might help us debug this:
> RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
> RCU used illegally from extended quiescent state!
> 2 locks held by cc1/63645:
> #0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
> #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
>
> CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
> Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
> 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
> ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
> 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
> Call Trace:
> [<ffffffff816ae383>] dump_stack+0x19/0x1b
> [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
> [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
> [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
> [<ffffffff8108dffc>] update_curr+0xec/0x240
> [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
> [<ffffffff816b3a71>] __schedule+0x161/0x9b0
> [<ffffffff816b4721>] preempt_schedule+0x51/0x80
> [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
> [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
> [<ffffffff816be280>] ftrace_call+0x5/0x2f
> [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
> [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> ------------[ cut here ]------------
>
> What happened was that the function tracer traced the schedule_user() code
> that tells RCU that the system is coming back from userspace, and to
> add the CPU back to the RCU monitoring.
>
> Because the function tracer does a preempt_disable/enable_notrace() calls
> the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
> then preempt_schedule() is called. But this is called before the user_exit()
> function can inform the kernel that the CPU is no longer in user mode and
> needs to be accounted for by RCU.
>
> The fix is to create a new preempt_schedule_context() that checks if
> the kernel is still in user mode and if so to switch it to kernel mode
> before calling schedule. It also switches back to user mode coming back
> from schedule in need be.
>
> The only user of this currently is the preempt_enable_notrace(), which is
> only used by the tracing subsystem.
>
> Link: http://lkml.kernel.org/r/[email protected]
>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> include/linux/preempt.h | 18 +++++++++++++++++-
> kernel/context_tracking.c | 40 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 57 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index 87a03c7..f5d4723 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -33,9 +33,25 @@ do { \
> preempt_schedule(); \
> } while (0)
>
> +#ifdef CONFIG_CONTEXT_TRACKING
> +
> +void preempt_schedule_context(void);
> +
> +#define preempt_check_resched_context() \
> +do { \
> + if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
> + preempt_schedule_context(); \
> +} while (0)
> +#else
> +
> +#define preempt_check_resched_context() preempt_check_resched()
> +
> +#endif /* CONFIG_CONTEXT_TRACKING */
> +
> #else /* !CONFIG_PREEMPT */
>
> #define preempt_check_resched() do { } while (0)
> +#define preempt_check_resched_context() do { } while (0)
>
> #endif /* CONFIG_PREEMPT */
>
> @@ -88,7 +104,7 @@ do { \
> do { \
> preempt_enable_no_resched_notrace(); \
> barrier(); \
> - preempt_check_resched(); \
> + preempt_check_resched_context(); \
> } while (0)
>
> #else /* !CONFIG_PREEMPT_COUNT */
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index 65349f0..ac3a312 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -71,6 +71,46 @@ void user_enter(void)
> local_irq_restore(flags);
> }
>
> +/**
> + * preempt_schedule_context - preempt_schedule called by tracing
> + *
> + * The tracing infrastructure uses preempt_enable_notrace to prevent
> + * recursion and tracing preempt enabling caused by the tracing
> + * infrastructure itself. But as tracing can happen in areas coming
> + * from userspace or just about to enter userspace, a preempt enable
> + * can occur before user_exit() is called. This will cause the scheduler
> + * to be called when the system is still in usermode.
> + *
> + * To prevent this, the preempt_enable_notrace will use this function
> + * instead of preempt_schedule() to exit user context if needed before
> + * calling the scheduler.
> + */
> +void __sched notrace preempt_schedule_context(void)
> +{
> + struct thread_info *ti = current_thread_info();
> + enum ctx_state prev_ctx;
> +
> + if (likely(ti->preempt_count || irqs_disabled()))
> + return;
> +
> + /*
> + * Need to disable preemption in case user_exit() is traced
> + * and the tracer calls preempt_enable_notrace() causing
> + * an infinite recursion.
> + */
> + preempt_disable_notrace();
> + prev_ctx = this_cpu_read(context_tracking.state);
> + user_exit();

You can reuse exception_enter()

> + preempt_enable_no_resched_notrace();
> +
> + preempt_schedule();
> +
> + preempt_disable_notrace();
> + if (prev_ctx == IN_USER)
> + user_enter();

And then exception_exit() here.

I guess this replaces your fix with schedule_preempt_user(). I liked
it because it seems that:

if (need_resched()) {
user_exit();
local_irq_enable();
schedule();
local_irq_enable();
user_enter();
}

is a common pattern of arch user resume preemption that we can consolidate.

But your new patch probably makes it more widely safe for the function tracer
for any function that can be called and traced in IN_USER mode. Not only user preemption.
Think about do_notify_resume() for example if it is called after syscall_trace_leave().

Independantly, schedule_preempt_user() is still interesting for consolidation.

Thanks.


> + preempt_enable_notrace();
> +}
> +EXPORT_SYMBOL(preempt_schedule_context);
>
> /**
> * user_exit - Inform the context tracking that the CPU is
> --
> 1.7.3.4
>
>
>

2013-05-31 15:18:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Fri, 2013-05-31 at 15:43 +0200, Frederic Weisbecker wrote:

> > +void __sched notrace preempt_schedule_context(void)
> > +{
> > + struct thread_info *ti = current_thread_info();
> > + enum ctx_state prev_ctx;
> > +
> > + if (likely(ti->preempt_count || irqs_disabled()))
> > + return;
> > +
> > + /*
> > + * Need to disable preemption in case user_exit() is traced
> > + * and the tracer calls preempt_enable_notrace() causing
> > + * an infinite recursion.
> > + */
> > + preempt_disable_notrace();
> > + prev_ctx = this_cpu_read(context_tracking.state);
> > + user_exit();
>
> You can reuse exception_enter()

I originally did use that, but then noticed that everything else in
context_tracking.c used context_tracking.state directly. I have no
problems doing it this way again.

>
> > + preempt_enable_no_resched_notrace();
> > +
> > + preempt_schedule();
> > +
> > + preempt_disable_notrace();
> > + if (prev_ctx == IN_USER)
> > + user_enter();
>
> And then exception_exit() here.
>
> I guess this replaces your fix with schedule_preempt_user(). I liked
> it because it seems that:
>
> if (need_resched()) {
> user_exit();
> local_irq_enable();
> schedule();
> local_irq_enable();
> user_enter();
> }
>
> is a common pattern of arch user resume preemption that we can consolidate.
>
> But your new patch probably makes it more widely safe for the function tracer
> for any function that can be called and traced in IN_USER mode. Not only user preemption.
> Think about do_notify_resume() for example if it is called after syscall_trace_leave().
>
> Independantly, schedule_preempt_user() is still interesting for consolidation.

And I think that patch is still valid from just a clean up point of
view. It just didn't cover all the cases needed for tracing.

I'll rewrite the patch and send it out for another review.

Thanks!

-- Steve

2013-05-31 15:56:55

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Fri, May 31, 2013 at 11:18:48AM -0400, Steven Rostedt wrote:
> On Fri, 2013-05-31 at 15:43 +0200, Frederic Weisbecker wrote:
>
> > > +void __sched notrace preempt_schedule_context(void)
> > > +{
> > > + struct thread_info *ti = current_thread_info();
> > > + enum ctx_state prev_ctx;
> > > +
> > > + if (likely(ti->preempt_count || irqs_disabled()))
> > > + return;
> > > +
> > > + /*
> > > + * Need to disable preemption in case user_exit() is traced
> > > + * and the tracer calls preempt_enable_notrace() causing
> > > + * an infinite recursion.
> > > + */
> > > + preempt_disable_notrace();
> > > + prev_ctx = this_cpu_read(context_tracking.state);
> > > + user_exit();
> >
> > You can reuse exception_enter()
>
> I originally did use that, but then noticed that everything else in
> context_tracking.c used context_tracking.state directly. I have no
> problems doing it this way again.

It's more about the fact that exception_*() APIs already implement
part of what you're doing. And yeah as a bonus it's also better to keep
context_tracking internals in context_tracking.c

>
> >
> > > + preempt_enable_no_resched_notrace();
> > > +
> > > + preempt_schedule();
> > > +
> > > + preempt_disable_notrace();
> > > + if (prev_ctx == IN_USER)
> > > + user_enter();
> >
> > And then exception_exit() here.
> >
> > I guess this replaces your fix with schedule_preempt_user(). I liked
> > it because it seems that:
> >
> > if (need_resched()) {
> > user_exit();
> > local_irq_enable();
> > schedule();
> > local_irq_enable();
> > user_enter();
> > }
> >
> > is a common pattern of arch user resume preemption that we can consolidate.
> >
> > But your new patch probably makes it more widely safe for the function tracer
> > for any function that can be called and traced in IN_USER mode. Not only user preemption.
> > Think about do_notify_resume() for example if it is called after syscall_trace_leave().
> >
> > Independantly, schedule_preempt_user() is still interesting for consolidation.
>
> And I think that patch is still valid from just a clean up point of
> view. It just didn't cover all the cases needed for tracing.

Right.

Thanks.

2013-05-31 16:01:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Fri, May 31, 2013 at 11:18:48AM -0400, Steven Rostedt wrote:
> I'll rewrite the patch and send it out for another review.

This is the only part I read in detail and marked the thread read!

Awaiting v2 :-)

2013-05-31 16:11:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Fri, 2013-05-31 at 18:01 +0200, Peter Zijlstra wrote:
> On Fri, May 31, 2013 at 11:18:48AM -0400, Steven Rostedt wrote:
> > I'll rewrite the patch and send it out for another review.
>
> This is the only part I read in detail and marked the thread read!
>
> Awaiting v2 :-)

Yeah, it would be a quick send, but I'm running a test that seems to be
hitting every bug currently in the kernel :-p

-- Steve

2013-06-01 01:30:29

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

Dave Jones hit the following bug report:

===============================
[ INFO: suspicious RCU usage. ]
3.10.0-rc2+ #1 Not tainted
-------------------------------
include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
other info that might help us debug this:
RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
2 locks held by cc1/63645:
#0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0

CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
Call Trace:
[<ffffffff816ae383>] dump_stack+0x19/0x1b
[<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
[<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
[<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
[<ffffffff8108dffc>] update_curr+0xec/0x240
[<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
[<ffffffff816b3a71>] __schedule+0x161/0x9b0
[<ffffffff816b4721>] preempt_schedule+0x51/0x80
[<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
[<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
[<ffffffff816be280>] ftrace_call+0x5/0x2f
[<ffffffff816b681d>] ? retint_careful+0xb/0x2e
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
------------[ cut here ]------------

What happened was that the function tracer traced the schedule_user() code
that tells RCU that the system is coming back from userspace, and to
add the CPU back to the RCU monitoring.

Because the function tracer does a preempt_disable/enable_notrace() calls
the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
then preempt_schedule() is called. But this is called before the user_exit()
function can inform the kernel that the CPU is no longer in user mode and
needs to be accounted for by RCU.

The fix is to create a new preempt_schedule_context() that checks if
the kernel is still in user mode and if so to switch it to kernel mode
before calling schedule. It also switches back to user mode coming back
from schedule in need be.

The only user of this currently is the preempt_enable_notrace(), which is
only used by the tracing subsystem.

Link: http://lkml.kernel.org/r/[email protected]

Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/preempt.h | 18 +++++++++++++++++-
kernel/context_tracking.c | 38 ++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 1 deletions(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 87a03c7..f5d4723 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -33,9 +33,25 @@ do { \
preempt_schedule(); \
} while (0)

+#ifdef CONFIG_CONTEXT_TRACKING
+
+void preempt_schedule_context(void);
+
+#define preempt_check_resched_context() \
+do { \
+ if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+ preempt_schedule_context(); \
+} while (0)
+#else
+
+#define preempt_check_resched_context() preempt_check_resched()
+
+#endif /* CONFIG_CONTEXT_TRACKING */
+
#else /* !CONFIG_PREEMPT */

#define preempt_check_resched() do { } while (0)
+#define preempt_check_resched_context() do { } while (0)

#endif /* CONFIG_PREEMPT */

@@ -88,7 +104,7 @@ do { \
do { \
preempt_enable_no_resched_notrace(); \
barrier(); \
- preempt_check_resched(); \
+ preempt_check_resched_context(); \
} while (0)

#else /* !CONFIG_PREEMPT_COUNT */
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 65349f0..15c9f2e 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -71,6 +71,44 @@ void user_enter(void)
local_irq_restore(flags);
}

+/**
+ * preempt_schedule_context - preempt_schedule called by tracing
+ *
+ * The tracing infrastructure uses preempt_enable_notrace to prevent
+ * recursion and tracing preempt enabling caused by the tracing
+ * infrastructure itself. But as tracing can happen in areas coming
+ * from userspace or just about to enter userspace, a preempt enable
+ * can occur before user_exit() is called. This will cause the scheduler
+ * to be called when the system is still in usermode.
+ *
+ * To prevent this, the preempt_enable_notrace will use this function
+ * instead of preempt_schedule() to exit user context if needed before
+ * calling the scheduler.
+ */
+void __sched notrace preempt_schedule_context(void)
+{
+ struct thread_info *ti = current_thread_info();
+ enum ctx_state prev_ctx;
+
+ if (likely(ti->preempt_count || irqs_disabled()))
+ return;
+
+ /*
+ * Need to disable preemption in case user_exit() is traced
+ * and the tracer calls preempt_enable_notrace() causing
+ * an infinite recursion.
+ */
+ preempt_disable_notrace();
+ prev_ctx = exception_enter();
+ preempt_enable_no_resched_notrace();
+
+ preempt_schedule();
+
+ preempt_disable_notrace();
+ exception_exit(prev_ctx);
+ preempt_enable_notrace();
+}
+EXPORT_SYMBOL(preempt_schedule_context);

/**
* user_exit - Inform the context tracking that the CPU is
--
1.7.3.4


2013-06-04 12:09:58

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Fri, May 31, 2013 at 09:30:18PM -0400, Steven Rostedt wrote:
> Dave Jones hit the following bug report:
>
> ===============================
> [ INFO: suspicious RCU usage. ]
> 3.10.0-rc2+ #1 Not tainted
> -------------------------------
> include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
> other info that might help us debug this:
> RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
> RCU used illegally from extended quiescent state!
> 2 locks held by cc1/63645:
> #0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
> #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
>
> CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
> Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
> 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
> ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
> 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
> Call Trace:
> [<ffffffff816ae383>] dump_stack+0x19/0x1b
> [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
> [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
> [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
> [<ffffffff8108dffc>] update_curr+0xec/0x240
> [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
> [<ffffffff816b3a71>] __schedule+0x161/0x9b0
> [<ffffffff816b4721>] preempt_schedule+0x51/0x80
> [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
> [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
> [<ffffffff816be280>] ftrace_call+0x5/0x2f
> [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
> [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> ------------[ cut here ]------------
>
> What happened was that the function tracer traced the schedule_user() code
> that tells RCU that the system is coming back from userspace, and to
> add the CPU back to the RCU monitoring.
>
> Because the function tracer does a preempt_disable/enable_notrace() calls
> the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
> then preempt_schedule() is called. But this is called before the user_exit()
> function can inform the kernel that the CPU is no longer in user mode and
> needs to be accounted for by RCU.
>
> The fix is to create a new preempt_schedule_context() that checks if
> the kernel is still in user mode and if so to switch it to kernel mode
> before calling schedule. It also switches back to user mode coming back
> from schedule in need be.
>
> The only user of this currently is the preempt_enable_notrace(), which is
> only used by the tracing subsystem.
>
> Link: http://lkml.kernel.org/r/[email protected]
>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> include/linux/preempt.h | 18 +++++++++++++++++-
> kernel/context_tracking.c | 38 ++++++++++++++++++++++++++++++++++++++
> 2 files changed, 55 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index 87a03c7..f5d4723 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -33,9 +33,25 @@ do { \
> preempt_schedule(); \
> } while (0)
>
> +#ifdef CONFIG_CONTEXT_TRACKING
> +
> +void preempt_schedule_context(void);
> +
> +#define preempt_check_resched_context() \
> +do { \
> + if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
> + preempt_schedule_context(); \
> +} while (0)
> +#else
> +
> +#define preempt_check_resched_context() preempt_check_resched()
> +
> +#endif /* CONFIG_CONTEXT_TRACKING */
> +
> #else /* !CONFIG_PREEMPT */
>
> #define preempt_check_resched() do { } while (0)
> +#define preempt_check_resched_context() do { } while (0)
>
> #endif /* CONFIG_PREEMPT */
>
> @@ -88,7 +104,7 @@ do { \
> do { \
> preempt_enable_no_resched_notrace(); \
> barrier(); \
> - preempt_check_resched(); \
> + preempt_check_resched_context(); \
> } while (0)
>
> #else /* !CONFIG_PREEMPT_COUNT */
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index 65349f0..15c9f2e 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -71,6 +71,44 @@ void user_enter(void)
> local_irq_restore(flags);
> }
>
> +/**
> + * preempt_schedule_context - preempt_schedule called by tracing
> + *
> + * The tracing infrastructure uses preempt_enable_notrace to prevent
> + * recursion and tracing preempt enabling caused by the tracing
> + * infrastructure itself. But as tracing can happen in areas coming
> + * from userspace or just about to enter userspace, a preempt enable
> + * can occur before user_exit() is called. This will cause the scheduler
> + * to be called when the system is still in usermode.
> + *
> + * To prevent this, the preempt_enable_notrace will use this function
> + * instead of preempt_schedule() to exit user context if needed before
> + * calling the scheduler.
> + */
> +void __sched notrace preempt_schedule_context(void)
> +{
> + struct thread_info *ti = current_thread_info();
> + enum ctx_state prev_ctx;
> +
> + if (likely(ti->preempt_count || irqs_disabled()))
> + return;

or:
if (!preemptible())
return;

> +
> + /*
> + * Need to disable preemption in case user_exit() is traced
> + * and the tracer calls preempt_enable_notrace() causing
> + * an infinite recursion.
> + */
> + preempt_disable_notrace();
> + prev_ctx = exception_enter();
> + preempt_enable_no_resched_notrace();
> +
> + preempt_schedule();
> +
> + preempt_disable_notrace();
> + exception_exit(prev_ctx);
> + preempt_enable_notrace();
> +}
> +EXPORT_SYMBOL(preempt_schedule_context);

That's quite a tricky change but I can't think of anything better.

Acked-by: Frederic Weisbecker <[email protected]>


> /**
> * user_exit - Inform the context tracking that the CPU is
> --
> 1.7.3.4
>
>
>

2013-06-04 12:16:33

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Tue, 2013-06-04 at 14:09 +0200, Frederic Weisbecker wrote:
>
> > +/**
> > + * preempt_schedule_context - preempt_schedule called by tracing
> > + *
> > + * The tracing infrastructure uses preempt_enable_notrace to prevent
> > + * recursion and tracing preempt enabling caused by the tracing
> > + * infrastructure itself. But as tracing can happen in areas coming
> > + * from userspace or just about to enter userspace, a preempt enable
> > + * can occur before user_exit() is called. This will cause the scheduler
> > + * to be called when the system is still in usermode.
> > + *
> > + * To prevent this, the preempt_enable_notrace will use this function
> > + * instead of preempt_schedule() to exit user context if needed before
> > + * calling the scheduler.
> > + */
> > +void __sched notrace preempt_schedule_context(void)
> > +{
> > + struct thread_info *ti = current_thread_info();
> > + enum ctx_state prev_ctx;
> > +
> > + if (likely(ti->preempt_count || irqs_disabled()))
> > + return;
>
> or:
> if (!preemptible())
> return;

Sure, but this is a cut and paste from preempt_schedule(). If I make
that change here, I would want to make that there too.

Peter, should I use !preemptible() or keep it like preempt_schedule()
and make a clean up change for 3.11?

>
> > +
> > + /*
> > + * Need to disable preemption in case user_exit() is traced
> > + * and the tracer calls preempt_enable_notrace() causing
> > + * an infinite recursion.
> > + */
> > + preempt_disable_notrace();
> > + prev_ctx = exception_enter();
> > + preempt_enable_no_resched_notrace();
> > +
> > + preempt_schedule();
> > +
> > + preempt_disable_notrace();
> > + exception_exit(prev_ctx);
> > + preempt_enable_notrace();
> > +}
> > +EXPORT_SYMBOL(preempt_schedule_context);
>
> That's quite a tricky change but I can't think of anything better.

Yeah, I had a few other solutions that were even worse than this one.
Then I finally decided this was the best of the worse.

>
> Acked-by: Frederic Weisbecker <[email protected]>

Thanks,

-- Steve

>
>
> > /**
> > * user_exit - Inform the context tracking that the CPU is
> > --
> > 1.7.3.4
> >
> >
> >

2013-06-04 12:27:19

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Fri, May 31, 2013 at 09:30:18PM -0400, Steven Rostedt wrote:
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index 65349f0..15c9f2e 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -71,6 +71,44 @@ void user_enter(void)
> local_irq_restore(flags);
> }
>
> +/**
> + * preempt_schedule_context - preempt_schedule called by tracing
> + *
> + * The tracing infrastructure uses preempt_enable_notrace to prevent
> + * recursion and tracing preempt enabling caused by the tracing
> + * infrastructure itself. But as tracing can happen in areas coming
> + * from userspace or just about to enter userspace, a preempt enable
> + * can occur before user_exit() is called. This will cause the scheduler
> + * to be called when the system is still in usermode.
> + *
> + * To prevent this, the preempt_enable_notrace will use this function
> + * instead of preempt_schedule() to exit user context if needed before
> + * calling the scheduler.
> + */
> +void __sched notrace preempt_schedule_context(void)

Should it be under CONFIG_PREEMPT?

> +{
> + struct thread_info *ti = current_thread_info();
> + enum ctx_state prev_ctx;
> +
> + if (likely(ti->preempt_count || irqs_disabled()))
> + return;
> +
> + /*
> + * Need to disable preemption in case user_exit() is traced
> + * and the tracer calls preempt_enable_notrace() causing
> + * an infinite recursion.
> + */
> + preempt_disable_notrace();
> + prev_ctx = exception_enter();
> + preempt_enable_no_resched_notrace();
> +
> + preempt_schedule();
> +
> + preempt_disable_notrace();
> + exception_exit(prev_ctx);
> + preempt_enable_notrace();
> +}
> +EXPORT_SYMBOL(preempt_schedule_context);
>
> /**
> * user_exit - Inform the context tracking that the CPU is
> --
> 1.7.3.4
>
>
>

2013-06-04 12:29:26

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Tue, Jun 04, 2013 at 08:16:29AM -0400, Steven Rostedt wrote:
> On Tue, 2013-06-04 at 14:09 +0200, Frederic Weisbecker wrote:
> >
> > > +/**
> > > + * preempt_schedule_context - preempt_schedule called by tracing
> > > + *
> > > + * The tracing infrastructure uses preempt_enable_notrace to prevent
> > > + * recursion and tracing preempt enabling caused by the tracing
> > > + * infrastructure itself. But as tracing can happen in areas coming
> > > + * from userspace or just about to enter userspace, a preempt enable
> > > + * can occur before user_exit() is called. This will cause the scheduler
> > > + * to be called when the system is still in usermode.
> > > + *
> > > + * To prevent this, the preempt_enable_notrace will use this function
> > > + * instead of preempt_schedule() to exit user context if needed before
> > > + * calling the scheduler.
> > > + */
> > > +void __sched notrace preempt_schedule_context(void)
> > > +{
> > > + struct thread_info *ti = current_thread_info();
> > > + enum ctx_state prev_ctx;
> > > +
> > > + if (likely(ti->preempt_count || irqs_disabled()))
> > > + return;
> >
> > or:
> > if (!preemptible())
> > return;
>
> Sure, but this is a cut and paste from preempt_schedule(). If I make
> that change here, I would want to make that there too.
>
> Peter, should I use !preemptible() or keep it like preempt_schedule()
> and make a clean up change for 3.11?

That seem to indeed make sense to use it in preempt_schedule() as well.

2013-06-04 14:17:07

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Tue, 2013-06-04 at 14:27 +0200, Frederic Weisbecker wrote:
> On Fri, May 31, 2013 at 09:30:18PM -0400, Steven Rostedt wrote:
> > diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> > index 65349f0..15c9f2e 100644
> > --- a/kernel/context_tracking.c
> > +++ b/kernel/context_tracking.c
> > @@ -71,6 +71,44 @@ void user_enter(void)
> > local_irq_restore(flags);
> > }
> >
> > +/**
> > + * preempt_schedule_context - preempt_schedule called by tracing
> > + *
> > + * The tracing infrastructure uses preempt_enable_notrace to prevent
> > + * recursion and tracing preempt enabling caused by the tracing
> > + * infrastructure itself. But as tracing can happen in areas coming
> > + * from userspace or just about to enter userspace, a preempt enable
> > + * can occur before user_exit() is called. This will cause the scheduler
> > + * to be called when the system is still in usermode.
> > + *
> > + * To prevent this, the preempt_enable_notrace will use this function
> > + * instead of preempt_schedule() to exit user context if needed before
> > + * calling the scheduler.
> > + */
> > +void __sched notrace preempt_schedule_context(void)
>
> Should it be under CONFIG_PREEMPT?

Yes, and one of my tests triggered a failed build, and I fixed it last
night (patch below).

>
> > +{
> > + struct thread_info *ti = current_thread_info();
> > + enum ctx_state prev_ctx;
> > +
> > + if (likely(ti->preempt_count || irqs_disabled()))
> > + return;
> > +
> > + /*
> > + * Need to disable preemption in case user_exit() is traced
> > + * and the tracer calls preempt_enable_notrace() causing
> > + * an infinite recursion.
> > + */
> > + preempt_disable_notrace();
> > + prev_ctx = exception_enter();
> > + preempt_enable_no_resched_notrace();
> > +
> > + preempt_schedule();
> > +
> > + preempt_disable_notrace();
> > + exception_exit(prev_ctx);
> > + preempt_enable_notrace();
> > +}
> > +EXPORT_SYMBOL(preempt_schedule_context);
> >
> > /**
> > * user_exit - Inform the context tracking that the CPU is
> > --
> > 1.7.3.4
> >
> >


>From 07695eacd3d7f948671270cd9f8043c13e1337c7 Mon Sep 17 00:00:00 2001
From: Steven Rostedt <[email protected]>
Date: Fri, 24 May 2013 15:23:40 -0400
Subject: [PATCH] tracing/context-tracking: Add preempt_schedule_context() for
tracing

Dave Jones hit the following bug report:

===============================
[ INFO: suspicious RCU usage. ]
3.10.0-rc2+ #1 Not tainted
-------------------------------
include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
other info that might help us debug this:
RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
2 locks held by cc1/63645:
#0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0

CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
Call Trace:
[<ffffffff816ae383>] dump_stack+0x19/0x1b
[<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
[<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
[<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
[<ffffffff8108dffc>] update_curr+0xec/0x240
[<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
[<ffffffff816b3a71>] __schedule+0x161/0x9b0
[<ffffffff816b4721>] preempt_schedule+0x51/0x80
[<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
[<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
[<ffffffff816be280>] ftrace_call+0x5/0x2f
[<ffffffff816b681d>] ? retint_careful+0xb/0x2e
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
------------[ cut here ]------------

What happened was that the function tracer traced the schedule_user() code
that tells RCU that the system is coming back from userspace, and to
add the CPU back to the RCU monitoring.

Because the function tracer does a preempt_disable/enable_notrace() calls
the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
then preempt_schedule() is called. But this is called before the user_exit()
function can inform the kernel that the CPU is no longer in user mode and
needs to be accounted for by RCU.

The fix is to create a new preempt_schedule_context() that checks if
the kernel is still in user mode and if so to switch it to kernel mode
before calling schedule. It also switches back to user mode coming back
from schedule in need be.

The only user of this currently is the preempt_enable_notrace(), which is
only used by the tracing subsystem.

Link: http://lkml.kernel.org/r/[email protected]

Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/preempt.h | 18 +++++++++++++++++-
kernel/context_tracking.c | 40 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 87a03c7..f5d4723 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -33,9 +33,25 @@ do { \
preempt_schedule(); \
} while (0)

+#ifdef CONFIG_CONTEXT_TRACKING
+
+void preempt_schedule_context(void);
+
+#define preempt_check_resched_context() \
+do { \
+ if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+ preempt_schedule_context(); \
+} while (0)
+#else
+
+#define preempt_check_resched_context() preempt_check_resched()
+
+#endif /* CONFIG_CONTEXT_TRACKING */
+
#else /* !CONFIG_PREEMPT */

#define preempt_check_resched() do { } while (0)
+#define preempt_check_resched_context() do { } while (0)

#endif /* CONFIG_PREEMPT */

@@ -88,7 +104,7 @@ do { \
do { \
preempt_enable_no_resched_notrace(); \
barrier(); \
- preempt_check_resched(); \
+ preempt_check_resched_context(); \
} while (0)

#else /* !CONFIG_PREEMPT_COUNT */
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 65349f0..73b89d9 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -71,6 +71,46 @@ void user_enter(void)
local_irq_restore(flags);
}

+#ifdef CONFIG_PREEMPT
+/**
+ * preempt_schedule_context - preempt_schedule called by tracing
+ *
+ * The tracing infrastructure uses preempt_enable_notrace to prevent
+ * recursion and tracing preempt enabling caused by the tracing
+ * infrastructure itself. But as tracing can happen in areas coming
+ * from userspace or just about to enter userspace, a preempt enable
+ * can occur before user_exit() is called. This will cause the scheduler
+ * to be called when the system is still in usermode.
+ *
+ * To prevent this, the preempt_enable_notrace will use this function
+ * instead of preempt_schedule() to exit user context if needed before
+ * calling the scheduler.
+ */
+void __sched notrace preempt_schedule_context(void)
+{
+ struct thread_info *ti = current_thread_info();
+ enum ctx_state prev_ctx;
+
+ if (likely(ti->preempt_count || irqs_disabled()))
+ return;
+
+ /*
+ * Need to disable preemption in case user_exit() is traced
+ * and the tracer calls preempt_enable_notrace() causing
+ * an infinite recursion.
+ */
+ preempt_disable_notrace();
+ prev_ctx = exception_enter();
+ preempt_enable_no_resched_notrace();
+
+ preempt_schedule();
+
+ preempt_disable_notrace();
+ exception_exit(prev_ctx);
+ preempt_enable_notrace();
+}
+EXPORT_SYMBOL(preempt_schedule_context);
+#endif /* CONFIG_PREEMPT */

/**
* user_exit - Inform the context tracking that the CPU is
--
1.7.10.4


2013-06-05 11:46:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Tue, Jun 04, 2013 at 10:16:56AM -0400, Steven Rostedt wrote:
> index 65349f0..73b89d9 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -71,6 +71,46 @@ void user_enter(void)
> local_irq_restore(flags);
> }
>
> +#ifdef CONFIG_PREEMPT
> +/**
> + * preempt_schedule_context - preempt_schedule called by tracing
> + *
> + * The tracing infrastructure uses preempt_enable_notrace to prevent
> + * recursion and tracing preempt enabling caused by the tracing
> + * infrastructure itself. But as tracing can happen in areas coming
> + * from userspace or just about to enter userspace, a preempt enable
> + * can occur before user_exit() is called. This will cause the scheduler
> + * to be called when the system is still in usermode.
> + *
> + * To prevent this, the preempt_enable_notrace will use this function
> + * instead of preempt_schedule() to exit user context if needed before
> + * calling the scheduler.
> + */

If preempt_enable_notrace() is the only user, why does this live in
kernel/context_tracking.c and not in kernel/sched/core.c?

> +void __sched notrace preempt_schedule_context(void)
> +{
> + struct thread_info *ti = current_thread_info();
> + enum ctx_state prev_ctx;
> +
> + if (likely(ti->preempt_count || irqs_disabled()))
> + return;
> +
> + /*
> + * Need to disable preemption in case user_exit() is traced
> + * and the tracer calls preempt_enable_notrace() causing
> + * an infinite recursion.
> + */
> + preempt_disable_notrace();
> + prev_ctx = exception_enter();
> + preempt_enable_no_resched_notrace();
> +
> + preempt_schedule();
> +
> + preempt_disable_notrace();
> + exception_exit(prev_ctx);
> + preempt_enable_notrace();
> +}
> +EXPORT_SYMBOL(preempt_schedule_context);

Do we really need to export this?

> +#endif /* CONFIG_PREEMPT */
>
> /**
> * user_exit - Inform the context tracking that the CPU is
> --
> 1.7.10.4
>
>
>

2013-06-05 13:41:52

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Wed, 2013-06-05 at 13:45 +0200, Peter Zijlstra wrote:
> On Tue, Jun 04, 2013 at 10:16:56AM -0400, Steven Rostedt wrote:
> > index 65349f0..73b89d9 100644
> > --- a/kernel/context_tracking.c
> > +++ b/kernel/context_tracking.c
> > @@ -71,6 +71,46 @@ void user_enter(void)
> > local_irq_restore(flags);
> > }
> >
> > +#ifdef CONFIG_PREEMPT
> > +/**
> > + * preempt_schedule_context - preempt_schedule called by tracing
> > + *
> > + * The tracing infrastructure uses preempt_enable_notrace to prevent
> > + * recursion and tracing preempt enabling caused by the tracing
> > + * infrastructure itself. But as tracing can happen in areas coming
> > + * from userspace or just about to enter userspace, a preempt enable
> > + * can occur before user_exit() is called. This will cause the scheduler
> > + * to be called when the system is still in usermode.
> > + *
> > + * To prevent this, the preempt_enable_notrace will use this function
> > + * instead of preempt_schedule() to exit user context if needed before
> > + * calling the scheduler.
> > + */
>
> If preempt_enable_notrace() is the only user, why does this live in
> kernel/context_tracking.c and not in kernel/sched/core.c?

Then we would need to add #ifdef CONFIG_CONTEXT_TRACKING around it too.
As we have in preempt.h:

#ifdef CONFIG_CONTEXT_TRACKING

void preempt_schedule_context(void);

#define preempt_check_resched_context() \
do { \
if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
preempt_schedule_context(); \
} while(0)
#else

#define preempt_check_resched_context() preempt_check_resched()

#endif

Seemed like it was more appropriate to be in context_tracking.c than
sched/core.c, as it depends on context tracking.

>
> > +void __sched notrace preempt_schedule_context(void)
> > +{
> > + struct thread_info *ti = current_thread_info();
> > + enum ctx_state prev_ctx;
> > +
> > + if (likely(ti->preempt_count || irqs_disabled()))
> > + return;
> > +
> > + /*
> > + * Need to disable preemption in case user_exit() is traced
> > + * and the tracer calls preempt_enable_notrace() causing
> > + * an infinite recursion.
> > + */
> > + preempt_disable_notrace();
> > + prev_ctx = exception_enter();
> > + preempt_enable_no_resched_notrace();
> > +
> > + preempt_schedule();
> > +
> > + preempt_disable_notrace();
> > + exception_exit(prev_ctx);
> > + preempt_enable_notrace();
> > +}
> > +EXPORT_SYMBOL(preempt_schedule_context);
>
> Do we really need to export this?

As preempt_enable_notrace() is also used by tracepoints, yes. Because
tracepoints are used by modules. Hmm, now it may be exported via a GPL
export though.

-- Steve


>
> > +#endif /* CONFIG_PREEMPT */
> >
> > /**
> > * user_exit - Inform the context tracking that the CPU is
> > --
> > 1.7.10.4
> >
> >
> >

2013-06-06 02:49:53

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Wed, 2013-06-05 at 09:41 -0400, Steven Rostedt wrote:
> >
> > If preempt_enable_notrace() is the only user, why does this live in
> > kernel/context_tracking.c and not in kernel/sched/core.c?
>
> Then we would need to add #ifdef CONFIG_CONTEXT_TRACKING around it too.
> As we have in preempt.h:
>
> #ifdef CONFIG_CONTEXT_TRACKING
>
> void preempt_schedule_context(void);
>
> #define preempt_check_resched_context() \
> do { \
> if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
> preempt_schedule_context(); \
> } while(0)
> #else
>
> #define preempt_check_resched_context() preempt_check_resched()
>
> #endif
>
> Seemed like it was more appropriate to be in context_tracking.c than
> sched/core.c, as it depends on context tracking.
>

Peter,

Are you fine with this answer, or do you still believe I should put it
into sched/core.c?


> > > +EXPORT_SYMBOL(preempt_schedule_context);
> >
> > Do we really need to export this?
>
> As preempt_enable_notrace() is also used by tracepoints, yes. Because
> tracepoints are used by modules. Hmm, now it may be exported via a GPL
> export though.
>

I switched it to EXPORT_SYMBOL_GPL, so that only GPL modules may use
preempt_enable_notrace().

Are you OK with this?

Here's the new patch:

>From fa5be043fb62472eeeeb0ca599d24b22d0f52b83 Mon Sep 17 00:00:00 2001
From: Steven Rostedt <[email protected]>
Date: Fri, 24 May 2013 15:23:40 -0400
Subject: [PATCH] tracing/context-tracking: Add preempt_schedule_context() for
tracing

Dave Jones hit the following bug report:

===============================
[ INFO: suspicious RCU usage. ]
3.10.0-rc2+ #1 Not tainted
-------------------------------
include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
other info that might help us debug this:
RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
2 locks held by cc1/63645:
#0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0

CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
Call Trace:
[<ffffffff816ae383>] dump_stack+0x19/0x1b
[<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
[<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
[<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
[<ffffffff8108dffc>] update_curr+0xec/0x240
[<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
[<ffffffff816b3a71>] __schedule+0x161/0x9b0
[<ffffffff816b4721>] preempt_schedule+0x51/0x80
[<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
[<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
[<ffffffff816be280>] ftrace_call+0x5/0x2f
[<ffffffff816b681d>] ? retint_careful+0xb/0x2e
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
------------[ cut here ]------------

What happened was that the function tracer traced the schedule_user() code
that tells RCU that the system is coming back from userspace, and to
add the CPU back to the RCU monitoring.

Because the function tracer does a preempt_disable/enable_notrace() calls
the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
then preempt_schedule() is called. But this is called before the user_exit()
function can inform the kernel that the CPU is no longer in user mode and
needs to be accounted for by RCU.

The fix is to create a new preempt_schedule_context() that checks if
the kernel is still in user mode and if so to switch it to kernel mode
before calling schedule. It also switches back to user mode coming back
from schedule in need be.

The only user of this currently is the preempt_enable_notrace(), which is
only used by the tracing subsystem.

Link: http://lkml.kernel.org/r/[email protected]

Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/preempt.h | 18 +++++++++++++++++-
kernel/context_tracking.c | 40 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 87a03c7..f5d4723 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -33,9 +33,25 @@ do { \
preempt_schedule(); \
} while (0)

+#ifdef CONFIG_CONTEXT_TRACKING
+
+void preempt_schedule_context(void);
+
+#define preempt_check_resched_context() \
+do { \
+ if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+ preempt_schedule_context(); \
+} while (0)
+#else
+
+#define preempt_check_resched_context() preempt_check_resched()
+
+#endif /* CONFIG_CONTEXT_TRACKING */
+
#else /* !CONFIG_PREEMPT */

#define preempt_check_resched() do { } while (0)
+#define preempt_check_resched_context() do { } while (0)

#endif /* CONFIG_PREEMPT */

@@ -88,7 +104,7 @@ do { \
do { \
preempt_enable_no_resched_notrace(); \
barrier(); \
- preempt_check_resched(); \
+ preempt_check_resched_context(); \
} while (0)

#else /* !CONFIG_PREEMPT_COUNT */
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 65349f0..6667700 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -71,6 +71,46 @@ void user_enter(void)
local_irq_restore(flags);
}

+#ifdef CONFIG_PREEMPT
+/**
+ * preempt_schedule_context - preempt_schedule called by tracing
+ *
+ * The tracing infrastructure uses preempt_enable_notrace to prevent
+ * recursion and tracing preempt enabling caused by the tracing
+ * infrastructure itself. But as tracing can happen in areas coming
+ * from userspace or just about to enter userspace, a preempt enable
+ * can occur before user_exit() is called. This will cause the scheduler
+ * to be called when the system is still in usermode.
+ *
+ * To prevent this, the preempt_enable_notrace will use this function
+ * instead of preempt_schedule() to exit user context if needed before
+ * calling the scheduler.
+ */
+void __sched notrace preempt_schedule_context(void)
+{
+ struct thread_info *ti = current_thread_info();
+ enum ctx_state prev_ctx;
+
+ if (likely(ti->preempt_count || irqs_disabled()))
+ return;
+
+ /*
+ * Need to disable preemption in case user_exit() is traced
+ * and the tracer calls preempt_enable_notrace() causing
+ * an infinite recursion.
+ */
+ preempt_disable_notrace();
+ prev_ctx = exception_enter();
+ preempt_enable_no_resched_notrace();
+
+ preempt_schedule();
+
+ preempt_disable_notrace();
+ exception_exit(prev_ctx);
+ preempt_enable_notrace();
+}
+EXPORT_SYMBOL_GPL(preempt_schedule_context);
+#endif /* CONFIG_PREEMPT */

/**
* user_exit - Inform the context tracking that the CPU is
--
1.7.10.4


2013-06-06 10:08:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Wed, Jun 05, 2013 at 10:49:48PM -0400, Steven Rostedt wrote:
> Peter,
>
> Are you fine with this answer, or do you still believe I should put it
> into sched/core.c?

Its ok I suppose. I just had to wrap my head around this
context_tracking muck; first time I ever looked at it.

> I switched it to EXPORT_SYMBOL_GPL, so that only GPL modules may use
> preempt_enable_notrace().
>
> Are you OK with this?

Yep, applied.

2013-06-06 13:50:36

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing

On Thu, 2013-06-06 at 12:07 +0200, Peter Zijlstra wrote:

> > Are you OK with this?
>
> Yep, applied.

Ah, so you'll take it. Great! I'll drop it from my queue.

Thanks,

-- Steve