2020-04-15 14:53:48

by John Stultz

[permalink] [raw]
Subject: On trace_*_rcuidle functions in modules

Hey folks,
So recently I was looking at converting some drivers to be loadable
modules instead of built-in only, and one of my patches just landed in
-next and started getting build error reports.

It ends up, recently in the merge window, the driver I was converting
to module switched a trace_*() function to trace_*_rcuidle() to fix a
bug. Now when building as a module, if tracing is configured on, it
can't seem to find the trace_*_rcuidle() symbol.

This is because, as you are aware, we don't declare trace_*_rcuidle
functions in modules - and haven't for quite some time:
https://lore.kernel.org/lkml/20120905062306.GA14756@leaf/

I wanted to better understand the background rationale for that patch,
to understand if not exporting the rcu_idle_exit and rcu_idle_enter,
calls was because they weren't used or if it was a more intentional
decision to avoid allowing modules to use them.

Would it be reasonable to revisit that patch? Or is there some
recommended alternative solution?

thanks
-john


2020-04-15 14:57:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Tue, Apr 14, 2020 at 07:20:01PM -0700, John Stultz wrote:
> Hey folks,
> So recently I was looking at converting some drivers to be loadable
> modules instead of built-in only, and one of my patches just landed in
> -next and started getting build error reports.
>
> It ends up, recently in the merge window, the driver I was converting
> to module switched a trace_*() function to trace_*_rcuidle() to fix a
> bug. Now when building as a module, if tracing is configured on, it
> can't seem to find the trace_*_rcuidle() symbol.
>
> This is because, as you are aware, we don't declare trace_*_rcuidle
> functions in modules - and haven't for quite some time:
> https://lore.kernel.org/lkml/20120905062306.GA14756@leaf/
>
> I wanted to better understand the background rationale for that patch,
> to understand if not exporting the rcu_idle_exit and rcu_idle_enter,
> calls was because they weren't used or if it was a more intentional
> decision to avoid allowing modules to use them.
>
> Would it be reasonable to revisit that patch? Or is there some
> recommended alternative solution?

I will defer to Steven and Josh on the rationale. (Cowardly of me,
I know!)

What I do is to maintain a wrapper for tracepoints within a built-in
portion of RCU, export the wrapper, and invoke the wrapper from the
rcutorture module. Maybe you can do something similar?

But why would a module be invoked from the idle loop? Is the module
supplying an idle driver or some such?

Thanx, Paul

2020-04-15 15:01:03

by John Stultz

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Tue, Apr 14, 2020 at 7:57 PM Paul E. McKenney <[email protected]> wrote:
>
> On Tue, Apr 14, 2020 at 07:20:01PM -0700, John Stultz wrote:
> > Hey folks,
> > So recently I was looking at converting some drivers to be loadable
> > modules instead of built-in only, and one of my patches just landed in
> > -next and started getting build error reports.
> >
> > It ends up, recently in the merge window, the driver I was converting
> > to module switched a trace_*() function to trace_*_rcuidle() to fix a
> > bug. Now when building as a module, if tracing is configured on, it
> > can't seem to find the trace_*_rcuidle() symbol.
> >
> > This is because, as you are aware, we don't declare trace_*_rcuidle
> > functions in modules - and haven't for quite some time:
> > https://lore.kernel.org/lkml/20120905062306.GA14756@leaf/
> >
> > I wanted to better understand the background rationale for that patch,
> > to understand if not exporting the rcu_idle_exit and rcu_idle_enter,
> > calls was because they weren't used or if it was a more intentional
> > decision to avoid allowing modules to use them.
> >
> > Would it be reasonable to revisit that patch? Or is there some
> > recommended alternative solution?
>
> I will defer to Steven and Josh on the rationale. (Cowardly of me,
> I know!)
>
> What I do is to maintain a wrapper for tracepoints within a built-in
> portion of RCU, export the wrapper, and invoke the wrapper from the
> rcutorture module. Maybe you can do something similar?

That feels a little hackish, but I guess if there isn't a better option...

> But why would a module be invoked from the idle loop? Is the module
> supplying an idle driver or some such?

The driver (qcom rpmh driver) registers a cpu_pm notifier callback,
which gets called when entering idle.

thanks
-john

2020-04-15 23:56:40

by Steven Rostedt

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

[ +Peter +Thomas ]

On Tue, 14 Apr 2020 19:20:01 -0700
John Stultz <[email protected]> wrote:

> Hey folks,
> So recently I was looking at converting some drivers to be loadable
> modules instead of built-in only, and one of my patches just landed in
> -next and started getting build error reports.
>
> It ends up, recently in the merge window, the driver I was converting
> to module switched a trace_*() function to trace_*_rcuidle() to fix a
> bug. Now when building as a module, if tracing is configured on, it
> can't seem to find the trace_*_rcuidle() symbol.

Which modules need this.

Currently, Thomas and Peter are working on removing trace events from
places that don't have RCU enabled, or at least cleaning up the context
switches from user to kernel to interrupt.

-- Steve


>
> This is because, as you are aware, we don't declare trace_*_rcuidle
> functions in modules - and haven't for quite some time:
> https://lore.kernel.org/lkml/20120905062306.GA14756@leaf/
>
> I wanted to better understand the background rationale for that patch,
> to understand if not exporting the rcu_idle_exit and rcu_idle_enter,
> calls was because they weren't used or if it was a more intentional
> decision to avoid allowing modules to use them.
>
> Would it be reasonable to revisit that patch? Or is there some
> recommended alternative solution?
>
> thanks
> -john

2020-04-16 00:02:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Tue, Apr 14, 2020 at 08:47:18PM -0700, John Stultz wrote:
> On Tue, Apr 14, 2020 at 7:57 PM Paul E. McKenney <[email protected]> wrote:
> >
> > On Tue, Apr 14, 2020 at 07:20:01PM -0700, John Stultz wrote:
> > > Hey folks,
> > > So recently I was looking at converting some drivers to be loadable
> > > modules instead of built-in only, and one of my patches just landed in
> > > -next and started getting build error reports.
> > >
> > > It ends up, recently in the merge window, the driver I was converting
> > > to module switched a trace_*() function to trace_*_rcuidle() to fix a
> > > bug. Now when building as a module, if tracing is configured on, it
> > > can't seem to find the trace_*_rcuidle() symbol.
> > >
> > > This is because, as you are aware, we don't declare trace_*_rcuidle
> > > functions in modules - and haven't for quite some time:
> > > https://lore.kernel.org/lkml/20120905062306.GA14756@leaf/
> > >
> > > I wanted to better understand the background rationale for that patch,
> > > to understand if not exporting the rcu_idle_exit and rcu_idle_enter,
> > > calls was because they weren't used or if it was a more intentional
> > > decision to avoid allowing modules to use them.
> > >
> > > Would it be reasonable to revisit that patch? Or is there some
> > > recommended alternative solution?
> >
> > I will defer to Steven and Josh on the rationale. (Cowardly of me,
> > I know!)
> >
> > What I do is to maintain a wrapper for tracepoints within a built-in
> > portion of RCU, export the wrapper, and invoke the wrapper from the
> > rcutorture module. Maybe you can do something similar?
>
> That feels a little hackish, but I guess if there isn't a better option...

It is just rcutorture, so I am not going to claim that I put a huge
amount of energy into researching better options. ;-)

> > But why would a module be invoked from the idle loop? Is the module
> > supplying an idle driver or some such?
>
> The driver (qcom rpmh driver) registers a cpu_pm notifier callback,
> which gets called when entering idle.

That would do it! And yes, the idle-loop power-management code is in
an interesting situation in that RCU is not watching it by default, but
it does need to be debugged. One straightforward approach would be for
the PM code to tell RCU to watch on entry and exit, but I don't have a
feel for how this would affect energy efficiency.

Thanx, Paul

2020-04-16 01:04:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, 15 Apr 2020 12:56:52 -0700
John Stultz <[email protected]> wrote:

> I'm trying to enable the qcom rpmh driver
> (drivers/soc/qcom/rpmh-rsc.c) to be a module. As I mentioned to Paul,
> it registers a cpu_pm notifier callback, which calls its
> __tcs_buffer_write() function. The trace in the __tcs_buffer_write()
> function was just converted to using rcuidle to address bugs seen when
> it was being called from idle.
>
> > Currently, Thomas and Peter are working on removing trace events from
> > places that don't have RCU enabled, or at least cleaning up the context
> > switches from user to kernel to interrupt.
>
> So does that mean folks would most likely lean to trying to remove the
> tracepoint rather than reevaluating allowing the rcuidle call to be
> made from the module?
>

No. The clean up is to try to make the switch from each context small, fast
and safe. But what you are describing is the switch to idle, which is a
different story and something that there's some talk about cleaning up, but
not at the same level. Especially if there's more complex code that is
happening with RCU watching.

Looking at the commit that keeps trace_*_rcuidle() code out:

7ece55a4a3a04a ("trace: Don't declare trace_*_rcuidle functions in modules")

Which was added because the rcuidle variant called RCU code that was not
exported either. Which would have the same issue now as
rcu_irq_exit_irqson() is also not exported. Which would be needed.

Hmm, isn't module code itself synchronized via RCU. Then having module code
being called without RCU "watching" could be dangerous?

-- Steve

2020-04-16 01:04:48

by John Stultz

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, Apr 15, 2020 at 1:14 PM Steven Rostedt <[email protected]> wrote:
>
> On Wed, 15 Apr 2020 12:56:52 -0700
> John Stultz <[email protected]> wrote:
>
> > I'm trying to enable the qcom rpmh driver
> > (drivers/soc/qcom/rpmh-rsc.c) to be a module. As I mentioned to Paul,
> > it registers a cpu_pm notifier callback, which calls its
> > __tcs_buffer_write() function. The trace in the __tcs_buffer_write()
> > function was just converted to using rcuidle to address bugs seen when
> > it was being called from idle.
> >
> > > Currently, Thomas and Peter are working on removing trace events from
> > > places that don't have RCU enabled, or at least cleaning up the context
> > > switches from user to kernel to interrupt.
> >
> > So does that mean folks would most likely lean to trying to remove the
> > tracepoint rather than reevaluating allowing the rcuidle call to be
> > made from the module?
> >
>
> No. The clean up is to try to make the switch from each context small, fast
> and safe. But what you are describing is the switch to idle, which is a
> different story and something that there's some talk about cleaning up, but
> not at the same level. Especially if there's more complex code that is
> happening with RCU watching.
>
> Looking at the commit that keeps trace_*_rcuidle() code out:
>
> 7ece55a4a3a04a ("trace: Don't declare trace_*_rcuidle functions in modules")
>
> Which was added because the rcuidle variant called RCU code that was not
> exported either. Which would have the same issue now as
> rcu_irq_exit_irqson() is also not exported. Which would be needed.

Right, reverting that change and adding the exports seems like the
most direct solution, but I suspect that wasn't done back in the day
for a good reason. So I'm just trying to better understand that
reason.

> Hmm, isn't module code itself synchronized via RCU. Then having module code
> being called without RCU "watching" could be dangerous?

I'm not sure I'm following you here. Could you explain more?

thanks
-john

2020-04-16 01:06:01

by John Stultz

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, Apr 15, 2020 at 5:53 AM Steven Rostedt <[email protected]> wrote:
>
> [ +Peter +Thomas ]
>
> On Tue, 14 Apr 2020 19:20:01 -0700
> John Stultz <[email protected]> wrote:
>
> > Hey folks,
> > So recently I was looking at converting some drivers to be loadable
> > modules instead of built-in only, and one of my patches just landed in
> > -next and started getting build error reports.
> >
> > It ends up, recently in the merge window, the driver I was converting
> > to module switched a trace_*() function to trace_*_rcuidle() to fix a
> > bug. Now when building as a module, if tracing is configured on, it
> > can't seem to find the trace_*_rcuidle() symbol.
>
> Which modules need this.

Hey Steven!

I'm trying to enable the qcom rpmh driver
(drivers/soc/qcom/rpmh-rsc.c) to be a module. As I mentioned to Paul,
it registers a cpu_pm notifier callback, which calls its
__tcs_buffer_write() function. The trace in the __tcs_buffer_write()
function was just converted to using rcuidle to address bugs seen when
it was being called from idle.

> Currently, Thomas and Peter are working on removing trace events from
> places that don't have RCU enabled, or at least cleaning up the context
> switches from user to kernel to interrupt.

So does that mean folks would most likely lean to trying to remove the
tracepoint rather than reevaluating allowing the rcuidle call to be
made from the module?

thanks
-john

2020-04-16 01:08:41

by Steven Rostedt

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, 15 Apr 2020 13:17:53 -0700
John Stultz <[email protected]> wrote:

> > Hmm, isn't module code itself synchronized via RCU. Then having module code
> > being called without RCU "watching" could be dangerous?
>
> I'm not sure I'm following you here. Could you explain more?

So how does this code get registered to be called as a module? And if it is
registered, I'm guessing it needs to be unregistered too. How would that be
synchronized? Usually, calling synchronize_rcu() is done after
unregistering, but if that code is called without RCU watching, it is
possible synchronize_rcu() can finish before that code is released.

-- Steve

2020-04-16 01:10:13

by John Stultz

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, Apr 15, 2020 at 1:41 PM Steven Rostedt <[email protected]> wrote:
>
> On Wed, 15 Apr 2020 13:17:53 -0700
> John Stultz <[email protected]> wrote:
>
> > > Hmm, isn't module code itself synchronized via RCU. Then having module code
> > > being called without RCU "watching" could be dangerous?
> >
> > I'm not sure I'm following you here. Could you explain more?
>
> So how does this code get registered to be called as a module?

The driver is registered via standard platform_driver_register()
called via module_initcall. The callback is then registered via
cpu_pm_register_notifier() in the driver's probe function.

> And if it is
> registered, I'm guessing it needs to be unregistered too. How would that be
> synchronized? Usually, calling synchronize_rcu() is done after
> unregistering, but if that code is called without RCU watching, it is
> possible synchronize_rcu() can finish before that code is released.

So I'm actually trying to enable the driver to be loaded as a
permanent module, so there's no remove hook (so much depends on the
driver that you couldn't remove it and have anything work - we just
want it to be modularly loaded so all devices don't have to pay the
cost of including the driver).

So in my case your concerns may not be a problem, but I guess
generally it might. Though I'd hope the callback would be unregistered
(and whatever waiting for the grace period to complete be done) before
the module removal is complete. But maybe I'm still missing your
point?

thanks
-john

2020-04-16 01:12:00

by John Stultz

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, Apr 15, 2020 at 2:49 PM Steven Rostedt <[email protected]> wrote:
>
> On Wed, 15 Apr 2020 14:02:04 -0700
> John Stultz <[email protected]> wrote:
>
> >
> > So in my case your concerns may not be a problem, but I guess
> > generally it might. Though I'd hope the callback would be unregistered
> > (and whatever waiting for the grace period to complete be done) before
> > the module removal is complete. But maybe I'm still missing your
> > point?
>
> Hmm, you may have just brought up a problem here...
>
> You're saying that cpu_pm_register_notifier() callers are called from non
> RCU watching context? If that's the case, we have this:

Wait? what? I don't think I'm saying that. :) I'm just trying to fix
a build issue that prevents a driver from being a module since it has
a trace_*_rcuidle call in it.

Honestly, I really don't know much about how the cpu pm notifiers
interact with rcu. It's just this recent change, which seems
necessary, now seemingly prevents the driver from being built as a
module:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=efde2659b0fe835732047357b2902cca14f054d9

Maybe there's a better solution than using trace_*_rcuidle?

thanks
-john

2020-04-16 01:12:46

by Paul E. McKenney

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, Apr 15, 2020 at 05:49:18PM -0400, Steven Rostedt wrote:
> On Wed, 15 Apr 2020 14:02:04 -0700
> John Stultz <[email protected]> wrote:
>
> >
> > So in my case your concerns may not be a problem, but I guess
> > generally it might. Though I'd hope the callback would be unregistered
> > (and whatever waiting for the grace period to complete be done) before
> > the module removal is complete. But maybe I'm still missing your
> > point?
>
> Hmm, you may have just brought up a problem here...
>
> You're saying that cpu_pm_register_notifier() callers are called from non
> RCU watching context? If that's the case, we have this:
>
> int cpu_pm_unregister_notifier(struct notifier_block *nb)
> {
> return atomic_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
> }
>
> And this:
>
> int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
> struct notifier_block *n)
> {
> unsigned long flags;
> int ret;
>
> spin_lock_irqsave(&nh->lock, flags);
> ret = notifier_chain_unregister(&nh->head, n);
> spin_unlock_irqrestore(&nh->lock, flags);
> synchronize_rcu();
> return ret;
> }
>
> Which means that if something registered a cpu_pm notifier, then
> unregistered it, and freed whatever the notifier accesses, then there's a
> chance that the synchronize_rcu() can return before the called notifier
> finishes, and anything that notifier accesses could have been freed.
>
> I believe that module code should not be able to be run in RCU non watching
> context, and neither should notifiers. I think we just stumbled on a bug.
>
> Paul?

Or we say that such modules cannot be unloaded. Or that such modules'
exit handlers, after disentangling themselves from the idle loop, must
invoke synchronize_rcu_rude() or similar, just as modules that use
call_rcu() are currently required to invoke rcu_barrier().

Or is it possible to upgrade the protection that modules use?

My guess is that invoking rcu_irq_enter() and rcu_irq_exit() around every
potential call into module code out of the PM code is a non-starter,
but I cannot prove that either way.

Thanx, Paul

2020-04-16 01:12:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, Apr 15, 2020 at 04:14:24PM -0400, Steven Rostedt wrote:
> Which was added because the rcuidle variant called RCU code that was not
> exported either. Which would have the same issue now as
> rcu_irq_exit_irqson() is also not exported. Which would be needed.

Keeping all RCU_NONIDLE code in the core kernel allows us to eventually
clean it all up. Allowing it to escape into modules makes that ever so
much harder :/

2020-04-16 01:13:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Thu, Apr 16, 2020 at 12:42:14AM +0200, Peter Zijlstra wrote:
> On Wed, Apr 15, 2020 at 03:04:59PM -0700, Paul E. McKenney wrote:
> >
> > My guess is that invoking rcu_irq_enter() and rcu_irq_exit() around every
> > potential call into module code out of the PM code is a non-starter,
> > but I cannot prove that either way.
>
> Isn't that exactly what cpu_pm_notify() is doing?

Right you are! Problem solved, then?

Thanx, Paul

2020-04-16 01:13:26

by Steven Rostedt

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, 15 Apr 2020 14:02:04 -0700
John Stultz <[email protected]> wrote:

>
> So in my case your concerns may not be a problem, but I guess
> generally it might. Though I'd hope the callback would be unregistered
> (and whatever waiting for the grace period to complete be done) before
> the module removal is complete. But maybe I'm still missing your
> point?

Hmm, you may have just brought up a problem here...

You're saying that cpu_pm_register_notifier() callers are called from non
RCU watching context? If that's the case, we have this:

int cpu_pm_unregister_notifier(struct notifier_block *nb)
{
return atomic_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
}

And this:

int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
struct notifier_block *n)
{
unsigned long flags;
int ret;

spin_lock_irqsave(&nh->lock, flags);
ret = notifier_chain_unregister(&nh->head, n);
spin_unlock_irqrestore(&nh->lock, flags);
synchronize_rcu();
return ret;
}

Which means that if something registered a cpu_pm notifier, then
unregistered it, and freed whatever the notifier accesses, then there's a
chance that the synchronize_rcu() can return before the called notifier
finishes, and anything that notifier accesses could have been freed.

I believe that module code should not be able to be run in RCU non watching
context, and neither should notifiers. I think we just stumbled on a bug.

Paul?


-- Steve

2020-04-16 01:13:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, Apr 15, 2020 at 06:51:21PM -0400, Steven Rostedt wrote:
> On Wed, 15 Apr 2020 15:04:59 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > On Wed, Apr 15, 2020 at 05:49:18PM -0400, Steven Rostedt wrote:
> > > On Wed, 15 Apr 2020 14:02:04 -0700
> > > John Stultz <[email protected]> wrote:
> > >
> > > >
> > > > So in my case your concerns may not be a problem, but I guess
> > > > generally it might. Though I'd hope the callback would be unregistered
> > > > (and whatever waiting for the grace period to complete be done) before
> > > > the module removal is complete. But maybe I'm still missing your
> > > > point?
> > >
> > > Hmm, you may have just brought up a problem here...
> > >
> > > You're saying that cpu_pm_register_notifier() callers are called from non
> > > RCU watching context? If that's the case, we have this:
> > >
> > > int cpu_pm_unregister_notifier(struct notifier_block *nb)
> > > {
> > > return atomic_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
> > > }
> > >
> > > And this:
> > >
> > > int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
> > > struct notifier_block *n)
> > > {
> > > unsigned long flags;
> > > int ret;
> > >
> > > spin_lock_irqsave(&nh->lock, flags);
> > > ret = notifier_chain_unregister(&nh->head, n);
> > > spin_unlock_irqrestore(&nh->lock, flags);
> > > synchronize_rcu();
> > > return ret;
> > > }
> > >
> > > Which means that if something registered a cpu_pm notifier, then
> > > unregistered it, and freed whatever the notifier accesses, then there's a
> > > chance that the synchronize_rcu() can return before the called notifier
> > > finishes, and anything that notifier accesses could have been freed.
> > >
> > > I believe that module code should not be able to be run in RCU non watching
> > > context, and neither should notifiers. I think we just stumbled on a bug.
> > >
> > > Paul?
> >
> > Or we say that such modules cannot be unloaded. Or that such modules'
> > exit handlers, after disentangling themselves from the idle loop, must
> > invoke synchronize_rcu_rude() or similar, just as modules that use
> > call_rcu() are currently required to invoke rcu_barrier().
> >
> > Or is it possible to upgrade the protection that modules use?
> >
> > My guess is that invoking rcu_irq_enter() and rcu_irq_exit() around every
> > potential call into module code out of the PM code is a non-starter,
> > but I cannot prove that either way.
> >
>
> No this has nothing to do with modules. This is a bug right now with the
> cpu_pm notifier (after looking at the code, it's not a bug right now, see
> below).
>
> Say you have something that allocates some data and registers a
> callback to the cpu_pm notifier that access that data. Then for some
> reason, you want to remove that notifier and free the data. Usually you
> would do:
>
> cpu_pm_unregister_notifier(my_notifier);
> kfree(my_data);
>
> But the problem is that the callback of that my_notifier could be executing
> in a RCU non-watching space, and the cpu_pm_unregister_notifier() can
> return before the my_notifier is done, and the my_data is freed. Then the
> callback for the my_notifier could still be accessing the my_data.
>
>
> /me goes and reads the code and sees this is not an issue, and you can
> ignore the above concern.
>
> I was about to suggest a patch, but that has already been written...
>
> 313c8c16ee62b ("PM / CPU: replace raw_notifier with atomic_notifier")
>
> Which surrounds the notifier callbacks with rcu_irq_enter_irqson()
>
> Which means that if John moves the code to use the notifier, then he could
> also remove the _rcuidle(), because RCU will be watching.

Whew!!! ;-)

Thanx, Paul

2020-04-16 01:15:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, Apr 15, 2020 at 03:04:59PM -0700, Paul E. McKenney wrote:
>
> My guess is that invoking rcu_irq_enter() and rcu_irq_exit() around every
> potential call into module code out of the PM code is a non-starter,
> but I cannot prove that either way.

Isn't that exactly what cpu_pm_notify() is doing?

2020-04-16 01:15:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, 15 Apr 2020 15:04:59 -0700
"Paul E. McKenney" <[email protected]> wrote:

> On Wed, Apr 15, 2020 at 05:49:18PM -0400, Steven Rostedt wrote:
> > On Wed, 15 Apr 2020 14:02:04 -0700
> > John Stultz <[email protected]> wrote:
> >
> > >
> > > So in my case your concerns may not be a problem, but I guess
> > > generally it might. Though I'd hope the callback would be unregistered
> > > (and whatever waiting for the grace period to complete be done) before
> > > the module removal is complete. But maybe I'm still missing your
> > > point?
> >
> > Hmm, you may have just brought up a problem here...
> >
> > You're saying that cpu_pm_register_notifier() callers are called from non
> > RCU watching context? If that's the case, we have this:
> >
> > int cpu_pm_unregister_notifier(struct notifier_block *nb)
> > {
> > return atomic_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
> > }
> >
> > And this:
> >
> > int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
> > struct notifier_block *n)
> > {
> > unsigned long flags;
> > int ret;
> >
> > spin_lock_irqsave(&nh->lock, flags);
> > ret = notifier_chain_unregister(&nh->head, n);
> > spin_unlock_irqrestore(&nh->lock, flags);
> > synchronize_rcu();
> > return ret;
> > }
> >
> > Which means that if something registered a cpu_pm notifier, then
> > unregistered it, and freed whatever the notifier accesses, then there's a
> > chance that the synchronize_rcu() can return before the called notifier
> > finishes, and anything that notifier accesses could have been freed.
> >
> > I believe that module code should not be able to be run in RCU non watching
> > context, and neither should notifiers. I think we just stumbled on a bug.
> >
> > Paul?
>
> Or we say that such modules cannot be unloaded. Or that such modules'
> exit handlers, after disentangling themselves from the idle loop, must
> invoke synchronize_rcu_rude() or similar, just as modules that use
> call_rcu() are currently required to invoke rcu_barrier().
>
> Or is it possible to upgrade the protection that modules use?
>
> My guess is that invoking rcu_irq_enter() and rcu_irq_exit() around every
> potential call into module code out of the PM code is a non-starter,
> but I cannot prove that either way.
>

No this has nothing to do with modules. This is a bug right now with the
cpu_pm notifier (after looking at the code, it's not a bug right now, see
below).

Say you have something that allocates some data and registers a
callback to the cpu_pm notifier that access that data. Then for some
reason, you want to remove that notifier and free the data. Usually you
would do:

cpu_pm_unregister_notifier(my_notifier);
kfree(my_data);

But the problem is that the callback of that my_notifier could be executing
in a RCU non-watching space, and the cpu_pm_unregister_notifier() can
return before the my_notifier is done, and the my_data is freed. Then the
callback for the my_notifier could still be accessing the my_data.


/me goes and reads the code and sees this is not an issue, and you can
ignore the above concern.

I was about to suggest a patch, but that has already been written...

313c8c16ee62b ("PM / CPU: replace raw_notifier with atomic_notifier")

Which surrounds the notifier callbacks with rcu_irq_enter_irqson()

Which means that if John moves the code to use the notifier, then he could
also remove the _rcuidle(), because RCU will be watching.

-- Steve

2020-04-16 01:15:18

by Steven Rostedt

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Thu, 16 Apr 2020 00:42:14 +0200
Peter Zijlstra <[email protected]> wrote:

> On Wed, Apr 15, 2020 at 03:04:59PM -0700, Paul E. McKenney wrote:
> >
> > My guess is that invoking rcu_irq_enter() and rcu_irq_exit() around every
> > potential call into module code out of the PM code is a non-starter,
> > but I cannot prove that either way.
>
> Isn't that exactly what cpu_pm_notify() is doing?

That was originally my concern, but I didn't look at cpu_pm_notify(), until
I was about to add that to it ;-) Then noticed, it was already there
(making my last email rather confusing as I wrote half of it before seeing
this, and then continued that email after the fact).

-- Steve

2020-04-16 01:16:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, 15 Apr 2020 17:06:24 -0700
John Stultz <[email protected]> wrote:

> So you're saying the recent change to move to using trace_*_rcuidle()
> was unnecessary?
>
> Or is there a different notifier then cpu_pm_register_notifier() that
> the driver should be using (that one seems to be using
> atomic_notifier_chain_register())?

From looking at the trace event in __tcs_buffer_write() in
drivers/soc/qcom/rpmh-rsc.c, the _rcuidle() was added by:

efde2659b0fe8 ("drivers: qcom: rpmh-rsc: Use rcuidle tracepoints for rpmh")

Which shows a backtrace dump of:

Call trace:
dump_backtrace+0x0/0x174
show_stack+0x20/0x2c
dump_stack+0xc8/0x124
lockdep_rcu_suspicious+0xe4/0x104
__tcs_buffer_write+0x230/0x2d0
rpmh_rsc_write_ctrl_data+0x210/0x270
rpmh_flush+0x84/0x24c
rpmh_domain_power_off+0x78/0x98
_genpd_power_off+0x40/0xc0
genpd_power_off+0x168/0x208
genpd_power_off+0x1e0/0x208
genpd_power_off+0x1e0/0x208
genpd_runtime_suspend+0x1ac/0x220
__rpm_callback+0x70/0xfc
rpm_callback+0x34/0x8c
rpm_suspend+0x218/0x4a4
__pm_runtime_suspend+0x88/0xac
psci_enter_domain_idle_state+0x3c/0xb4
cpuidle_enter_state+0xb8/0x284
cpuidle_enter+0x38/0x4c
call_cpuidle+0x3c/0x68
do_idle+0x194/0x260
cpu_startup_entry+0x24/0x28
secondary_start_kernel+0x150/0x15c


There's no notifier that calls this. This is called by the rpm_callback
logic. Perhaps that callback will require a call to rcu_irq_enter()
before calling the callback.

In any case, I think it is wrong that these callbacks are called
without RCU watching. The _rcuidle() on that tracepoint should be
removed, and we fix the code that gets there to ensure that RCU is
enabled. I agree with Peter, that no module code should be executed
without RCU watching.

-- Steve

2020-04-16 01:17:21

by John Stultz

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, Apr 15, 2020 at 3:51 PM Steven Rostedt <[email protected]> wrote:
> I was about to suggest a patch, but that has already been written...
>
> 313c8c16ee62b ("PM / CPU: replace raw_notifier with atomic_notifier")
>
> Which surrounds the notifier callbacks with rcu_irq_enter_irqson()
>
> Which means that if John moves the code to use the notifier, then he could
> also remove the _rcuidle(), because RCU will be watching.

So you're saying the recent change to move to using trace_*_rcuidle()
was unnecessary?

Or is there a different notifier then cpu_pm_register_notifier() that
the driver should be using (that one seems to be using
atomic_notifier_chain_register())?

thanks
-john

2020-04-16 01:19:19

by Bjorn Andersson

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed 15 Apr 17:48 PDT 2020, Steven Rostedt wrote:

> On Wed, 15 Apr 2020 17:06:24 -0700
> John Stultz <[email protected]> wrote:
>
> > So you're saying the recent change to move to using trace_*_rcuidle()
> > was unnecessary?
> >
> > Or is there a different notifier then cpu_pm_register_notifier() that
> > the driver should be using (that one seems to be using
> > atomic_notifier_chain_register())?
>
> From looking at the trace event in __tcs_buffer_write() in
> drivers/soc/qcom/rpmh-rsc.c, the _rcuidle() was added by:
>
> efde2659b0fe8 ("drivers: qcom: rpmh-rsc: Use rcuidle tracepoints for rpmh")
>
> Which shows a backtrace dump of:
>
> Call trace:
> dump_backtrace+0x0/0x174
> show_stack+0x20/0x2c
> dump_stack+0xc8/0x124
> lockdep_rcu_suspicious+0xe4/0x104
> __tcs_buffer_write+0x230/0x2d0
> rpmh_rsc_write_ctrl_data+0x210/0x270
> rpmh_flush+0x84/0x24c
> rpmh_domain_power_off+0x78/0x98
> _genpd_power_off+0x40/0xc0
> genpd_power_off+0x168/0x208
> genpd_power_off+0x1e0/0x208
> genpd_power_off+0x1e0/0x208
> genpd_runtime_suspend+0x1ac/0x220
> __rpm_callback+0x70/0xfc
> rpm_callback+0x34/0x8c
> rpm_suspend+0x218/0x4a4
> __pm_runtime_suspend+0x88/0xac
> psci_enter_domain_idle_state+0x3c/0xb4
> cpuidle_enter_state+0xb8/0x284
> cpuidle_enter+0x38/0x4c
> call_cpuidle+0x3c/0x68
> do_idle+0x194/0x260
> cpu_startup_entry+0x24/0x28
> secondary_start_kernel+0x150/0x15c
>
>
> There's no notifier that calls this. This is called by the rpm_callback
> logic. Perhaps that callback will require a call to rcu_irq_enter()
> before calling the callback.
>
> In any case, I think it is wrong that these callbacks are called
> without RCU watching. The _rcuidle() on that tracepoint should be
> removed, and we fix the code that gets there to ensure that RCU is
> enabled. I agree with Peter, that no module code should be executed
> without RCU watching.
>

Forgive me, but how is this problem related to the fact that the code is
dynamically loaded, i.e. encapsulated in a module?

Per the example earlier in this thread, the thing we're worried about is
a use after free in the following scenario, right?

cpu_pm_unregister_notifier(my_notifier);
kfree(my_data);

But a driver implementing this snippet might do this regardless of being
builtin or module and afaict exiting probe() unsuccessfully or unbinding
the device would risk triggering this issue?

Regards,
Bjorn

2020-04-16 01:27:22

by Steven Rostedt

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, 15 Apr 2020 18:02:58 -0700
Bjorn Andersson <[email protected]> wrote:

> Forgive me, but how is this problem related to the fact that the code is
> dynamically loaded, i.e. encapsulated in a module?

It's not.

>
> Per the example earlier in this thread, the thing we're worried about is
> a use after free in the following scenario, right?
>
> cpu_pm_unregister_notifier(my_notifier);
> kfree(my_data);
>
> But a driver implementing this snippet might do this regardless of being
> builtin or module and afaict exiting probe() unsuccessfully or unbinding
> the device would risk triggering this issue?

I know my email was confusing. I was talking about a bug that does not
exist. (There is no bug!)

The reason is that rcu is enabled during the call to the notifiers. The
above assumes that the my_data usage in the notifier callback is
surrounded by rcu_read_lock() (otherwise it's broken regardless of this
code or not). The above unregister will call synchronize_rcu() after it
removes the notifier which will guarantee that the rcu_read_lock()
critical sections would be completed. Then the kfree(my_data) can free
my_data with no possible users.

-- Steve

2020-04-16 02:21:38

by John Stultz

[permalink] [raw]
Subject: Re: On trace_*_rcuidle functions in modules

On Wed, Apr 15, 2020 at 5:48 PM Steven Rostedt <[email protected]> wrote:
>
> On Wed, 15 Apr 2020 17:06:24 -0700
> John Stultz <[email protected]> wrote:
>
> > So you're saying the recent change to move to using trace_*_rcuidle()
> > was unnecessary?
> >
> > Or is there a different notifier then cpu_pm_register_notifier() that
> > the driver should be using (that one seems to be using
> > atomic_notifier_chain_register())?
>
> From looking at the trace event in __tcs_buffer_write() in
> drivers/soc/qcom/rpmh-rsc.c, the _rcuidle() was added by:
>
> efde2659b0fe8 ("drivers: qcom: rpmh-rsc: Use rcuidle tracepoints for rpmh")
>
> Which shows a backtrace dump of:
>
> Call trace:
> dump_backtrace+0x0/0x174
> show_stack+0x20/0x2c
> dump_stack+0xc8/0x124
> lockdep_rcu_suspicious+0xe4/0x104
> __tcs_buffer_write+0x230/0x2d0
> rpmh_rsc_write_ctrl_data+0x210/0x270
> rpmh_flush+0x84/0x24c
> rpmh_domain_power_off+0x78/0x98
> _genpd_power_off+0x40/0xc0
> genpd_power_off+0x168/0x208
> genpd_power_off+0x1e0/0x208
> genpd_power_off+0x1e0/0x208
> genpd_runtime_suspend+0x1ac/0x220
> __rpm_callback+0x70/0xfc
> rpm_callback+0x34/0x8c
> rpm_suspend+0x218/0x4a4
> __pm_runtime_suspend+0x88/0xac
> psci_enter_domain_idle_state+0x3c/0xb4
> cpuidle_enter_state+0xb8/0x284
> cpuidle_enter+0x38/0x4c
> call_cpuidle+0x3c/0x68
> do_idle+0x194/0x260
> cpu_startup_entry+0x24/0x28
> secondary_start_kernel+0x150/0x15c
>
>
> There's no notifier that calls this. This is called by the rpm_callback
> logic. Perhaps that callback will require a call to rcu_irq_enter()
> before calling the callback.

Yea. Sorry, its extra confusing as the call stack there includes
patches who's equivalents are only now in -next (I myself managed to
confuse what was upstream vs in -next in this thread and suddenly
couldn't find the code I had described - apologies).

See:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/soc/qcom/rpmh-rsc.c?h=next-20200415#n795

> In any case, I think it is wrong that these callbacks are called
> without RCU watching. The _rcuidle() on that tracepoint should be
> removed, and we fix the code that gets there to ensure that RCU is
> enabled. I agree with Peter, that no module code should be executed
> without RCU watching.

For sanity sake, it seems like the rule should be we avoid driver code
executing without RCU watching. The fact of if it's a loadable module
or not is super subtle, and likely that more folks will trip over it.

But ok. I'll follow around to understand if the commit efde2659b0fe8
("drivers: qcom: rpmh-rsc: Use rcuidle tracepoints for rpmh") is
actually necessary and see what would be needed to revert it.

thanks
-john