Hello Linus,
Once the merge window opens, please pull the latest RCU git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git tags/rcu-next-v6.7
# HEAD: d97ae6474ca0411bb8c2696e5764ec946dba43d0:
Merge branches 'rcu/torture', 'rcu/fixes', 'rcu/docs', 'rcu/refscale', 'rcu/tasks' and 'rcu/stall' into rcu/next (2023-10-23 15:24:11 +0200)
----------------------------------------------------------------
RCU pull request for v6.7
This pull request contains the following branches:
rcu/torture: RCU torture, locktorture and generic torture infrastructure
updates that include various fixes, cleanups and consolidations.
Among the user visible things, ftrace dumps can now be found into
their own file, and module parameters get better documented and
reported on dumps.
rcu/fixes: Generic and misc fixes all over the place. Some highlights:
* Hotplug handling has seen some light cleanups and comments.
* An RCU barrier can now be triggered through sysfs to serialize
memory stress testing and avoid OOM.
* Object information is now dumped in case of invalid callback
invocation.
* Also various SRCU issues, too hard to trigger to deserve urgent
pull requests, have been fixed.
rcu/docs: RCU documentation updates
rcu/refscale: RCU reference scalability test minor fixes and doc
improvements.
rcu/tasks: RCU tasks minor fixes
rcu/stall: Stall detection updates. Introduce RCU CPU Stall notifiers
that allows a subsystem to provide informations to help debugging.
Also cure some false positive stalls.
----------------------------------------------------------------
Arnd Bergmann (1):
rcu: Include torture_sched_setaffinity() declaration
Catalin Marinas (1):
rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
Dan Carpenter (1):
locktorture: Check the correct variable for allocation failure
Denis Arefev (1):
srcu: Fix srcu_struct node grpmask overflow on 64-bit systems
Frederic Weisbecker (10):
rcu: Use rcu_segcblist_segempty() instead of open coding it
rcu: Assume IRQS disabled from rcu_report_dead()
rcu: Assume rcu_report_dead() is always called locally
rcu: Remove references to rcu_migrate_callbacks() from diagrams
rcu: Conditionally build CPU-hotplug teardown callbacks
rcu: Standardize explicit CPU-hotplug calls
rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP
srcu: Fix callbacks acceleration mishandling
srcu: Only accelerate on enqueue time
Merge branches 'rcu/torture', 'rcu/fixes', 'rcu/docs', 'rcu/refscale', 'rcu/tasks' and 'rcu/stall' into rcu/next
Jiapeng Chong (1):
rcu-tasks: Make rcu_tasks_lazy_ms static
Joel Fernandes (Google) (7):
rcu/tree: Defer setting of jiffies during stall reset
Revert "checkpatch: Error out if deprecated RCU API used"
srcu: Fix error handling in init_srcu_struct_fields()
rcu/tree: Remove superfluous return from void call_rcu* functions
rcutorture: Fix stuttering races and other issues
rcutorture: Copy out ftrace into its own console file
rcutorture: Replace schedule_timeout*() 1-jiffy waits with HZ/20
Matthew Wilcox (Oracle) (1):
rcu: Describe listRCU read-side guarantees
Paul E. McKenney (26):
rcu: Add RCU CPU stall notifier
rcutorture: Add test of RCU CPU stall notifiers
rcu-tasks: Add printk()s to localize boot-time self-test hang
rcu-tasks: Pull sampling of ->percpu_dequeue_lim out of loop
refscale: Fix misplaced data re-read
refscale: Print out additional module parameters
doc: Add refscale.lookup_instances to kernel-parameters.txt
rcu: Add sysfs to provide throttled access to rcu_barrier()
rcu: Eliminate rcu_gp_slow_unregister() false positive
torture: Share torture_random_state with torture_shuffle_tasks()
torture: Make kvm-recheck.sh use mktemp
torture: Make torture_hrtimeout_ns() take an hrtimer mode parameter
torture: Move rcutorture_sched_setaffinity() out of rcutorture
locktorture: Add readers_bind and writers_bind module parameters
rcutorture: Add CONFIG_DEBUG_OBJECTS to RCU Tasks testing
locktorture: Alphabetize torture_param() entries
locktorture: Consolidate "if" statements in lock_torture_writer()
locktorture: Add acq_writer_lim to complain about long acquistion times
torture: Print out torture module parameters
torture: Make torture.sh refscale testing qualify verbose_batched
locktorture: Add new module parameters to lock_torture_print_module_parms()
locktorture: Add call_rcu_chains module parameter
doc: Catch-up update for locktorture module parameters
locktorture: Rename readers_bind/writers_bind to bind_readers/bind_writers
torture: Add kvm.sh --debug-info argument
torture: Convert parse-console.sh to mktemp
Wei Zhang (1):
Documentation: RCU: Fix section numbers after adding Section 7 in whatisRCU.rst
Yue Haibing (1):
rcu: Remove unused function declaration rcu_eqs_special_set()
Zhen Lei (5):
rcu: Delete a redundant check in rcu_check_gp_kthread_starvation()
rcu: Don't redump the stalled CPU where RCU GP kthread last ran
rcu: Eliminate check_cpu_stall() duplicate code
mm: Remove kmem_valid_obj()
rcu: Dump memory object info if callback function is invalid
Zqiang (1):
rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle()
.../Expedited-Grace-Periods.rst | 2 +-
.../Memory-Ordering/TreeRCU-callback-registry.svg | 9 -
.../RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg | 4 +-
.../RCU/Design/Memory-Ordering/TreeRCU-gp.svg | 13 +-
.../RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg | 4 +-
.../RCU/Design/Requirements/Requirements.rst | 4 +-
Documentation/RCU/listRCU.rst | 9 +
Documentation/RCU/whatisRCU.rst | 4 +-
Documentation/admin-guide/kernel-parameters.txt | 70 +++++-
arch/arm64/kernel/smp.c | 4 +-
arch/powerpc/kernel/smp.c | 2 +-
arch/s390/kernel/smp.c | 2 +-
arch/x86/kernel/smpboot.c | 2 +-
include/linux/interrupt.h | 2 +-
include/linux/rcu_notifier.h | 32 +++
include/linux/rcupdate.h | 2 -
include/linux/rcutiny.h | 2 +-
include/linux/rcutree.h | 17 +-
include/linux/slab.h | 5 +-
include/linux/torture.h | 8 +-
kernel/cpu.c | 13 +-
kernel/locking/locktorture.c | 214 +++++++++++++-----
kernel/rcu/rcu.h | 17 +-
kernel/rcu/rcu_segcblist.c | 4 +-
kernel/rcu/rcutorture.c | 37 +++-
kernel/rcu/refscale.c | 6 +-
kernel/rcu/srcutiny.c | 1 +
kernel/rcu/srcutree.c | 74 +++++--
kernel/rcu/tasks.h | 11 +-
kernel/rcu/tiny.c | 1 +
kernel/rcu/tree.c | 242 +++++++++++++++------
kernel/rcu/tree.h | 4 +
kernel/rcu/tree_exp.h | 6 +-
kernel/rcu/tree_stall.h | 135 ++++++++----
kernel/rcu/update.c | 9 +-
kernel/torture.c | 75 +++----
mm/slab_common.c | 41 +---
mm/util.c | 4 +-
scripts/checkpatch.pl | 9 -
.../testing/selftests/rcutorture/bin/functions.sh | 29 +++
.../selftests/rcutorture/bin/kvm-recheck.sh | 2 +-
tools/testing/selftests/rcutorture/bin/kvm.sh | 17 +-
.../selftests/rcutorture/bin/parse-console.sh | 9 +-
tools/testing/selftests/rcutorture/bin/torture.sh | 2 +-
.../selftests/rcutorture/configs/rcu/TRACE02 | 1 +
45 files changed, 809 insertions(+), 351 deletions(-)
create mode 100644 include/linux/rcu_notifier.h
mode change 100644 => 100755 tools/testing/selftests/rcutorture/bin/functions.sh
On Fri, 27 Oct 2023 at 01:33, Frederic Weisbecker <[email protected]> wrote:
>
> rcu/stall: Stall detection updates. Introduce RCU CPU Stall notifiers
> that allows a subsystem to provide informations to help debugging.
> Also cure some false positive stalls.
I absolutely detest this stall notifier thing.
Putting the stall notifier before the stall message does not "help
debugging". Quite the reverse. It ends up being a lovely way to make
sure that the debug message is never printed, because there's some
entirely untested - and thus buggy - notifier on the chain before the
printout from the actual stall code.
I've pulled this, but I really want to voice my objection against
these kinds of "debugging aids". I have personally spent way too many
hours debugging a dead machine because some "debug aid" ended up being
untested garbage.
If you absolutely think that this is a worthy and useful thing to do,
then at the very least make sure that these "debug aids" will always
come *after* the core output, and can't make things horrendously
worse.
But in general, think twice before adding "maybe somebody else wants
to print debug info". Because unless you have a really really REALLY
good reason for it, it's more likely to hurt than to help.
Right now I see no users of this except for the rcu torture code, and
it certainly doesn't seem hugely important there. And so I'm wondering
what the actual real use-case would be.
Linus
The pull request you sent on Fri, 27 Oct 2023 13:33:15 +0200:
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git tags/rcu-next-v6.7
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/2656821f1f202d58224551b71eff41aafd1edf8b
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
On Mon, Oct 30, 2023 at 06:12:51PM -1000, Linus Torvalds wrote:
> On Fri, 27 Oct 2023 at 01:33, Frederic Weisbecker <[email protected]> wrote:
> >
> > rcu/stall: Stall detection updates. Introduce RCU CPU Stall notifiers
> > that allows a subsystem to provide informations to help debugging.
> > Also cure some false positive stalls.
>
> I absolutely detest this stall notifier thing.
>
> Putting the stall notifier before the stall message does not "help
> debugging". Quite the reverse. It ends up being a lovely way to make
> sure that the debug message is never printed, because there's some
> entirely untested - and thus buggy - notifier on the chain before the
> printout from the actual stall code.
>
> I've pulled this, but I really want to voice my objection against
> these kinds of "debugging aids". I have personally spent way too many
> hours debugging a dead machine because some "debug aid" ended up being
> untested garbage.
>
> If you absolutely think that this is a worthy and useful thing to do,
> then at the very least make sure that these "debug aids" will always
> come *after* the core output, and can't make things horrendously
> worse.
>
> But in general, think twice before adding "maybe somebody else wants
> to print debug info". Because unless you have a really really REALLY
> good reason for it, it's more likely to hurt than to help.
>
> Right now I see no users of this except for the rcu torture code, and
> it certainly doesn't seem hugely important there. And so I'm wondering
> what the actual real use-case would be.
I see, one possibility is to revert this and switch to normal calls
for any future debug information to add from another subsystem. I'll
wait for Paul's opinion...
Thanks.
On Tue, Oct 31, 2023 at 11:19:01AM +0100, Frederic Weisbecker wrote:
> On Mon, Oct 30, 2023 at 06:12:51PM -1000, Linus Torvalds wrote:
> > On Fri, 27 Oct 2023 at 01:33, Frederic Weisbecker <[email protected]> wrote:
> > >
> > > rcu/stall: Stall detection updates. Introduce RCU CPU Stall notifiers
> > > that allows a subsystem to provide informations to help debugging.
> > > Also cure some false positive stalls.
> >
> > I absolutely detest this stall notifier thing.
> >
> > Putting the stall notifier before the stall message does not "help
> > debugging". Quite the reverse. It ends up being a lovely way to make
> > sure that the debug message is never printed, because there's some
> > entirely untested - and thus buggy - notifier on the chain before the
> > printout from the actual stall code.
> >
> > I've pulled this, but I really want to voice my objection against
> > these kinds of "debugging aids". I have personally spent way too many
> > hours debugging a dead machine because some "debug aid" ended up being
> > untested garbage.
> >
> > If you absolutely think that this is a worthy and useful thing to do,
> > then at the very least make sure that these "debug aids" will always
> > come *after* the core output, and can't make things horrendously
> > worse.
> >
> > But in general, think twice before adding "maybe somebody else wants
> > to print debug info". Because unless you have a really really REALLY
> > good reason for it, it's more likely to hurt than to help.
> >
> > Right now I see no users of this except for the rcu torture code, and
> > it certainly doesn't seem hugely important there. And so I'm wondering
> > what the actual real use-case would be.
>
> I see, one possibility is to revert this and switch to normal calls
> for any future debug information to add from another subsystem. I'll
> wait for Paul's opinion...
The use case thus far is where the RCU CPU stall warning is due to
locks being spun for or held for excessive periods of times, and then
the called code prints out the relevant debug information. In this
particular case, the RCU CPU stall warning message is just added noise.
And if we were to print the RCU CPU stall warning first, we would
likely disturb the locking state, thus rendering the corresponding
debug information useless.
But I completely agree that a poorly planned use of this facility would
have all the problems that Linus has seen in the past.
Would it help if we make rcu_stall_chain_notifier_register() print a
suitably obnoxious message saying that future RCU CPU stall warnings
might be unreliable?
Thanx, Paul
On Tue, 31 Oct 2023 at 03:57, Paul E. McKenney <[email protected]> wrote:
>
> Would it help if we make rcu_stall_chain_notifier_register() print a
> suitably obnoxious message saying that future RCU CPU stall warnings
> might be unreliable?
It's not the future stall warnings I worry about.
It's literally things like somebody thinking they are being clever,
registering a rcu stall notifier that prints out extra information in
order to be helpful, and in the process takes a spinlock or something
without thinking about it.
And that spinlock might be the *reason* for the RCU stall in the first place.
So now the RCU stall code prints out NOTHING AT ALL, because now the
stall notifier itself has deadlocked.
This is *exactly* what has happened before with these kinds of
"helpful" exception case notifiers. Because they never trigger in
normal loads, they get basically zero testing - and then when bad
things happen, it turns out that the "helpful" debug code actually
just makes things worse.
Or, if they get testing, they get tested in artificial bad cases (eg
"let's just write a busy loop that hangs for 30 seconds to trigger a
RCU stall"), which doesn't show any of the issues, because they aren't
real bugs with real existing deadlocks.
See what I'm saying? Having notifiers for "sh*t happened" is
fundmanetally questionable to begin with, because they get no testing.
And then calling said notifiers *before* you even have the core
printout for "Look, things are going down hill quickly", now you've
turned a bad situation even worse.
I really think that we should *never* have any kind of notifiers for
kernel bugs. They cause problems. The *one* exception is an actual
honest-to-goodness kernel debugger, and then it should literally
*only* be the debugger that can register a notifier, so that you are
*never* in the situation that a kernel without a debugger will just
hang because of some bogus debug notifier.
Linus
On Tue, Oct 31, 2023 at 01:06:44PM -1000, Linus Torvalds wrote:
> On Tue, 31 Oct 2023 at 03:57, Paul E. McKenney <[email protected]> wrote:
> >
> > Would it help if we make rcu_stall_chain_notifier_register() print a
> > suitably obnoxious message saying that future RCU CPU stall warnings
> > might be unreliable?
>
> It's not the future stall warnings I worry about.
>
> It's literally things like somebody thinking they are being clever,
> registering a rcu stall notifier that prints out extra information in
> order to be helpful, and in the process takes a spinlock or something
> without thinking about it.
>
> And that spinlock might be the *reason* for the RCU stall in the first place.
>
> So now the RCU stall code prints out NOTHING AT ALL, because now the
> stall notifier itself has deadlocked.
>
> This is *exactly* what has happened before with these kinds of
> "helpful" exception case notifiers. Because they never trigger in
> normal loads, they get basically zero testing - and then when bad
> things happen, it turns out that the "helpful" debug code actually
> just makes things worse.
>
> Or, if they get testing, they get tested in artificial bad cases (eg
> "let's just write a busy loop that hangs for 30 seconds to trigger a
> RCU stall"), which doesn't show any of the issues, because they aren't
> real bugs with real existing deadlocks.
>
> See what I'm saying? Having notifiers for "sh*t happened" is
> fundmanetally questionable to begin with, because they get no testing.
>
> And then calling said notifiers *before* you even have the core
> printout for "Look, things are going down hill quickly", now you've
> turned a bad situation even worse.
>
> I really think that we should *never* have any kind of notifiers for
> kernel bugs. They cause problems. The *one* exception is an actual
> honest-to-goodness kernel debugger, and then it should literally
> *only* be the debugger that can register a notifier, so that you are
> *never* in the situation that a kernel without a debugger will just
> hang because of some bogus debug notifier.
All fair points.
Here are the ways forward I can see:
1. Status quo. This has all the issues that you call out.
People will hurt themselves with it and consume time and effort.
So let's not do this.
2. I send you a pure revert. Those of us who need this keep the
patches around and apply them when we need them. This avoids
the problems you point out, but makes it harder to use this
where it is needed and useful.
3. Add a default-n Kconfig option that depends on RCU_EXPERT
and KEBUG_KERNEL, so that these problems can only arise in
specially built kernels.
4. Same as #3, but use a kernel boot parameter instead of a
Kconfig option.
5. One of the above other than #2, but complaining (maybe a WARN_ON()
or maybe just a printk() at rcu_stall_chain_notifier_register()
time, but before the call to atomic_notifier_chain_register().
This would mean that the complaint ("hey, you are asking for
something that might be dangerous") appears before any RCU CPU
stall warning that could possibly trigger a notifier.
Are there any other ways forward? Either way, which would you prefer?
Thanx, Paul
On Tue, 31 Oct 2023 at 15:08, Paul E. McKenney <[email protected]> wrote:
>
> Here are the ways forward I can see:
>
> 1. Status quo. This has all the issues that you call out.
> People will hurt themselves with it and consume time and effort.
> So let's not do this.
Well, at a *minimum*, I really want that notifier chain call to be
done *after* the core printk's.
That way, if it deadlocks or does something else stupid, at least the
core printouts make it out.
IOW, I think the notifier should be done perhaps just before the
"panic_on_rcu_stall()" call, not at the top before you've even
reported any stall conditions at all.
And yes, I think the trace_rcu_stall_warning() might be better off
later too, but at least trace events are things that get regular
testing in nasty conditions (including NMI etc), so I'm *much* less
worried about those than about "random developers who think they know
what they do and add a notifier".
And yes, I do think the notifier should be narrowed down a lot, if you
actually want to keep it.
I did not actually hear you say that there is a good use-case for it.
I only saw you say "Those of us who need this", without showing *any*
kind of indication of why anybody would use it in reality.
Why the secrecy? There is certainly no current user, nor any
description of what a user would be and what makes that notifier
useful.
The commit message also just says "It is sometimes helpful" and some
strange reference to "the subsystem causing the stall to dump its
state". It all sounds very fishy. Why would anybody ever have a known
subsystem causing RCU stalls? Except, of course, for the rcutorture
testing.
Anyway, that all absolutely SCREAMS to me "this is not something
useful in any normal kernel", and so yes:
> 3. Add a default-n Kconfig option that depends on RCU_EXPERT
> and KEBUG_KERNEL, so that these problems can only arise in
> specially built kernels.
>
> 4. Same as #3, but use a kernel boot parameter instead of a
> Kconfig option.
let's make it clear that this is *not* something that any upstream
kernel would ever do, and the *only* possible use for it is some kind
of external temporary debug patch.
See why I so hate things like this? Let's head off any crazy use long
*long* before somebody decides that "Oh, I want to use this".
Linus
On Tue, Oct 31, 2023 at 06:07:57PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 31, 2023 at 01:06:44PM -1000, Linus Torvalds wrote:
[ . . . ]
> > I really think that we should *never* have any kind of notifiers for
> > kernel bugs. They cause problems. The *one* exception is an actual
> > honest-to-goodness kernel debugger, and then it should literally
> > *only* be the debugger that can register a notifier, so that you are
> > *never* in the situation that a kernel without a debugger will just
> > hang because of some bogus debug notifier.
Here you might have been suggesting that I use gdb and just set a
breakpoint in check_cpu_stall(), and then use gdb commands to read out
the state. And yes, this work well in some situations. In fact, there
is a --gdb parameter to the rcutorture scripting for just this purpose.
Except that I normally run a few hundred rcutorture guest OSes spread
across 20 systems, and sometimes more than a thousand guest OSes across
50 systems for hard-to-reproduce bugs. In my experience, managing that
many remote gdb sessions is cranky and unreliable, which is not helpful
when debugging. Writing a few tens of lines of C code in the kernel is
much simpler and more reliable.
Assuming of course that I avoid the traps you point out. Which I have
done thus far. (Famous last words...)
Thanx, Paul
On Wed, Nov 01, 2023 at 07:11:54AM -1000, Linus Torvalds wrote:
> On Tue, 31 Oct 2023 at 15:08, Paul E. McKenney <[email protected]> wrote:
> >
> > Here are the ways forward I can see:
> >
> > 1. Status quo. This has all the issues that you call out.
> > People will hurt themselves with it and consume time and effort.
> > So let's not do this.
>
> Well, at a *minimum*, I really want that notifier chain call to be
> done *after* the core printk's.
>
> That way, if it deadlocks or does something else stupid, at least the
> core printouts make it out.
>
> IOW, I think the notifier should be done perhaps just before the
> "panic_on_rcu_stall()" call, not at the top before you've even
> reported any stall conditions at all.
Understood. But my problem is that the core printk()s destroy the state
that the notifier is trying to output.
> And yes, I think the trace_rcu_stall_warning() might be better off
> later too, but at least trace events are things that get regular
> testing in nasty conditions (including NMI etc), so I'm *much* less
> worried about those than about "random developers who think they know
> what they do and add a notifier".
Agreed, this is a special debug facility, not something that anyone
should use in production. And also not something that should be used
where gdb would do the job.
> And yes, I do think the notifier should be narrowed down a lot, if you
> actually want to keep it.
Understood, thus a new default-disabled Kconfig option that depends on
RCU_EXPERT and DEBUG_KERNEL, along with a default-disabled kernel
boot parameter, both of which have to be selected to make anything
happen.
> I did not actually hear you say that there is a good use-case for it.
> I only saw you say "Those of us who need this", without showing *any*
> kind of indication of why anybody would use it in reality.
>
> Why the secrecy? There is certainly no current user, nor any
> description of what a user would be and what makes that notifier
> useful.
>
> The commit message also just says "It is sometimes helpful" and some
> strange reference to "the subsystem causing the stall to dump its
> state". It all sounds very fishy. Why would anybody ever have a known
> subsystem causing RCU stalls? Except, of course, for the rcutorture
> testing.
One use case is dumping out the qspinlock state for an extremely
rare lockup. If you even look at the system cross-eyed, the lockup
goes away. And yes, I should have mentioned this in the commit
log, and I apologize for having failed to do so. I do not expect
that the state-dump code would ever be appropriate for mainline.
> Anyway, that all absolutely SCREAMS to me "this is not something
> useful in any normal kernel", and so yes:
Agreed, definitely not for any normal kernel!
> > 3. Add a default-n Kconfig option that depends on RCU_EXPERT
> > and KEBUG_KERNEL, so that these problems can only arise in
> > specially built kernels.
> >
> > 4. Same as #3, but use a kernel boot parameter instead of a
> > Kconfig option.
>
> let's make it clear that this is *not* something that any upstream
> kernel would ever do, and the *only* possible use for it is some kind
> of external temporary debug patch.
>
> See why I so hate things like this? Let's head off any crazy use long
> *long* before somebody decides that "Oh, I want to use this".
You are absolutely right, a debug tool with this many sharp edges should
definitely not be default-enabled. And needs some scary words in the
Kconfig help text. And a boot-time splat to make people think twice
before using it.
Apologies for not having thought this through!
I will send a fixup patch before the end of today.
Thanx, Paul
On Wed, Nov 01, 2023 at 10:40:14AM -0700, Paul E. McKenney wrote:
> On Wed, Nov 01, 2023 at 07:11:54AM -1000, Linus Torvalds wrote:
> > On Tue, 31 Oct 2023 at 15:08, Paul E. McKenney <[email protected]> wrote:
> > >
> > > Here are the ways forward I can see:
> > >
> > > 1. Status quo. This has all the issues that you call out.
> > > People will hurt themselves with it and consume time and effort.
> > > So let's not do this.
> >
> > Well, at a *minimum*, I really want that notifier chain call to be
> > done *after* the core printk's.
> >
> > That way, if it deadlocks or does something else stupid, at least the
> > core printouts make it out.
> >
> > IOW, I think the notifier should be done perhaps just before the
> > "panic_on_rcu_stall()" call, not at the top before you've even
> > reported any stall conditions at all.
>
> Understood. But my problem is that the core printk()s destroy the state
> that the notifier is trying to output.
>
> > And yes, I think the trace_rcu_stall_warning() might be better off
> > later too, but at least trace events are things that get regular
> > testing in nasty conditions (including NMI etc), so I'm *much* less
> > worried about those than about "random developers who think they know
> > what they do and add a notifier".
>
> Agreed, this is a special debug facility, not something that anyone
> should use in production. And also not something that should be used
> where gdb would do the job.
>
> > And yes, I do think the notifier should be narrowed down a lot, if you
> > actually want to keep it.
>
> Understood, thus a new default-disabled Kconfig option that depends on
> RCU_EXPERT and DEBUG_KERNEL, along with a default-disabled kernel
> boot parameter, both of which have to be selected to make anything
> happen.
>
> > I did not actually hear you say that there is a good use-case for it.
> > I only saw you say "Those of us who need this", without showing *any*
> > kind of indication of why anybody would use it in reality.
> >
> > Why the secrecy? There is certainly no current user, nor any
> > description of what a user would be and what makes that notifier
> > useful.
> >
> > The commit message also just says "It is sometimes helpful" and some
> > strange reference to "the subsystem causing the stall to dump its
> > state". It all sounds very fishy. Why would anybody ever have a known
> > subsystem causing RCU stalls? Except, of course, for the rcutorture
> > testing.
>
> One use case is dumping out the qspinlock state for an extremely
> rare lockup. If you even look at the system cross-eyed, the lockup
> goes away. And yes, I should have mentioned this in the commit
> log, and I apologize for having failed to do so. I do not expect
> that the state-dump code would ever be appropriate for mainline.
>
> > Anyway, that all absolutely SCREAMS to me "this is not something
> > useful in any normal kernel", and so yes:
>
> Agreed, definitely not for any normal kernel!
>
> > > 3. Add a default-n Kconfig option that depends on RCU_EXPERT
> > > and KEBUG_KERNEL, so that these problems can only arise in
> > > specially built kernels.
> > >
> > > 4. Same as #3, but use a kernel boot parameter instead of a
> > > Kconfig option.
> >
> > let's make it clear that this is *not* something that any upstream
> > kernel would ever do, and the *only* possible use for it is some kind
> > of external temporary debug patch.
> >
> > See why I so hate things like this? Let's head off any crazy use long
> > *long* before somebody decides that "Oh, I want to use this".
>
> You are absolutely right, a debug tool with this many sharp edges should
> definitely not be default-enabled. And needs some scary words in the
> Kconfig help text. And a boot-time splat to make people think twice
> before using it.
>
> Apologies for not having thought this through!
>
> I will send a fixup patch before the end of today.
And here is a lightly tested first cut.
This still invokes the notifier before the stall warning prints.
If you are absolutely against that approach, then we should just revert.
In that case, there would be no useful intersection between "acceptable in
mainline" and "useful in testing involving persistent transient states".
Hey, worse things could happen. ;-)
With that, please see below.
Oh, and the rcutorture changes are needed to avoid an unsolicited
splat during the usual rcutorture testing.
Thoughts?
Thanx, Paul
------------------------------------------------------------------------
commit fd4a4cfd0c085c70c76f9b667a001d05462983d7
Author: Paul E. McKenney <[email protected]>
Date: Wed Nov 1 18:28:38 2023 -0700
rcu: Restrict access to RCU CPU stall notifiers
Although the RCU CPU stall notifiers can be useful for dumping state when
tracking down delicate forward-progress bugs where NUMA effects cause
cache lines to be delivered to a given CPU regularly, but always in a
state that prevents that CPU from making forward progress. These bugs can
be detected by the RCU CPU stall-warning mechanism, but in some cases,
the stall-warnings printk()s disrupt the forward-progress bug before
any useful state can be obtained.
Unfortunately, the notifier mechanism added by 5b404fdabacf ("rcu: Add RCU
CPU stall notifier") can make matters worse if used at all carelessly.
For example, if the stall warning was caused by a lock not being
released, then any attempt to acquire that lock in the notifier will hang.
This will prevent not only the notifier from producing any useful output,
but it will also prevent the stall-warning message from ever appearing.
This commit therefore hides this new RCU CPU stall notifier
mechanism under a new RCU_CPU_STALL_NOTIFIER Kconfig option that
depends on both DEBUG_KERNEL and RCU_EXPERT. In addition, the
rcupdate.rcu_cpu_stall_notifiers=1 kernel boot parameter must also
be specified. The RCU_CPU_STALL_NOTIFIER Kconfig option's help text
contains a warning and explains the dangers of careless use, recommending
lockless notifier code. In addition, a WARN() is triggered each time
that an attempt is made to register a stall-warning notifier in kernels
built with CONFIG_RCU_CPU_STALL_NOTIFIER=y.
This combination of measures will keep use of this mechanism confined to
debug kernels and away from routine deployments.
Fixes: 5b404fdabacf ("rcu: Add RCU CPU stall notifier")
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index dca212b6adfbb..fbf249ca7b229 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5289,6 +5289,12 @@
Dump ftrace buffer after reporting RCU CPU
stall warning.
+ rcupdate.rcu_cpu_stall_notifiers= [KNL]
+ Provide RCU CPU stall notifiers, but see the
+ warnings in the RCU_CPU_STALL_NOTIFIER Kconfig
+ option's help text. TL;DR: You almost certainly
+ do not want rcupdate.rcu_cpu_stall_notifiers.
+
rcupdate.rcu_cpu_stall_suppress= [KNL]
Suppress RCU CPU stall warning messages.
diff --git a/include/linux/rcu_notifier.h b/include/linux/rcu_notifier.h
index ebf371364581d..5640f024773b3 100644
--- a/include/linux/rcu_notifier.h
+++ b/include/linux/rcu_notifier.h
@@ -13,7 +13,7 @@
#define RCU_STALL_NOTIFY_NORM 1
#define RCU_STALL_NOTIFY_EXP 2
-#ifdef CONFIG_RCU_STALL_COMMON
+#if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER)
#include <linux/notifier.h>
#include <linux/types.h>
@@ -21,12 +21,12 @@
int rcu_stall_chain_notifier_register(struct notifier_block *n);
int rcu_stall_chain_notifier_unregister(struct notifier_block *n);
-#else // #ifdef CONFIG_RCU_STALL_COMMON
+#else // #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER)
// No RCU CPU stall warnings in Tiny RCU.
static inline int rcu_stall_chain_notifier_register(struct notifier_block *n) { return -EEXIST; }
static inline int rcu_stall_chain_notifier_unregister(struct notifier_block *n) { return -ENOENT; }
-#endif // #else // #ifdef CONFIG_RCU_STALL_COMMON
+#endif // #else // #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER)
#endif /* __LINUX_RCU_NOTIFIER_H */
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 2984de629f749..9b0b52e1836fa 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -105,6 +105,31 @@ config RCU_CPU_STALL_CPUTIME
The boot option rcupdate.rcu_cpu_stall_cputime has the same function
as this one, but will override this if it exists.
+config RCU_CPU_STALL_NOTIFIER
+ bool "Provide RCU CPU-stall notifiers"
+ depends on RCU_STALL_COMMON
+ depends on DEBUG_KERNEL
+ depends on RCU_EXPERT
+ default n
+ help
+ WARNING: You almost certainly do not want this!!!
+
+ Enable RCU CPU-stall notifiers, which are invoked just before
+ printing the RCU CPU stall warning. As such, bugs in notifier
+ callbacks can prevent stall warnings from being printed.
+ And the whole reason that a stall warning is being printed is
+ that something is hung up somewhere. Therefore, the notifier
+ callbacks must be written extremely carefully, preferably
+ containing only lockless code. After all, it is quite possible
+ that the whole reason that the RCU CPU stall is happening in
+ the first place is that someone forgot to release whatever lock
+ that you are thinking of acquiring. In which case, having your
+ notifier callback acquire that lock will hang, preventing the
+ RCU CPU stall warning from appearing.
+
+ Say Y here if you want RCU CPU stall notifiers (you don't want them)
+ Say N if you are unsure.
+
config RCU_TRACE
bool "Enable tracing for RCU"
depends on DEBUG_KERNEL
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index b531c33e9545b..3a2f8c5b60318 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -265,6 +265,7 @@ static inline bool rcu_stall_is_suppressed_at_boot(void)
#ifdef CONFIG_RCU_STALL_COMMON
extern int rcu_cpu_stall_ftrace_dump;
+extern int rcu_cpu_stall_notifiers;
extern int rcu_cpu_stall_suppress;
extern int rcu_cpu_stall_timeout;
extern int rcu_exp_cpu_stall_timeout;
@@ -659,10 +660,10 @@ static inline bool rcu_cpu_beenfullyonline(int cpu) { return true; }
bool rcu_cpu_beenfullyonline(int cpu);
#endif
-#ifdef CONFIG_RCU_STALL_COMMON
+#if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER)
int rcu_stall_notifier_call_chain(unsigned long val, void *v);
-#else // #ifdef CONFIG_RCU_STALL_COMMON
+#else // #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER)
static inline int rcu_stall_notifier_call_chain(unsigned long val, void *v) { return NOTIFY_DONE; }
-#endif // #else // #ifdef CONFIG_RCU_STALL_COMMON
+#endif // #else // #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER)
#endif /* __LINUX_RCU_H */
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 30fc9d34e3297..f486ce7a26017 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2450,10 +2450,12 @@ static int rcu_torture_stall(void *args)
unsigned long stop_at;
VERBOSE_TOROUT_STRING("rcu_torture_stall task started");
- ret = rcu_stall_chain_notifier_register(&rcu_torture_stall_block);
- if (ret)
- pr_info("%s: rcu_stall_chain_notifier_register() returned %d, %sexpected.\n",
- __func__, ret, !IS_ENABLED(CONFIG_RCU_STALL_COMMON) ? "un" : "");
+ if (rcu_cpu_stall_notifiers) {
+ ret = rcu_stall_chain_notifier_register(&rcu_torture_stall_block);
+ if (ret)
+ pr_info("%s: rcu_stall_chain_notifier_register() returned %d, %sexpected.\n",
+ __func__, ret, !IS_ENABLED(CONFIG_RCU_STALL_COMMON) ? "un" : "");
+ }
if (stall_cpu_holdoff > 0) {
VERBOSE_TOROUT_STRING("rcu_torture_stall begin holdoff");
schedule_timeout_interruptible(stall_cpu_holdoff * HZ);
@@ -2498,9 +2500,11 @@ static int rcu_torture_stall(void *args)
}
pr_alert("%s end.\n", __func__);
if (!ret) {
- ret = rcu_stall_chain_notifier_unregister(&rcu_torture_stall_block);
- if (ret)
- pr_info("%s: rcu_stall_chain_notifier_unregister() returned %d.\n", __func__, ret);
+ if (rcu_cpu_stall_notifiers) {
+ ret = rcu_stall_chain_notifier_unregister(&rcu_torture_stall_block);
+ if (ret)
+ pr_info("%s: rcu_stall_chain_notifier_unregister() returned %d.\n", __func__, ret);
+ }
}
torture_shutdown_absorb("rcu_torture_stall");
while (!kthread_should_stop())
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index ac8e86babe449..5d666428546b0 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -1061,6 +1061,7 @@ static int __init rcu_sysrq_init(void)
}
early_initcall(rcu_sysrq_init);
+#ifdef CONFIG_RCU_CPU_STALL_NOTIFIER
//////////////////////////////////////////////////////////////////////////////
//
@@ -1081,7 +1082,13 @@ static ATOMIC_NOTIFIER_HEAD(rcu_cpu_stall_notifier_list);
*/
int rcu_stall_chain_notifier_register(struct notifier_block *n)
{
- return atomic_notifier_chain_register(&rcu_cpu_stall_notifier_list, n);
+ int rcsn = rcu_cpu_stall_notifiers;
+
+ WARN(1, "Adding %pS() to RCU stall notifier list (%s).\n", n->notifier_call,
+ rcsn ? "possibly suppressing RCU CPU stall warnings" : "failed, so all is well");
+ if (rcsn)
+ return atomic_notifier_chain_register(&rcu_cpu_stall_notifier_list, n);
+ return -EEXIST;
}
EXPORT_SYMBOL_GPL(rcu_stall_chain_notifier_register);
@@ -1115,3 +1122,5 @@ int rcu_stall_notifier_call_chain(unsigned long val, void *v)
{
return atomic_notifier_call_chain(&rcu_cpu_stall_notifier_list, val, v);
}
+
+#endif // #ifdef CONFIG_RCU_CPU_STALL_NOTIFIER
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index c534d6806d3d5..95d6738d70f63 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -541,6 +541,11 @@ EXPORT_SYMBOL_GPL(torture_sched_setaffinity);
#ifdef CONFIG_RCU_STALL_COMMON
int rcu_cpu_stall_ftrace_dump __read_mostly;
module_param(rcu_cpu_stall_ftrace_dump, int, 0644);
+int rcu_cpu_stall_notifiers __read_mostly; // !0 = provide stall notifiers (rarely useful)
+EXPORT_SYMBOL_GPL(rcu_cpu_stall_notifiers);
+#ifdef CONFIG_RCU_CPU_STALL_NOTIFIER
+module_param(rcu_cpu_stall_notifiers, int, 0444);
+#endif // #ifdef CONFIG_RCU_CPU_STALL_NOTIFIER
int rcu_cpu_stall_suppress __read_mostly; // !0 = suppress stall warnings.
EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress);
module_param(rcu_cpu_stall_suppress, int, 0644);