2022-10-20 00:42:04

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing: Add trace_trigger kernel command line option

On Wed, 19 Oct 2022 20:01:37 -0400
Steven Rostedt <[email protected]> wrote:

> @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
> #ifdef CONFIG_TRACEPOINTS
> static inline void tracepoint_synchronize_unregister(void)
> {
> + /* Early updates do not need synchronization */
> + if (early_boot_irqs_disabled)
> + return;
> +
> synchronize_srcu(&tracepoint_srcu);
> synchronize_rcu();

I wonder if this check should be just moved to the RCU synchronization
code? That is, if early_boot_irqs_disabled is set, do nothing, as there
should be nothing to synchronize against.

It took me a bit to figure out why I was constantly getting:

[ 1.072013] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 1.078888] ------------[ cut here ]------------
[ 1.083422] Interrupts were enabled early
[ 1.087399] WARNING: CPU: 0 PID: 0 at init/main.c:1061 start_kernel+0x36f/0x4e8
[ 1.094656] Modules linked in:
[ 1.097683] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc1-test+ #461
[ 1.104509] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
[ 1.113411] RIP: 0010:start_kernel+0x36f/0x4e8
[ 1.117817] Code: 48 27 5f e8 fd 7b 03 00 e8 55 72 e5 fe e8 6b 05 03 00 ff 15 2a f9 89 ff 0f ba e0 09 73 0e 48 c7 c7 98 26 31 a0 e8 01 5b df fe <0f> 0b c6 05 c7 2c af ff 00 e8 da 36 47 fe ff 15 14 f9 89 ff e8 56
[ 1.136485] RSP: 0000:ffffffffa0603f30 EFLAGS: 00010286
[ 1.141670] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1.148754] RDX: 0000000000000002 RSI: 00000000ffffdfff RDI: 0000000000000001
[ 1.155840] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffa0603dd8
[ 1.162926] R10: 0000000000000003 R11: ffffffffa0734168 R12: 000000000000180a
[ 1.170011] R13: 00000000010fecf0 R14: 0000000000000000 R15: 00000000d44e3018
[ 1.177097] FS: 0000000000000000(0000) GS:ffff91909aa00000(0000) knlGS:0000000000000000
[ 1.185131] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.190835] CR2: ffff91909edff000 CR3: 000000007760a001 CR4: 00000000000606f0
[ 1.197924] Call Trace:
[ 1.200345] <TASK>


It was a bit of whack-a-mole in finding all the places that were calling
synchronize RCU.

-- Steve


2022-10-20 16:06:01

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] tracing: Add trace_trigger kernel command line option

On Wed, Oct 19, 2022 at 08:07:45PM -0400, Steven Rostedt wrote:
> On Wed, 19 Oct 2022 20:01:37 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
> > #ifdef CONFIG_TRACEPOINTS
> > static inline void tracepoint_synchronize_unregister(void)
> > {
> > + /* Early updates do not need synchronization */
> > + if (early_boot_irqs_disabled)
> > + return;
> > +
> > synchronize_srcu(&tracepoint_srcu);
> > synchronize_rcu();
>
> I wonder if this check should be just moved to the RCU synchronization
> code? That is, if early_boot_irqs_disabled is set, do nothing, as there
> should be nothing to synchronize against.

There already is a similar check, but it follows the lockdep check.

Does the following patch help?

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index ca4b5dcec675b..356ef70d5442c 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1267,14 +1267,15 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
{
struct rcu_synchronize rcu;

+ if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
+ return;
+
RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section");

- if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
- return;
might_sleep();
check_init_srcu_struct(ssp);
init_completion(&rcu.completion);

2022-10-20 16:36:50

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing: Add trace_trigger kernel command line option

On Thu, 20 Oct 2022 08:59:00 -0700
"Paul E. McKenney" <[email protected]> wrote:

> On Wed, Oct 19, 2022 at 08:07:45PM -0400, Steven Rostedt wrote:
> > On Wed, 19 Oct 2022 20:01:37 -0400
> > Steven Rostedt <[email protected]> wrote:
> >
> > > @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
> > > #ifdef CONFIG_TRACEPOINTS
> > > static inline void tracepoint_synchronize_unregister(void)
> > > {
> > > + /* Early updates do not need synchronization */
> > > + if (early_boot_irqs_disabled)
> > > + return;
> > > +
> > > synchronize_srcu(&tracepoint_srcu);
> > > synchronize_rcu();
> >
> > I wonder if this check should be just moved to the RCU synchronization
> > code? That is, if early_boot_irqs_disabled is set, do nothing, as there
> > should be nothing to synchronize against.
>
> There already is a similar check, but it follows the lockdep check.
>
> Does the following patch help?

Not sure if it would.

I added this:


diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c03fd7037add..79ac31a6a87b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2748,9 +2748,12 @@ void trace_buffered_event_disable(void)
disable_trace_buffered_event, NULL, 1);
preempt_enable();

+ if (!irqs_disabled())
+ printk("IRQS DISABLED!! before %s:%d\n", __func__, __LINE__);
/* Wait for all current users to finish */
- if (!early_boot_irqs_disabled)
- synchronize_rcu();
+ synchronize_rcu();
+ if (!irqs_disabled())
+ printk("IRQS DISABLED!! after %s:%d\n", __func__, __LINE__);

for_each_tracing_cpu(cpu) {
free_page((unsigned long)per_cpu(trace_buffered_event, cpu));

Which produced this:


[ 0.972867] ftrace: allocating 47021 entries in 184 pages
[ 0.978285] ftrace section at ffffffffacef74c0 sorted properly
[ 0.991153] ftrace: allocated 184 pages with 4 groups
[ 0.996163] Dynamic Preempt: full
[ 0.999354] rcu: Preemptible hierarchical RCU implementation.
[ 1.005020] rcu: RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=8.
[ 1.011680] Tasks-RCU CPU stall warnings timeout set to 120000 (rcu_task_stall_timeout).
[ 1.019813] Trampoline variant of Tasks RCU enabled.
[ 1.024831] Rude variant of Tasks RCU enabled.
[ 1.029330] Tracing variant of Tasks RCU enabled.
[ 1.034089] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
[ 1.041699] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[ 1.052607] IRQS DISABLED!! after trace_buffered_event_disable:2756
[ 1.058712] NR_IRQS: 8448, nr_irqs: 488, preallocated irqs: 16
[ 1.064670] NO_HZ: Full dynticks CPUs: 3.
[ 1.068501] rcu: Offload RCU callbacks from CPUs: 3.
[ 1.073519] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 1.080400] ------------[ cut here ]------------
[ 1.084938] Interrupts were enabled early
[ 1.088919] WARNING: CPU: 0 PID: 0 at init/main.c:1061 start_kernel+0x36f/0x4e8
[ 1.096184] Modules linked in:
[ 1.099213] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc1-test+ #468
[ 1.106047] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
[ 1.114960] RIP: 0010:start_kernel+0x36f/0x4e8
[ 1.119373] Code: 48 27 53 e8 fd 7b 03 00 e8 55 72 e5 fe e8 6b 05 03 00 ff 15 2a f9 89 ff 0f ba e0 09 73 0e 48 c7 c7 98 25 31 ac e8 01 5b df fe <0f> 0b c6 05 c7 2c af ff 00 e8 ba 36 47 fe ff 15 14 f9 89 ff e8 56
[ 1.138057] RSP: 0000:ffffffffac603f30 EFLAGS: 00010286
[ 1.143248] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1.150343] RDX: 0000000000000002 RSI: 00000000ffffdfff RDI: 0000000000000001
[ 1.157438] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffac603dd8
[ 1.164537] R10: 0000000000000003 R11: ffffffffac734168 R12: 000000000000180a
[ 1.171633] R13: 00000000010fecf0 R14: 0000000000000000 R15: 00000000d44e3018
[ 1.178730] FS: 0000000000000000(0000) GS:ffff93ee1aa00000(0000) knlGS:0000000000000000
[ 1.186778] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.192489] CR2: ffff93ee1edff000 CR3: 000000002e60a001 CR4: 00000000000606f0
[ 1.199584] Call Trace:
[ 1.202009] <TASK>


So I'm not sure how just updating srcu will do anything to prevent this.

-- Steve

>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index ca4b5dcec675b..356ef70d5442c 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -1267,14 +1267,15 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> {
> struct rcu_synchronize rcu;
>
> + if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> + return;
> +
> RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> lock_is_held(&rcu_bh_lock_map) ||
> lock_is_held(&rcu_lock_map) ||
> lock_is_held(&rcu_sched_lock_map),
> "Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section");
>
> - if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> - return;
> might_sleep();
> check_init_srcu_struct(ssp);
> init_completion(&rcu.completion);

2022-10-20 17:13:59

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] tracing: Add trace_trigger kernel command line option



> On Oct 20, 2022, at 12:33 PM, Steven Rostedt <[email protected]> wrote:
>
> On Thu, 20 Oct 2022 08:59:00 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
>>> On Wed, Oct 19, 2022 at 08:07:45PM -0400, Steven Rostedt wrote:
>>> On Wed, 19 Oct 2022 20:01:37 -0400
>>> Steven Rostedt <[email protected]> wrote:
>>>
>>>> @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
>>>> #ifdef CONFIG_TRACEPOINTS
>>>> static inline void tracepoint_synchronize_unregister(void)
>>>> {
>>>> + /* Early updates do not need synchronization */
>>>> + if (early_boot_irqs_disabled)
>>>> + return;
>>>> +
>>>> synchronize_srcu(&tracepoint_srcu);
>>>> synchronize_rcu();
>>>
>>> I wonder if this check should be just moved to the RCU synchronization
>>> code? That is, if early_boot_irqs_disabled is set, do nothing, as there
>>> should be nothing to synchronize against.
>>
>> There already is a similar check, but it follows the lockdep check.
>>
>> Does the following patch help?
>
> Not sure if it would.
>
> I added this:

I’m not working today, so I can’t look very closely. But could you remove
the might_sleep() from rcu_gp_is_blocking() and see if it fixes your issue?

I think that function call might be incompatible with early boot. Just a hunch.

- Joel


> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index c03fd7037add..79ac31a6a87b 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2748,9 +2748,12 @@ void trace_buffered_event_disable(void)
> disable_trace_buffered_event, NULL, 1);
> preempt_enable();
>
> + if (!irqs_disabled())
> + printk("IRQS DISABLED!! before %s:%d\n", __func__, __LINE__);
> /* Wait for all current users to finish */
> - if (!early_boot_irqs_disabled)
> - synchronize_rcu();
> + synchronize_rcu();
> + if (!irqs_disabled())
> + printk("IRQS DISABLED!! after %s:%d\n", __func__, __LINE__);
>
> for_each_tracing_cpu(cpu) {
> free_page((unsigned long)per_cpu(trace_buffered_event, cpu));
>
> Which produced this:
>
>
> [ 0.972867] ftrace: allocating 47021 entries in 184 pages
> [ 0.978285] ftrace section at ffffffffacef74c0 sorted properly
> [ 0.991153] ftrace: allocated 184 pages with 4 groups
> [ 0.996163] Dynamic Preempt: full
> [ 0.999354] rcu: Preemptible hierarchical RCU implementation.
> [ 1.005020] rcu: RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=8.
> [ 1.011680] Tasks-RCU CPU stall warnings timeout set to 120000 (rcu_task_stall_timeout).
> [ 1.019813] Trampoline variant of Tasks RCU enabled.
> [ 1.024831] Rude variant of Tasks RCU enabled.
> [ 1.029330] Tracing variant of Tasks RCU enabled.
> [ 1.034089] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
> [ 1.041699] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
> [ 1.052607] IRQS DISABLED!! after trace_buffered_event_disable:2756
> [ 1.058712] NR_IRQS: 8448, nr_irqs: 488, preallocated irqs: 16
> [ 1.064670] NO_HZ: Full dynticks CPUs: 3.
> [ 1.068501] rcu: Offload RCU callbacks from CPUs: 3.
> [ 1.073519] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [ 1.080400] ------------[ cut here ]------------
> [ 1.084938] Interrupts were enabled early
> [ 1.088919] WARNING: CPU: 0 PID: 0 at init/main.c:1061 start_kernel+0x36f/0x4e8
> [ 1.096184] Modules linked in:
> [ 1.099213] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc1-test+ #468
> [ 1.106047] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
> [ 1.114960] RIP: 0010:start_kernel+0x36f/0x4e8
> [ 1.119373] Code: 48 27 53 e8 fd 7b 03 00 e8 55 72 e5 fe e8 6b 05 03 00 ff 15 2a f9 89 ff 0f ba e0 09 73 0e 48 c7 c7 98 25 31 ac e8 01 5b df fe <0f> 0b c6 05 c7 2c af ff 00 e8 ba 36 47 fe ff 15 14 f9 89 ff e8 56
> [ 1.138057] RSP: 0000:ffffffffac603f30 EFLAGS: 00010286
> [ 1.143248] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1.150343] RDX: 0000000000000002 RSI: 00000000ffffdfff RDI: 0000000000000001
> [ 1.157438] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffac603dd8
> [ 1.164537] R10: 0000000000000003 R11: ffffffffac734168 R12: 000000000000180a
> [ 1.171633] R13: 00000000010fecf0 R14: 0000000000000000 R15: 00000000d44e3018
> [ 1.178730] FS: 0000000000000000(0000) GS:ffff93ee1aa00000(0000) knlGS:0000000000000000
> [ 1.186778] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1.192489] CR2: ffff93ee1edff000 CR3: 000000002e60a001 CR4: 00000000000606f0
> [ 1.199584] Call Trace:
> [ 1.202009] <TASK>
>
>
> So I'm not sure how just updating srcu will do anything to prevent this.
>
> -- Steve
>
>>
>> Thanx, Paul
>>
>> ------------------------------------------------------------------------
>>
>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
>> index ca4b5dcec675b..356ef70d5442c 100644
>> --- a/kernel/rcu/srcutree.c
>> +++ b/kernel/rcu/srcutree.c
>> @@ -1267,14 +1267,15 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
>> {
>> struct rcu_synchronize rcu;
>>
>> + if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>> + return;
>> +
>> RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
>> lock_is_held(&rcu_bh_lock_map) ||
>> lock_is_held(&rcu_lock_map) ||
>> lock_is_held(&rcu_sched_lock_map),
>> "Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section");
>>
>> - if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>> - return;
>> might_sleep();
>> check_init_srcu_struct(ssp);
>> init_completion(&rcu.completion);
>

2022-10-20 17:36:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing: Add trace_trigger kernel command line option

On Thu, 20 Oct 2022 12:33:57 -0400
Steven Rostedt <[email protected]> wrote:

> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index c03fd7037add..79ac31a6a87b 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2748,9 +2748,12 @@ void trace_buffered_event_disable(void)
> disable_trace_buffered_event, NULL, 1);
> preempt_enable();
>
> + if (!irqs_disabled())
> + printk("IRQS DISABLED!! before %s:%d\n", __func__, __LINE__);
> /* Wait for all current users to finish */
> - if (!early_boot_irqs_disabled)
> - synchronize_rcu();
> + synchronize_rcu();
> + if (!irqs_disabled())
> + printk("IRQS DISABLED!! after %s:%d\n", __func__, __LINE__);
>
> for_each_tracing_cpu(cpu) {
> free_page((unsigned long)per_cpu(trace_buffered_event, cpu));


Continuing the above printk()s I found the culprit.

synchronize_rcu() {
rcu_poll_gp_seq_start_unlocked() {
struct rcu_node *rnp = rcu_get_root();

if (rcu_init_invoked()) {
lockdep_assert_irqs_enabled();
raw_spin_lock_irq_rcu_node(rnp);
}
rcu_poll_gp_seq_start(snap);
if (rcu_init_invoked())
raw_spin_unlock_irq_rcu_node(rnp); <<-- Interrupts enabled here
}



-- Steve

2022-10-20 18:23:10

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] tracing: Add trace_trigger kernel command line option

On Thu, Oct 20, 2022 at 01:05:28PM -0400, Steven Rostedt wrote:
> On Thu, 20 Oct 2022 12:33:57 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index c03fd7037add..79ac31a6a87b 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -2748,9 +2748,12 @@ void trace_buffered_event_disable(void)
> > disable_trace_buffered_event, NULL, 1);
> > preempt_enable();
> >
> > + if (!irqs_disabled())
> > + printk("IRQS DISABLED!! before %s:%d\n", __func__, __LINE__);
> > /* Wait for all current users to finish */
> > - if (!early_boot_irqs_disabled)
> > - synchronize_rcu();
> > + synchronize_rcu();
> > + if (!irqs_disabled())
> > + printk("IRQS DISABLED!! after %s:%d\n", __func__, __LINE__);
> >
> > for_each_tracing_cpu(cpu) {
> > free_page((unsigned long)per_cpu(trace_buffered_event, cpu));
>
>
> Continuing the above printk()s I found the culprit.
>
> synchronize_rcu() {
> rcu_poll_gp_seq_start_unlocked() {
> struct rcu_node *rnp = rcu_get_root();
>
> if (rcu_init_invoked()) {
> lockdep_assert_irqs_enabled();
> raw_spin_lock_irq_rcu_node(rnp);
> }
> rcu_poll_gp_seq_start(snap);
> if (rcu_init_invoked())
> raw_spin_unlock_irq_rcu_node(rnp); <<-- Interrupts enabled here
> }

Thank you for digging into this!

Does the following help?

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index abc615808b6e8..45ad9ef00d647 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1425,30 +1425,32 @@ static void rcu_poll_gp_seq_end(unsigned long *snap)
// where caller does not hold the root rcu_node structure's lock.
static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
{
+ unsigned long flags;
struct rcu_node *rnp = rcu_get_root();

if (rcu_init_invoked()) {
lockdep_assert_irqs_enabled();
- raw_spin_lock_irq_rcu_node(rnp);
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
}
rcu_poll_gp_seq_start(snap);
if (rcu_init_invoked())
- raw_spin_unlock_irq_rcu_node(rnp);
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}

// Make the polled API aware of the end of a grace period, but where
// caller does not hold the root rcu_node structure's lock.
static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
{
+ unsigned long flags;
struct rcu_node *rnp = rcu_get_root();

if (rcu_init_invoked()) {
lockdep_assert_irqs_enabled();
- raw_spin_lock_irq_rcu_node(rnp);
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
}
rcu_poll_gp_seq_end(snap);
if (rcu_init_invoked())
- raw_spin_unlock_irq_rcu_node(rnp);
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}

/*

2022-10-20 18:51:39

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] tracing: Add trace_trigger kernel command line option

On Thu, Oct 20, 2022 at 12:33:57PM -0400, Steven Rostedt wrote:
> On Thu, 20 Oct 2022 08:59:00 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > On Wed, Oct 19, 2022 at 08:07:45PM -0400, Steven Rostedt wrote:
> > > On Wed, 19 Oct 2022 20:01:37 -0400
> > > Steven Rostedt <[email protected]> wrote:
> > >
> > > > @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
> > > > #ifdef CONFIG_TRACEPOINTS
> > > > static inline void tracepoint_synchronize_unregister(void)
> > > > {
> > > > + /* Early updates do not need synchronization */
> > > > + if (early_boot_irqs_disabled)
> > > > + return;
> > > > +
> > > > synchronize_srcu(&tracepoint_srcu);
> > > > synchronize_rcu();
> > >
> > > I wonder if this check should be just moved to the RCU synchronization
> > > code? That is, if early_boot_irqs_disabled is set, do nothing, as there
> > > should be nothing to synchronize against.
> >
> > There already is a similar check, but it follows the lockdep check.
> >
> > Does the following patch help?
>
> Not sure if it would.
>
> I added this:
>
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index c03fd7037add..79ac31a6a87b 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2748,9 +2748,12 @@ void trace_buffered_event_disable(void)
> disable_trace_buffered_event, NULL, 1);
> preempt_enable();
>
> + if (!irqs_disabled())
> + printk("IRQS DISABLED!! before %s:%d\n", __func__, __LINE__);
> /* Wait for all current users to finish */
> - if (!early_boot_irqs_disabled)
> - synchronize_rcu();
> + synchronize_rcu();
> + if (!irqs_disabled())
> + printk("IRQS DISABLED!! after %s:%d\n", __func__, __LINE__);
>
> for_each_tracing_cpu(cpu) {
> free_page((unsigned long)per_cpu(trace_buffered_event, cpu));
>
> Which produced this:
>
>
> [ 0.972867] ftrace: allocating 47021 entries in 184 pages
> [ 0.978285] ftrace section at ffffffffacef74c0 sorted properly
> [ 0.991153] ftrace: allocated 184 pages with 4 groups
> [ 0.996163] Dynamic Preempt: full
> [ 0.999354] rcu: Preemptible hierarchical RCU implementation.
> [ 1.005020] rcu: RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=8.
> [ 1.011680] Tasks-RCU CPU stall warnings timeout set to 120000 (rcu_task_stall_timeout).
> [ 1.019813] Trampoline variant of Tasks RCU enabled.
> [ 1.024831] Rude variant of Tasks RCU enabled.
> [ 1.029330] Tracing variant of Tasks RCU enabled.
> [ 1.034089] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
> [ 1.041699] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
> [ 1.052607] IRQS DISABLED!! after trace_buffered_event_disable:2756
> [ 1.058712] NR_IRQS: 8448, nr_irqs: 488, preallocated irqs: 16
> [ 1.064670] NO_HZ: Full dynticks CPUs: 3.
> [ 1.068501] rcu: Offload RCU callbacks from CPUs: 3.
> [ 1.073519] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [ 1.080400] ------------[ cut here ]------------
> [ 1.084938] Interrupts were enabled early
> [ 1.088919] WARNING: CPU: 0 PID: 0 at init/main.c:1061 start_kernel+0x36f/0x4e8
> [ 1.096184] Modules linked in:
> [ 1.099213] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc1-test+ #468
> [ 1.106047] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
> [ 1.114960] RIP: 0010:start_kernel+0x36f/0x4e8
> [ 1.119373] Code: 48 27 53 e8 fd 7b 03 00 e8 55 72 e5 fe e8 6b 05 03 00 ff 15 2a f9 89 ff 0f ba e0 09 73 0e 48 c7 c7 98 25 31 ac e8 01 5b df fe <0f> 0b c6 05 c7 2c af ff 00 e8 ba 36 47 fe ff 15 14 f9 89 ff e8 56
> [ 1.138057] RSP: 0000:ffffffffac603f30 EFLAGS: 00010286
> [ 1.143248] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1.150343] RDX: 0000000000000002 RSI: 00000000ffffdfff RDI: 0000000000000001
> [ 1.157438] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffac603dd8
> [ 1.164537] R10: 0000000000000003 R11: ffffffffac734168 R12: 000000000000180a
> [ 1.171633] R13: 00000000010fecf0 R14: 0000000000000000 R15: 00000000d44e3018
> [ 1.178730] FS: 0000000000000000(0000) GS:ffff93ee1aa00000(0000) knlGS:0000000000000000
> [ 1.186778] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1.192489] CR2: ffff93ee1edff000 CR3: 000000002e60a001 CR4: 00000000000606f0
> [ 1.199584] Call Trace:
> [ 1.202009] <TASK>
>
>
> So I'm not sure how just updating srcu will do anything to prevent this.

Apologies, SRCU on the brain due to a different email thread... :-/

Thanx, Paul

> -- Steve
>
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index ca4b5dcec675b..356ef70d5442c 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -1267,14 +1267,15 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> > {
> > struct rcu_synchronize rcu;
> >
> > + if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> > + return;
> > +
> > RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > lock_is_held(&rcu_bh_lock_map) ||
> > lock_is_held(&rcu_lock_map) ||
> > lock_is_held(&rcu_sched_lock_map),
> > "Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section");
> >
> > - if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> > - return;
> > might_sleep();
> > check_init_srcu_struct(ssp);
> > init_completion(&rcu.completion);
>

2022-10-20 20:18:23

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing: Add trace_trigger kernel command line option

On Thu, 20 Oct 2022 10:55:12 -0700
"Paul E. McKenney" <[email protected]> wrote:

> > synchronize_rcu() {
> > rcu_poll_gp_seq_start_unlocked() {
> > struct rcu_node *rnp = rcu_get_root();
> >
> > if (rcu_init_invoked()) {
> > lockdep_assert_irqs_enabled();
> > raw_spin_lock_irq_rcu_node(rnp);
> > }
> > rcu_poll_gp_seq_start(snap);
> > if (rcu_init_invoked())
> > raw_spin_unlock_irq_rcu_node(rnp); <<-- Interrupts enabled here
> > }
>
> Thank you for digging into this!
>
> Does the following help?

It does indeed!

I can remove the early_boot_irqs_disabled checks if you can add this to
this rc cycle and mark it stable.

-- Steve

2022-10-20 23:13:03

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] tracing: Add trace_trigger kernel command line option

On Thu, Oct 20, 2022 at 02:53:40PM -0400, Steven Rostedt wrote:
> On Thu, 20 Oct 2022 10:55:12 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > > synchronize_rcu() {
> > > rcu_poll_gp_seq_start_unlocked() {
> > > struct rcu_node *rnp = rcu_get_root();
> > >
> > > if (rcu_init_invoked()) {
> > > lockdep_assert_irqs_enabled();
> > > raw_spin_lock_irq_rcu_node(rnp);
> > > }
> > > rcu_poll_gp_seq_start(snap);
> > > if (rcu_init_invoked())
> > > raw_spin_unlock_irq_rcu_node(rnp); <<-- Interrupts enabled here
> > > }
> >
> > Thank you for digging into this!
> >
> > Does the following help?
>
> It does indeed!
>
> I can remove the early_boot_irqs_disabled checks if you can add this to
> this rc cycle and mark it stable.

My plan is to push it in after a bit of testing. The Fixes tag should
make the right -stable stuff happen.

Thanx, Paul

------------------------------------------------------------------------

commit 952643575669332565c77066fb42d67987680f12
Author: Paul E. McKenney <[email protected]>
Date: Thu Oct 20 10:58:14 2022 -0700

rcu: Keep synchronize_rcu() from enabling irqs in early boot

Making polled RCU grace periods account for expedited grace periods
required acquiring the leaf rcu_node structure's lock during early boot,
but after rcu_init() was called. This lock is irq-disabled, but the
code incorrectly assumes that irqs are always disabled when invoking
synchronize_rcu(). The exception is early boot before the scheduler has
started, which means that upon return from synchronize_rcu(), irqs will
be incorrectly enabled.

This commit fixes this bug by using irqsave/irqrestore locking primitives.

Fixes: bf95b2bc3e42 ("rcu: Switch polled grace-period APIs to ->gp_seq_polled")

Reported-by: Steven Rostedt <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index abc615808b6e8..45ad9ef00d647 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1425,30 +1425,32 @@ static void rcu_poll_gp_seq_end(unsigned long *snap)
// where caller does not hold the root rcu_node structure's lock.
static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
{
+ unsigned long flags;
struct rcu_node *rnp = rcu_get_root();

if (rcu_init_invoked()) {
lockdep_assert_irqs_enabled();
- raw_spin_lock_irq_rcu_node(rnp);
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
}
rcu_poll_gp_seq_start(snap);
if (rcu_init_invoked())
- raw_spin_unlock_irq_rcu_node(rnp);
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}

// Make the polled API aware of the end of a grace period, but where
// caller does not hold the root rcu_node structure's lock.
static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
{
+ unsigned long flags;
struct rcu_node *rnp = rcu_get_root();

if (rcu_init_invoked()) {
lockdep_assert_irqs_enabled();
- raw_spin_lock_irq_rcu_node(rnp);
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
}
rcu_poll_gp_seq_end(snap);
if (rcu_init_invoked())
- raw_spin_unlock_irq_rcu_node(rnp);
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}

/*