On Mon, Jul 03, 2023 at 10:07:28AM -0700, Linus Torvalds wrote:
> On Mon, 3 Jul 2023 at 10:00, Conor Dooley <[email protected]> wrote:
> >
> > I'm not entirely sure if it is related, as stuff in the guts of mm like
> > this is beyond me, but I've been seeing similar warnings on RISC-V.
>
> No, that RISC-V warning is also about bad RCU usage, but that's a
> different thing.
>
> > RCU used illegally from offline CPU!
> > rcu_scheduler_active = 1, debug_locks = 1
> > 1 lock held by swapper/1/0:
> > #0: ffffffff8169ceb0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x32
> >
> > stack backtrace:
> > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.4.0-10173-ga901a3568fd2 #1
> > Hardware name: riscv-virtio,qemu (DT)
> > Call Trace:
> > [<ffffffff80006a20>] show_stack+0x2c/0x38
> > [<ffffffff80af3ee0>] dump_stack_lvl+0x5e/0x80
> > [<ffffffff80af3f16>] dump_stack+0x14/0x1c
> > [<ffffffff80083ff0>] lockdep_rcu_suspicious+0x19e/0x232
> > [<ffffffff80ad4802>] mtree_load+0x18a/0x3b6
> > [<ffffffff80091632>] __irq_get_desc_lock+0x2c/0x82
> > [<ffffffff80094722>] enable_percpu_irq+0x36/0x9e
> > [<ffffffff800087d4>] riscv_ipi_enable+0x32/0x4e
> > [<ffffffff80008692>] smp_callin+0x24/0x66
>
> This is also triggering on the maple tree sanity checks, but it' sa
> different maple tree, and a different code sequence.
>
> And a different case of suspicious RCU usage - not a lack of locking,
> but simply using RCU before marking the CPU online.
Ah, I probably should've known from the
RCU used illegally from offline CPU!
that it was different.
> I suspect the riscv_ipi_enable() in the RISC-V version of smp_callin()
> needs to be moved down to below the
>
> set_cpu_online(curr_cpuid, 1);
>
> or was there some reason why it needed to be done quite _that_ early
> in commit 832f15f42646 ("RISC-V: Treat IPIs as normal Linux IRQs")?
>
> Added guilty parties to the cc.
Taking the rationale & potential problems out of the equation, that
code movement does suppress the complaints from rcu/maple tree,
thanks.
Cheers,
Conor.
On Mon, 03 Jul 2023 18:20:52 +0100,
Conor Dooley <[email protected]> wrote:
>
> [1 <text/plain; us-ascii (quoted-printable)>]
> On Mon, Jul 03, 2023 at 10:07:28AM -0700, Linus Torvalds wrote:
> > On Mon, 3 Jul 2023 at 10:00, Conor Dooley <[email protected]> wrote:
> > >
> > > I'm not entirely sure if it is related, as stuff in the guts of mm like
> > > this is beyond me, but I've been seeing similar warnings on RISC-V.
> >
> > No, that RISC-V warning is also about bad RCU usage, but that's a
> > different thing.
> >
> > > RCU used illegally from offline CPU!
> > > rcu_scheduler_active = 1, debug_locks = 1
> > > 1 lock held by swapper/1/0:
> > > #0: ffffffff8169ceb0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x32
> > >
> > > stack backtrace:
> > > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.4.0-10173-ga901a3568fd2 #1
> > > Hardware name: riscv-virtio,qemu (DT)
> > > Call Trace:
> > > [<ffffffff80006a20>] show_stack+0x2c/0x38
> > > [<ffffffff80af3ee0>] dump_stack_lvl+0x5e/0x80
> > > [<ffffffff80af3f16>] dump_stack+0x14/0x1c
> > > [<ffffffff80083ff0>] lockdep_rcu_suspicious+0x19e/0x232
> > > [<ffffffff80ad4802>] mtree_load+0x18a/0x3b6
> > > [<ffffffff80091632>] __irq_get_desc_lock+0x2c/0x82
> > > [<ffffffff80094722>] enable_percpu_irq+0x36/0x9e
> > > [<ffffffff800087d4>] riscv_ipi_enable+0x32/0x4e
> > > [<ffffffff80008692>] smp_callin+0x24/0x66
> >
> > This is also triggering on the maple tree sanity checks, but it' sa
> > different maple tree, and a different code sequence.
> >
> > And a different case of suspicious RCU usage - not a lack of locking,
> > but simply using RCU before marking the CPU online.
>
> Ah, I probably should've known from the
> RCU used illegally from offline CPU!
> that it was different.
>
> > I suspect the riscv_ipi_enable() in the RISC-V version of smp_callin()
> > needs to be moved down to below the
> >
> > set_cpu_online(curr_cpuid, 1);
> >
> > or was there some reason why it needed to be done quite _that_ early
> > in commit 832f15f42646 ("RISC-V: Treat IPIs as normal Linux IRQs")?
> >
> > Added guilty parties to the cc.
>
> Taking the rationale & potential problems out of the equation, that
> code movement does suppress the complaints from rcu/maple tree,
> thanks.
Comparing with what we do on arm64, a less radical change would be to
move the IPI init after notify_cpu_starting(), which explicitly
enables RCU usage.
Something like:
diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index bb0b76e1a6d4..f4d6acb38dd0 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -238,10 +238,11 @@ asmlinkage __visible void smp_callin(void)
mmgrab(mm);
current->active_mm = mm;
- riscv_ipi_enable();
-
store_cpu_topology(curr_cpuid);
notify_cpu_starting(curr_cpuid);
+
+ riscv_ipi_enable();
+
numa_add_cpu(curr_cpuid);
set_cpu_online(curr_cpuid, 1);
probe_vendor_features(curr_cpuid);
which I obviously haven't tested at all.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Mon, 3 Jul 2023 at 10:34, Marc Zyngier <[email protected]> wrote:
>
> Comparing with what we do on arm64, a less radical change would be to
> move the IPI init after notify_cpu_starting(), which explicitly
> enables RCU usage.
Ack, that looks right to me.
Linus
On Mon, 03 Jul 2023 10:46:55 PDT (-0700), Linus Torvalds wrote:
> On Mon, 3 Jul 2023 at 10:34, Marc Zyngier <[email protected]> wrote:
>>
>> Comparing with what we do on arm64, a less radical change would be to
>> move the IPI init after notify_cpu_starting(), which explicitly
>> enables RCU usage.
>
> Ack, that looks right to me.
I don't see anything wrong with it and it's passing my tests, but we've
got a handful of ways to boot so it's all a bit messy and I might be
messing something.
I'm still catching up a bit as I took both days off this weekend.
Hopefully just an oversight in a bigger rafactoring, but I'll give it
another look.
Marc: are you going to send that as a patch?
On Mon, 03 Jul 2023 19:11:27 +0100,
Palmer Dabbelt <[email protected]> wrote:
>
> On Mon, 03 Jul 2023 10:46:55 PDT (-0700), Linus Torvalds wrote:
> > On Mon, 3 Jul 2023 at 10:34, Marc Zyngier <[email protected]> wrote:
> >>
> >> Comparing with what we do on arm64, a less radical change would be to
> >> move the IPI init after notify_cpu_starting(), which explicitly
> >> enables RCU usage.
> >
> > Ack, that looks right to me.
>
> I don't see anything wrong with it and it's passing my tests, but
> we've got a handful of ways to boot so it's all a bit messy and I
> might be messing something.
>
> I'm still catching up a bit as I took both days off this weekend.
> Hopefully just an oversight in a bigger rafactoring, but I'll give it
> another look.
>
> Marc: are you going to send that as a patch?
Just did [1].
Thanks,
M.
[1] https://lore.kernel.org/r/[email protected]
--
Without deviation from the norm, progress is not possible.