2021-08-10 14:22:32

by Valentin Schneider

[permalink] [raw]
Subject: [SPLAT 2/3] irqchip/gic-v3-its: Sleeping spinlocks down gic_reserve_range()

[ 0.134518] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:35
[ 0.134520] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
[ 0.134522] 1 lock held by swapper/1/0:
[ 0.134523] #0: ffff008f3624f728 ((lock).lock){+.+.}-{2:2}, at: get_page_from_freelist (mm/page_alloc.c:3673 mm/page_alloc.c:3704 mm/page_alloc.c:4166)
[ 0.134533] irq event stamp: 0
[ 0.134534] hardirqs last enabled at (0): 0x0
[ 0.134538] hardirqs last disabled at (0): copy_process (./include/linux/lockdep.h:195 ./include/linux/lockdep.h:202 ./include/linux/lockdep.h:208 ./include/linux/seqlock.h:78 kernel/fork.c:2084)
[ 0.134542] softirqs last enabled at (0): copy_process (./include/linux/lockdep.h:195 ./include/linux/lockdep.h:202 ./include/linux/lockdep.h:208 ./include/linux/seqlock.h:78 kernel/fork.c:2084)
[ 0.134545] softirqs last disabled at (0): 0x0
[ 0.134547] Preemption disabled at:
[ 0.134547] rt_mutex_slowunlock (kernel/locking/rtmutex.c:1223)
[ 0.134552] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.14.0-rc4-rt6-torture+ #56
[ 0.134555] Call trace:
[ 0.134556] dump_backtrace (arch/arm64/kernel/stacktrace.c:151)
[ 0.134558] show_stack (arch/arm64/kernel/stacktrace.c:217)
[ 0.134559] dump_stack_lvl (lib/dump_stack.c:106)
[ 0.134563] dump_stack (lib/dump_stack.c:113)
[ 0.134565] ___might_sleep (kernel/sched/core.c:9306)
[ 0.134567] rt_spin_lock (kernel/locking/rtmutex.c:1641 (discriminator 4) kernel/locking/spinlock_rt.c:30 (discriminator 4) kernel/locking/spinlock_rt.c:36 (discriminator 4) kernel/locking/spinlock_rt.c:44 (discriminator 4))
[ 0.134569] get_page_from_freelist (mm/page_alloc.c:3673 mm/page_alloc.c:3704 mm/page_alloc.c:4166)
[ 0.134571] __alloc_pages (mm/page_alloc.c:5391)
[ 0.134573] alloc_page_interleave (mm/mempolicy.c:2119)
[ 0.134576] alloc_pages (mm/mempolicy.c:2249)
[ 0.134577] new_slab (mm/slub.c:1740 mm/slub.c:1877 mm/slub.c:1940)
[ 0.134580] ___slab_alloc (mm/slub.c:2951)
[ 0.134582] __slab_alloc.isra.0 (mm/slub.c:3038)
[ 0.134584] kmem_cache_alloc_trace (mm/slub.c:3129 mm/slub.c:3171 mm/slub.c:3188)
[ 0.134587] efi_mem_reserve_iomem (drivers/firmware/efi/efi.c:905)
[ 0.134590] efi_mem_reserve_persistent (drivers/firmware/efi/efi.c:952)
[ 0.134593] its_cpu_init (drivers/irqchip/irq-gic-v3-its.c:3074 drivers/irqchip/irq-gic-v3-its.c:5196)
[ 0.134596] gic_starting_cpu (drivers/irqchip/irq-gic.c:798)
[ 0.134599] cpuhp_invoke_callback (kernel/cpu.c:180)
[ 0.134601] cpuhp_invoke_callback_range (kernel/cpu.c:656)
[ 0.134603] notify_cpu_starting (kernel/cpu.c:1270)
[ 0.134605] secondary_start_kernel (arch/arm64/kernel/smp.c:243)
[ 0.134608] __secondary_switched (arch/arm64/kernel/head.S:661)


2021-08-11 09:18:49

by Marc Zyngier

[permalink] [raw]
Subject: Re: [SPLAT 2/3] irqchip/gic-v3-its: Sleeping spinlocks down gic_reserve_range()

[+ Ard]

On Tue, 10 Aug 2021 14:41:26 +0100,
Valentin Schneider <[email protected]> wrote:
>
> [ 0.134518] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:35
> [ 0.134520] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
> [ 0.134522] 1 lock held by swapper/1/0:
> [ 0.134523] #0: ffff008f3624f728 ((lock).lock){+.+.}-{2:2}, at: get_page_from_freelist (mm/page_alloc.c:3673 mm/page_alloc.c:3704 mm/page_alloc.c:4166)
> [ 0.134533] irq event stamp: 0
> [ 0.134534] hardirqs last enabled at (0): 0x0
> [ 0.134538] hardirqs last disabled at (0): copy_process (./include/linux/lockdep.h:195 ./include/linux/lockdep.h:202 ./include/linux/lockdep.h:208 ./include/linux/seqlock.h:78 kernel/fork.c:2084)
> [ 0.134542] softirqs last enabled at (0): copy_process (./include/linux/lockdep.h:195 ./include/linux/lockdep.h:202 ./include/linux/lockdep.h:208 ./include/linux/seqlock.h:78 kernel/fork.c:2084)
> [ 0.134545] softirqs last disabled at (0): 0x0
> [ 0.134547] Preemption disabled at:
> [ 0.134547] rt_mutex_slowunlock (kernel/locking/rtmutex.c:1223)
> [ 0.134552] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.14.0-rc4-rt6-torture+ #56
> [ 0.134555] Call trace:
> [ 0.134556] dump_backtrace (arch/arm64/kernel/stacktrace.c:151)
> [ 0.134558] show_stack (arch/arm64/kernel/stacktrace.c:217)
> [ 0.134559] dump_stack_lvl (lib/dump_stack.c:106)
> [ 0.134563] dump_stack (lib/dump_stack.c:113)
> [ 0.134565] ___might_sleep (kernel/sched/core.c:9306)
> [ 0.134567] rt_spin_lock (kernel/locking/rtmutex.c:1641 (discriminator 4) kernel/locking/spinlock_rt.c:30 (discriminator 4) kernel/locking/spinlock_rt.c:36 (discriminator 4) kernel/locking/spinlock_rt.c:44 (discriminator 4))
> [ 0.134569] get_page_from_freelist (mm/page_alloc.c:3673 mm/page_alloc.c:3704 mm/page_alloc.c:4166)
> [ 0.134571] __alloc_pages (mm/page_alloc.c:5391)
> [ 0.134573] alloc_page_interleave (mm/mempolicy.c:2119)
> [ 0.134576] alloc_pages (mm/mempolicy.c:2249)
> [ 0.134577] new_slab (mm/slub.c:1740 mm/slub.c:1877 mm/slub.c:1940)
> [ 0.134580] ___slab_alloc (mm/slub.c:2951)
> [ 0.134582] __slab_alloc.isra.0 (mm/slub.c:3038)
> [ 0.134584] kmem_cache_alloc_trace (mm/slub.c:3129 mm/slub.c:3171 mm/slub.c:3188)
> [ 0.134587] efi_mem_reserve_iomem (drivers/firmware/efi/efi.c:905)
> [ 0.134590] efi_mem_reserve_persistent (drivers/firmware/efi/efi.c:952)
> [ 0.134593] its_cpu_init (drivers/irqchip/irq-gic-v3-its.c:3074 drivers/irqchip/irq-gic-v3-its.c:5196)
> [ 0.134596] gic_starting_cpu (drivers/irqchip/irq-gic.c:798)
> [ 0.134599] cpuhp_invoke_callback (kernel/cpu.c:180)
> [ 0.134601] cpuhp_invoke_callback_range (kernel/cpu.c:656)
> [ 0.134603] notify_cpu_starting (kernel/cpu.c:1270)
> [ 0.134605] secondary_start_kernel (arch/arm64/kernel/smp.c:243)
> [ 0.134608] __secondary_switched (arch/arm64/kernel/head.S:661)

The issue is that although the redistributor tables have been
allocated ahead of time (outside of any cpuhp callback), they cannot
be programmed into the RDs until the corresponding CPUs have been
brought up (the registers may not be accessible).

For the same reason, we don't know whether we can free them (because
there is already a table programmed there) or have to reserve them
with an efi_mem_reserve_persistent() call. efi_mem_reserve_iomem()
uses GFP_ATOMIC for its allocation, but this is not sufficient for RT
anymore.

We could postpone the reservation of the memory to a later point (it
is only useful for kexec), but it isn't clear where that point is. The
CPU is not quite up yet, and we can't easily IPI the boot CPU to do
the reserve call.

M.

--
Without deviation from the norm, progress is not possible.

2021-08-11 12:35:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [SPLAT 2/3] irqchip/gic-v3-its: Sleeping spinlocks down gic_reserve_range()

On Wed, Aug 11 2021 at 09:50, Marc Zyngier wrote:
> On Tue, 10 Aug 2021 14:41:26 +0100,
> Valentin Schneider <[email protected]> wrote:
> The issue is that although the redistributor tables have been
> allocated ahead of time (outside of any cpuhp callback), they cannot
> be programmed into the RDs until the corresponding CPUs have been
> brought up (the registers may not be accessible).
>
> For the same reason, we don't know whether we can free them (because
> there is already a table programmed there) or have to reserve them
> with an efi_mem_reserve_persistent() call. efi_mem_reserve_iomem()
> uses GFP_ATOMIC for its allocation, but this is not sufficient for RT
> anymore.
>
> We could postpone the reservation of the memory to a later point (it
> is only useful for kexec), but it isn't clear where that point is. The
> CPU is not quite up yet, and we can't easily IPI the boot CPU to do
> the reserve call.

Right, but don't you know about the need for reservation _before_
bringing the CPU up?

Thanks,

tglx

2021-08-11 15:18:03

by Marc Zyngier

[permalink] [raw]
Subject: Re: [SPLAT 2/3] irqchip/gic-v3-its: Sleeping spinlocks down gic_reserve_range()

On Wed, 11 Aug 2021 13:28:21 +0100,
Thomas Gleixner <[email protected]> wrote:
>
> On Wed, Aug 11 2021 at 09:50, Marc Zyngier wrote:
> > On Tue, 10 Aug 2021 14:41:26 +0100,
> > Valentin Schneider <[email protected]> wrote:
> > The issue is that although the redistributor tables have been
> > allocated ahead of time (outside of any cpuhp callback), they cannot
> > be programmed into the RDs until the corresponding CPUs have been
> > brought up (the registers may not be accessible).
> >
> > For the same reason, we don't know whether we can free them (because
> > there is already a table programmed there) or have to reserve them
> > with an efi_mem_reserve_persistent() call. efi_mem_reserve_iomem()
> > uses GFP_ATOMIC for its allocation, but this is not sufficient for RT
> > anymore.
> >
> > We could postpone the reservation of the memory to a later point (it
> > is only useful for kexec), but it isn't clear where that point is. The
> > CPU is not quite up yet, and we can't easily IPI the boot CPU to do
> > the reserve call.
>
> Right, but don't you know about the need for reservation _before_
> bringing the CPU up?

Unfortunately not. To find out, you need to access a pair of per-CPU
registers which are not guaranteed to be powered-on until the
corresponding CPU has made it into the kernel (the firmware will power
things on as part of bringing the CPU up).

Which is why we always allocate the memory upfront for all the CPUs,
and each CPU either frees the memory if it already had something in
its redistributor, or point the redistributor to the memory and
reserves it.

This is probably the most epic fail of the GICv3 architecture...

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-08-17 15:20:10

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [SPLAT 2/3] irqchip/gic-v3-its: Sleeping spinlocks down gic_reserve_range()

On Wed, 11 Aug 2021 at 10:50, Marc Zyngier <[email protected]> wrote:
>
> [+ Ard]
>
> On Tue, 10 Aug 2021 14:41:26 +0100,
> Valentin Schneider <[email protected]> wrote:
> >
> > [ 0.134518] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:35
> > [ 0.134520] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
> > [ 0.134522] 1 lock held by swapper/1/0:
> > [ 0.134523] #0: ffff008f3624f728 ((lock).lock){+.+.}-{2:2}, at: get_page_from_freelist (mm/page_alloc.c:3673 mm/page_alloc.c:3704 mm/page_alloc.c:4166)
> > [ 0.134533] irq event stamp: 0
> > [ 0.134534] hardirqs last enabled at (0): 0x0
> > [ 0.134538] hardirqs last disabled at (0): copy_process (./include/linux/lockdep.h:195 ./include/linux/lockdep.h:202 ./include/linux/lockdep.h:208 ./include/linux/seqlock.h:78 kernel/fork.c:2084)
> > [ 0.134542] softirqs last enabled at (0): copy_process (./include/linux/lockdep.h:195 ./include/linux/lockdep.h:202 ./include/linux/lockdep.h:208 ./include/linux/seqlock.h:78 kernel/fork.c:2084)
> > [ 0.134545] softirqs last disabled at (0): 0x0
> > [ 0.134547] Preemption disabled at:
> > [ 0.134547] rt_mutex_slowunlock (kernel/locking/rtmutex.c:1223)
> > [ 0.134552] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.14.0-rc4-rt6-torture+ #56
> > [ 0.134555] Call trace:
> > [ 0.134556] dump_backtrace (arch/arm64/kernel/stacktrace.c:151)
> > [ 0.134558] show_stack (arch/arm64/kernel/stacktrace.c:217)
> > [ 0.134559] dump_stack_lvl (lib/dump_stack.c:106)
> > [ 0.134563] dump_stack (lib/dump_stack.c:113)
> > [ 0.134565] ___might_sleep (kernel/sched/core.c:9306)
> > [ 0.134567] rt_spin_lock (kernel/locking/rtmutex.c:1641 (discriminator 4) kernel/locking/spinlock_rt.c:30 (discriminator 4) kernel/locking/spinlock_rt.c:36 (discriminator 4) kernel/locking/spinlock_rt.c:44 (discriminator 4))
> > [ 0.134569] get_page_from_freelist (mm/page_alloc.c:3673 mm/page_alloc.c:3704 mm/page_alloc.c:4166)
> > [ 0.134571] __alloc_pages (mm/page_alloc.c:5391)
> > [ 0.134573] alloc_page_interleave (mm/mempolicy.c:2119)
> > [ 0.134576] alloc_pages (mm/mempolicy.c:2249)
> > [ 0.134577] new_slab (mm/slub.c:1740 mm/slub.c:1877 mm/slub.c:1940)
> > [ 0.134580] ___slab_alloc (mm/slub.c:2951)
> > [ 0.134582] __slab_alloc.isra.0 (mm/slub.c:3038)
> > [ 0.134584] kmem_cache_alloc_trace (mm/slub.c:3129 mm/slub.c:3171 mm/slub.c:3188)
> > [ 0.134587] efi_mem_reserve_iomem (drivers/firmware/efi/efi.c:905)
> > [ 0.134590] efi_mem_reserve_persistent (drivers/firmware/efi/efi.c:952)
> > [ 0.134593] its_cpu_init (drivers/irqchip/irq-gic-v3-its.c:3074 drivers/irqchip/irq-gic-v3-its.c:5196)
> > [ 0.134596] gic_starting_cpu (drivers/irqchip/irq-gic.c:798)
> > [ 0.134599] cpuhp_invoke_callback (kernel/cpu.c:180)
> > [ 0.134601] cpuhp_invoke_callback_range (kernel/cpu.c:656)
> > [ 0.134603] notify_cpu_starting (kernel/cpu.c:1270)
> > [ 0.134605] secondary_start_kernel (arch/arm64/kernel/smp.c:243)
> > [ 0.134608] __secondary_switched (arch/arm64/kernel/head.S:661)
>
> The issue is that although the redistributor tables have been
> allocated ahead of time (outside of any cpuhp callback), they cannot
> be programmed into the RDs until the corresponding CPUs have been
> brought up (the registers may not be accessible).
>
> For the same reason, we don't know whether we can free them (because
> there is already a table programmed there) or have to reserve them
> with an efi_mem_reserve_persistent() call. efi_mem_reserve_iomem()
> uses GFP_ATOMIC for its allocation, but this is not sufficient for RT
> anymore.
>
> We could postpone the reservation of the memory to a later point (it
> is only useful for kexec), but it isn't clear where that point is. The
> CPU is not quite up yet, and we can't easily IPI the boot CPU to do
> the reserve call.
>

The kzalloc() call in question is used to allocate the struct resource
which is inserted in to the iomem resource tree. This could definitely
be postponed, given that the kernel itself does not care about these
entries, only user space (IIUC)