On Wed, Dec 02, 2020 at 11:56:49AM +0100, Heiko Carstens wrote:
> From 7bd86fb3eb039a4163281472ca79b9158e726526 Mon Sep 17 00:00:00 2001
> From: Heiko Carstens <[email protected]>
> Date: Wed, 2 Dec 2020 11:46:01 +0100
> Subject: [PATCH] s390: fix irq state tracing
>
> With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
> tracing") common code calls arch_cpu_idle() with a lockdep state that
> tells irqs are on.
>
> This doesn't work very well for s390: psw_idle() will enable interrupts
> to wait for an interrupt. As soon as an interrupt occurs the interrupt
> handler will verify if the old context was psw_idle(). If that is the
> case the interrupt enablement bits in the old program status word will
> be cleared.
>
> A subsequent test in both the external as well as the io interrupt
> handler checks if in the old context interrupts were enabled. Due to
> the above patching of the old program status word it is assumed the
> old context had interrupts disabled, and therefore a call to
> TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
> turn makes lockdep incorrectly "think" that interrupts are enabled
> within the interrupt handler.
>
> Fix this by unconditionally calling TRACE_IRQS_OFF when entering
> interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
> leaving interrupts handlers.
>
> This leaves the special psw_idle() case, which now returns with
> interrupts disabled, but has an "irqs on" lockdep state. So callers of
> psw_idle() must adjust the state on their own, if required. This is
> currently only __udelay_disabled().
>
> Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
> Signed-off-by: Heiko Carstens <[email protected]>
FWIW, this makes sense to me from what I had to chase on the arm64 side,
and this seems happy atop v5.10-rc6 with all the lockdep and RCU debug
options enabled when booting to userspace under QEMU.
Thanks,
Mark.
> ---
> arch/s390/kernel/entry.S | 15 ---------------
> arch/s390/lib/delay.c | 5 ++---
> 2 files changed, 2 insertions(+), 18 deletions(-)
>
> diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
> index 26bb0603c5a1..92beb1444644 100644
> --- a/arch/s390/kernel/entry.S
> +++ b/arch/s390/kernel/entry.S
> @@ -763,12 +763,7 @@ ENTRY(io_int_handler)
> xc __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
> TSTMSK __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
> jo .Lio_restore
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> - tmhh %r8,0x300
> - jz 1f
> TRACE_IRQS_OFF
> -1:
> -#endif
> xc __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
> .Lio_loop:
> lgr %r2,%r11 # pass pointer to pt_regs
> @@ -791,12 +786,7 @@ ENTRY(io_int_handler)
> TSTMSK __LC_CPU_FLAGS,_CIF_WORK
> jnz .Lio_work
> .Lio_restore:
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> - tm __PT_PSW(%r11),3
> - jno 0f
> TRACE_IRQS_ON
> -0:
> -#endif
> mvc __LC_RETURN_PSW(16),__PT_PSW(%r11)
> tm __PT_PSW+1(%r11),0x01 # returning to user ?
> jno .Lio_exit_kernel
> @@ -976,12 +966,7 @@ ENTRY(ext_int_handler)
> xc __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
> TSTMSK __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
> jo .Lio_restore
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> - tmhh %r8,0x300
> - jz 1f
> TRACE_IRQS_OFF
> -1:
> -#endif
> xc __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
> lgr %r2,%r11 # pass pointer to pt_regs
> lghi %r3,EXT_INTERRUPT
> diff --git a/arch/s390/lib/delay.c b/arch/s390/lib/delay.c
> index daca7bad66de..8c0c68e7770e 100644
> --- a/arch/s390/lib/delay.c
> +++ b/arch/s390/lib/delay.c
> @@ -33,7 +33,7 @@ EXPORT_SYMBOL(__delay);
>
> static void __udelay_disabled(unsigned long long usecs)
> {
> - unsigned long cr0, cr0_new, psw_mask, flags;
> + unsigned long cr0, cr0_new, psw_mask;
> struct s390_idle_data idle;
> u64 end;
>
> @@ -45,9 +45,8 @@ static void __udelay_disabled(unsigned long long usecs)
> psw_mask = __extract_psw() | PSW_MASK_EXT | PSW_MASK_WAIT;
> set_clock_comparator(end);
> set_cpu_flag(CIF_IGNORE_IRQ);
> - local_irq_save(flags);
> psw_idle(&idle, psw_mask);
> - local_irq_restore(flags);
> + trace_hardirqs_off();
> clear_cpu_flag(CIF_IGNORE_IRQ);
> set_clock_comparator(S390_lowcore.clock_comparator);
> __ctl_load(cr0, 0, 0);
> --
> 2.17.1
>
On Wed, Dec 02, 2020 at 11:16:05AM +0000, Mark Rutland wrote:
> On Wed, Dec 02, 2020 at 11:56:49AM +0100, Heiko Carstens wrote:
> > From 7bd86fb3eb039a4163281472ca79b9158e726526 Mon Sep 17 00:00:00 2001
> > From: Heiko Carstens <[email protected]>
> > Date: Wed, 2 Dec 2020 11:46:01 +0100
> > Subject: [PATCH] s390: fix irq state tracing
> >
> > With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
> > tracing") common code calls arch_cpu_idle() with a lockdep state that
> > tells irqs are on.
> >
> > This doesn't work very well for s390: psw_idle() will enable interrupts
> > to wait for an interrupt. As soon as an interrupt occurs the interrupt
> > handler will verify if the old context was psw_idle(). If that is the
> > case the interrupt enablement bits in the old program status word will
> > be cleared.
> >
> > A subsequent test in both the external as well as the io interrupt
> > handler checks if in the old context interrupts were enabled. Due to
> > the above patching of the old program status word it is assumed the
> > old context had interrupts disabled, and therefore a call to
> > TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
> > turn makes lockdep incorrectly "think" that interrupts are enabled
> > within the interrupt handler.
> >
> > Fix this by unconditionally calling TRACE_IRQS_OFF when entering
> > interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
> > leaving interrupts handlers.
> >
> > This leaves the special psw_idle() case, which now returns with
> > interrupts disabled, but has an "irqs on" lockdep state. So callers of
> > psw_idle() must adjust the state on their own, if required. This is
> > currently only __udelay_disabled().
> >
> > Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
> > Signed-off-by: Heiko Carstens <[email protected]>
>
> FWIW, this makes sense to me from what I had to chase on the arm64 side,
> and this seems happy atop v5.10-rc6 with all the lockdep and RCU debug
> options enabled when booting to userspace under QEMU.
>
> Thanks,
> Mark.
Thanks a lot for having a look and testing this!