2021-06-15 12:37:03

by Rasmus Villemoes

[permalink] [raw]
Subject: commit 3d5bfbd97163 versus -rt

Hi

Booting 5.10.41-rt42 with CONFIG_PREEMPT_RT=y gives this splat:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 29 at kernel/irq/handle.c:159
__handle_irq_event_percpu+0x1ec/0x27c
irq 66 handler irq_default_primary_handler+0x0/0x1c enabled interrupts
Modules linked in:
CPU: 0 PID: 29 Comm: irq/45-gpio-cas Not tainted
5.10.41-rt42-00001-g3ce830134091 #495
Hardware name: Freescale LS1021A
Backtrace:
[<806ad420>] (dump_backtrace) from [<806ad7c4>] (show_stack+0x20/0x24)
r7:80abad04 r6:60070013 r5:00000000 r4:80abad04
[<806ad7a4>] (show_stack) from [<806b0a94>] (dump_stack+0x88/0xa4)
[<806b0a0c>] (dump_stack) from [<8011d424>] (__warn+0xd4/0x100)
r7:80e3fe14 r6:00000000 r5:00000009 r4:80170c48
[<8011d350>] (__warn) from [<806ade3c>] (warn_slowpath_fmt+0x88/0xcc)
r9:ffffe000 r8:807e5943 r7:00000009 r6:80170c48 r5:0000009f r4:807e596a
[<806addb8>] (warn_slowpath_fmt) from [<80170c48>]
(__handle_irq_event_percpu+0x1ec/0x27c)
r8:80e3fe80 r7:80fff800 r6:00000042 r5:00000002 r4:80ea2000
[<80170a5c>] (__handle_irq_event_percpu) from [<80170d38>]
(handle_irq_event_percpu+0x60/0xc0)
r10:80ae2508 r9:80171c48 r8:80e3e000 r7:80e282e4 r6:00000000 r5:80fff800
r4:00000000
[<80170cd8>] (handle_irq_event_percpu) from [<80170e2c>]
(handle_irq_event+0x94/0xb8)
r7:80e282e4 r6:80fff878 r5:80fff818 r4:80fff800
[<80170d98>] (handle_irq_event) from [<801761a4>]
(handle_edge_irq+0xec/0x10c)
r7:80e282e4 r6:80fff878 r5:80fff818 r4:80fff800
[<801760b8>] (handle_edge_irq) from [<8016fbb8>]
(generic_handle_irq+0x38/0x48)
r7:80e282e4 r6:ffffe000 r5:80e3a640 r4:0000000c
[<8016fb80>] (generic_handle_irq) from [<8043e98c>]
(mpc8xxx_gpio_irq_cascade+0xac/0xd0)
[<8043e8e0>] (mpc8xxx_gpio_irq_cascade) from [<80171c80>]
(irq_forced_thread_fn+0x38/0x8c)
r5:80e282c0 r4:80deda00
[<80171c48>] (irq_forced_thread_fn) from [<80171eb4>]
(irq_thread+0x11c/0x238)
r7:80e282e4 r6:ffffe000 r5:80e282c0 r4:80deda00
[<80171d98>] (irq_thread) from [<8013d9bc>] (kthread+0x18c/0x19c)
r10:80e121c0 r9:80e282c0 r8:80171d98 r7:80d15bf4 r6:80e28300 r5:80e3e000
r4:80e28340
[<8013d830>] (kthread) from [<80100130>] (ret_from_fork+0x14/0x24)
Exception stack(0x80e3ffb0 to 0x80e3fff8)
ffa0: 00000000 00000000 00000000
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:8013d830
r4:80e28300 r3:80e121c0
---[ end trace 0000000000000002 ]---

Reverting commit 3d5bfbd9716318b1ca5c38488aa69f64d38a9aa5 (gpio:
mpc8xxx: change the gpio interrupt flags.) makes it go away, as does
disabling CONFIG_PREEMPT_RT or simply booting a vanilla v5.10.42 (where
that option exists but cannot be selected).

This seems to be the kind of thing where an -rt expert can immediately
see what's wrong and how to fix it. Ideas anyone?

Thanks,
Rasmus


2021-06-15 15:35:18

by Steven Rostedt

[permalink] [raw]
Subject: Re: commit 3d5bfbd97163 versus -rt

On Tue, 15 Jun 2021 14:35:27 +0200
Rasmus Villemoes <[email protected]> wrote:

> Reverting commit 3d5bfbd9716318b1ca5c38488aa69f64d38a9aa5 (gpio:
> mpc8xxx: change the gpio interrupt flags.) makes it go away, as does
> disabling CONFIG_PREEMPT_RT or simply booting a vanilla v5.10.42 (where
> that option exists but cannot be selected).

I'm curious if it will also trigger on vanilla v5.10.42 but add to the
kernel command line: threadirqs

Make sure you have CONFIG_IRQ_FORCED_THREADING set too.

Because it appears to be an issue with that being called by the generic
threaded irq infrastructure, which PREEMPT_RT enables automatically.


-- Steve

2021-06-15 16:00:09

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: commit 3d5bfbd97163 versus -rt

On 15/06/2021 17.33, Steven Rostedt wrote:
> On Tue, 15 Jun 2021 14:35:27 +0200
> Rasmus Villemoes <[email protected]> wrote:
>
>> Reverting commit 3d5bfbd9716318b1ca5c38488aa69f64d38a9aa5 (gpio:
>> mpc8xxx: change the gpio interrupt flags.) makes it go away, as does
>> disabling CONFIG_PREEMPT_RT or simply booting a vanilla v5.10.42 (where
>> that option exists but cannot be selected).
>
> I'm curious if it will also trigger on vanilla v5.10.42 but add to the
> kernel command line: threadirqs
>
> Make sure you have CONFIG_IRQ_FORCED_THREADING set too.
>
> Because it appears to be an issue with that being called by the generic
> threaded irq infrastructure, which PREEMPT_RT enables automatically.

It doesn't:

~ # uname -r
5.10.42-00001-g10216cf63a12
~ # grep -ow threadirqs /proc/cmdline
threadirqs
~ # zcat /proc/config.gz | grep FORCED_THREADING
CONFIG_IRQ_FORCED_THREADING=y
~ # dmesg | grep WARNING
~ #

(the one patch on top of 5.10.42 is a fixup of ls1021a.dtsi that I
should get upstream some day, but not something that should affect this
issue in any way).

Thanks for the suggestion.

Rasmus

2021-06-15 16:27:32

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: commit 3d5bfbd97163 versus -rt

On 15/06/2021 17.57, Rasmus Villemoes wrote:
> On 15/06/2021 17.33, Steven Rostedt wrote:
>> On Tue, 15 Jun 2021 14:35:27 +0200
>> Rasmus Villemoes <[email protected]> wrote:
>>
>>> Reverting commit 3d5bfbd9716318b1ca5c38488aa69f64d38a9aa5 (gpio:
>>> mpc8xxx: change the gpio interrupt flags.) makes it go away, as does
>>> disabling CONFIG_PREEMPT_RT or simply booting a vanilla v5.10.42 (where
>>> that option exists but cannot be selected).
>>
>> I'm curious if it will also trigger on vanilla v5.10.42 but add to the
>> kernel command line: threadirqs
>>
>> Make sure you have CONFIG_IRQ_FORCED_THREADING set too.
>>
>> Because it appears to be an issue with that being called by the generic
>> threaded irq infrastructure, which PREEMPT_RT enables automatically.
>
> It doesn't:
>
> ~ # uname -r
> 5.10.42-00001-g10216cf63a12
> ~ # grep -ow threadirqs /proc/cmdline
> threadirqs
> ~ # zcat /proc/config.gz | grep FORCED_THREADING
> CONFIG_IRQ_FORCED_THREADING=y
> ~ # dmesg | grep WARNING
> ~ #

And as an extra data point, it also doesn't trigger on 5.10.41-rt42
configured without PREEMPT_RT but with threadirqs on the command line.

Rasmus

2021-06-15 17:12:33

by Steven Rostedt

[permalink] [raw]
Subject: Re: commit 3d5bfbd97163 versus -rt

On Tue, 15 Jun 2021 18:24:20 +0200
Rasmus Villemoes <[email protected]> wrote:

> > ~ # uname -r
> > 5.10.42-00001-g10216cf63a12
> > ~ # grep -ow threadirqs /proc/cmdline
> > threadirqs
> > ~ # zcat /proc/config.gz | grep FORCED_THREADING
> > CONFIG_IRQ_FORCED_THREADING=y
> > ~ # dmesg | grep WARNING
> > ~ #
>
> And as an extra data point, it also doesn't trigger on 5.10.41-rt42
> configured without PREEMPT_RT but with threadirqs on the command line.

Sounds to me that there's a "spin_lock_irq*" somewhere in the path, because
from what I can see, there's not much difference with the IRQ code between
5.10.41 and 5.10.41-rt42. But if you are seeing it only with PREEMPT_RT
set, that tells me that without PREEMPT_RT, interrupts are disabled at that
point, but not with PREEMPT_RT. The only thing I can think of that would do
that is a spin_lock_irq*() taken (not a raw_spin_lock_irq*()).

-- Steve

2021-06-16 08:02:25

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: commit 3d5bfbd97163 versus -rt

Hello.

On Tue, Jun 15, 2021 at 01:09:59PM -0400, Steven Rostedt wrote:
> On Tue, 15 Jun 2021 18:24:20 +0200
> Rasmus Villemoes <[email protected]> wrote:
>
> > > ~ # uname -r
> > > 5.10.42-00001-g10216cf63a12
> > > ~ # grep -ow threadirqs /proc/cmdline
> > > threadirqs
> > > ~ # zcat /proc/config.gz | grep FORCED_THREADING
> > > CONFIG_IRQ_FORCED_THREADING=y
> > > ~ # dmesg | grep WARNING
> > > ~ #
> >
> > And as an extra data point, it also doesn't trigger on 5.10.41-rt42
> > configured without PREEMPT_RT but with threadirqs on the command line.
>
> Sounds to me that there's a "spin_lock_irq*" somewhere in the path, because
> from what I can see, there's not much difference with the IRQ code between
> 5.10.41 and 5.10.41-rt42. But if you are seeing it only with PREEMPT_RT
> set, that tells me that without PREEMPT_RT, interrupts are disabled at that
> point, but not with PREEMPT_RT. The only thing I can think of that would do
> that is a spin_lock_irq*() taken (not a raw_spin_lock_irq*()).

This reminds me [1] and [2].

I'm carrying forward [3] in my domestic kernel build to cope with that.

/cc'ing people involved back then.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://bugzilla.kernel.org/show_bug.cgi?id=202453
[3] https://gitlab.com/post-factum/pf-kernel/-/commit/f7c99d74cca99d71179d63e827811f0df51bd8fc

--
Oleksandr Natalenko (post-factum)

2021-06-19 02:25:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: commit 3d5bfbd97163 versus -rt

On Tue, Jun 15 2021 at 14:35, Rasmus Villemoes wrote:
> [<8016fb80>] (generic_handle_irq) from [<8043e98c>]
> (mpc8xxx_gpio_irq_cascade+0xac/0xd0)
> [<8043e8e0>] (mpc8xxx_gpio_irq_cascade) from [<80171c80>]
> (irq_forced_thread_fn+0x38/0x8c)
> r5:80e282c0 r4:80deda00
> [<80171c48>] (irq_forced_thread_fn) from [<80171eb4>]
> (irq_thread+0x11c/0x238)
.
> Reverting commit 3d5bfbd9716318b1ca5c38488aa69f64d38a9aa5 (gpio:
> mpc8xxx: change the gpio interrupt flags.) makes it go away, as does
> disabling CONFIG_PREEMPT_RT or simply booting a vanilla v5.10.42 (where
> that option exists but cannot be selected).
>
> This seems to be the kind of thing where an -rt expert can immediately
> see what's wrong and how to fix it. Ideas anyone?

Let me explain. The force threaded interrupt code in mainline made the
wrong assumption that disabling softirqs is sufficient to provide the
semantics of non-threaded handler invocation. This turned out to be
wrong and caused people to do fancy workarounds.

See 81e2073c175b ("genirq: Disable interrupts for force threaded
handlers") for details.

So people started to remove IRQF_NO_THREAD or local_irq_save/restore
pairs in drivers.

But 3d5bfbd97163 ("gpio: mpc8xxx: change the gpio interrupt flags.") has
nothing to do with that:

"Delete the interrupt IRQF_NO_THREAD flags in order to gpio interrupts
can be threaded to allow high-priority processes to preempt."

This changelog is blatantly wrong. In mainline forced irq threads have
always been invoked with softirqs disabled, which obviously makes them
non-preemptible. And on RT this would have exploded exactly in the way
you observed.

The commit predates 81e2073c175b and therefore was obviously never
tested neither on mainline nor on RT.

Bartosz, please revert this ASAP.

I'll add a lockdep assert into generic_handle_irq() to make it obvious
where the problem is. That won't help with completely untested garbage
of course.

81e2073c175b is obviously a problem for demultiplexing drivers which
make weird assumptions when RT comes into play and I'm not surprised
that there is some fallout which we need to look at. Especially when
people start to zap IRQF_NO_THREAD or irq_save/restore pairs blindly.

Thanks,

tglx

2021-06-21 07:56:08

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: commit 3d5bfbd97163 versus -rt

On 18/06/2021 22.04, Thomas Gleixner wrote:
> On Tue, Jun 15 2021 at 14:35, Rasmus Villemoes wrote:
>> [<8016fb80>] (generic_handle_irq) from [<8043e98c>]
>> (mpc8xxx_gpio_irq_cascade+0xac/0xd0)
>> [<8043e8e0>] (mpc8xxx_gpio_irq_cascade) from [<80171c80>]
>> (irq_forced_thread_fn+0x38/0x8c)
>> r5:80e282c0 r4:80deda00
>> [<80171c48>] (irq_forced_thread_fn) from [<80171eb4>]
>> (irq_thread+0x11c/0x238)
> .
>> Reverting commit 3d5bfbd9716318b1ca5c38488aa69f64d38a9aa5 (gpio:
>> mpc8xxx: change the gpio interrupt flags.) makes it go away, as does
>> disabling CONFIG_PREEMPT_RT or simply booting a vanilla v5.10.42 (where
>> that option exists but cannot be selected).
>>
>> This seems to be the kind of thing where an -rt expert can immediately
>> see what's wrong and how to fix it. Ideas anyone?
>
> Let me explain. The force threaded interrupt code in mainline made the
> wrong assumption that disabling softirqs is sufficient to provide the
> semantics of non-threaded handler invocation. This turned out to be
> wrong and caused people to do fancy workarounds.
>
> See 81e2073c175b ("genirq: Disable interrupts for force threaded
> handlers") for details.
>
> So people started to remove IRQF_NO_THREAD or local_irq_save/restore
> pairs in drivers.
>
> But 3d5bfbd97163 ("gpio: mpc8xxx: change the gpio interrupt flags.") has
> nothing to do with that:
>
> "Delete the interrupt IRQF_NO_THREAD flags in order to gpio interrupts
> can be threaded to allow high-priority processes to preempt."
>
> This changelog is blatantly wrong. In mainline forced irq threads have
> always been invoked with softirqs disabled, which obviously makes them
> non-preemptible. And on RT this would have exploded exactly in the way
> you observed.
>
> The commit predates 81e2073c175b and therefore was obviously never
> tested neither on mainline nor on RT.

Thanks. Commit 81e2073c175b was included in 5.10.y per 5.10.26, and I
can confirm that booting a vanilla 5.10.25 with threadirqs does indeed
trigger the same splat I see on 5.10.41-rt42.

> Bartosz, please revert this ASAP.

Thanks for confirming that reverting 3d5bfbd97163 is the right thing to do.

Rasmus