2022-05-02 23:54:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3] irq/core: synchronize irq_thread startup

On Mon, May 02 2022 at 13:28, Thomas Pfaff wrote:
> While running
> "while /bin/true; do setserial /dev/ttyS0 uart none;
> setserial /dev/ttyS0 uart 16550A; done"
> on a kernel with threaded irqs, setserial is hung after some calls.
>
> setserial opens the device, this will install an irq handler if the uart is
> not none, followed by TIOCGSERIAL and TIOCSSERIAL ioctls.
> Then the device is closed. On close, synchronize_irq() is called by
> serial_core.

This comment made me look deeper because I expected that free_irq()
would hang.

But free_irq() stopped issuing synchronize_irq() with commit
519cc8652b3a ("genirq: Synchronize only with single thread on
free_irq()"). And that turns out to be the root cause of the problem.
I should have caught that back then, but in hindsight ....

While the proposed patch works, I think the real solution is to ensure
that both the hardware interrupt _and_ the interrupt threads which are
associated to the removed action are in quiescent state. This should
catch the case you observed.

Something like the untested below.

Thanks,

tglx
---
Subject: genirq: Quiesce interrupt threads in free_irq()
From: Thomas Gleixner <[email protected]>
Date: Mon, 02 May 2022 15:40:25 +0200

Fill void...

Fixes: 519cc8652b3a ("genirq: Synchronize only with single thread on free_irq()")
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/irq/manage.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)

--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1914,6 +1914,22 @@ static struct irqaction *__free_irq(stru
*/
__synchronize_hardirq(desc, true);

+ /*
+ * Wait for associated interrupt threads to complete. This cannot
+ * use synchronize_irq() due to interrupt sharing in the PCIe
+ * layer. See 519cc8652b3a ("genirq: Synchronize only with single
+ * thread on free_irq()") for further explanation.
+ */
+ if (action->thread) {
+ unsigned int thread_mask = action->thread_mask;
+
+ if (action->secondary)
+ thread_mask |= action->secondary->thread_mask;
+
+ wait_event(desc->wait_for_threads,
+ !(atomic_read(&desc->threads_active) & thread_mask));
+ }
+
#ifdef CONFIG_DEBUG_SHIRQ
/*
* It's a shared IRQ -- the driver ought to be prepared for an IRQ
@@ -1931,10 +1947,11 @@ static struct irqaction *__free_irq(stru
#endif

/*
- * The action has already been removed above, but the thread writes
- * its oneshot mask bit when it completes. Though request_mutex is
- * held across this which prevents __setup_irq() from handing out
- * the same bit to a newly requested action.
+ * The action has already been removed above and both the hardware
+ * interrupt and the associated threads have been synchronized,
+ * which means they are in quiescent state. request_mutex is still
+ * held which prevents __setup_irq() from handing out action's
+ * thread_mask to a newly requested action.
*/
if (action->thread) {
kthread_stop(action->thread);


2022-05-10 11:08:10

by Thomas Pfaff

[permalink] [raw]
Subject: Re: [PATCH v3] irq/core: synchronize irq_thread startup



On Mon, 2 May 2022, Thomas Gleixner wrote:

> On Mon, May 02 2022 at 13:28, Thomas Pfaff wrote:
> > While running
> > "while /bin/true; do setserial /dev/ttyS0 uart none;
> > setserial /dev/ttyS0 uart 16550A; done"
> > on a kernel with threaded irqs, setserial is hung after some calls.
> >
> > setserial opens the device, this will install an irq handler if the uart is
> > not none, followed by TIOCGSERIAL and TIOCSSERIAL ioctls.
> > Then the device is closed. On close, synchronize_irq() is called by
> > serial_core.
>
> This comment made me look deeper because I expected that free_irq()
> would hang.
>
> But free_irq() stopped issuing synchronize_irq() with commit
> 519cc8652b3a ("genirq: Synchronize only with single thread on
> free_irq()"). And that turns out to be the root cause of the problem.
> I should have caught that back then, but in hindsight ....
>

Sorry for coming back to this again late, but this makes me believe that
the real problem for the freeze in setserial is that uart_port_shutdown()
is calling synchronize_irq() after free_irq(), which is illegal in my
opinion.

It can be done only before the interrupt thread is stopped, and free_irq()
itself is already taking care about synchronizing, no matter if its done by
__synchronize_hardirq() or by synchronize_irq(), like it was before commit
519cc8652b3a.
If it is called after free_irq(), the context is already lost.

I am not sure about all the other drivers, but at least serial_core should
be fixed if you agree.

Thanks,
Thomas



2022-05-10 12:47:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3] irq/core: synchronize irq_thread startup

On Tue, May 10 2022 at 10:43, Thomas Pfaff wrote:
> On Mon, 2 May 2022, Thomas Gleixner wrote:
>> On Mon, May 02 2022 at 13:28, Thomas Pfaff wrote:
>> This comment made me look deeper because I expected that free_irq()
>> would hang.
>>
>> But free_irq() stopped issuing synchronize_irq() with commit
>> 519cc8652b3a ("genirq: Synchronize only with single thread on
>> free_irq()"). And that turns out to be the root cause of the problem.
>> I should have caught that back then, but in hindsight ....
>>
>
> Sorry for coming back to this again late, but this makes me believe that
> the real problem for the freeze in setserial is that uart_port_shutdown()
> is calling synchronize_irq() after free_irq(), which is illegal in my
> opinion.

Well, I'd say pointless.

But it's not the real problem, it's the messenger which unearthed the
underlying issue. Even if you remove that call, the underlying problem
persists because the interrupt descriptor is in inconsistent state.

> It can be done only before the interrupt thread is stopped, and free_irq()
> itself is already taking care about synchronizing, no matter if its done by
> __synchronize_hardirq() or by synchronize_irq(), like it was before commit
> 519cc8652b3a.

No, it does not really take care about it. It can return with
irq_desc::threads_active > 0 due to the interrupt thread being stopped
before reaching the thread function. Think about shared interrupts.

> If it is called after free_irq(), the context is already lost.

That's correct.

> I am not sure about all the other drivers, but at least serial_core should
> be fixed if you agree.

Yes, that call is pointless.

Thanks,

tglx

2022-05-10 13:45:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3] irq/core: synchronize irq_thread startup

On Tue, May 10 2022 at 13:34, Thomas Gleixner wrote:
> On Tue, May 10 2022 at 10:43, Thomas Pfaff wrote:
>> It can be done only before the interrupt thread is stopped, and free_irq()
>> itself is already taking care about synchronizing, no matter if its done by
>> __synchronize_hardirq() or by synchronize_irq(), like it was before commit
>> 519cc8652b3a.
>
> No, it does not really take care about it. It can return with
> irq_desc::threads_active > 0 due to the interrupt thread being stopped
> before reaching the thread function. Think about shared interrupts.

Duh. Hit send too fast.

It does matter whether the synchronization is done via
__synchronize_hardirq() or via synchronize_irq(). The latter ensured
that the thread reached the thread function and handled the pending
wakeup _before_ kthread_stop() become true.

So the fix is required to undo the damage created by 519cc8652b3a.

The synchronize_irq() after free_irq() is a completely different
problem.

Thanks,

tglx

2022-05-10 15:55:23

by Thomas Pfaff

[permalink] [raw]
Subject: Re: [PATCH v3] irq/core: synchronize irq_thread startup



On Tue, 10 May 2022, Thomas Gleixner wrote:

> It does matter whether the synchronization is done via
> __synchronize_hardirq() or via synchronize_irq(). The latter ensured
> that the thread reached the thread function and handled the pending
> wakeup _before_ kthread_stop() become true.
>
> So the fix is required to undo the damage created by 519cc8652b3a.
>
> The synchronize_irq() after free_irq() is a completely different
> problem.

Thank you for the clarification.
I was unsure if I missed something.

Thomas