2022-12-09 13:32:35

by Sverdlin, Alexander

[permalink] [raw]
Subject: Invalid locking pattern in Documentation/kernel-hacking/locking.rst?

Dear documentation maintainers,

the latest version of locking.rst contains the following (since 2005):

"Manfred Spraul points out that you can still do this, even if the data
is very occasionally accessed in user context or softirqs/tasklets. The
irq handler doesn't use a lock, and all other accesses are done as so::

spin_lock(&lock);
disable_irq(irq);
"

Isn't it "sleeping in atomic" actually because of the sleeping
disable_irq()?

--
Alexander Sverdlin
Siemens AG
http://www.siemens.com


2022-12-09 19:39:36

by Manfred Spraul

[permalink] [raw]
Subject: Re: Invalid locking pattern in Documentation/kernel-hacking/locking.rst?

Hi Alexander,

On 12/9/22 13:23, Sverdlin, Alexander wrote:
> Dear documentation maintainers,
>
> the latest version of locking.rst contains the following (since 2005):
>
> "Manfred Spraul points out that you can still do this, even if the data
> is very occasionally accessed in user context or softirqs/tasklets. The
> irq handler doesn't use a lock, and all other accesses are done as so::
>
> spin_lock(&lock);
> disable_irq(irq);
> "
>
> Isn't it "sleeping in atomic" actually because of the sleeping
> disable_irq()?

Good catch!

The documentation of disable_irq() claims that the function is safe to
be called from IRQ context (for careful callers)

But it calls synchronize_irq(). And synchronize_irq() claims that the
function can be called only from preemptible code.

The change was in 2009:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.1-rc8&id=3aa551c9b4c40018f0e261a178e3d25478dc04a9


@Thomas/@Ingo: What do we want?

Declare disable_irq()&synchronize_irq() as safe from irq context only if
no threaded irq handlers are used?

Or declare both function as preemptible context only?


The update for locking.rst would be to switch from spin_lock() to
mutex_lock().


https://elixir.bootlin.com/linux/latest/source/kernel/irq/manage.c#L126

> /**
> * synchronize_irq - wait for pending IRQ handlers (on other CPUs)
> * @irq: interrupt number to wait for
> *
> * This function waits for any pending IRQ handlers for this interrupt
> * to complete before returning. If you use this function while
> * holding a resource the IRQ handler may need you will deadlock.
> *
> * Can only be called from preemptible code as it might sleep when
> * an interrupt thread is associated to @irq.
> *
> * It optionally makes sure (when the irq chip supports that method)
> * that the interrupt is not pending in any CPU and waiting for
> * service.
> */
> void synchronize_irq
> <https://elixir.bootlin.com/linux/latest/C/ident/synchronize_irq>(unsigned int irq)
> {
> struct irq_desc <https://elixir.bootlin.com/linux/latest/C/ident/irq_desc> *desc = irq_to_desc
> <https://elixir.bootlin.com/linux/latest/C/ident/irq_to_desc>(irq);
>
> if (desc) {
> __synchronize_hardirq
> <https://elixir.bootlin.com/linux/latest/C/ident/__synchronize_hardirq>(desc, true <https://elixir.bootlin.com/linux/latest/C/ident/true>);
> /*
> * We made sure that no hardirq handler is
> * running. Now verify that no threaded handlers are
> * active.
> */
> wait_event
> <https://elixir.bootlin.com/linux/latest/C/ident/wait_event>(desc->wait_for_threads
> <https://elixir.bootlin.com/linux/latest/C/ident/wait_for_threads>,
> !atomic_read
> <https://elixir.bootlin.com/linux/latest/C/ident/atomic_read>(&desc->threads_active
> <https://elixir.bootlin.com/linux/latest/C/ident/threads_active>));
> }
> }
> EXPORT_SYMBOL
> <https://elixir.bootlin.com/linux/latest/C/ident/EXPORT_SYMBOL>(synchronize_irq
> <https://elixir.bootlin.com/linux/latest/C/ident/synchronize_irq>);

https://elixir.bootlin.com/linux/latest/source/kernel/irq/manage.c#L716

> /**
> * disable_irq - disable an irq and wait for completion
> * @irq: Interrupt to disable
> *
> * Disable the selected interrupt line. Enables and Disables are
> * nested.
> * This function waits for any pending IRQ handlers for this interrupt
> * to complete before returning. If you use this function while
> * holding a resource the IRQ handler may need you will deadlock.
> *
> * This function may be called - with care - from IRQ context.
> */
> void disable_irq
> <https://elixir.bootlin.com/linux/latest/C/ident/disable_irq>(unsigned int irq)
> {
> if (!__disable_irq_nosync
> <https://elixir.bootlin.com/linux/latest/C/ident/__disable_irq_nosync>(irq))
> synchronize_irq
> <https://elixir.bootlin.com/linux/latest/C/ident/synchronize_irq>(irq);
> }
>
--

    Manfred

2022-12-12 15:21:09

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Invalid locking pattern in Documentation/kernel-hacking/locking.rst?

On Fri, Dec 09 2022 at 19:47, Manfred Spraul wrote:
> On 12/9/22 13:23, Sverdlin, Alexander wrote:
>> the latest version of locking.rst contains the following (since 2005):
>>
>> "Manfred Spraul points out that you can still do this, even if the data
>> is very occasionally accessed in user context or softirqs/tasklets. The
>> irq handler doesn't use a lock, and all other accesses are done as so::
>>
>> spin_lock(&lock);
>> disable_irq(irq);
>> "
>>
>> Isn't it "sleeping in atomic" actually because of the sleeping
>> disable_irq()?
>
> Good catch!
>
> The documentation of disable_irq() claims that the function is safe to
> be called from IRQ context (for careful callers)
>
> But it calls synchronize_irq(). And synchronize_irq() claims that the
> function can be called only from preemptible code.
>
> The change was in 2009:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.1-rc8&id=3aa551c9b4c40018f0e261a178e3d25478dc04a9
>
>
> @Thomas/@Ingo: What do we want?
>
> Declare disable_irq()&synchronize_irq() as safe from irq context only if
> no threaded irq handlers are used?
>
> Or declare both function as preemptible context only?

The latter.

The comment for disable_irq() needs to go away too:

"This function may be called - with care - from IRQ context."

Obviously it can't be called from the interrupt context which it
tries to disable as it will live-lock on the "in progress" flag.

So that leaves the option to call it from some unrelated interrupt
context which does not make much sense. In fact, back in the days when
this comment was added it was still allowed to reenable interrupts in
the interrupt handler, which obviously opens the window for some other
interrupt to come in which then tries to disable the one it just
interrupted. Not an issue anymore, but the synchronize_irq() change to
handle threaded interrupts made it more or less impossible.

Thanks,

tglx