2024-05-09 14:41:11

by Muni Sekhar

[permalink] [raw]
Subject: Seeking Assistance with Spin Lock Usage and Resolving Hard LOCKUP Error

Dear Linux Kernel Community,

I am reaching out to seek assistance regarding the usage of spin locks
in the Linux kernel and to address a recurring issue related to hard
LOCKUP errors that I have encountered during testing.

Recently, I developed a small kernel module that involves ISR handling
and utilizes the spinlock_t primitive. In my module, I have employed
spin locks both in process context using spin_lock() and spin_unlock()
APIs, as well as in ISR context using spin_lock_irqsave() and
spin_unlock_irqrestore() APIs.

Here is a brief overview of how I have implemented spin locks in my module:

spinlock_t my_spinlock; // Declare a spin lock

// In ISR context (interrupt handler):
spin_lock_irqsave(&my_spinlock, flags);
// ... Critical section ...
spin_unlock_irqrestore(&my_spinlock, flags);


// In process context: (struct file_operations.read)
spin_lock(&my_spinlock);
// ... Critical section ...
spin_unlock(&my_spinlock);


However, during testing, I have encountered a scenario where a hard
LOCKUP (NMI watchdog: Watchdog detected hard LOCKUP on cpu 2) error
occurs, specifically when a process context code execution triggers
the spin_lock() function and is preempted by an interrupt that enters
the ISR context and encounters the spin_lock_irqsave() function. This
situation leads to the CPU being stuck indefinitely.

My primary concern is to understand the appropriate usage of spin
locks in both process and ISR contexts to avoid such hard LOCKUP
errors. I am seeking clarification on the following points:

Is it safe to use spin_lock_irqsave() and spin_unlock_irqrestore()
APIs in ISR context and spin_lock() and spin_unlock() APIs in process
context simultaneously?
In scenarios where a process context code execution is preempted
by an interrupt and enters ISR context, how should spin locks be used
to prevent hard LOCKUP errors?
Are there any specific guidelines or best practices for using spin
locks in scenarios involving both process and ISR contexts?

I would greatly appreciate any insights, guidance, or suggestions from
the experienced members of the Linux kernel community to help address
this issue and ensure the correct and efficient usage of spin locks in
my kernel module.

Thank you very much for your time and assistance.

--
Thanks,
Sekhar


2024-05-09 15:28:33

by Billie Alsup (balsup)

[permalink] [raw]
Subject: Re: Seeking Assistance with Spin Lock Usage and Resolving Hard LOCKUP Error

>From:?Muni Sekhar <[email protected]>

>Here is a brief overview of how I have implemented spin locks in my module:
>
>spinlock_t my_spinlock; // Declare a spin lock
>
>// In ISR context (interrupt handler):
>spin_lock_irqsave(&my_spinlock, flags);
>// ... Critical section ...
>spin_unlock_irqrestore(&my_spinlock, flags);
>
>
>// In process context: (struct file_operations.read)
>spin_lock(&my_spinlock);
>// ... Critical section ...
>spin_unlock(&my_spinlock);

from my understanding, you have the usage backwards. It is the irqsave/irqrestore versions that should be used within process context to prevent the interrupt from being handled on the same cpu while executing in your critical section.

The use of irqsave/irqrestore within the isr itself is ok, although perhaps unnecessary. It depends on whether the interrupt can occur again while you are servicing the interrupt (whether on this cpu or another). Usually (?) the same interrupt does not nest, unless you have explicitly coded to allow it (for example, by acknowledging and re-enabling the interrupt early in your ISR). Certainly the spinlock is necessary to protect the critical section from running in an isr on one cpu and process space on another cpu.

From a lockup perspective, not doing the irqsave/irqrestore from process context could explain your problem. Also look for code (anywhere!) that blindly enables interrupts, rather than doing irqrestore from a prior irqsave.

2024-05-09 17:27:26

by Muni Sekhar

[permalink] [raw]
Subject: Re: Seeking Assistance with Spin Lock Usage and Resolving Hard LOCKUP Error

On Thu, May 9, 2024 at 8:56 PM Billie Alsup (balsup) <[email protected]> wrote:
>
> >From: Muni Sekhar <[email protected]>
>
> >Here is a brief overview of how I have implemented spin locks in my module:
> >
> >spinlock_t my_spinlock; // Declare a spin lock
> >
> >// In ISR context (interrupt handler):
> >spin_lock_irqsave(&my_spinlock, flags);
> >// ... Critical section ...
> >spin_unlock_irqrestore(&my_spinlock, flags);
> >
> >
> >// In process context: (struct file_operations.read)
> >spin_lock(&my_spinlock);
> >// ... Critical section ...
> >spin_unlock(&my_spinlock);
>
> from my understanding, you have the usage backwards. It is the irqsave/irqrestore versions that should be used within process context to prevent the interrupt from being handled on the same cpu while executing in your critical section.
>
> The use of irqsave/irqrestore within the isr itself is ok, although perhaps unnecessary. It depends on whether the interrupt can occur again while you are servicing the interrupt (whether on this cpu or another). Usually (?) the same interrupt does not nest, unless you have explicitly coded to allow it (for example, by acknowledging and re-enabling the interrupt early in your ISR). Certainly the spinlock is necessary to protect the critical section from running in an isr on one cpu and process space on another cpu.
>
In the scenario where an interrupt occurs while we are servicing the
interrupt, and in the scenario where it doesn't occur while we are
servicing the interrupt, when should we use the
spin_lock_irqsave/spin_unlock_irqrestore APIs?
> From a lockup perspective, not doing the irqsave/irqrestore from process context could explain your problem. Also look for code (anywhere!) that blindly enables interrupts, rather than doing irqrestore from a prior irqsave.



--
Thanks,
Sekhar

2024-05-09 17:59:08

by Billie Alsup (balsup)

[permalink] [raw]
Subject: Re: Seeking Assistance with Spin Lock Usage and Resolving Hard LOCKUP Error

>From:?Muni Sekhar <[email protected]>
>In the scenario where an interrupt occurs while we are servicing the
>interrupt, and in the scenario where it doesn't occur while we are
>servicing the interrupt, when should we use the
>spin_lock_irqsave/spin_unlock_irqrestore APIs?

In my experience, the interrupts are masked by the infrastructure before invoking the
interrupt service routine. ?So unless you explicitly re-enable them, there shouldn't be
a nested interrupt for the same interrupt number.

It is the code run at process context that must be protected using the irqsave/irqrestore
versions. ?You want to not only enter the critical section, but also prevent
the interrupt from occurring (on the same cpu at least). ?If you enter the critical section in
process context, but then take an interrupt and attempt to again enter the
critical section, then your interrupt routine will deadlock. the interrupt routine will never?
be able to acquire the lock, and the process context code that was interrupted will never be
able to complete to release the lock. ?So the process context code requires the
irqsave/irqrestore variant to not only take the lock, but also prevent a competing interrupt
routine from being triggered while you hold the lock.

Bottom line is that if a critical section can be entered via both process context
and interrupt context, then the process context invocation should use the irqsave/irqrestore
variants to disable the interrupt before taking the lock. If it is common code shared between
process context and interrupt context, then there is no harm in calling the irqsave/irqrestore
version from both contexts.

Otherwise, the standard spin_lock/spin_unlock variants (without irqsave/irqrestore) would be
used for a critical section shared by multiple threads (different cpus), or when your code has
already (separately) handled disabling interrupts as needed before invoking spin_lock.



2024-05-10 07:13:38

by Muni Sekhar

[permalink] [raw]
Subject: Re: Seeking Assistance with Spin Lock Usage and Resolving Hard LOCKUP Error

On Thu, May 9, 2024 at 11:27 PM Billie Alsup (balsup) <balsup@ciscocom> wrote:
>
> >From: Muni Sekhar <[email protected]>
> >In the scenario where an interrupt occurs while we are servicing the
> >interrupt, and in the scenario where it doesn't occur while we are
> >servicing the interrupt, when should we use the
> >spin_lock_irqsave/spin_unlock_irqrestore APIs?
>
> In my experience, the interrupts are masked by the infrastructure before invoking the
> interrupt service routine. So unless you explicitly re-enable them, there shouldn't be
> a nested interrupt for the same interrupt number.
>
> It is the code run at process context that must be protected using the irqsave/irqrestore
> versions. You want to not only enter the critical section, but also prevent
> the interrupt from occurring (on the same cpu at least). If you enter the critical section in
> process context, but then take an interrupt and attempt to again enter the
> critical section, then your interrupt routine will deadlock. the interrupt routine will never
> be able to acquire the lock, and the process context code that was interrupted will never be
> able to complete to release the lock. So the process context code requires the
> irqsave/irqrestore variant to not only take the lock, but also prevent a competing interrupt
> routine from being triggered while you hold the lock.
>
> Bottom line is that if a critical section can be entered via both process context
> and interrupt context, then the process context invocation should use the irqsave/irqrestore
> variants to disable the interrupt before taking the lock. If it is common code shared between
> process context and interrupt context, then there is no harm in calling the irqsave/irqrestore
> version from both contexts.
Thanks a lot for the detailed clarification.
>
> Otherwise, the standard spin_lock/spin_unlock variants (without irqsave/irqrestore) would be
> used for a critical section shared by multiple threads (different cpus), or when your code has
> already (separately) handled disabling interrupts as needed before invoking spin_lock.
>
>


--
Thanks,
Sekhar

2024-05-17 20:44:56

by Jim Cromie

[permalink] [raw]
Subject: Re: Seeking Assistance with Spin Lock Usage and Resolving Hard LOCKUP Error

On Thu, May 9, 2024 at 8:39 AM Muni Sekhar <[email protected]> wrote:
>
> Dear Linux Kernel Community,
>
> I am reaching out to seek assistance regarding the usage of spin locks
> in the Linux kernel and to address a recurring issue related to hard
> LOCKUP errors that I have encountered during testing.
>

build your kernel with LOCKDEP everything ?


> Recently, I developed a small kernel module that involves ISR handling
> and utilizes the spinlock_t primitive. In my module, I have employed
> spin locks both in process context using spin_lock() and spin_unlock()
> APIs, as well as in ISR context using spin_lock_irqsave() and
> spin_unlock_irqrestore() APIs.
>
> Here is a brief overview of how I have implemented spin locks in my module:
>

I certainly dont know whether the above and below are legal.
Id be comparing my usage to working examples from the source-code.

and you didnt say anything about your module or what it does.
(fwiw, you'd get more help if it were "our" module, ie gpl'd)



> However, during testing, I have encountered a scenario where a hard
> LOCKUP (NMI watchdog: Watchdog detected hard LOCKUP on cpu 2) error
> occurs, specifically when a process context code execution triggers
> the spin_lock() function and is preempted by an interrupt that enters
> the ISR context and encounters the spin_lock_irqsave() function. This
> situation leads to the CPU being stuck indefinitely.
>

Id build w/o watchdog, to see what else goes wrong.
2 different errors might help find common cause.



> My primary concern is to understand the appropriate usage of spin
> locks in both process and ISR contexts to avoid such hard LOCKUP
> errors. I am seeking clarification on the following points:
>

Documentation/locking/hwspinlock.rst

> Is it safe to use spin_lock_irqsave() and spin_unlock_irqrestore()
> APIs in ISR context and spin_lock() and spin_unlock() APIs in process
> context simultaneously?
> In scenarios where a process context code execution is preempted
> by an interrupt and enters ISR context, how should spin locks be used
> to prevent hard LOCKUP errors?
> Are there any specific guidelines or best practices for using spin
> locks in scenarios involving both process and ISR contexts?
>
> I would greatly appreciate any insights, guidance, or suggestions from
> the experienced members of the Linux kernel community to help address
> this issue and ensure the correct and efficient usage of spin locks in
> my kernel module.
>
> Thank you very much for your time and assistance.
>
> --
> Thanks,
> Sekhar
>
> _______________________________________________
> Kernelnewbies mailing list
> [email protected]
> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies