2023-06-27 15:41:46

by Chengfeng Ye

[permalink] [raw]
Subject: [PATCH] ipmi: fix potential deadlock on &kcs_bmc->lock

As kcs_bmc_handle_event() is executed inside both a timer and a hardirq,
it should disable irq before lock acquisition otherwise deadlock could
happen if the timmer is preemtped by the irq.

Possible deadlock scenario:
aspeed_kcs_check_obe() (timer)
-> kcs_bmc_handle_event()
-> spin_lock(&kcs_bmc->lock)
<irq interruption>
-> aspeed_kcs_irq()
-> kcs_bmc_handle_event()
-> spin_lock(&kcs_bmc->lock) (deadlock here)

This flaw was found using an experimental static analysis tool we are
developing for irq-related deadlock.

The tentative patch fix the potential deadlock by spin_lock_irqsave()

Signed-off-by: Chengfeng Ye <[email protected]>
---
drivers/char/ipmi/kcs_bmc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/char/ipmi/kcs_bmc.c b/drivers/char/ipmi/kcs_bmc.c
index 03d02a848f3a..8b1161d5194a 100644
--- a/drivers/char/ipmi/kcs_bmc.c
+++ b/drivers/char/ipmi/kcs_bmc.c
@@ -56,12 +56,13 @@ irqreturn_t kcs_bmc_handle_event(struct kcs_bmc_device *kcs_bmc)
{
struct kcs_bmc_client *client;
irqreturn_t rc = IRQ_NONE;
+ unsigned long flags;

- spin_lock(&kcs_bmc->lock);
+ spin_lock_irqsave(&kcs_bmc->lock, flags);
client = kcs_bmc->client;
if (client)
rc = client->ops->event(client);
- spin_unlock(&kcs_bmc->lock);
+ spin_unlock_irqrestore(&kcs_bmc->lock, flags);

return rc;
}
--
2.17.1



2023-06-28 12:21:34

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH] ipmi: fix potential deadlock on &kcs_bmc->lock

Indeed, this looks like an issue.

Andrew, any opinions on this? The attached patch will work, the other
option would be to disable interrupts when calling
kcs_bmc_handle_event() in the timer handler. But then you have to worry
about RT.

-corey

On Tue, Jun 27, 2023 at 03:24:49PM +0000, Chengfeng Ye wrote:
> As kcs_bmc_handle_event() is executed inside both a timer and a hardirq,
> it should disable irq before lock acquisition otherwise deadlock could
> happen if the timmer is preemtped by the irq.
>
> Possible deadlock scenario:
> aspeed_kcs_check_obe() (timer)
> -> kcs_bmc_handle_event()
> -> spin_lock(&kcs_bmc->lock)
> <irq interruption>
> -> aspeed_kcs_irq()
> -> kcs_bmc_handle_event()
> -> spin_lock(&kcs_bmc->lock) (deadlock here)
>
> This flaw was found using an experimental static analysis tool we are
> developing for irq-related deadlock.
>
> The tentative patch fix the potential deadlock by spin_lock_irqsave()
>
> Signed-off-by: Chengfeng Ye <[email protected]>
> ---
> drivers/char/ipmi/kcs_bmc.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/char/ipmi/kcs_bmc.c b/drivers/char/ipmi/kcs_bmc.c
> index 03d02a848f3a..8b1161d5194a 100644
> --- a/drivers/char/ipmi/kcs_bmc.c
> +++ b/drivers/char/ipmi/kcs_bmc.c
> @@ -56,12 +56,13 @@ irqreturn_t kcs_bmc_handle_event(struct kcs_bmc_device *kcs_bmc)
> {
> struct kcs_bmc_client *client;
> irqreturn_t rc = IRQ_NONE;
> + unsigned long flags;
>
> - spin_lock(&kcs_bmc->lock);
> + spin_lock_irqsave(&kcs_bmc->lock, flags);
> client = kcs_bmc->client;
> if (client)
> rc = client->ops->event(client);
> - spin_unlock(&kcs_bmc->lock);
> + spin_unlock_irqrestore(&kcs_bmc->lock, flags);
>
> return rc;
> }
> --
> 2.17.1
>

2023-06-30 01:43:38

by Andrew Jeffery

[permalink] [raw]
Subject: Re: [PATCH] ipmi: fix potential deadlock on &kcs_bmc->lock

Hi Corey, Chengfeng,

On Wed, 28 Jun 2023, at 21:17, Corey Minyard wrote:
> Indeed, this looks like an issue.
>
> Andrew, any opinions on this? The attached patch will work, the other
> option would be to disable interrupts when calling
> kcs_bmc_handle_event() in the timer handler. But then you have to worry
> about RT.

Right, I think we'd do best to not over-complicate things.
spin_lock_irq{save,restore}() are the intuitive choice to me.

I'll follow up with R-b/T-b tags once I've booted the patch
and done some testing.

Andrew

>
> -corey
>
> On Tue, Jun 27, 2023 at 03:24:49PM +0000, Chengfeng Ye wrote:
>> As kcs_bmc_handle_event() is executed inside both a timer and a hardirq,
>> it should disable irq before lock acquisition otherwise deadlock could
>> happen if the timmer is preemtped by the irq.
>>
>> Possible deadlock scenario:
>> aspeed_kcs_check_obe() (timer)
>> -> kcs_bmc_handle_event()
>> -> spin_lock(&kcs_bmc->lock)
>> <irq interruption>
>> -> aspeed_kcs_irq()
>> -> kcs_bmc_handle_event()
>> -> spin_lock(&kcs_bmc->lock) (deadlock here)
>>
>> This flaw was found using an experimental static analysis tool we are
>> developing for irq-related deadlock.
>>
>> The tentative patch fix the potential deadlock by spin_lock_irqsave()
>>
>> Signed-off-by: Chengfeng Ye <[email protected]>
>> ---
>> drivers/char/ipmi/kcs_bmc.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/char/ipmi/kcs_bmc.c b/drivers/char/ipmi/kcs_bmc.c
>> index 03d02a848f3a..8b1161d5194a 100644
>> --- a/drivers/char/ipmi/kcs_bmc.c
>> +++ b/drivers/char/ipmi/kcs_bmc.c
>> @@ -56,12 +56,13 @@ irqreturn_t kcs_bmc_handle_event(struct kcs_bmc_device *kcs_bmc)
>> {
>> struct kcs_bmc_client *client;
>> irqreturn_t rc = IRQ_NONE;
>> + unsigned long flags;
>>
>> - spin_lock(&kcs_bmc->lock);
>> + spin_lock_irqsave(&kcs_bmc->lock, flags);
>> client = kcs_bmc->client;
>> if (client)
>> rc = client->ops->event(client);
>> - spin_unlock(&kcs_bmc->lock);
>> + spin_unlock_irqrestore(&kcs_bmc->lock, flags);
>>
>> return rc;
>> }
>> --
>> 2.17.1
>>

2023-07-04 14:43:03

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH] ipmi: fix potential deadlock on &kcs_bmc->lock

On Fri, Jun 30, 2023 at 10:31:02AM +0930, Andrew Jeffery wrote:
> Hi Corey, Chengfeng,
>
> On Wed, 28 Jun 2023, at 21:17, Corey Minyard wrote:
> > Indeed, this looks like an issue.
> >
> > Andrew, any opinions on this? The attached patch will work, the other
> > option would be to disable interrupts when calling
> > kcs_bmc_handle_event() in the timer handler. But then you have to worry
> > about RT.
>
> Right, I think we'd do best to not over-complicate things.
> spin_lock_irq{save,restore}() are the intuitive choice to me.
>
> I'll follow up with R-b/T-b tags once I've booted the patch
> and done some testing.

Thanks. This is in my for-next tree, I'd like to get this in the merge
window, which I believe ends this Sunday.

> >> This flaw was found using an experimental static analysis tool we are
> >> developing for irq-related deadlock.

Will this tool be available for general use? It's obviously quite
handy.

-corey

2023-07-04 16:40:24

by Chengfeng Ye

[permalink] [raw]
Subject: Re: [PATCH] ipmi: fix potential deadlock on &kcs_bmc->lock

Thanks so much for the reply!

> Will this tool be available for general use? It's obviously quite
> handy.

And also thanks for your interest! I am still optimizing the tool, so far
it still can report some false positives in some cases and require a certain
effort of manual checking. Would let you know once I finish my work and
open-source it later!

Best Regards,
Chengfeng

2023-07-05 12:21:38

by Andrew Jeffery

[permalink] [raw]
Subject: Re: [PATCH] ipmi: fix potential deadlock on &kcs_bmc->lock



On Tue, 4 Jul 2023, at 23:57, Corey Minyard wrote:
> On Fri, Jun 30, 2023 at 10:31:02AM +0930, Andrew Jeffery wrote:
>> Hi Corey, Chengfeng,
>>
>> On Wed, 28 Jun 2023, at 21:17, Corey Minyard wrote:
>> > Indeed, this looks like an issue.
>> >
>> > Andrew, any opinions on this? The attached patch will work, the other
>> > option would be to disable interrupts when calling
>> > kcs_bmc_handle_event() in the timer handler. But then you have to worry
>> > about RT.
>>
>> Right, I think we'd do best to not over-complicate things.
>> spin_lock_irq{save,restore}() are the intuitive choice to me.
>>
>> I'll follow up with R-b/T-b tags once I've booted the patch
>> and done some testing.
>
> Thanks. This is in my for-next tree, I'd like to get this in the merge
> window, which I believe ends this Sunday.
>

I've successfully booted an IBM Rainier machine with this patch
applied. Rainier is a Power10-based platform managed by an AST2600 BMC.
The boot process makes heavy use of one of the KCS devices as part of
an MCTP transport binding implementation between the BMC and host
firmware.

Reviewed-by: Andrew Jeffery <[email protected]>
Tested-by: Andrew Jeffery <[email protected]>

Thanks,

Andrew

2023-07-19 19:32:55

by Chengfeng Ye

[permalink] [raw]
Subject: Re: [PATCH] ipmi: fix potential deadlock on &kcs_bmc->lock

> Reviewed-by: Andrew Jeffery <[email protected]>
> Tested-by: Andrew Jeffery <[email protected]>

Thanks much for your time and effort in reviewing/testing the patch.

Best Regards,
Chengfeng