2022-01-31 22:51:30

by Jia-Ju Bai

[permalink] [raw]
Subject: [BUG] bus: mhi: possible deadlock in mhi_pm_disable_transition() and mhi_async_power_up()

Hello,

My static analysis tool reports a possible deadlock in the mhi driver in
Linux 5.10:

mhi_async_power_up()
  mutex_lock(&mhi_cntrl->pm_mutex); --> Line 933 (Lock A)
  wait_event_timeout(mhi_cntrl->state_event, ...) --> Line 985 (Wait X)
  mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 1040 (Unlock A)

mhi_pm_disable_transition()
  mutex_lock(&mhi_cntrl->pm_mutex); --> Line 463 (Lock A)
  wake_up_all(&mhi_cntrl->state_event); --> Line 474 (Wake X)
  mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 524 (Unlock A)
  wake_up_all(&mhi_cntrl->state_event); --> Line 526 (Wake X)

When mhi_async_power_up() is executed, "Wait X" is performed by holding
"Lock A". If mhi_pm_disable_transition() is concurrently executed at
this time, "Wake X" cannot be performed to wake up "Wait X" in
mhi_async_power_up(), because "Lock A" is already hold by
mhi_async_power_up(), causing a possible deadlock.
I find that "Wait X" is performed with a timeout, to relieve the
possible deadlock; but I think this timeout can cause inefficient execution.

I am not quite sure whether this possible problem is real and how to fix
it if it is real.
Any feedback would be appreciated, thanks :)


Best wishes,
Jia-Ju Bai




2022-02-02 08:03:34

by Daniel Thompson

[permalink] [raw]
Subject: Re: [BUG] bus: mhi: possible deadlock in mhi_pm_disable_transition() and mhi_async_power_up()

On Sat, Jan 29, 2022 at 10:56:30AM +0800, Jia-Ju Bai wrote:
> Hello,
>
> My static analysis tool reports a possible deadlock in the mhi driver in
> Linux 5.10:
>
> mhi_async_power_up()
> ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 933 (Lock A)
> ? wait_event_timeout(mhi_cntrl->state_event, ...) --> Line 985 (Wait X)
> ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 1040 (Unlock A)
>
> mhi_pm_disable_transition()
> ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 463 (Lock A)
> ? wake_up_all(&mhi_cntrl->state_event); --> Line 474 (Wake X)
> ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 524 (Unlock A)
> ? wake_up_all(&mhi_cntrl->state_event); --> Line 526 (Wake X)
>
> When mhi_async_power_up() is executed, "Wait X" is performed by holding
> "Lock A". If mhi_pm_disable_transition() is concurrently executed at this
> time, "Wake X" cannot be performed to wake up "Wait X" in
> mhi_async_power_up(), because "Lock A" is already hold by
> mhi_async_power_up(), causing a possible deadlock.
> I find that "Wait X" is performed with a timeout, to relieve the possible
> deadlock; but I think this timeout can cause inefficient execution.
>
> I am not quite sure whether this possible problem is real and how to fix it
> if it is real.
> Any feedback would be appreciated, thanks :)

Interesting find but I think it would be better to run your tool
against more recent kernels to confirm any problem reports. In this
case the code you mention looks like it was removed in v5.17-rc1
(and should eventually make its way to the stable kernels too).


Daniel.

2022-02-07 14:25:10

by Jia-Ju Bai

[permalink] [raw]
Subject: Re: [BUG] bus: mhi: possible deadlock in mhi_pm_disable_transition() and mhi_async_power_up()



On 2022/2/2 1:15, Daniel Thompson wrote:
> On Sat, Jan 29, 2022 at 10:56:30AM +0800, Jia-Ju Bai wrote:
>> Hello,
>>
>> My static analysis tool reports a possible deadlock in the mhi driver in
>> Linux 5.10:
>>
>> mhi_async_power_up()
>>   mutex_lock(&mhi_cntrl->pm_mutex); --> Line 933 (Lock A)
>>   wait_event_timeout(mhi_cntrl->state_event, ...) --> Line 985 (Wait X)
>>   mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 1040 (Unlock A)
>>
>> mhi_pm_disable_transition()
>>   mutex_lock(&mhi_cntrl->pm_mutex); --> Line 463 (Lock A)
>>   wake_up_all(&mhi_cntrl->state_event); --> Line 474 (Wake X)
>>   mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 524 (Unlock A)
>>   wake_up_all(&mhi_cntrl->state_event); --> Line 526 (Wake X)
>>
>> When mhi_async_power_up() is executed, "Wait X" is performed by holding
>> "Lock A". If mhi_pm_disable_transition() is concurrently executed at this
>> time, "Wake X" cannot be performed to wake up "Wait X" in
>> mhi_async_power_up(), because "Lock A" is already hold by
>> mhi_async_power_up(), causing a possible deadlock.
>> I find that "Wait X" is performed with a timeout, to relieve the possible
>> deadlock; but I think this timeout can cause inefficient execution.
>>
>> I am not quite sure whether this possible problem is real and how to fix it
>> if it is real.
>> Any feedback would be appreciated, thanks :)
> Interesting find but I think it would be better to run your tool
> against more recent kernels to confirm any problem reports. In this
> case the code you mention looks like it was removed in v5.17-rc1
> (and should eventually make its way to the stable kernels too).

Hi Daniel,

Thanks for your reply :)
I check Linux v5.17-rc1 code, and find that this possible deadlock does
not exist, due to the changes in commit d651ce8e917f.

However, my tool also reports several other possible deadlocks, which
are caused by waiting with holding mhi_cntrl->pm_mutex.
There are two examples in Linux v5.17-rc1:

#BUG 1
mhi_pm_sys_error_transition()
  mutex_lock(&mhi_cntrl->pm_mutex); --> Line 572 (Lock A)
  wait_event_timeout(mhi_cntrl->state_event, ...); --> Line 600 (Wait X)
  mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 630 (Unlock A)

mhi_pm_disable_transition()
  mutex_lock(&mhi_cntrl->pm_mutex); --> Line 464 (Lock A)
  mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 496 (Unlock A)
  wake_up_all(&mhi_cntrl->state_event); --> Line 498 (Wake X)

#BUG 2
mhi_pm_sys_error_transition()
  mutex_lock(&mhi_cntrl->pm_mutex); --> Line 572 (Lock A)
  wait_event_timeout(mhi_cntrl->state_event, ...); --> Line 600 (Wait X)
  mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 630 (Unlock A)

mhi_power_down()
  mutex_lock(&mhi_cntrl->pm_mutex); --> Line 1139 (Lock A)
  wake_up_all(&mhi_cntrl->state_event); --> Line 1165 (Wait X)
  mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 1168 (Unlock A)

I am not quite sure whether these possible problems are real.
Any feedback would be appreciated, thanks :)


Best wishes,
Jia-Ju Bai


2022-02-08 23:22:54

by Manivannan Sadhasivam

[permalink] [raw]
Subject: Re: [BUG] bus: mhi: possible deadlock in mhi_pm_disable_transition() and mhi_async_power_up()

Hi,

On Sun, Feb 06, 2022 at 11:34:02PM +0800, Jia-Ju Bai wrote:
>
>
> On 2022/2/2 1:15, Daniel Thompson wrote:
> > On Sat, Jan 29, 2022 at 10:56:30AM +0800, Jia-Ju Bai wrote:
> > > Hello,
> > >
> > > My static analysis tool reports a possible deadlock in the mhi driver in
> > > Linux 5.10:
> > >
> > > mhi_async_power_up()
> > > ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 933 (Lock A)
> > > ? wait_event_timeout(mhi_cntrl->state_event, ...) --> Line 985 (Wait X)
> > > ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 1040 (Unlock A)
> > >
> > > mhi_pm_disable_transition()
> > > ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 463 (Lock A)
> > > ? wake_up_all(&mhi_cntrl->state_event); --> Line 474 (Wake X)
> > > ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 524 (Unlock A)
> > > ? wake_up_all(&mhi_cntrl->state_event); --> Line 526 (Wake X)
> > >
> > > When mhi_async_power_up() is executed, "Wait X" is performed by holding
> > > "Lock A". If mhi_pm_disable_transition() is concurrently executed at this
> > > time, "Wake X" cannot be performed to wake up "Wait X" in
> > > mhi_async_power_up(), because "Lock A" is already hold by
> > > mhi_async_power_up(), causing a possible deadlock.
> > > I find that "Wait X" is performed with a timeout, to relieve the possible
> > > deadlock; but I think this timeout can cause inefficient execution.
> > >
> > > I am not quite sure whether this possible problem is real and how to fix it
> > > if it is real.
> > > Any feedback would be appreciated, thanks :)
> > Interesting find but I think it would be better to run your tool
> > against more recent kernels to confirm any problem reports. In this
> > case the code you mention looks like it was removed in v5.17-rc1
> > (and should eventually make its way to the stable kernels too).
>
> Hi Daniel,
>
> Thanks for your reply :)
> I check Linux v5.17-rc1 code, and find that this possible deadlock does not
> exist, due to the changes in commit d651ce8e917f.
>
> However, my tool also reports several other possible deadlocks, which are
> caused by waiting with holding mhi_cntrl->pm_mutex.
> There are two examples in Linux v5.17-rc1:
>
> #BUG 1
> mhi_pm_sys_error_transition()
> ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 572 (Lock A)
> ? wait_event_timeout(mhi_cntrl->state_event, ...); --> Line 600 (Wait X)
> ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 630 (Unlock A)
>
> mhi_pm_disable_transition()
> ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 464 (Lock A)
> ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 496 (Unlock A)
> ? wake_up_all(&mhi_cntrl->state_event); --> Line 498 (Wake X)
>

The wait_event_timeout() in mhi_pm_sys_error_transition() waits for the state
change event from the endpoint device after triggering MHI reset. And the device
will send the state change event through the BHI IRQ vector, that'll wake up
mhi_pm_sys_error_transition().

Refer drivers/bus/mhi/host/main.c:

irqreturn_t mhi_intvec_handler(int irq_number, void *dev)
{
struct mhi_controller *mhi_cntrl = dev;

/* Wake up events waiting for state change */
wake_up_all(&mhi_cntrl->state_event);

return IRQ_WAKE_THREAD;
}

By this way there would be no deadlock. This applies to your 2nd case as well.

Thanks for reporting!

Regards,
Mani

> #BUG 2
> mhi_pm_sys_error_transition()
> ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 572 (Lock A)
> ? wait_event_timeout(mhi_cntrl->state_event, ...); --> Line 600 (Wait X)
> ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 630 (Unlock A)
>
> mhi_power_down()
> ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 1139 (Lock A)
> ? wake_up_all(&mhi_cntrl->state_event); --> Line 1165 (Wait X)
> ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 1168 (Unlock A)
>
> I am not quite sure whether these possible problems are real.
> Any feedback would be appreciated, thanks :)
>
>
> Best wishes,
> Jia-Ju Bai
>

2022-02-09 04:56:39

by Manivannan Sadhasivam

[permalink] [raw]
Subject: Re: [BUG] bus: mhi: possible deadlock in mhi_pm_disable_transition() and mhi_async_power_up()

Hi,

Thanks for the report!

On Sat, Jan 29, 2022 at 10:56:30AM +0800, Jia-Ju Bai wrote:
> Hello,
>
> My static analysis tool reports a possible deadlock in the mhi driver in
> Linux 5.10:
>
> mhi_async_power_up()
> ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 933 (Lock A)
> ? wait_event_timeout(mhi_cntrl->state_event, ...) --> Line 985 (Wait X)
> ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 1040 (Unlock A)
>
> mhi_pm_disable_transition()
> ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 463 (Lock A)
> ? wake_up_all(&mhi_cntrl->state_event); --> Line 474 (Wake X)
> ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 524 (Unlock A)
> ? wake_up_all(&mhi_cntrl->state_event); --> Line 526 (Wake X)
>
> When mhi_async_power_up() is executed, "Wait X" is performed by holding
> "Lock A". If mhi_pm_disable_transition() is concurrently executed at this
> time, "Wake X" cannot be performed to wake up "Wait X" in
> mhi_async_power_up(), because "Lock A" is already hold by
> mhi_async_power_up(), causing a possible deadlock.
> I find that "Wait X" is performed with a timeout, to relieve the possible
> deadlock; but I think this timeout can cause inefficient execution.
>

As per the MHI design, we can be sure that mhi_pm_disable_transition() won't be
called until wait_event_timeout() completes in mhi_async_power_up(). So this
deadlock is not possible in practical.

Thanks,
Mani

> I am not quite sure whether this possible problem is real and how to fix it
> if it is real.
> Any feedback would be appreciated, thanks :)
>
>
> Best wishes,
> Jia-Ju Bai
>
>
>

2022-02-09 08:54:47

by Manivannan Sadhasivam

[permalink] [raw]
Subject: Re: [BUG] bus: mhi: possible deadlock in mhi_pm_disable_transition() and mhi_async_power_up()

On Tue, Feb 01, 2022 at 05:15:40PM +0000, Daniel Thompson wrote:
> On Sat, Jan 29, 2022 at 10:56:30AM +0800, Jia-Ju Bai wrote:
> > Hello,
> >
> > My static analysis tool reports a possible deadlock in the mhi driver in
> > Linux 5.10:
> >
> > mhi_async_power_up()
> > ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 933 (Lock A)
> > ? wait_event_timeout(mhi_cntrl->state_event, ...) --> Line 985 (Wait X)
> > ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 1040 (Unlock A)
> >
> > mhi_pm_disable_transition()
> > ? mutex_lock(&mhi_cntrl->pm_mutex); --> Line 463 (Lock A)
> > ? wake_up_all(&mhi_cntrl->state_event); --> Line 474 (Wake X)
> > ? mutex_unlock(&mhi_cntrl->pm_mutex); --> Line 524 (Unlock A)
> > ? wake_up_all(&mhi_cntrl->state_event); --> Line 526 (Wake X)
> >
> > When mhi_async_power_up() is executed, "Wait X" is performed by holding
> > "Lock A". If mhi_pm_disable_transition() is concurrently executed at this
> > time, "Wake X" cannot be performed to wake up "Wait X" in
> > mhi_async_power_up(), because "Lock A" is already hold by
> > mhi_async_power_up(), causing a possible deadlock.
> > I find that "Wait X" is performed with a timeout, to relieve the possible
> > deadlock; but I think this timeout can cause inefficient execution.
> >
> > I am not quite sure whether this possible problem is real and how to fix it
> > if it is real.
> > Any feedback would be appreciated, thanks :)
>
> Interesting find but I think it would be better to run your tool
> against more recent kernels to confirm any problem reports. In this
> case the code you mention looks like it was removed in v5.17-rc1
> (and should eventually make its way to the stable kernels too).
>

Hmm, looks like the commit didn't apply cleanly to 5.10:
https://www.spinics.net/lists/stable/msg526754.html

Let send the fix up version.

Thanks,
Mani

>
> Daniel.