2023-09-13 21:28:08

by Stephen Boyd

[permalink] [raw]
Subject: [PATCH v4 2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt()

It's possible for the completion in ipc_wait_for_interrupt() to timeout,
simply because the interrupt was delayed in being processed. A timeout
in itself is not an error. This driver should check the status register
upon a timeout to ensure that scheduling or interrupt processing delays
don't affect the outcome of the IPC return value.

CPU0 SCU
---- ---
ipc_wait_for_interrupt()
wait_for_completion_timeout(&scu->cmd_complete)
[TIMEOUT] status[IPC_STATUS_BUSY]=0

Fix this problem by reading the status bit in all cases, regardless of
the timeout. If the completion times out, we'll assume the problem was
that the IPC_STATUS_BUSY bit was still set, but if the status bit is
cleared in the meantime we know that we hit some scheduling delay and we
should just check the error bit.

Cc: Prashant Malani <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reviewed-by: Mika Westerberg <[email protected]>
Fixes: ed12f295bfd5 ("ipc: Added support for IPC interrupt mode")
Signed-off-by: Stephen Boyd <[email protected]>
---
drivers/platform/x86/intel_scu_ipc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/intel_scu_ipc.c b/drivers/platform/x86/intel_scu_ipc.c
index 4c774ee8bb1b..299c15312acb 100644
--- a/drivers/platform/x86/intel_scu_ipc.c
+++ b/drivers/platform/x86/intel_scu_ipc.c
@@ -248,10 +248,12 @@ static inline int ipc_wait_for_interrupt(struct intel_scu_ipc_dev *scu)
{
int status;

- if (!wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT))
- return -ETIMEDOUT;
+ wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT);

status = ipc_read_status(scu);
+ if (status & IPC_STATUS_BUSY)
+ return -ETIMEDOUT;
+
if (status & IPC_STATUS_ERR)
return -EIO;

--
https://chromeos.dev


2023-09-15 23:39:07

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH v4 2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt()

On Wed, 13 Sep 2023, Stephen Boyd wrote:

> It's possible for the completion in ipc_wait_for_interrupt() to timeout,
> simply because the interrupt was delayed in being processed. A timeout
> in itself is not an error. This driver should check the status register
> upon a timeout to ensure that scheduling or interrupt processing delays
> don't affect the outcome of the IPC return value.
>
> CPU0 SCU
> ---- ---
> ipc_wait_for_interrupt()
> wait_for_completion_timeout(&scu->cmd_complete)
> [TIMEOUT] status[IPC_STATUS_BUSY]=0
>
> Fix this problem by reading the status bit in all cases, regardless of
> the timeout. If the completion times out, we'll assume the problem was
> that the IPC_STATUS_BUSY bit was still set, but if the status bit is
> cleared in the meantime we know that we hit some scheduling delay and we
> should just check the error bit.

Hi,

I don't understand the intent here. What prevents IPC_STATUS_BUSY from
changing right after you've read it in ipc_read_status(scu)? Doesn't that
end you exactly into the same situation where the returned value is stale
so I cannot see how this fixes anything, at best it just plays around the
race window that seems to still be there after this fix?

--
i.


> Cc: Prashant Malani <[email protected]>
> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
> Reviewed-by: Andy Shevchenko <[email protected]>
> Reviewed-by: Mika Westerberg <[email protected]>
> Fixes: ed12f295bfd5 ("ipc: Added support for IPC interrupt mode")
> Signed-off-by: Stephen Boyd <[email protected]>
> ---
> drivers/platform/x86/intel_scu_ipc.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/platform/x86/intel_scu_ipc.c b/drivers/platform/x86/intel_scu_ipc.c
> index 4c774ee8bb1b..299c15312acb 100644
> --- a/drivers/platform/x86/intel_scu_ipc.c
> +++ b/drivers/platform/x86/intel_scu_ipc.c
> @@ -248,10 +248,12 @@ static inline int ipc_wait_for_interrupt(struct intel_scu_ipc_dev *scu)
> {
> int status;
>
> - if (!wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT))
> - return -ETIMEDOUT;
> + wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT);
>
> status = ipc_read_status(scu);
> + if (status & IPC_STATUS_BUSY)
> + return -ETIMEDOUT;
> +
> if (status & IPC_STATUS_ERR)
> return -EIO;
>
>

2023-09-18 16:06:01

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH v4 2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt()

On Mon, 18 Sep 2023, Hans de Goede wrote:
> On 9/15/23 15:49, Ilpo J?rvinen wrote:
> > On Wed, 13 Sep 2023, Stephen Boyd wrote:
> >
> >> It's possible for the completion in ipc_wait_for_interrupt() to timeout,
> >> simply because the interrupt was delayed in being processed. A timeout
> >> in itself is not an error. This driver should check the status register
> >> upon a timeout to ensure that scheduling or interrupt processing delays
> >> don't affect the outcome of the IPC return value.
> >>
> >> CPU0 SCU
> >> ---- ---
> >> ipc_wait_for_interrupt()
> >> wait_for_completion_timeout(&scu->cmd_complete)
> >> [TIMEOUT] status[IPC_STATUS_BUSY]=0
> >>
> >> Fix this problem by reading the status bit in all cases, regardless of
> >> the timeout. If the completion times out, we'll assume the problem was
> >> that the IPC_STATUS_BUSY bit was still set, but if the status bit is
> >> cleared in the meantime we know that we hit some scheduling delay and we
> >> should just check the error bit.
> >
> > Hi,
> >
> > I don't understand the intent here. What prevents IPC_STATUS_BUSY from
> > changing right after you've read it in ipc_read_status(scu)? Doesn't that
> > end you exactly into the same situation where the returned value is stale
> > so I cannot see how this fixes anything, at best it just plays around the
> > race window that seems to still be there after this fix?
>
> As I understand it the problem before was that the function would
> return -ETIMEDOUT; purely based on wait_for_completion_timeout()
> without ever actually checking the BUSY bit:
>
> Old code:
>
> if (!wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT))
> return -ETIMEDOUT;
>
> This allows for a scenario where when the IRQ processing got delayed
> (on say another core) causing the timeout to trigger,
> ipc_wait_for_interrupt() would return -ETIMEDOUT even though
> the BUSY flag was already cleared by the SCU.
>
> This patch adds an explicit check for the BUSY flag after
> the wait_for_completion(), rather then relying on the
> wait_for_completion() return value which implies things
> are still busy.

Oh, I see, it's because the code is waiting for the completion rather than
the actual condition.

> As for "What prevents IPC_STATUS_BUSY from
> changing right after you've read it in ipc_read_status(scu)?"
>
> AFAICT in this code path the bit is only ever supposed to go
> from being set (busy) to unset (not busy), not the other
> way around since no new commands can be submitted until
> this function has completed. So that scenario cannot happen.

This is not what I meant.

I meant that if the code has decided to return -ETIMEDOUT, the status bit
still change at that point which makes the return value to not match. This
race is still there and given the changelog was a bit sparse on what race
it was fixing I ended up noticing this detail.


--
i.

2023-09-18 22:36:26

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v4 2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt()

Hi Ilpo,

On 9/15/23 15:49, Ilpo Järvinen wrote:
> On Wed, 13 Sep 2023, Stephen Boyd wrote:
>
>> It's possible for the completion in ipc_wait_for_interrupt() to timeout,
>> simply because the interrupt was delayed in being processed. A timeout
>> in itself is not an error. This driver should check the status register
>> upon a timeout to ensure that scheduling or interrupt processing delays
>> don't affect the outcome of the IPC return value.
>>
>> CPU0 SCU
>> ---- ---
>> ipc_wait_for_interrupt()
>> wait_for_completion_timeout(&scu->cmd_complete)
>> [TIMEOUT] status[IPC_STATUS_BUSY]=0
>>
>> Fix this problem by reading the status bit in all cases, regardless of
>> the timeout. If the completion times out, we'll assume the problem was
>> that the IPC_STATUS_BUSY bit was still set, but if the status bit is
>> cleared in the meantime we know that we hit some scheduling delay and we
>> should just check the error bit.
>
> Hi,
>
> I don't understand the intent here. What prevents IPC_STATUS_BUSY from
> changing right after you've read it in ipc_read_status(scu)? Doesn't that
> end you exactly into the same situation where the returned value is stale
> so I cannot see how this fixes anything, at best it just plays around the
> race window that seems to still be there after this fix?

As I understand it the problem before was that the function would
return -ETIMEDOUT; purely based on wait_for_completion_timeout()
without ever actually checking the BUSY bit:

Old code:

if (!wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT))
return -ETIMEDOUT;

This allows for a scenario where when the IRQ processing got delayed
(on say another core) causing the timeout to trigger,
ipc_wait_for_interrupt() would return -ETIMEDOUT even though
the BUSY flag was already cleared by the SCU.

This patch adds an explicit check for the BUSY flag after
the wait_for_completion(), rather then relying on the
wait_for_completion() return value which implies things
are still busy.

As for "What prevents IPC_STATUS_BUSY from
changing right after you've read it in ipc_read_status(scu)?"

AFAICT in this code path the bit is only ever supposed to go
from being set (busy) to unset (not busy), not the other
way around since no new commands can be submitted until
this function has completed. So that scenario cannot happen.

Regards,

Hans