2023-04-10 12:31:01

by Peijie Shao

[permalink] [raw]
Subject: [PATCH] tpm_tis_spi: fix:release chip select when flow control fails

From: Peijie Shao <[email protected]>

The TPM's chip select will leave active after spi_bus_unlock when
flow control timeout, and may interfere other chips sharing the same
spi bus, or may damage them dule to level conflict on MISO pin.

So the patch deactives the chip select by sending an empty message
with cs_change=0 if flow control fails.

The reason why flow control timeout for me is unfortunately I got a
damaged TPM chip. It always pull MISO low during cs active(this can
be easily emulated by wire MISO to the ground), not responding anything,
and dmesg shows below:
...
[ 311.150725] tpm_tis_spi: probe of spi0.0 failed with error -110
...

Signed-off-by: Peijie Shao <[email protected]>
---
drivers/char/tpm/tpm_tis_spi_main.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/char/tpm/tpm_tis_spi_main.c
b/drivers/char/tpm/tpm_tis_spi_main.c
index a0963a3e92bd..5c8ff343761f 100644
--- a/drivers/char/tpm/tpm_tis_spi_main.c
+++ b/drivers/char/tpm/tpm_tis_spi_main.c
@@ -105,8 +105,19 @@ int tpm_tis_spi_transfer(struct tpm_tis_data
*data, u32 addr, u16 len,
/* Flow control transfers are receive only */
spi_xfer.tx_buf = NULL;
ret = phy->flow_control(phy, &spi_xfer);
- if (ret < 0)
+ if (ret < 0) {
+ /*
+ * Release cs pin if the device is not responding, regardless of the reason.
+ * Notice cs may alreadly been released if the failure was caused inside
+ * spi_sync_locked called by flow_control, in this situation, a pluse may be
+ * generated on cs.
+ */
+ memset(&spi_xfer, 0, sizeof(spi_xfer));
+ spi_message_init(&m);
+ spi_message_add_tail(&spi_xfer, &m);
+ spi_sync_locked(phy->spi_device, &m);
goto exit;
+ }
spi_xfer.cs_change = 0;
spi_xfer.len = transfer_len;
--
2.39.1


2023-04-21 21:48:04

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis_spi: fix:release chip select when flow control fails

On Fri, Mar 31, 2023 at 02:56:25PM +0800, [email protected] wrote:
> From: Peijie Shao <[email protected]>
>
> The TPM's chip select will leave active after spi_bus_unlock when
> flow control timeout, and may interfere other chips sharing the same
> spi bus, or may damage them dule to level conflict on MISO pin.
>
> So the patch deactives the chip select by sending an empty message
> with cs_change=0 if flow control fails.
>
> The reason why flow control timeout for me is unfortunately I got a
> damaged TPM chip. It always pull MISO low during cs active(this can
> be easily emulated by wire MISO to the ground), not responding anything,
> and dmesg shows below:
> ...
> [ 311.150725] tpm_tis_spi: probe of spi0.0 failed with error -110
> ...

We don't really cease to support damaged hardware but it is true
that the *software* failure paths should probably be robust enough
to deativate chip select.

I would rewrite this as

"The failure paths in tpm_tis_spi_transfer() do not deactivate
chip select. Send an empty message (cs_select == 0) to overcome
this."

That's all there needs to be. We do not care about broken hardware.

>
> Signed-off-by: Peijie Shao <[email protected]>
> ---
> drivers/char/tpm/tpm_tis_spi_main.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/char/tpm/tpm_tis_spi_main.c b/drivers/char/tpm/tpm_tis_spi_main.c
> index a0963a3e92bd..5c8ff343761f 100644
> --- a/drivers/char/tpm/tpm_tis_spi_main.c
> +++ b/drivers/char/tpm/tpm_tis_spi_main.c
> @@ -105,8 +105,19 @@ int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len,
> /* Flow control transfers are receive only */
> spi_xfer.tx_buf = NULL;
> ret = phy->flow_control(phy, &spi_xfer);
> - if (ret < 0)
> + if (ret < 0) {
> + /*
> + * Release cs pin if the device is not responding, regardless of the reason.
> + * Notice cs may alreadly been released if the failure was caused inside
> + * spi_sync_locked called by flow_control, in this situation, a pluse may be
> + * generated on cs.
> + */

Please replace above comment with:

/* Deactivate chip select: */

> + memset(&spi_xfer, 0, sizeof(spi_xfer));
> + spi_message_init(&m);
> + spi_message_add_tail(&spi_xfer, &m);
> + spi_sync_locked(phy->spi_device, &m);
> goto exit;
> + }
>
> spi_xfer.cs_change = 0;
> spi_xfer.len = transfer_len;
> --
> 2.39.1
>
>
>

There's three call sites, why are you taking care of only one
of them?

I'd consider instead:

return 0;

exit:
memset(&spi_xfer, 0, sizeof(spi_xfer));
spi_message_init(&m);
spi_message_add_tail(&spi_xfer, &m);
spi_sync_locked(phy->spi_device, &m);
spi_bus_unlock(phy->spi_device->master);
return ret;
}

The the rollback would apply to all call sites.

BR, Jarkko

2023-04-24 06:46:07

by shaopeijie

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis_spi: fix:release chip select when flow control fails

On Sat, 22 Apr 2023 00:46:40 +0300, [email protected] wrote:
> On Fri, Mar 31, 2023 at 02:56:25PM +0800, [email protected] wrote:
>> From: Peijie Shao <[email protected]>
>>
>> The TPM's chip select will leave active after spi_bus_unlock when
>> flow control timeout, and may interfere other chips sharing the same
>> spi bus, or may damage them dule to level conflict on MISO pin.
>>
>> So the patch deactives the chip select by sending an empty message
>> with cs_change=0 if flow control fails.
>>
>> The reason why flow control timeout for me is unfortunately I got a
>> damaged TPM chip. It always pull MISO low during cs active(this can
>> be easily emulated by wire MISO to the ground), not responding anything,
>> and dmesg shows below:
>> ...
>> [ 311.150725] tpm_tis_spi: probe of spi0.0 failed with error -110
>> ...
>
> We don't really cease to support damaged hardware but it is true
> that the *software* failure paths should probably be robust enough
> to deativate chip select.
>
> I would rewrite this as
>
> "The failure paths in tpm_tis_spi_transfer() do not deactivate
> chip select. Send an empty message (cs_select == 0) to overcome
> this."
>
> That's all there needs to be. We do not care about broken hardware.
>

I agree. The patch is to robust *software* path, but not support
broken hardware.

>>
>> Signed-off-by: Peijie Shao <[email protected]>
>> ---
>> drivers/char/tpm/tpm_tis_spi_main.c | 13 ++++++++++++-
>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/char/tpm/tpm_tis_spi_main.c b/drivers/char/tpm/tpm_tis_spi_main.c
>> index a0963a3e92bd..5c8ff343761f 100644
>> --- a/drivers/char/tpm/tpm_tis_spi_main.c
>> +++ b/drivers/char/tpm/tpm_tis_spi_main.c
>> @@ -105,8 +105,19 @@ int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len,
>> /* Flow control transfers are receive only */
>> spi_xfer.tx_buf = NULL;
>> ret = phy->flow_control(phy, &spi_xfer);
>> - if (ret < 0)
>> + if (ret < 0) {
>> + /*
>> + * Release cs pin if the device is not responding, regardless of the reason.
>> + * Notice cs may alreadly been released if the failure was caused inside
>> + * spi_sync_locked called by flow_control, in this situation, a pluse may be
>> + * generated on cs.
>> + */
>
> Please replace above comment with:
>
> /* Deactivate chip select: */
>

I agree.

>> + memset(&spi_xfer, 0, sizeof(spi_xfer));
>> + spi_message_init(&m);
>> + spi_message_add_tail(&spi_xfer, &m);
>> + spi_sync_locked(phy->spi_device, &m);
>> goto exit;
>> + }
>>
>> spi_xfer.cs_change = 0;
>> spi_xfer.len = transfer_len;
>> --
>> 2.39.1
>>
>>
>>
>
> There's three call sites, why are you taking care of only one
> of them?
>
> I'd consider instead:
>
> return 0;
>
> exit:
> memset(&spi_xfer, 0, sizeof(spi_xfer));
> spi_message_init(&m);
> spi_message_add_tail(&spi_xfer, &m);
> spi_sync_locked(phy->spi_device, &m);
> spi_bus_unlock(phy->spi_device->master);
> return ret;
> }

I found that spi_sync_locked() will deactivate cs if any failure
is generated inside it.

spi_sync_locked()
...
spi_transfer_one_message()
...
if (ret != 0 || !keep_cs)
spi_set_cs(msg->spi, false, false);

spi_transfer_one_message() is the default transfer method, I think
other spi controllers who implements .transfer_one_message should
have the same behaviour.

Sending an empty message when cs is already deactivated may have a small
side effect: cs will go activate and deactivate in a very short time,
means a pulse will be generated on cs pin. This may also happen when
failure is generated by spi_sync_locked() which called inside ->flow_control().

So to reduce this, I prefer deactivating cs only when phy->flow_control()
fails. If the side effect is totaly acceptable, your advice maybe better.

Waiting for your suggestions.

>
> The the rollback would apply to all call sites.
>
> BR, Jarkko

Thanks!

Peijie, Shao