WEC TPMs (in 1.2 mode) and NTC (in 2.0 mode) have been observer to
frequently, but intermittently, fail probe with:
tpm_tis: probe of 00:09 failed with error -1
Added debugging output showed that the request_locality in
tpm_tis_core_init succeeds, but then the tpm_chip_start fails when its
call to tpm_request_locality -> request_locality fails.
The access register in check_locality would show:
0x80 TPM_ACCESS_VALID
0x82 TPM_ACCESS_VALID | TPM_ACCESS_REQUEST_USE
0x80 TPM_ACCESS_VALID
continuing until it times out. TPM_ACCESS_ACTIVE_LOCALITY (0x20) doesn't
get set which would end the wait.
My best guess is something racy was going on between release_locality's
write and request_locality's write. There is no wait in
release_locality to ensure that the locality is released, so the
subsequent request_locality could confuse the TPM?
tpm_chip_start grabs locality 0, and updates chip->locality. Call that
before the TPM_INT_ENABLE write, and drop the explicit request/release
calls. tpm_chip_stop performs the release. With this, we switch to
using chip->locality instead of priv->locality. The probe failure is
not seen after this.
commit 0ef333f5ba7f ("tpm: add request_locality before write
TPM_INT_ENABLE") added a request_locality/release_locality pair around
tpm_tis_write32 TPM_INT_ENABLE, but there is a read of
TPM_INT_ENABLE for the intmask which should also have the locality
grabbed. tpm_chip_start is moved before that to have the locality open
during the read.
Fixes: 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE")
CC: [email protected]
Signed-off-by: Jason Andryuk <[email protected]>
---
The probe failure was seen on 5.4, 5.15 and 5.17.
commit e42acf104d6e ("tpm_tis: Clean up locality release") removed the
release wait. I haven't tried, but re-introducing that would probably
fix this issue. It's hard to know apriori when a synchronous wait is
needed, and they don't seem to be needed typically. Re-introducing the
wait would re-introduce a wait in all cases.
Surrounding the read of TPM_INT_ENABLE with grabbing the locality may
not be necessary? It looks like the code only grabs a locality for
writing, but that asymmetry is surprising to me.
tpm_chip and tpm_tis_data track the locality separately. Should the
tpm_tis_data one be removed so they don't get out of sync?
---
drivers/char/tpm/tpm_tis_core.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index dc56b976d816..529c241800c0 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -986,8 +986,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
goto out_err;
}
+ /* Grabs locality 0. */
+ rc = tpm_chip_start(chip);
+ if (rc)
+ goto out_err;
+
/* Take control of the TPM's interrupt hardware and shut it off */
- rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
+ rc = tpm_tis_read32(priv, TPM_INT_ENABLE(chip->locality), &intmask);
if (rc < 0)
goto out_err;
@@ -995,19 +1000,10 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
TPM_INTF_DATA_AVAIL_INT | TPM_INTF_STS_VALID_INT;
intmask &= ~TPM_GLOBAL_INT_ENABLE;
- rc = request_locality(chip, 0);
- if (rc < 0) {
- rc = -ENODEV;
- goto out_err;
- }
-
- tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
- release_locality(chip, 0);
+ tpm_tis_write32(priv, TPM_INT_ENABLE(chip->locality), intmask);
- rc = tpm_chip_start(chip);
- if (rc)
- goto out_err;
rc = tpm2_probe(chip);
+ /* Releases locality 0. */
tpm_chip_stop(chip);
if (rc)
goto out_err;
--
2.36.1
On Thu, Jul 7, 2022 at 3:48 AM chenjun (AM) <[email protected]> wrote:
>
> 在 2022/7/7 0:41, Jason Andryuk 写道:
> > WEC TPMs (in 1.2 mode) and NTC (in 2.0 mode) have been observer to
> > frequently, but intermittently, fail probe with:
> > tpm_tis: probe of 00:09 failed with error -1
> >
> > Added debugging output showed that the request_locality in
> > tpm_tis_core_init succeeds, but then the tpm_chip_start fails when its
> > call to tpm_request_locality -> request_locality fails.
> >
> > The access register in check_locality would show:
> > 0x80 TPM_ACCESS_VALID
> > 0x82 TPM_ACCESS_VALID | TPM_ACCESS_REQUEST_USE
> > 0x80 TPM_ACCESS_VALID
> > continuing until it times out. TPM_ACCESS_ACTIVE_LOCALITY (0x20) doesn't
> > get set which would end the wait.
> >
> > My best guess is something racy was going on between release_locality's
> > write and request_locality's write. There is no wait in
> > release_locality to ensure that the locality is released, so the
> > subsequent request_locality could confuse the TPM?
> >
> > tpm_chip_start grabs locality 0, and updates chip->locality. Call that
> > before the TPM_INT_ENABLE write, and drop the explicit request/release
> > calls. tpm_chip_stop performs the release. With this, we switch to
> > using chip->locality instead of priv->locality. The probe failure is
> > not seen after this.
> >
> > commit 0ef333f5ba7f ("tpm: add request_locality before write
> > TPM_INT_ENABLE") added a request_locality/release_locality pair around
> > tpm_tis_write32 TPM_INT_ENABLE, but there is a read of
> > TPM_INT_ENABLE for the intmask which should also have the locality
> > grabbed. tpm_chip_start is moved before that to have the locality open
> > during the read.
> >
> > Fixes: 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE")
>
> 0ef333f5ba7f is probably not the commit that introduced the problem? As
> you said the problem was in 5.4 and the commit was merged in 5.16.
I was imprecise with my versions. 0ef333f5ba7f was backported to
stable-5.4.y as 13af3a9b1ba6 in 5.4.174. I was running 5.4.163 on
some systems without problem, but the probe failures started after
jumping to 5.4.200.
Other systems showing the probe failures were 5.15.29 (which has
0ef333f5ba7f backported as ea1fd8364c9f 5.15.17) and 5.17.y (5.17.5 I
think) from a Fedora 36 USB drive.
Other machines run fine with 0ef333f5ba7f which is part of why I
didn't notice it earlier between 5.4.613 and 5.4.200.
At the top I wrote: "frequently, but intermittently, fail probe". To
expand on that: Basically on the affected machines, the probe during
boot fails. In one instance, I repeatedly ran `echo 00:05 >
/sys/bus/pnp/drivers/tpm_tis/bind`, and it successfully probed on the
7th try. So something racy seems to be going on.
Regards,
Jason
On Wed, Jul 06, 2022 at 12:40:43PM -0400, Jason Andryuk wrote:
> WEC TPMs (in 1.2 mode) and NTC (in 2.0 mode) have been observer to
> frequently, but intermittently, fail probe with:
> tpm_tis: probe of 00:09 failed with error -1
>
> Added debugging output showed that the request_locality in
> tpm_tis_core_init succeeds, but then the tpm_chip_start fails when its
> call to tpm_request_locality -> request_locality fails.
>
> The access register in check_locality would show:
> 0x80 TPM_ACCESS_VALID
> 0x82 TPM_ACCESS_VALID | TPM_ACCESS_REQUEST_USE
> 0x80 TPM_ACCESS_VALID
> continuing until it times out. TPM_ACCESS_ACTIVE_LOCALITY (0x20) doesn't
> get set which would end the wait.
>
> My best guess is something racy was going on between release_locality's
> write and request_locality's write. There is no wait in
> release_locality to ensure that the locality is released, so the
> subsequent request_locality could confuse the TPM?
>
> tpm_chip_start grabs locality 0, and updates chip->locality. Call that
> before the TPM_INT_ENABLE write, and drop the explicit request/release
> calls. tpm_chip_stop performs the release. With this, we switch to
> using chip->locality instead of priv->locality. The probe failure is
> not seen after this.
>
> commit 0ef333f5ba7f ("tpm: add request_locality before write
> TPM_INT_ENABLE") added a request_locality/release_locality pair around
> tpm_tis_write32 TPM_INT_ENABLE, but there is a read of
> TPM_INT_ENABLE for the intmask which should also have the locality
> grabbed. tpm_chip_start is moved before that to have the locality open
> during the read.
>
> Fixes: 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE")
> CC: [email protected]
> Signed-off-by: Jason Andryuk <[email protected]>
> ---
> The probe failure was seen on 5.4, 5.15 and 5.17.
>
> commit e42acf104d6e ("tpm_tis: Clean up locality release") removed the
> release wait. I haven't tried, but re-introducing that would probably
> fix this issue. It's hard to know apriori when a synchronous wait is
> needed, and they don't seem to be needed typically. Re-introducing the
> wait would re-introduce a wait in all cases.
>
> Surrounding the read of TPM_INT_ENABLE with grabbing the locality may
> not be necessary? It looks like the code only grabs a locality for
> writing, but that asymmetry is surprising to me.
>
> tpm_chip and tpm_tis_data track the locality separately. Should the
> tpm_tis_data one be removed so they don't get out of sync?
> ---
> drivers/char/tpm/tpm_tis_core.c | 20 ++++++++------------
> 1 file changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index dc56b976d816..529c241800c0 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -986,8 +986,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
> goto out_err;
> }
>
> + /* Grabs locality 0. */
> + rc = tpm_chip_start(chip);
> + if (rc)
> + goto out_err;
> +
> /* Take control of the TPM's interrupt hardware and shut it off */
> - rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
> + rc = tpm_tis_read32(priv, TPM_INT_ENABLE(chip->locality), &intmask);
> if (rc < 0)
> goto out_err;
>
> @@ -995,19 +1000,10 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
> TPM_INTF_DATA_AVAIL_INT | TPM_INTF_STS_VALID_INT;
> intmask &= ~TPM_GLOBAL_INT_ENABLE;
>
> - rc = request_locality(chip, 0);
> - if (rc < 0) {
> - rc = -ENODEV;
> - goto out_err;
> - }
> -
> - tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> - release_locality(chip, 0);
> + tpm_tis_write32(priv, TPM_INT_ENABLE(chip->locality), intmask);
>
> - rc = tpm_chip_start(chip);
> - if (rc)
> - goto out_err;
> rc = tpm2_probe(chip);
> + /* Releases locality 0. */
> tpm_chip_stop(chip);
> if (rc)
> goto out_err;
> --
> 2.36.1
>
Can you test against
https://lore.kernel.org/linux-integrity/[email protected]/T/#t
BR, Jarkko
On Sun, Jul 10, 2022 at 10:58 PM Jarkko Sakkinen <[email protected]> wrote:
>
> On Wed, Jul 06, 2022 at 12:40:43PM -0400, Jason Andryuk wrote:
> > WEC TPMs (in 1.2 mode) and NTC (in 2.0 mode) have been observer to
> > frequently, but intermittently, fail probe with:
> > tpm_tis: probe of 00:09 failed with error -1
> >
> > Added debugging output showed that the request_locality in
> > tpm_tis_core_init succeeds, but then the tpm_chip_start fails when its
> > call to tpm_request_locality -> request_locality fails.
> >
> > The access register in check_locality would show:
> > 0x80 TPM_ACCESS_VALID
> > 0x82 TPM_ACCESS_VALID | TPM_ACCESS_REQUEST_USE
> > 0x80 TPM_ACCESS_VALID
> > continuing until it times out. TPM_ACCESS_ACTIVE_LOCALITY (0x20) doesn't
> > get set which would end the wait.
> >
> > My best guess is something racy was going on between release_locality's
> > write and request_locality's write. There is no wait in
> > release_locality to ensure that the locality is released, so the
> > subsequent request_locality could confuse the TPM?
> >
> > tpm_chip_start grabs locality 0, and updates chip->locality. Call that
> > before the TPM_INT_ENABLE write, and drop the explicit request/release
> > calls. tpm_chip_stop performs the release. With this, we switch to
> > using chip->locality instead of priv->locality. The probe failure is
> > not seen after this.
> >
> > commit 0ef333f5ba7f ("tpm: add request_locality before write
> > TPM_INT_ENABLE") added a request_locality/release_locality pair around
> > tpm_tis_write32 TPM_INT_ENABLE, but there is a read of
> > TPM_INT_ENABLE for the intmask which should also have the locality
> > grabbed. tpm_chip_start is moved before that to have the locality open
> > during the read.
> >
> > Fixes: 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE")
> > CC: [email protected]
> > Signed-off-by: Jason Andryuk <[email protected]>
> > ---
> > The probe failure was seen on 5.4, 5.15 and 5.17.
> >
> > commit e42acf104d6e ("tpm_tis: Clean up locality release") removed the
> > release wait. I haven't tried, but re-introducing that would probably
> > fix this issue. It's hard to know apriori when a synchronous wait is
> > needed, and they don't seem to be needed typically. Re-introducing the
> > wait would re-introduce a wait in all cases.
> >
> > Surrounding the read of TPM_INT_ENABLE with grabbing the locality may
> > not be necessary? It looks like the code only grabs a locality for
> > writing, but that asymmetry is surprising to me.
> >
> > tpm_chip and tpm_tis_data track the locality separately. Should the
> > tpm_tis_data one be removed so they don't get out of sync?
> > ---
> > drivers/char/tpm/tpm_tis_core.c | 20 ++++++++------------
> > 1 file changed, 8 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> > index dc56b976d816..529c241800c0 100644
> > --- a/drivers/char/tpm/tpm_tis_core.c
> > +++ b/drivers/char/tpm/tpm_tis_core.c
> > @@ -986,8 +986,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
> > goto out_err;
> > }
> >
> > + /* Grabs locality 0. */
> > + rc = tpm_chip_start(chip);
> > + if (rc)
> > + goto out_err;
> > +
> > /* Take control of the TPM's interrupt hardware and shut it off */
> > - rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
> > + rc = tpm_tis_read32(priv, TPM_INT_ENABLE(chip->locality), &intmask);
> > if (rc < 0)
> > goto out_err;
> >
> > @@ -995,19 +1000,10 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
> > TPM_INTF_DATA_AVAIL_INT | TPM_INTF_STS_VALID_INT;
> > intmask &= ~TPM_GLOBAL_INT_ENABLE;
> >
> > - rc = request_locality(chip, 0);
> > - if (rc < 0) {
> > - rc = -ENODEV;
> > - goto out_err;
> > - }
> > -
> > - tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> > - release_locality(chip, 0);
> > + tpm_tis_write32(priv, TPM_INT_ENABLE(chip->locality), intmask);
> >
> > - rc = tpm_chip_start(chip);
> > - if (rc)
> > - goto out_err;
> > rc = tpm2_probe(chip);
> > + /* Releases locality 0. */
> > tpm_chip_stop(chip);
> > if (rc)
> > goto out_err;
> > --
> > 2.36.1
> >
>
> Can you test against
>
> https://lore.kernel.org/linux-integrity/[email protected]/T/#t
I applied on top of 5.15.53, and the probe on boot still fails.
Manually probing works intermittently.
Regards,
Jason
On Mon, Jul 11, 2022 at 3:37 PM Jason Andryuk <[email protected]> wrote:
>
> On Sun, Jul 10, 2022 at 10:58 PM Jarkko Sakkinen <[email protected]> wrote:
> >
> > Can you test against
> >
> > https://lore.kernel.org/linux-integrity/[email protected]/T/#t
>
> I applied on top of 5.15.53, and the probe on boot still fails.
> Manually probing works intermittently.
On top of Lino Sanfilippo's patch queue, I added an additional
tpm_tis_request_locality and tpm_tis_release_locality to hold the
locality open via refcount in tpm_tis_chip_init. Similar to my patch
in this thread, it acquires the locality before the TPM_INT_ENABLE
write and holds it until after the TPM_RID read. That fixes the probe
issue on the box where I tested.
While tpm_tis_core_init is definitely a problem, I wonder if there are
other code paths that could trigger this acquire -> release ->
acquire issue. In that light, restoring a wait by reverting commit
e42acf104d6e ("tpm_tis: Clean up locality release") seems safer since
it would cover any code path. I just tested reverting that and it
still fails to probe on boot and intermittently. I'm surprised since
I expected it to solve the issue, but my original debugging was
showing TPM_ACCESS_ACTIVE_LOCALITY cleared (and never re-set) so the
driver isn't actually waiting.
All this locality acquiring and releasing seems rather redundant. If
locality 0 is all that is ever used, why not just hold it open? I
guess Trench Boot/Secure Launch wants to use locality 2
(https://lore.kernel.org/lkml/[email protected]/),
but in either case doesn't the system just stay in the assigned
locality? I haven't read the spec, so maybe that is disallowed.
There is something nice about cleaning up and releasing the locality
when not in use, but it's also causing a problem here.
Regards,
Jason