2020-10-15 22:27:18

by Jerry Snitselaar

[permalink] [raw]
Subject: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

There is a misconfiguration in the bios of the gpio pin used for the
interrupt in the T490s. When interrupts are enabled in the tpm_tis
driver code this results in an interrupt storm. This was initially
reported when we attempted to enable the interrupt code in the tpm_tis
driver, which previously wasn't setting a flag to enable it. Due to
the reports of the interrupt storm that code was reverted and we went back
to polling instead of using interrupts. Now that we know the T490s problem
is a firmware issue, add code to check if the system is a T490s and
disable interrupts if that is the case. This will allow us to enable
interrupts for everyone else. If the user has a fixed bios they can
force the enabling of interrupts with tpm_tis.interrupts=1 on the
kernel command line.

Cc: [email protected]
Cc: Peter Huewe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Hans de Goede <[email protected]>
Reviewed-by: James Bottomley <[email protected]>
Signed-off-by: Jerry Snitselaar <[email protected]>
---
drivers/char/tpm/tpm_tis.c | 29 +++++++++++++++++++++++++++--
1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index 0b214963539d..4ed6e660273a 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -27,6 +27,7 @@
#include <linux/of.h>
#include <linux/of_device.h>
#include <linux/kernel.h>
+#include <linux/dmi.h>
#include "tpm.h"
#include "tpm_tis_core.h"

@@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
return container_of(data, struct tpm_tis_tcg_phy, priv);
}

-static bool interrupts = true;
-module_param(interrupts, bool, 0444);
+static int interrupts = -1;
+module_param(interrupts, int, 0444);
MODULE_PARM_DESC(interrupts, "Enable interrupts");

static bool itpm;
@@ -63,6 +64,28 @@ module_param(force, bool, 0444);
MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry");
#endif

+static int tpm_tis_disable_irq(const struct dmi_system_id *d)
+{
+ if (interrupts == -1) {
+ pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident);
+ interrupts = 0;
+ }
+
+ return 0;
+}
+
+static const struct dmi_system_id tpm_tis_dmi_table[] = {
+ {
+ .callback = tpm_tis_disable_irq,
+ .ident = "ThinkPad T490s",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"),
+ },
+ },
+ {}
+};
+
#if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
static int has_hid(struct acpi_device *dev, const char *hid)
{
@@ -192,6 +215,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info)
int irq = -1;
int rc;

+ dmi_check_system(tpm_tis_dmi_table);
+
rc = check_acpi_tpm2(dev);
if (rc)
return rc;
--
2.28.0


2020-10-15 22:28:21

by Jerry Snitselaar

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s


James should this get tacked on the end of your patchset?

Regards,
Jerry

2020-10-15 22:42:25

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
>
> There is a misconfiguration in the bios of the gpio pin used for the
> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> driver code this results in an interrupt storm. This was initially
> reported when we attempted to enable the interrupt code in the tpm_tis
> driver, which previously wasn't setting a flag to enable it. Due to
> the reports of the interrupt storm that code was reverted and we went back
> to polling instead of using interrupts. Now that we know the T490s problem
> is a firmware issue, add code to check if the system is a T490s and
> disable interrupts if that is the case. This will allow us to enable
> interrupts for everyone else. If the user has a fixed bios they can
> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> kernel command line.

I think an implication of this is that systems haven't been
well-tested with interrupts enabled. In general when we've found a
firmware issue in one place it ends up happening elsewhere as well, so
it wouldn't surprise me if there are other machines that will also be
unhappy with interrupts enabled. Would it be possible to automatically
detect this case (eg, if we get more than a certain number of
interrupts in a certain timeframe immediately after enabling the
interrupt) and automatically fall back to polling in that case? It
would also mean that users with fixed firmware wouldn't need to pass a
parameter.

2020-10-16 06:21:17

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

Hi,

On 10/15/20 11:44 PM, Jerry Snitselaar wrote:
> There is a misconfiguration in the bios of the gpio pin used for the
> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> driver code this results in an interrupt storm. This was initially
> reported when we attempted to enable the interrupt code in the tpm_tis
> driver, which previously wasn't setting a flag to enable it. Due to
> the reports of the interrupt storm that code was reverted and we went back
> to polling instead of using interrupts. Now that we know the T490s problem
> is a firmware issue, add code to check if the system is a T490s and
> disable interrupts if that is the case. This will allow us to enable
> interrupts for everyone else. If the user has a fixed bios they can
> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> kernel command line.
>
> Cc: [email protected]
> Cc: Peter Huewe <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Hans de Goede <[email protected]>
> Reviewed-by: James Bottomley <[email protected]>
> Signed-off-by: Jerry Snitselaar <[email protected]>

Patch looks good to me:

Reviewed-by: Hans de Goede <[email protected]>

Regards,

Hans


> ---
> drivers/char/tpm/tpm_tis.c | 29 +++++++++++++++++++++++++++--
> 1 file changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
> index 0b214963539d..4ed6e660273a 100644
> --- a/drivers/char/tpm/tpm_tis.c
> +++ b/drivers/char/tpm/tpm_tis.c
> @@ -27,6 +27,7 @@
> #include <linux/of.h>
> #include <linux/of_device.h>
> #include <linux/kernel.h>
> +#include <linux/dmi.h>
> #include "tpm.h"
> #include "tpm_tis_core.h"
>
> @@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
> return container_of(data, struct tpm_tis_tcg_phy, priv);
> }
>
> -static bool interrupts = true;
> -module_param(interrupts, bool, 0444);
> +static int interrupts = -1;
> +module_param(interrupts, int, 0444);
> MODULE_PARM_DESC(interrupts, "Enable interrupts");
>
> static bool itpm;
> @@ -63,6 +64,28 @@ module_param(force, bool, 0444);
> MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry");
> #endif
>
> +static int tpm_tis_disable_irq(const struct dmi_system_id *d)
> +{
> + if (interrupts == -1) {
> + pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident);
> + interrupts = 0;
> + }
> +
> + return 0;
> +}
> +
> +static const struct dmi_system_id tpm_tis_dmi_table[] = {
> + {
> + .callback = tpm_tis_disable_irq,
> + .ident = "ThinkPad T490s",
> + .matches = {
> + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"),
> + },
> + },
> + {}
> +};
> +
> #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
> static int has_hid(struct acpi_device *dev, const char *hid)
> {
> @@ -192,6 +215,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info)
> int irq = -1;
> int rc;
>
> + dmi_check_system(tpm_tis_dmi_table);
> +
> rc = check_acpi_tpm2(dev);
> if (rc)
> return rc;
>

2020-10-16 08:24:23

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

Hi,

On 10/16/20 12:39 AM, Matthew Garrett wrote:
> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
>>
>> There is a misconfiguration in the bios of the gpio pin used for the
>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>> driver code this results in an interrupt storm. This was initially
>> reported when we attempted to enable the interrupt code in the tpm_tis
>> driver, which previously wasn't setting a flag to enable it. Due to
>> the reports of the interrupt storm that code was reverted and we went back
>> to polling instead of using interrupts. Now that we know the T490s problem
>> is a firmware issue, add code to check if the system is a T490s and
>> disable interrupts if that is the case. This will allow us to enable
>> interrupts for everyone else. If the user has a fixed bios they can
>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>> kernel command line.
>
> I think an implication of this is that systems haven't been
> well-tested with interrupts enabled. In general when we've found a
> firmware issue in one place it ends up happening elsewhere as well, so
> it wouldn't surprise me if there are other machines that will also be
> unhappy with interrupts enabled. Would it be possible to automatically
> detect this case (eg, if we get more than a certain number of
> interrupts in a certain timeframe immediately after enabling the
> interrupt) and automatically fall back to polling in that case? It
> would also mean that users with fixed firmware wouldn't need to pass a
> parameter.

IIRC then at least on the T490 the irq storm caused systems to not boot
in some cases. I guess if we detect the storm and disable the irq we might
fix that... OTOH this problem seems to only hit a certain generation of
Thinkpads so with some luck the denylist should not be too big and the denylist
approach should work.

Regards,

Hans

2020-10-19 09:38:41

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

On Thu, Oct 15, 2020 at 02:44:30PM -0700, Jerry Snitselaar wrote:
> There is a misconfiguration in the bios of the gpio pin used for the
> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> driver code this results in an interrupt storm. This was initially
> reported when we attempted to enable the interrupt code in the tpm_tis
> driver, which previously wasn't setting a flag to enable it. Due to
> the reports of the interrupt storm that code was reverted and we went back
> to polling instead of using interrupts. Now that we know the T490s problem
> is a firmware issue, add code to check if the system is a T490s and
> disable interrupts if that is the case. This will allow us to enable
> interrupts for everyone else. If the user has a fixed bios they can
> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> kernel command line.
>
> Cc: [email protected]
> Cc: Peter Huewe <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Hans de Goede <[email protected]>
> Reviewed-by: James Bottomley <[email protected]>
> Signed-off-by: Jerry Snitselaar <[email protected]>

Reviewed-by: Jarkko Sakkinen <[email protected]>

I'll apply this and make it available in linux-next.

/Jarkko

> ---
> drivers/char/tpm/tpm_tis.c | 29 +++++++++++++++++++++++++++--
> 1 file changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
> index 0b214963539d..4ed6e660273a 100644
> --- a/drivers/char/tpm/tpm_tis.c
> +++ b/drivers/char/tpm/tpm_tis.c
> @@ -27,6 +27,7 @@
> #include <linux/of.h>
> #include <linux/of_device.h>
> #include <linux/kernel.h>
> +#include <linux/dmi.h>
> #include "tpm.h"
> #include "tpm_tis_core.h"
>
> @@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
> return container_of(data, struct tpm_tis_tcg_phy, priv);
> }
>
> -static bool interrupts = true;
> -module_param(interrupts, bool, 0444);
> +static int interrupts = -1;
> +module_param(interrupts, int, 0444);
> MODULE_PARM_DESC(interrupts, "Enable interrupts");
>
> static bool itpm;
> @@ -63,6 +64,28 @@ module_param(force, bool, 0444);
> MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry");
> #endif
>
> +static int tpm_tis_disable_irq(const struct dmi_system_id *d)
> +{
> + if (interrupts == -1) {
> + pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident);
> + interrupts = 0;
> + }
> +
> + return 0;
> +}
> +
> +static const struct dmi_system_id tpm_tis_dmi_table[] = {
> + {
> + .callback = tpm_tis_disable_irq,
> + .ident = "ThinkPad T490s",
> + .matches = {
> + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"),
> + },
> + },
> + {}
> +};
> +
> #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
> static int has_hid(struct acpi_device *dev, const char *hid)
> {
> @@ -192,6 +215,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info)
> int irq = -1;
> int rc;
>
> + dmi_check_system(tpm_tis_dmi_table);
> +
> rc = check_acpi_tpm2(dev);
> if (rc)
> return rc;
> --
> 2.28.0
>
>

2020-10-19 09:39:03

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

On Mon, Oct 19, 2020 at 12:11:44AM +0300, Jarkko Sakkinen wrote:
> On Thu, Oct 15, 2020 at 02:44:30PM -0700, Jerry Snitselaar wrote:
> > There is a misconfiguration in the bios of the gpio pin used for the
> > interrupt in the T490s. When interrupts are enabled in the tpm_tis
> > driver code this results in an interrupt storm. This was initially
> > reported when we attempted to enable the interrupt code in the tpm_tis
> > driver, which previously wasn't setting a flag to enable it. Due to
> > the reports of the interrupt storm that code was reverted and we went back
> > to polling instead of using interrupts. Now that we know the T490s problem
> > is a firmware issue, add code to check if the system is a T490s and
> > disable interrupts if that is the case. This will allow us to enable
> > interrupts for everyone else. If the user has a fixed bios they can
> > force the enabling of interrupts with tpm_tis.interrupts=1 on the
> > kernel command line.
> >
> > Cc: [email protected]
> > Cc: Peter Huewe <[email protected]>
> > Cc: Jason Gunthorpe <[email protected]>
> > Cc: Hans de Goede <[email protected]>
> > Reviewed-by: James Bottomley <[email protected]>
> > Signed-off-by: Jerry Snitselaar <[email protected]>
>
> Reviewed-by: Jarkko Sakkinen <[email protected]>
>
> I'll apply this and make it available in linux-next.

Applied.

Thank you.

/Jarkko

2020-11-19 06:40:37

by Jerry Snitselaar

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s


Matthew Garrett @ 2020-10-15 15:39 MST:

> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
>>
>> There is a misconfiguration in the bios of the gpio pin used for the
>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>> driver code this results in an interrupt storm. This was initially
>> reported when we attempted to enable the interrupt code in the tpm_tis
>> driver, which previously wasn't setting a flag to enable it. Due to
>> the reports of the interrupt storm that code was reverted and we went back
>> to polling instead of using interrupts. Now that we know the T490s problem
>> is a firmware issue, add code to check if the system is a T490s and
>> disable interrupts if that is the case. This will allow us to enable
>> interrupts for everyone else. If the user has a fixed bios they can
>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>> kernel command line.
>
> I think an implication of this is that systems haven't been
> well-tested with interrupts enabled. In general when we've found a
> firmware issue in one place it ends up happening elsewhere as well, so
> it wouldn't surprise me if there are other machines that will also be
> unhappy with interrupts enabled. Would it be possible to automatically
> detect this case (eg, if we get more than a certain number of
> interrupts in a certain timeframe immediately after enabling the
> interrupt) and automatically fall back to polling in that case? It
> would also mean that users with fixed firmware wouldn't need to pass a
> parameter.

I believe Matthew is correct here. I found another system today
with completely different vendor for both the system and the tpm chip.
In addition another Lenovo model, the L490, has the issue.

This initial attempt at a solution like Matthew suggested works on
the system I found today, but I imagine it is all sorts of wrong.
In the 2 systems where I've seen it, there are about 100000 interrupts
in around 1.5 seconds, and then the irq code shuts down the interrupt
because they aren't being handled.


diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 49ae09ac604f..478e9d02a3fa 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -27,6 +27,11 @@
#include "tpm.h"
#include "tpm_tis_core.h"

+static unsigned int time_start = 0;
+static bool storm_check = true;
+static bool storm_killed = false;
+static u32 irqs_fired = 0;
+
static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);

static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
@@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
return rc;
}

-static void disable_interrupts(struct tpm_chip *chip)
+static void __disable_interrupts(struct tpm_chip *chip)
{
struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
u32 intmask;
int rc;

- if (priv->irq == 0)
- return;
-
rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
if (rc < 0)
intmask = 0;

intmask &= ~TPM_GLOBAL_INT_ENABLE;
rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
+ chip->flags &= ~TPM_CHIP_FLAG_IRQ;
+}
+
+static void disable_interrupts(struct tpm_chip *chip)
+{
+ struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);

+ if (priv->irq == 0)
+ return;
+
+ __disable_interrupts(chip);
devm_free_irq(chip->dev.parent, priv->irq, chip);
priv->irq = 0;
- chip->flags &= ~TPM_CHIP_FLAG_IRQ;
}

/*
@@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
int rc, irq;
struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);

+ if (unlikely(storm_killed)) {
+ devm_free_irq(chip->dev.parent, priv->irq, chip);
+ priv->irq = 0;
+ storm_killed = false;
+ }
+
if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
return tpm_tis_send_main(chip, buf, len);

@@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
u32 interrupt;
int i, rc;

+ if (storm_check) {
+ irqs_fired++;
+
+ if (!time_start) {
+ time_start = jiffies_to_msecs(jiffies);
+ } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
+ __disable_interrupts(chip);
+ storm_check = false;
+ storm_killed = true;
+ return IRQ_HANDLED;
+ } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
+ storm_check = false;
+ }
+ }
+
rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
if (rc < 0)
return IRQ_NONE;

2020-11-19 14:46:19

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

Hi,

On 11/19/20 7:36 AM, Jerry Snitselaar wrote:
>
> Matthew Garrett @ 2020-10-15 15:39 MST:
>
>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
>>>
>>> There is a misconfiguration in the bios of the gpio pin used for the
>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>>> driver code this results in an interrupt storm. This was initially
>>> reported when we attempted to enable the interrupt code in the tpm_tis
>>> driver, which previously wasn't setting a flag to enable it. Due to
>>> the reports of the interrupt storm that code was reverted and we went back
>>> to polling instead of using interrupts. Now that we know the T490s problem
>>> is a firmware issue, add code to check if the system is a T490s and
>>> disable interrupts if that is the case. This will allow us to enable
>>> interrupts for everyone else. If the user has a fixed bios they can
>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>>> kernel command line.
>>
>> I think an implication of this is that systems haven't been
>> well-tested with interrupts enabled. In general when we've found a
>> firmware issue in one place it ends up happening elsewhere as well, so
>> it wouldn't surprise me if there are other machines that will also be
>> unhappy with interrupts enabled. Would it be possible to automatically
>> detect this case (eg, if we get more than a certain number of
>> interrupts in a certain timeframe immediately after enabling the
>> interrupt) and automatically fall back to polling in that case? It
>> would also mean that users with fixed firmware wouldn't need to pass a
>> parameter.
>
> I believe Matthew is correct here. I found another system today
> with completely different vendor for both the system and the tpm chip.
> In addition another Lenovo model, the L490, has the issue.
>
> This initial attempt at a solution like Matthew suggested works on
> the system I found today, but I imagine it is all sorts of wrong.
> In the 2 systems where I've seen it, there are about 100000 interrupts
> in around 1.5 seconds, and then the irq code shuts down the interrupt
> because they aren't being handled.

Is that with your patch? The IRQ should be silenced as soon as
devm_free_irq(chip->dev.parent, priv->irq, chip); is called.

Depending on if we can get your storm-detection to work or not,
we might also choose to just never try to use the IRQ (at least on
x86 systems). AFAIK the TPM is never used for high-throughput stuff
so the polling overhead should not be a big deal (and I'm getting the feeling
that Windows always polls).

Regards,

Hans



> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 49ae09ac604f..478e9d02a3fa 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -27,6 +27,11 @@
> #include "tpm.h"
> #include "tpm_tis_core.h"
>
> +static unsigned int time_start = 0;
> +static bool storm_check = true;
> +static bool storm_killed = false;
> +static u32 irqs_fired = 0;
> +
> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
>
> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
> return rc;
> }
>
> -static void disable_interrupts(struct tpm_chip *chip)
> +static void __disable_interrupts(struct tpm_chip *chip)
> {
> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> u32 intmask;
> int rc;
>
> - if (priv->irq == 0)
> - return;
> -
> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
> if (rc < 0)
> intmask = 0;
>
> intmask &= ~TPM_GLOBAL_INT_ENABLE;
> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> + chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> +}
> +
> +static void disable_interrupts(struct tpm_chip *chip)
> +{
> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>
> + if (priv->irq == 0)
> + return;
> +
> + __disable_interrupts(chip);
> devm_free_irq(chip->dev.parent, priv->irq, chip);
> priv->irq = 0;
> - chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> }
>
> /*
> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
> int rc, irq;
> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>
> + if (unlikely(storm_killed)) {
> + devm_free_irq(chip->dev.parent, priv->irq, chip);
> + priv->irq = 0;
> + storm_killed = false;
> + }
> +
> if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
> return tpm_tis_send_main(chip, buf, len);
>
> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
> u32 interrupt;
> int i, rc;
>
> + if (storm_check) {
> + irqs_fired++;
> +
> + if (!time_start) {
> + time_start = jiffies_to_msecs(jiffies);
> + } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
> + __disable_interrupts(chip);
> + storm_check = false;
> + storm_killed = true;
> + return IRQ_HANDLED;
> + } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
> + storm_check = false;
> + }
> + }
> +
> rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
> if (rc < 0)
> return IRQ_NONE;
>

2020-11-19 17:08:32

by Jerry Snitselaar

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s


Hans de Goede @ 2020-11-19 07:42 MST:

> Hi,
>
> On 11/19/20 7:36 AM, Jerry Snitselaar wrote:
>>
>> Matthew Garrett @ 2020-10-15 15:39 MST:
>>
>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
>>>>
>>>> There is a misconfiguration in the bios of the gpio pin used for the
>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>>>> driver code this results in an interrupt storm. This was initially
>>>> reported when we attempted to enable the interrupt code in the tpm_tis
>>>> driver, which previously wasn't setting a flag to enable it. Due to
>>>> the reports of the interrupt storm that code was reverted and we went back
>>>> to polling instead of using interrupts. Now that we know the T490s problem
>>>> is a firmware issue, add code to check if the system is a T490s and
>>>> disable interrupts if that is the case. This will allow us to enable
>>>> interrupts for everyone else. If the user has a fixed bios they can
>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>>>> kernel command line.
>>>
>>> I think an implication of this is that systems haven't been
>>> well-tested with interrupts enabled. In general when we've found a
>>> firmware issue in one place it ends up happening elsewhere as well, so
>>> it wouldn't surprise me if there are other machines that will also be
>>> unhappy with interrupts enabled. Would it be possible to automatically
>>> detect this case (eg, if we get more than a certain number of
>>> interrupts in a certain timeframe immediately after enabling the
>>> interrupt) and automatically fall back to polling in that case? It
>>> would also mean that users with fixed firmware wouldn't need to pass a
>>> parameter.
>>
>> I believe Matthew is correct here. I found another system today
>> with completely different vendor for both the system and the tpm chip.
>> In addition another Lenovo model, the L490, has the issue.
>>
>> This initial attempt at a solution like Matthew suggested works on
>> the system I found today, but I imagine it is all sorts of wrong.
>> In the 2 systems where I've seen it, there are about 100000 interrupts
>> in around 1.5 seconds, and then the irq code shuts down the interrupt
>> because they aren't being handled.
>
> Is that with your patch? The IRQ should be silenced as soon as
> devm_free_irq(chip->dev.parent, priv->irq, chip); is called.
>

No that is just with James' patchset that enables interrupts for
tpm_tis. It looks like the irq is firing, but the tpm's int_status
register is clear, so the handler immediately returns IRQ_NONE. After
that happens 100000 times the core irq code shuts down the irq, but it
isn't released so I could still see the stats in /proc/interrupts. With
my attempt below on top of that patchset it releases the irq. I had to
stick the check prior to it checking the int_status register because it
is cleared and the handler returns, and I couldn't do the devm_free_irq
directly in tis_int_handler, so I tried sticking it in tpm_tis_send
where the other odd irq testing code was already located. I'm not sure
if there is another place that would work better for calling the
devm_free_irq.

> Depending on if we can get your storm-detection to work or not,
> we might also choose to just never try to use the IRQ (at least on
> x86 systems). AFAIK the TPM is never used for high-throughput stuff
> so the polling overhead should not be a big deal (and I'm getting the feeling
> that Windows always polls).
>

I was wondering about Windows as well. In addition to the Lenovo systems
which the T490s had Nuvoton tpm, the system I found yesterday was a development
system we have from a partner with an Infineon tpm. Dan Williams has
seen it internally at Intel as well on some system.

> Regards,
>
> Hans
>
>
>
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 49ae09ac604f..478e9d02a3fa 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -27,6 +27,11 @@
>> #include "tpm.h"
>> #include "tpm_tis_core.h"
>>
>> +static unsigned int time_start = 0;
>> +static bool storm_check = true;
>> +static bool storm_killed = false;
>> +static u32 irqs_fired = 0;
>> +
>> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
>>
>> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
>> return rc;
>> }
>>
>> -static void disable_interrupts(struct tpm_chip *chip)
>> +static void __disable_interrupts(struct tpm_chip *chip)
>> {
>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>> u32 intmask;
>> int rc;
>>
>> - if (priv->irq == 0)
>> - return;
>> -
>> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>> if (rc < 0)
>> intmask = 0;
>>
>> intmask &= ~TPM_GLOBAL_INT_ENABLE;
>> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
>> + chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>> +}
>> +
>> +static void disable_interrupts(struct tpm_chip *chip)
>> +{
>> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>
>> + if (priv->irq == 0)
>> + return;
>> +
>> + __disable_interrupts(chip);
>> devm_free_irq(chip->dev.parent, priv->irq, chip);
>> priv->irq = 0;
>> - chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>> }
>>
>> /*
>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
>> int rc, irq;
>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>
>> + if (unlikely(storm_killed)) {
>> + devm_free_irq(chip->dev.parent, priv->irq, chip);
>> + priv->irq = 0;
>> + storm_killed = false;
>> + }
>> +
>> if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
>> return tpm_tis_send_main(chip, buf, len);
>>
>> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
>> u32 interrupt;
>> int i, rc;
>>
>> + if (storm_check) {
>> + irqs_fired++;
>> +
>> + if (!time_start) {
>> + time_start = jiffies_to_msecs(jiffies);
>> + } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
>> + __disable_interrupts(chip);
>> + storm_check = false;
>> + storm_killed = true;
>> + return IRQ_HANDLED;
>> + } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
>> + storm_check = false;
>> + }
>> + }
>> +
>> rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
>> if (rc < 0)
>> return IRQ_NONE;
>>

2020-11-23 12:23:41

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

Hi,

On 11/19/20 6:05 PM, Jerry Snitselaar wrote:
>
> Hans de Goede @ 2020-11-19 07:42 MST:
>
>> Hi,
>>
>> On 11/19/20 7:36 AM, Jerry Snitselaar wrote:
>>>
>>> Matthew Garrett @ 2020-10-15 15:39 MST:
>>>
>>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
>>>>>
>>>>> There is a misconfiguration in the bios of the gpio pin used for the
>>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>>>>> driver code this results in an interrupt storm. This was initially
>>>>> reported when we attempted to enable the interrupt code in the tpm_tis
>>>>> driver, which previously wasn't setting a flag to enable it. Due to
>>>>> the reports of the interrupt storm that code was reverted and we went back
>>>>> to polling instead of using interrupts. Now that we know the T490s problem
>>>>> is a firmware issue, add code to check if the system is a T490s and
>>>>> disable interrupts if that is the case. This will allow us to enable
>>>>> interrupts for everyone else. If the user has a fixed bios they can
>>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>>>>> kernel command line.
>>>>
>>>> I think an implication of this is that systems haven't been
>>>> well-tested with interrupts enabled. In general when we've found a
>>>> firmware issue in one place it ends up happening elsewhere as well, so
>>>> it wouldn't surprise me if there are other machines that will also be
>>>> unhappy with interrupts enabled. Would it be possible to automatically
>>>> detect this case (eg, if we get more than a certain number of
>>>> interrupts in a certain timeframe immediately after enabling the
>>>> interrupt) and automatically fall back to polling in that case? It
>>>> would also mean that users with fixed firmware wouldn't need to pass a
>>>> parameter.
>>>
>>> I believe Matthew is correct here. I found another system today
>>> with completely different vendor for both the system and the tpm chip.
>>> In addition another Lenovo model, the L490, has the issue.
>>>
>>> This initial attempt at a solution like Matthew suggested works on
>>> the system I found today, but I imagine it is all sorts of wrong.
>>> In the 2 systems where I've seen it, there are about 100000 interrupts
>>> in around 1.5 seconds, and then the irq code shuts down the interrupt
>>> because they aren't being handled.
>>
>> Is that with your patch? The IRQ should be silenced as soon as
>> devm_free_irq(chip->dev.parent, priv->irq, chip); is called.
>>
>
> No that is just with James' patchset that enables interrupts for
> tpm_tis. It looks like the irq is firing, but the tpm's int_status
> register is clear, so the handler immediately returns IRQ_NONE. After
> that happens 100000 times the core irq code shuts down the irq, but it
> isn't released so I could still see the stats in /proc/interrupts.

I see, yes I have seen this behavior on the X1C8 with a pre-production BIOS.

> With
> my attempt below on top of that patchset it releases the irq. I had to
> stick the check prior to it checking the int_status register because it
> is cleared and the handler returns,

Ack.

> and I couldn't do the devm_free_irq
> directly in tis_int_handler, so I tried sticking it in tpm_tis_send
> where the other odd irq testing code was already located. I'm not sure
> if there is another place that would work better for calling the
> devm_free_irq.

Adding it together with the other IRQ testing related code seems fine
to me.

>> Depending on if we can get your storm-detection to work or not,
>> we might also choose to just never try to use the IRQ (at least on
>> x86 systems). AFAIK the TPM is never used for high-throughput stuff
>> so the polling overhead should not be a big deal (and I'm getting the feeling
>> that Windows always polls).
>>
>
> I was wondering about Windows as well. In addition to the Lenovo systems
> which the T490s had Nuvoton tpm, the system I found yesterday was a development
> system we have from a partner with an Infineon tpm. Dan Williams has
> seen it internally at Intel as well on some system.

Sounds to me like Windows never uses the IRQ, so we get the fun of finding
all the untested firmware bugs.

So if we cannot get this detection code to work reliable, maybe we should
just not use the IRQ at all, at least on X86 + ACPI systems?

Anyways lets try this storm-detection thing first, but I have the feeling
we should not spend too much time on this. Just outright disabling IRQ
support might be better.

REgards,

Hans

2020-11-24 03:47:28

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
>
> Matthew Garrett @ 2020-10-15 15:39 MST:
>
> > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
> >>
> >> There is a misconfiguration in the bios of the gpio pin used for the
> >> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> >> driver code this results in an interrupt storm. This was initially
> >> reported when we attempted to enable the interrupt code in the tpm_tis
> >> driver, which previously wasn't setting a flag to enable it. Due to
> >> the reports of the interrupt storm that code was reverted and we went back
> >> to polling instead of using interrupts. Now that we know the T490s problem
> >> is a firmware issue, add code to check if the system is a T490s and
> >> disable interrupts if that is the case. This will allow us to enable
> >> interrupts for everyone else. If the user has a fixed bios they can
> >> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> >> kernel command line.
> >
> > I think an implication of this is that systems haven't been
> > well-tested with interrupts enabled. In general when we've found a
> > firmware issue in one place it ends up happening elsewhere as well, so
> > it wouldn't surprise me if there are other machines that will also be
> > unhappy with interrupts enabled. Would it be possible to automatically
> > detect this case (eg, if we get more than a certain number of
> > interrupts in a certain timeframe immediately after enabling the
> > interrupt) and automatically fall back to polling in that case? It
> > would also mean that users with fixed firmware wouldn't need to pass a
> > parameter.
>
> I believe Matthew is correct here. I found another system today
> with completely different vendor for both the system and the tpm chip.
> In addition another Lenovo model, the L490, has the issue.
>
> This initial attempt at a solution like Matthew suggested works on
> the system I found today, but I imagine it is all sorts of wrong.
> In the 2 systems where I've seen it, there are about 100000 interrupts
> in around 1.5 seconds, and then the irq code shuts down the interrupt
> because they aren't being handled.
>
>
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 49ae09ac604f..478e9d02a3fa 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -27,6 +27,11 @@
> #include "tpm.h"
> #include "tpm_tis_core.h"
>
> +static unsigned int time_start = 0;
> +static bool storm_check = true;
> +static bool storm_killed = false;
> +static u32 irqs_fired = 0;

Maybe kstat_irqs() would be a better idea than ad hoc stats.

> +
> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
>
> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
> return rc;
> }
>
> -static void disable_interrupts(struct tpm_chip *chip)
> +static void __disable_interrupts(struct tpm_chip *chip)
> {
> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> u32 intmask;
> int rc;
>
> - if (priv->irq == 0)
> - return;
> -
> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
> if (rc < 0)
> intmask = 0;
>
> intmask &= ~TPM_GLOBAL_INT_ENABLE;
> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> + chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> +}
> +
> +static void disable_interrupts(struct tpm_chip *chip)
> +{
> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>
> + if (priv->irq == 0)
> + return;
> +
> + __disable_interrupts(chip);
> devm_free_irq(chip->dev.parent, priv->irq, chip);
> priv->irq = 0;
> - chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> }
>
> /*
> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
> int rc, irq;
> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>
> + if (unlikely(storm_killed)) {
> + devm_free_irq(chip->dev.parent, priv->irq, chip);
> + priv->irq = 0;
> + storm_killed = false;
> + }

OK this kind of bad solution because if tpm_tis_send() is not called,
then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
lock, i.e. you could render out both storm_check and storm_killed.

> +
> if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
> return tpm_tis_send_main(chip, buf, len);
>
> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
> u32 interrupt;
> int i, rc;
>
> + if (storm_check) {
> + irqs_fired++;
> +
> + if (!time_start) {
> + time_start = jiffies_to_msecs(jiffies);
> + } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
> + __disable_interrupts(chip);
> + storm_check = false;
> + storm_killed = true;
> + return IRQ_HANDLED;
> + } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
> + storm_check = false;
> + }
> + }
> +
> rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
> if (rc < 0)
> return IRQ_NONE;
>
>

/Jarkko

2020-11-24 03:48:55

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

On Tue, Nov 24, 2020 at 05:27:30AM +0200, Jarkko Sakkinen wrote:
> On Thu, Nov 19, 2020 at 03:42:35PM +0100, Hans de Goede wrote:
> > Hi,
> >
> > On 11/19/20 7:36 AM, Jerry Snitselaar wrote:
> > >
> > > Matthew Garrett @ 2020-10-15 15:39 MST:
> > >
> > >> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
> > >>>
> > >>> There is a misconfiguration in the bios of the gpio pin used for the
> > >>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> > >>> driver code this results in an interrupt storm. This was initially
> > >>> reported when we attempted to enable the interrupt code in the tpm_tis
> > >>> driver, which previously wasn't setting a flag to enable it. Due to
> > >>> the reports of the interrupt storm that code was reverted and we went back
> > >>> to polling instead of using interrupts. Now that we know the T490s problem
> > >>> is a firmware issue, add code to check if the system is a T490s and
> > >>> disable interrupts if that is the case. This will allow us to enable
> > >>> interrupts for everyone else. If the user has a fixed bios they can
> > >>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> > >>> kernel command line.
> > >>
> > >> I think an implication of this is that systems haven't been
> > >> well-tested with interrupts enabled. In general when we've found a
> > >> firmware issue in one place it ends up happening elsewhere as well, so
> > >> it wouldn't surprise me if there are other machines that will also be
> > >> unhappy with interrupts enabled. Would it be possible to automatically
> > >> detect this case (eg, if we get more than a certain number of
> > >> interrupts in a certain timeframe immediately after enabling the
> > >> interrupt) and automatically fall back to polling in that case? It
> > >> would also mean that users with fixed firmware wouldn't need to pass a
> > >> parameter.
> > >
> > > I believe Matthew is correct here. I found another system today
> > > with completely different vendor for both the system and the tpm chip.
> > > In addition another Lenovo model, the L490, has the issue.
> > >
> > > This initial attempt at a solution like Matthew suggested works on
> > > the system I found today, but I imagine it is all sorts of wrong.
> > > In the 2 systems where I've seen it, there are about 100000 interrupts
> > > in around 1.5 seconds, and then the irq code shuts down the interrupt
> > > because they aren't being handled.
> >
> > Is that with your patch? The IRQ should be silenced as soon as
> > devm_free_irq(chip->dev.parent, priv->irq, chip); is called.
> >
> > Depending on if we can get your storm-detection to work or not,
> > we might also choose to just never try to use the IRQ (at least on
> > x86 systems). AFAIK the TPM is never used for high-throughput stuff
> > so the polling overhead should not be a big deal (and I'm getting the feeling
> > that Windows always polls).
> >
> > Regards,
> >
> > Hans
>
> Yeah, this is what I've been wondering for a while. Why could not we
> just strip off IRQ code? Why does it matter?

And we DO NOT use interrupts in tpm_crb and nobody has ever complained.

/Jarkko

2020-11-24 17:56:29

by Jerry Snitselaar

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s


Jarkko Sakkinen @ 2020-11-23 20:26 MST:

> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
>>
>> Matthew Garrett @ 2020-10-15 15:39 MST:
>>
>> > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
>> >>
>> >> There is a misconfiguration in the bios of the gpio pin used for the
>> >> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>> >> driver code this results in an interrupt storm. This was initially
>> >> reported when we attempted to enable the interrupt code in the tpm_tis
>> >> driver, which previously wasn't setting a flag to enable it. Due to
>> >> the reports of the interrupt storm that code was reverted and we went back
>> >> to polling instead of using interrupts. Now that we know the T490s problem
>> >> is a firmware issue, add code to check if the system is a T490s and
>> >> disable interrupts if that is the case. This will allow us to enable
>> >> interrupts for everyone else. If the user has a fixed bios they can
>> >> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>> >> kernel command line.
>> >
>> > I think an implication of this is that systems haven't been
>> > well-tested with interrupts enabled. In general when we've found a
>> > firmware issue in one place it ends up happening elsewhere as well, so
>> > it wouldn't surprise me if there are other machines that will also be
>> > unhappy with interrupts enabled. Would it be possible to automatically
>> > detect this case (eg, if we get more than a certain number of
>> > interrupts in a certain timeframe immediately after enabling the
>> > interrupt) and automatically fall back to polling in that case? It
>> > would also mean that users with fixed firmware wouldn't need to pass a
>> > parameter.
>>
>> I believe Matthew is correct here. I found another system today
>> with completely different vendor for both the system and the tpm chip.
>> In addition another Lenovo model, the L490, has the issue.
>>
>> This initial attempt at a solution like Matthew suggested works on
>> the system I found today, but I imagine it is all sorts of wrong.
>> In the 2 systems where I've seen it, there are about 100000 interrupts
>> in around 1.5 seconds, and then the irq code shuts down the interrupt
>> because they aren't being handled.
>>
>>
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 49ae09ac604f..478e9d02a3fa 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -27,6 +27,11 @@
>> #include "tpm.h"
>> #include "tpm_tis_core.h"
>>
>> +static unsigned int time_start = 0;
>> +static bool storm_check = true;
>> +static bool storm_killed = false;
>> +static u32 irqs_fired = 0;
>
> Maybe kstat_irqs() would be a better idea than ad hoc stats.
>

Thanks, yes that would be better.

>> +
>> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
>>
>> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
>> return rc;
>> }
>>
>> -static void disable_interrupts(struct tpm_chip *chip)
>> +static void __disable_interrupts(struct tpm_chip *chip)
>> {
>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>> u32 intmask;
>> int rc;
>>
>> - if (priv->irq == 0)
>> - return;
>> -
>> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>> if (rc < 0)
>> intmask = 0;
>>
>> intmask &= ~TPM_GLOBAL_INT_ENABLE;
>> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
>> + chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>> +}
>> +
>> +static void disable_interrupts(struct tpm_chip *chip)
>> +{
>> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>
>> + if (priv->irq == 0)
>> + return;
>> +
>> + __disable_interrupts(chip);
>> devm_free_irq(chip->dev.parent, priv->irq, chip);
>> priv->irq = 0;
>> - chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>> }
>>
>> /*
>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
>> int rc, irq;
>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>
>> + if (unlikely(storm_killed)) {
>> + devm_free_irq(chip->dev.parent, priv->irq, chip);
>> + priv->irq = 0;
>> + storm_killed = false;
>> + }
>
> OK this kind of bad solution because if tpm_tis_send() is not called,
> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
> lock, i.e. you could render out both storm_check and storm_killed.
>

Is there a way to flag it for freeing later while in an interrupt
context? I'm not sure where to clean it up since devm_free_irq can't be
called in tis_int_handler.

Before diving further into that though, does anyone else have an opinion
on ripping out the irq code, and just using polling? We've been only
polling since 2015 anyways.

>> +
>> if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
>> return tpm_tis_send_main(chip, buf, len);
>>
>> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
>> u32 interrupt;
>> int i, rc;
>>
>> + if (storm_check) {
>> + irqs_fired++;
>> +
>> + if (!time_start) {
>> + time_start = jiffies_to_msecs(jiffies);
>> + } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
>> + __disable_interrupts(chip);
>> + storm_check = false;
>> + storm_killed = true;
>> + return IRQ_HANDLED;
>> + } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
>> + storm_check = false;
>> + }
>> + }
>> +
>> rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
>> if (rc < 0)
>> return IRQ_NONE;
>>
>>
>
> /Jarkko

2020-11-24 18:13:09

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

On Tue, 2020-11-24 at 10:52 -0700, Jerry Snitselaar wrote:
> Before diving further into that though, does anyone else have an
> opinion on ripping out the irq code, and just using polling? We've
> been only polling since 2015 anyways.

Well only a biased one, obviously: polling causes large amounts of busy
waiting, which is a waste of CPU resources and does increase the time
it takes us to do TPM operations ... not a concern if you're doing long
computation ones, like signatures, but it is a problem for short
operations like bulk updates of PCRs. The other potential issue, as we
saw with atmel is that if you prod the chip too often (which you have
to do with polling) you risk upsetting it. We've spent ages trying to
tune the polling parameters to balance reduction of busy wait with chip
upset and still, apparently, not quite got it right. If the TPM has a
functioning IRQ then it gets us out of the whole polling mess entirely.
The big question is how many chips that report an IRQ actually have a
malfunctioning one?

James



2020-11-24 19:46:20

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

On Thu, Nov 19, 2020 at 03:42:35PM +0100, Hans de Goede wrote:
> Hi,
>
> On 11/19/20 7:36 AM, Jerry Snitselaar wrote:
> >
> > Matthew Garrett @ 2020-10-15 15:39 MST:
> >
> >> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
> >>>
> >>> There is a misconfiguration in the bios of the gpio pin used for the
> >>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> >>> driver code this results in an interrupt storm. This was initially
> >>> reported when we attempted to enable the interrupt code in the tpm_tis
> >>> driver, which previously wasn't setting a flag to enable it. Due to
> >>> the reports of the interrupt storm that code was reverted and we went back
> >>> to polling instead of using interrupts. Now that we know the T490s problem
> >>> is a firmware issue, add code to check if the system is a T490s and
> >>> disable interrupts if that is the case. This will allow us to enable
> >>> interrupts for everyone else. If the user has a fixed bios they can
> >>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> >>> kernel command line.
> >>
> >> I think an implication of this is that systems haven't been
> >> well-tested with interrupts enabled. In general when we've found a
> >> firmware issue in one place it ends up happening elsewhere as well, so
> >> it wouldn't surprise me if there are other machines that will also be
> >> unhappy with interrupts enabled. Would it be possible to automatically
> >> detect this case (eg, if we get more than a certain number of
> >> interrupts in a certain timeframe immediately after enabling the
> >> interrupt) and automatically fall back to polling in that case? It
> >> would also mean that users with fixed firmware wouldn't need to pass a
> >> parameter.
> >
> > I believe Matthew is correct here. I found another system today
> > with completely different vendor for both the system and the tpm chip.
> > In addition another Lenovo model, the L490, has the issue.
> >
> > This initial attempt at a solution like Matthew suggested works on
> > the system I found today, but I imagine it is all sorts of wrong.
> > In the 2 systems where I've seen it, there are about 100000 interrupts
> > in around 1.5 seconds, and then the irq code shuts down the interrupt
> > because they aren't being handled.
>
> Is that with your patch? The IRQ should be silenced as soon as
> devm_free_irq(chip->dev.parent, priv->irq, chip); is called.
>
> Depending on if we can get your storm-detection to work or not,
> we might also choose to just never try to use the IRQ (at least on
> x86 systems). AFAIK the TPM is never used for high-throughput stuff
> so the polling overhead should not be a big deal (and I'm getting the feeling
> that Windows always polls).
>
> Regards,
>
> Hans

Yeah, this is what I've been wondering for a while. Why could not we
just strip off IRQ code? Why does it matter?

/Jarkko

2020-11-25 00:01:44

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

Hi,

On 11/24/20 6:52 PM, Jerry Snitselaar wrote:
>
> Jarkko Sakkinen @ 2020-11-23 20:26 MST:
>
>> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
>>>
>>> Matthew Garrett @ 2020-10-15 15:39 MST:
>>>
>>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
>>>>>
>>>>> There is a misconfiguration in the bios of the gpio pin used for the
>>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>>>>> driver code this results in an interrupt storm. This was initially
>>>>> reported when we attempted to enable the interrupt code in the tpm_tis
>>>>> driver, which previously wasn't setting a flag to enable it. Due to
>>>>> the reports of the interrupt storm that code was reverted and we went back
>>>>> to polling instead of using interrupts. Now that we know the T490s problem
>>>>> is a firmware issue, add code to check if the system is a T490s and
>>>>> disable interrupts if that is the case. This will allow us to enable
>>>>> interrupts for everyone else. If the user has a fixed bios they can
>>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>>>>> kernel command line.
>>>>
>>>> I think an implication of this is that systems haven't been
>>>> well-tested with interrupts enabled. In general when we've found a
>>>> firmware issue in one place it ends up happening elsewhere as well, so
>>>> it wouldn't surprise me if there are other machines that will also be
>>>> unhappy with interrupts enabled. Would it be possible to automatically
>>>> detect this case (eg, if we get more than a certain number of
>>>> interrupts in a certain timeframe immediately after enabling the
>>>> interrupt) and automatically fall back to polling in that case? It
>>>> would also mean that users with fixed firmware wouldn't need to pass a
>>>> parameter.
>>>
>>> I believe Matthew is correct here. I found another system today
>>> with completely different vendor for both the system and the tpm chip.
>>> In addition another Lenovo model, the L490, has the issue.
>>>
>>> This initial attempt at a solution like Matthew suggested works on
>>> the system I found today, but I imagine it is all sorts of wrong.
>>> In the 2 systems where I've seen it, there are about 100000 interrupts
>>> in around 1.5 seconds, and then the irq code shuts down the interrupt
>>> because they aren't being handled.
>>>
>>>
>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>> index 49ae09ac604f..478e9d02a3fa 100644
>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>> @@ -27,6 +27,11 @@
>>> #include "tpm.h"
>>> #include "tpm_tis_core.h"
>>>
>>> +static unsigned int time_start = 0;
>>> +static bool storm_check = true;
>>> +static bool storm_killed = false;
>>> +static u32 irqs_fired = 0;
>>
>> Maybe kstat_irqs() would be a better idea than ad hoc stats.
>>
>
> Thanks, yes that would be better.
>
>>> +
>>> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
>>>
>>> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
>>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
>>> return rc;
>>> }
>>>
>>> -static void disable_interrupts(struct tpm_chip *chip)
>>> +static void __disable_interrupts(struct tpm_chip *chip)
>>> {
>>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>> u32 intmask;
>>> int rc;
>>>
>>> - if (priv->irq == 0)
>>> - return;
>>> -
>>> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>>> if (rc < 0)
>>> intmask = 0;
>>>
>>> intmask &= ~TPM_GLOBAL_INT_ENABLE;
>>> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
>>> + chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>>> +}
>>> +
>>> +static void disable_interrupts(struct tpm_chip *chip)
>>> +{
>>> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>>
>>> + if (priv->irq == 0)
>>> + return;
>>> +
>>> + __disable_interrupts(chip);
>>> devm_free_irq(chip->dev.parent, priv->irq, chip);
>>> priv->irq = 0;
>>> - chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>>> }
>>>
>>> /*
>>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
>>> int rc, irq;
>>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>>
>>> + if (unlikely(storm_killed)) {
>>> + devm_free_irq(chip->dev.parent, priv->irq, chip);
>>> + priv->irq = 0;
>>> + storm_killed = false;
>>> + }
>>
>> OK this kind of bad solution because if tpm_tis_send() is not called,
>> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
>> lock, i.e. you could render out both storm_check and storm_killed.
>>
>
> Is there a way to flag it for freeing later while in an interrupt
> context? I'm not sure where to clean it up since devm_free_irq can't be
> called in tis_int_handler.

You could add a workqueue work-struct just for this and queue that up
to do the free when you detect the storm. That will then run pretty much
immediately, avoiding the storm going on for (much) longer.

> Before diving further into that though, does anyone else have an opinion
> on ripping out the irq code, and just using polling? We've been only
> polling since 2015 anyways.

Given James Bottomley's reply I guess it would be worthwhile to get the
storm detection to work.

Regards,

Hans


>
>>> +
>>> if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
>>> return tpm_tis_send_main(chip, buf, len);
>>>
>>> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
>>> u32 interrupt;
>>> int i, rc;
>>>
>>> + if (storm_check) {
>>> + irqs_fired++;
>>> +
>>> + if (!time_start) {
>>> + time_start = jiffies_to_msecs(jiffies);
>>> + } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
>>> + __disable_interrupts(chip);
>>> + storm_check = false;
>>> + storm_killed = true;
>>> + return IRQ_HANDLED;
>>> + } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
>>> + storm_check = false;
>>> + }
>>> + }
>>> +
>>> rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
>>> if (rc < 0)
>>> return IRQ_NONE;
>>>
>>>
>>
>> /Jarkko
>

2020-11-29 03:26:47

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

On Tue, Nov 24, 2020 at 10:52:56AM -0700, Jerry Snitselaar wrote:
>
> Jarkko Sakkinen @ 2020-11-23 20:26 MST:
>
> > On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
> >>
> >> Matthew Garrett @ 2020-10-15 15:39 MST:
> >>
> >> > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
> >> >>
> >> >> There is a misconfiguration in the bios of the gpio pin used for the
> >> >> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> >> >> driver code this results in an interrupt storm. This was initially
> >> >> reported when we attempted to enable the interrupt code in the tpm_tis
> >> >> driver, which previously wasn't setting a flag to enable it. Due to
> >> >> the reports of the interrupt storm that code was reverted and we went back
> >> >> to polling instead of using interrupts. Now that we know the T490s problem
> >> >> is a firmware issue, add code to check if the system is a T490s and
> >> >> disable interrupts if that is the case. This will allow us to enable
> >> >> interrupts for everyone else. If the user has a fixed bios they can
> >> >> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> >> >> kernel command line.
> >> >
> >> > I think an implication of this is that systems haven't been
> >> > well-tested with interrupts enabled. In general when we've found a
> >> > firmware issue in one place it ends up happening elsewhere as well, so
> >> > it wouldn't surprise me if there are other machines that will also be
> >> > unhappy with interrupts enabled. Would it be possible to automatically
> >> > detect this case (eg, if we get more than a certain number of
> >> > interrupts in a certain timeframe immediately after enabling the
> >> > interrupt) and automatically fall back to polling in that case? It
> >> > would also mean that users with fixed firmware wouldn't need to pass a
> >> > parameter.
> >>
> >> I believe Matthew is correct here. I found another system today
> >> with completely different vendor for both the system and the tpm chip.
> >> In addition another Lenovo model, the L490, has the issue.
> >>
> >> This initial attempt at a solution like Matthew suggested works on
> >> the system I found today, but I imagine it is all sorts of wrong.
> >> In the 2 systems where I've seen it, there are about 100000 interrupts
> >> in around 1.5 seconds, and then the irq code shuts down the interrupt
> >> because they aren't being handled.
> >>
> >>
> >> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> >> index 49ae09ac604f..478e9d02a3fa 100644
> >> --- a/drivers/char/tpm/tpm_tis_core.c
> >> +++ b/drivers/char/tpm/tpm_tis_core.c
> >> @@ -27,6 +27,11 @@
> >> #include "tpm.h"
> >> #include "tpm_tis_core.h"
> >>
> >> +static unsigned int time_start = 0;
> >> +static bool storm_check = true;
> >> +static bool storm_killed = false;
> >> +static u32 irqs_fired = 0;
> >
> > Maybe kstat_irqs() would be a better idea than ad hoc stats.
> >
>
> Thanks, yes that would be better.
>
> >> +
> >> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
> >>
> >> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> >> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
> >> return rc;
> >> }
> >>
> >> -static void disable_interrupts(struct tpm_chip *chip)
> >> +static void __disable_interrupts(struct tpm_chip *chip)
> >> {
> >> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >> u32 intmask;
> >> int rc;
> >>
> >> - if (priv->irq == 0)
> >> - return;
> >> -
> >> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
> >> if (rc < 0)
> >> intmask = 0;
> >>
> >> intmask &= ~TPM_GLOBAL_INT_ENABLE;
> >> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> >> + chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >> +}
> >> +
> >> +static void disable_interrupts(struct tpm_chip *chip)
> >> +{
> >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>
> >> + if (priv->irq == 0)
> >> + return;
> >> +
> >> + __disable_interrupts(chip);
> >> devm_free_irq(chip->dev.parent, priv->irq, chip);
> >> priv->irq = 0;
> >> - chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >> }
> >>
> >> /*
> >> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
> >> int rc, irq;
> >> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>
> >> + if (unlikely(storm_killed)) {
> >> + devm_free_irq(chip->dev.parent, priv->irq, chip);
> >> + priv->irq = 0;
> >> + storm_killed = false;
> >> + }
> >
> > OK this kind of bad solution because if tpm_tis_send() is not called,
> > then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
> > lock, i.e. you could render out both storm_check and storm_killed.
> >
>
> Is there a way to flag it for freeing later while in an interrupt
> context? I'm not sure where to clean it up since devm_free_irq can't be
> called in tis_int_handler.
>
> Before diving further into that though, does anyone else have an opinion
> on ripping out the irq code, and just using polling? We've been only
> polling since 2015 anyways.

Given these all these issues, it's quite obvious that Windows side is
just polling. I'll ack a patch that removes the IRQ code for sure.

/Jarkko

2020-11-29 03:29:15

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

On Tue, Nov 24, 2020 at 10:10:21AM -0800, James Bottomley wrote:
> On Tue, 2020-11-24 at 10:52 -0700, Jerry Snitselaar wrote:
> > Before diving further into that though, does anyone else have an
> > opinion on ripping out the irq code, and just using polling? We've
> > been only polling since 2015 anyways.
>
> Well only a biased one, obviously: polling causes large amounts of busy
> waiting, which is a waste of CPU resources and does increase the time
> it takes us to do TPM operations ... not a concern if you're doing long
> computation ones, like signatures, but it is a problem for short
> operations like bulk updates of PCRs. The other potential issue, as we
> saw with atmel is that if you prod the chip too often (which you have
> to do with polling) you risk upsetting it. We've spent ages trying to
> tune the polling parameters to balance reduction of busy wait with chip
> upset and still, apparently, not quite got it right. If the TPM has a
> functioning IRQ then it gets us out of the whole polling mess entirely.
> The big question is how many chips that report an IRQ actually have a
> malfunctioning one?
>
> James

Do we have a way to know is Windows TPM code using IRQ's?

/Jarkko

2020-11-29 03:30:10

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

On Tue, Nov 24, 2020 at 10:45:01PM +0100, Hans de Goede wrote:
> Hi,
>
> On 11/24/20 6:52 PM, Jerry Snitselaar wrote:
> >
> > Jarkko Sakkinen @ 2020-11-23 20:26 MST:
> >
> >> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
> >>>
> >>> Matthew Garrett @ 2020-10-15 15:39 MST:
> >>>
> >>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
> >>>>>
> >>>>> There is a misconfiguration in the bios of the gpio pin used for the
> >>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> >>>>> driver code this results in an interrupt storm. This was initially
> >>>>> reported when we attempted to enable the interrupt code in the tpm_tis
> >>>>> driver, which previously wasn't setting a flag to enable it. Due to
> >>>>> the reports of the interrupt storm that code was reverted and we went back
> >>>>> to polling instead of using interrupts. Now that we know the T490s problem
> >>>>> is a firmware issue, add code to check if the system is a T490s and
> >>>>> disable interrupts if that is the case. This will allow us to enable
> >>>>> interrupts for everyone else. If the user has a fixed bios they can
> >>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> >>>>> kernel command line.
> >>>>
> >>>> I think an implication of this is that systems haven't been
> >>>> well-tested with interrupts enabled. In general when we've found a
> >>>> firmware issue in one place it ends up happening elsewhere as well, so
> >>>> it wouldn't surprise me if there are other machines that will also be
> >>>> unhappy with interrupts enabled. Would it be possible to automatically
> >>>> detect this case (eg, if we get more than a certain number of
> >>>> interrupts in a certain timeframe immediately after enabling the
> >>>> interrupt) and automatically fall back to polling in that case? It
> >>>> would also mean that users with fixed firmware wouldn't need to pass a
> >>>> parameter.
> >>>
> >>> I believe Matthew is correct here. I found another system today
> >>> with completely different vendor for both the system and the tpm chip.
> >>> In addition another Lenovo model, the L490, has the issue.
> >>>
> >>> This initial attempt at a solution like Matthew suggested works on
> >>> the system I found today, but I imagine it is all sorts of wrong.
> >>> In the 2 systems where I've seen it, there are about 100000 interrupts
> >>> in around 1.5 seconds, and then the irq code shuts down the interrupt
> >>> because they aren't being handled.
> >>>
> >>>
> >>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> >>> index 49ae09ac604f..478e9d02a3fa 100644
> >>> --- a/drivers/char/tpm/tpm_tis_core.c
> >>> +++ b/drivers/char/tpm/tpm_tis_core.c
> >>> @@ -27,6 +27,11 @@
> >>> #include "tpm.h"
> >>> #include "tpm_tis_core.h"
> >>>
> >>> +static unsigned int time_start = 0;
> >>> +static bool storm_check = true;
> >>> +static bool storm_killed = false;
> >>> +static u32 irqs_fired = 0;
> >>
> >> Maybe kstat_irqs() would be a better idea than ad hoc stats.
> >>
> >
> > Thanks, yes that would be better.
> >
> >>> +
> >>> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
> >>>
> >>> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> >>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
> >>> return rc;
> >>> }
> >>>
> >>> -static void disable_interrupts(struct tpm_chip *chip)
> >>> +static void __disable_interrupts(struct tpm_chip *chip)
> >>> {
> >>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>> u32 intmask;
> >>> int rc;
> >>>
> >>> - if (priv->irq == 0)
> >>> - return;
> >>> -
> >>> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
> >>> if (rc < 0)
> >>> intmask = 0;
> >>>
> >>> intmask &= ~TPM_GLOBAL_INT_ENABLE;
> >>> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> >>> + chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>> +}
> >>> +
> >>> +static void disable_interrupts(struct tpm_chip *chip)
> >>> +{
> >>> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>>
> >>> + if (priv->irq == 0)
> >>> + return;
> >>> +
> >>> + __disable_interrupts(chip);
> >>> devm_free_irq(chip->dev.parent, priv->irq, chip);
> >>> priv->irq = 0;
> >>> - chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>> }
> >>>
> >>> /*
> >>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
> >>> int rc, irq;
> >>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>>
> >>> + if (unlikely(storm_killed)) {
> >>> + devm_free_irq(chip->dev.parent, priv->irq, chip);
> >>> + priv->irq = 0;
> >>> + storm_killed = false;
> >>> + }
> >>
> >> OK this kind of bad solution because if tpm_tis_send() is not called,
> >> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
> >> lock, i.e. you could render out both storm_check and storm_killed.
> >>
> >
> > Is there a way to flag it for freeing later while in an interrupt
> > context? I'm not sure where to clean it up since devm_free_irq can't be
> > called in tis_int_handler.
>
> You could add a workqueue work-struct just for this and queue that up
> to do the free when you detect the storm. That will then run pretty much
> immediately, avoiding the storm going on for (much) longer.

That's sounds feasible.

> > Before diving further into that though, does anyone else have an opinion
> > on ripping out the irq code, and just using polling? We've been only
> > polling since 2015 anyways.
>
> Given James Bottomley's reply I guess it would be worthwhile to get the
> storm detection to work.

OK, agreed. I take my words back from a response few minutes ago :-)

> Regards,
>
> Hans

/Jarkko

2020-11-29 11:38:05

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

Hi All,

On 11/29/20 4:23 AM, Jarkko Sakkinen wrote:
> On Tue, Nov 24, 2020 at 10:45:01PM +0100, Hans de Goede wrote:
>> Hi,
>>
>> On 11/24/20 6:52 PM, Jerry Snitselaar wrote:
>>>
>>> Jarkko Sakkinen @ 2020-11-23 20:26 MST:
>>>
>>>> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
>>>>>
>>>>> Matthew Garrett @ 2020-10-15 15:39 MST:
>>>>>
>>>>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
>>>>>>>
>>>>>>> There is a misconfiguration in the bios of the gpio pin used for the
>>>>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>>>>>>> driver code this results in an interrupt storm. This was initially
>>>>>>> reported when we attempted to enable the interrupt code in the tpm_tis
>>>>>>> driver, which previously wasn't setting a flag to enable it. Due to
>>>>>>> the reports of the interrupt storm that code was reverted and we went back
>>>>>>> to polling instead of using interrupts. Now that we know the T490s problem
>>>>>>> is a firmware issue, add code to check if the system is a T490s and
>>>>>>> disable interrupts if that is the case. This will allow us to enable
>>>>>>> interrupts for everyone else. If the user has a fixed bios they can
>>>>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>>>>>>> kernel command line.
>>>>>>
>>>>>> I think an implication of this is that systems haven't been
>>>>>> well-tested with interrupts enabled. In general when we've found a
>>>>>> firmware issue in one place it ends up happening elsewhere as well, so
>>>>>> it wouldn't surprise me if there are other machines that will also be
>>>>>> unhappy with interrupts enabled. Would it be possible to automatically
>>>>>> detect this case (eg, if we get more than a certain number of
>>>>>> interrupts in a certain timeframe immediately after enabling the
>>>>>> interrupt) and automatically fall back to polling in that case? It
>>>>>> would also mean that users with fixed firmware wouldn't need to pass a
>>>>>> parameter.
>>>>>
>>>>> I believe Matthew is correct here. I found another system today
>>>>> with completely different vendor for both the system and the tpm chip.
>>>>> In addition another Lenovo model, the L490, has the issue.
>>>>>
>>>>> This initial attempt at a solution like Matthew suggested works on
>>>>> the system I found today, but I imagine it is all sorts of wrong.
>>>>> In the 2 systems where I've seen it, there are about 100000 interrupts
>>>>> in around 1.5 seconds, and then the irq code shuts down the interrupt
>>>>> because they aren't being handled.
>>>>>
>>>>>
>>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>>>> index 49ae09ac604f..478e9d02a3fa 100644
>>>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>>>> @@ -27,6 +27,11 @@
>>>>> #include "tpm.h"
>>>>> #include "tpm_tis_core.h"
>>>>>
>>>>> +static unsigned int time_start = 0;
>>>>> +static bool storm_check = true;
>>>>> +static bool storm_killed = false;
>>>>> +static u32 irqs_fired = 0;
>>>>
>>>> Maybe kstat_irqs() would be a better idea than ad hoc stats.
>>>>
>>>
>>> Thanks, yes that would be better.
>>>
>>>>> +
>>>>> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
>>>>>
>>>>> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
>>>>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
>>>>> return rc;
>>>>> }
>>>>>
>>>>> -static void disable_interrupts(struct tpm_chip *chip)
>>>>> +static void __disable_interrupts(struct tpm_chip *chip)
>>>>> {
>>>>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>>>> u32 intmask;
>>>>> int rc;
>>>>>
>>>>> - if (priv->irq == 0)
>>>>> - return;
>>>>> -
>>>>> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>>>>> if (rc < 0)
>>>>> intmask = 0;
>>>>>
>>>>> intmask &= ~TPM_GLOBAL_INT_ENABLE;
>>>>> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
>>>>> + chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>>>>> +}
>>>>> +
>>>>> +static void disable_interrupts(struct tpm_chip *chip)
>>>>> +{
>>>>> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>>>>
>>>>> + if (priv->irq == 0)
>>>>> + return;
>>>>> +
>>>>> + __disable_interrupts(chip);
>>>>> devm_free_irq(chip->dev.parent, priv->irq, chip);
>>>>> priv->irq = 0;
>>>>> - chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>>>>> }
>>>>>
>>>>> /*
>>>>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
>>>>> int rc, irq;
>>>>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>>>>
>>>>> + if (unlikely(storm_killed)) {
>>>>> + devm_free_irq(chip->dev.parent, priv->irq, chip);
>>>>> + priv->irq = 0;
>>>>> + storm_killed = false;
>>>>> + }
>>>>
>>>> OK this kind of bad solution because if tpm_tis_send() is not called,
>>>> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
>>>> lock, i.e. you could render out both storm_check and storm_killed.
>>>>
>>>
>>> Is there a way to flag it for freeing later while in an interrupt
>>> context? I'm not sure where to clean it up since devm_free_irq can't be
>>> called in tis_int_handler.
>>
>> You could add a workqueue work-struct just for this and queue that up
>> to do the free when you detect the storm. That will then run pretty much
>> immediately, avoiding the storm going on for (much) longer.
>
> That's sounds feasible.
>
>>> Before diving further into that though, does anyone else have an opinion
>>> on ripping out the irq code, and just using polling? We've been only
>>> polling since 2015 anyways.
>>
>> Given James Bottomley's reply I guess it would be worthwhile to get the
>> storm detection to work.
>
> OK, agreed. I take my words back from a response few minutes ago :-)

:)

To be clear, I think we should give the storm detection a go. Especially
given the problems which James has seen with polling on some TPMs.

But if that turns out to not be feasible I agree we should just either
disable IRQs by default on standard x86 platforms, or just remove the
IRQ support all together.

Regards,

Hans

2020-12-02 16:12:15

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

On Sun, Nov 29, 2020 at 12:34:34PM +0100, Hans de Goede wrote:
> Hi All,
>
> On 11/29/20 4:23 AM, Jarkko Sakkinen wrote:
> > On Tue, Nov 24, 2020 at 10:45:01PM +0100, Hans de Goede wrote:
> >> Hi,
> >>
> >> On 11/24/20 6:52 PM, Jerry Snitselaar wrote:
> >>>
> >>> Jarkko Sakkinen @ 2020-11-23 20:26 MST:
> >>>
> >>>> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
> >>>>>
> >>>>> Matthew Garrett @ 2020-10-15 15:39 MST:
> >>>>>
> >>>>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <[email protected]> wrote:
> >>>>>>>
> >>>>>>> There is a misconfiguration in the bios of the gpio pin used for the
> >>>>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> >>>>>>> driver code this results in an interrupt storm. This was initially
> >>>>>>> reported when we attempted to enable the interrupt code in the tpm_tis
> >>>>>>> driver, which previously wasn't setting a flag to enable it. Due to
> >>>>>>> the reports of the interrupt storm that code was reverted and we went back
> >>>>>>> to polling instead of using interrupts. Now that we know the T490s problem
> >>>>>>> is a firmware issue, add code to check if the system is a T490s and
> >>>>>>> disable interrupts if that is the case. This will allow us to enable
> >>>>>>> interrupts for everyone else. If the user has a fixed bios they can
> >>>>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> >>>>>>> kernel command line.
> >>>>>>
> >>>>>> I think an implication of this is that systems haven't been
> >>>>>> well-tested with interrupts enabled. In general when we've found a
> >>>>>> firmware issue in one place it ends up happening elsewhere as well, so
> >>>>>> it wouldn't surprise me if there are other machines that will also be
> >>>>>> unhappy with interrupts enabled. Would it be possible to automatically
> >>>>>> detect this case (eg, if we get more than a certain number of
> >>>>>> interrupts in a certain timeframe immediately after enabling the
> >>>>>> interrupt) and automatically fall back to polling in that case? It
> >>>>>> would also mean that users with fixed firmware wouldn't need to pass a
> >>>>>> parameter.
> >>>>>
> >>>>> I believe Matthew is correct here. I found another system today
> >>>>> with completely different vendor for both the system and the tpm chip.
> >>>>> In addition another Lenovo model, the L490, has the issue.
> >>>>>
> >>>>> This initial attempt at a solution like Matthew suggested works on
> >>>>> the system I found today, but I imagine it is all sorts of wrong.
> >>>>> In the 2 systems where I've seen it, there are about 100000 interrupts
> >>>>> in around 1.5 seconds, and then the irq code shuts down the interrupt
> >>>>> because they aren't being handled.
> >>>>>
> >>>>>
> >>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> >>>>> index 49ae09ac604f..478e9d02a3fa 100644
> >>>>> --- a/drivers/char/tpm/tpm_tis_core.c
> >>>>> +++ b/drivers/char/tpm/tpm_tis_core.c
> >>>>> @@ -27,6 +27,11 @@
> >>>>> #include "tpm.h"
> >>>>> #include "tpm_tis_core.h"
> >>>>>
> >>>>> +static unsigned int time_start = 0;
> >>>>> +static bool storm_check = true;
> >>>>> +static bool storm_killed = false;
> >>>>> +static u32 irqs_fired = 0;
> >>>>
> >>>> Maybe kstat_irqs() would be a better idea than ad hoc stats.
> >>>>
> >>>
> >>> Thanks, yes that would be better.
> >>>
> >>>>> +
> >>>>> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
> >>>>>
> >>>>> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> >>>>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
> >>>>> return rc;
> >>>>> }
> >>>>>
> >>>>> -static void disable_interrupts(struct tpm_chip *chip)
> >>>>> +static void __disable_interrupts(struct tpm_chip *chip)
> >>>>> {
> >>>>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>>>> u32 intmask;
> >>>>> int rc;
> >>>>>
> >>>>> - if (priv->irq == 0)
> >>>>> - return;
> >>>>> -
> >>>>> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
> >>>>> if (rc < 0)
> >>>>> intmask = 0;
> >>>>>
> >>>>> intmask &= ~TPM_GLOBAL_INT_ENABLE;
> >>>>> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> >>>>> + chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>>>> +}
> >>>>> +
> >>>>> +static void disable_interrupts(struct tpm_chip *chip)
> >>>>> +{
> >>>>> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>>>>
> >>>>> + if (priv->irq == 0)
> >>>>> + return;
> >>>>> +
> >>>>> + __disable_interrupts(chip);
> >>>>> devm_free_irq(chip->dev.parent, priv->irq, chip);
> >>>>> priv->irq = 0;
> >>>>> - chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>>>> }
> >>>>>
> >>>>> /*
> >>>>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
> >>>>> int rc, irq;
> >>>>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>>>>
> >>>>> + if (unlikely(storm_killed)) {
> >>>>> + devm_free_irq(chip->dev.parent, priv->irq, chip);
> >>>>> + priv->irq = 0;
> >>>>> + storm_killed = false;
> >>>>> + }
> >>>>
> >>>> OK this kind of bad solution because if tpm_tis_send() is not called,
> >>>> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
> >>>> lock, i.e. you could render out both storm_check and storm_killed.
> >>>>
> >>>
> >>> Is there a way to flag it for freeing later while in an interrupt
> >>> context? I'm not sure where to clean it up since devm_free_irq can't be
> >>> called in tis_int_handler.
> >>
> >> You could add a workqueue work-struct just for this and queue that up
> >> to do the free when you detect the storm. That will then run pretty much
> >> immediately, avoiding the storm going on for (much) longer.
> >
> > That's sounds feasible.
> >
> >>> Before diving further into that though, does anyone else have an opinion
> >>> on ripping out the irq code, and just using polling? We've been only
> >>> polling since 2015 anyways.
> >>
> >> Given James Bottomley's reply I guess it would be worthwhile to get the
> >> storm detection to work.
> >
> > OK, agreed. I take my words back from a response few minutes ago :-)
>
> :)
>
> To be clear, I think we should give the storm detection a go. Especially
> given the problems which James has seen with polling on some TPMs.
>
> But if that turns out to not be feasible I agree we should just either
> disable IRQs by default on standard x86 platforms, or just remove the
> IRQ support all together.

Just for completeness: one option is also to whitelist IRQ's.

> Regards,
>
> Hans

/Jarkko