2017-08-04 21:56:57

by Haris Okanovic

[permalink] [raw]
Subject: [PATCH] [RFC] tpm_tis: tpm_tcg_flush() after iowrite*()s

I have a latency issue using a SPI-based TPM chip with tpm_tis driver
from non-rt usermode application, which induces ~400 us latency spikes
in cyclictest (Intel Atom E3940 system, PREEMPT_RT_FULL kernel).

The spikes are caused by a stalling ioread8() operation, following a
sequence of 30+ iowrite8()s to the same address. I believe this happens
because the writes are cached (in cpu or somewhere along the bus), which
gets flushed on the first LOAD instruction (ioread*()) that follows.

The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation.

I believe this works because it amortize the cost of flushing data to
chip across multiple instructions. However, I don't have any direct
evidence to support this theory.

Does this seem like a reasonable theory?

Any feedback on the change (a better way to do it, perhaps)?

Thanks,
Haris Okanovic

https://github.com/harisokanovic/linux/tree/dev/hokanovi/tpm-latency-spike-fix-rfc
---
drivers/char/tpm/tpm_tis.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index c7e1384f1b08..5cdbfec0ad67 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -89,6 +89,19 @@ static inline int is_itpm(struct acpi_device *dev)
}
#endif

+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * Flushes previous iowrite*() operations to chip so that a subsequent
+ * ioread*() won't stall a cpu.
+ */
+static void tpm_tcg_flush(struct tpm_tis_tcg_phy *phy)
+{
+ ioread8(phy->iobase + TPM_ACCESS(0));
+}
+#else
+#define tpm_tcg_flush do { } while(0)
+#endif
+
static int tpm_tcg_read_bytes(struct tpm_tis_data *data, u32 addr, u16 len,
u8 *result)
{
@@ -104,8 +117,10 @@ static int tpm_tcg_write_bytes(struct tpm_tis_data *data, u32 addr, u16 len,
{
struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data);

- while (len--)
+ while (len--) {
iowrite8(*value++, phy->iobase + addr);
+ tpm_tcg_flush(phy);
+ }
return 0;
}

@@ -130,6 +145,7 @@ static int tpm_tcg_write32(struct tpm_tis_data *data, u32 addr, u32 value)
struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data);

iowrite32(value, phy->iobase + addr);
+ tpm_tcg_flush(phy);
return 0;
}

--
2.13.2


2017-08-07 14:59:39

by Julia Cartwright

[permalink] [raw]
Subject: Re: [PATCH] [RFC] tpm_tis: tpm_tcg_flush() after iowrite*()s

On Fri, Aug 04, 2017 at 04:56:51PM -0500, Haris Okanovic wrote:
> I have a latency issue using a SPI-based TPM chip with tpm_tis driver
> from non-rt usermode application, which induces ~400 us latency spikes
> in cyclictest (Intel Atom E3940 system, PREEMPT_RT_FULL kernel).
>
> The spikes are caused by a stalling ioread8() operation, following a
> sequence of 30+ iowrite8()s to the same address. I believe this happens
> because the writes are cached (in cpu or somewhere along the bus), which
> gets flushed on the first LOAD instruction (ioread*()) that follows.

To use the ARM parlance, these accesses aren't "cached" (which would
imply that a result could be returned to the load from any intermediate
node in the interconnect), but instead are "bufferable".

It is really unfortunate that we continue to run into this class of
problem across various CPU vendors and various underlying bus
technologies; it's the continuing curse of running an PREEMPT_RT on
commodity hardware. RT is not easy :)

> The enclosed change appears to fix this issue: read the TPM chip's
> access register (status code) after every iowrite*() operation.

Are we engaged in a game of wack-a-mole with all of the drivers which
use this same access pattern (of which I imagine there are quite a
few!)?

I'm wondering if we should explore the idea of adding a load in the
iowriteN()/writeX() macros (marking those accesses in which reads cause
side effects explicitly, redirecting to a _raw() variant or something).

Obviously that would be expensive for non-RT use cases, but for helping
constrain latency, it may be worth it for RT.

Julia


Attachments:
(No filename) (1.59 kB)
signature.asc (488.00 B)
Download all attachments

2017-08-08 21:58:57

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] [RFC] tpm_tis: tpm_tcg_flush() after iowrite*()s

On Mon, Aug 07, 2017 at 09:59:35AM -0500, Julia Cartwright wrote:
> On Fri, Aug 04, 2017 at 04:56:51PM -0500, Haris Okanovic wrote:
> > I have a latency issue using a SPI-based TPM chip with tpm_tis driver
> > from non-rt usermode application, which induces ~400 us latency spikes
> > in cyclictest (Intel Atom E3940 system, PREEMPT_RT_FULL kernel).
> >
> > The spikes are caused by a stalling ioread8() operation, following a
> > sequence of 30+ iowrite8()s to the same address. I believe this happens
> > because the writes are cached (in cpu or somewhere along the bus), which
> > gets flushed on the first LOAD instruction (ioread*()) that follows.
>
> To use the ARM parlance, these accesses aren't "cached" (which would
> imply that a result could be returned to the load from any intermediate
> node in the interconnect), but instead are "bufferable".
>
> It is really unfortunate that we continue to run into this class of
> problem across various CPU vendors and various underlying bus
> technologies; it's the continuing curse of running an PREEMPT_RT on
> commodity hardware. RT is not easy :)
>
> > The enclosed change appears to fix this issue: read the TPM chip's
> > access register (status code) after every iowrite*() operation.
>
> Are we engaged in a game of wack-a-mole with all of the drivers which
> use this same access pattern (of which I imagine there are quite a
> few!)?
>
> I'm wondering if we should explore the idea of adding a load in the
> iowriteN()/writeX() macros (marking those accesses in which reads cause
> side effects explicitly, redirecting to a _raw() variant or something).
>
> Obviously that would be expensive for non-RT use cases, but for helping
> constrain latency, it may be worth it for RT.
>
> Julia

What if we as quick resort we add tpm_tis_iowrite8() to the TPM driver.
Would be easy to move to iowrite8() if the problem is sorted out there
later on.

/Jarkko

2017-08-14 22:52:49

by Haris Okanovic

[permalink] [raw]
Subject: Re: [PATCH] [RFC] tpm_tis: tpm_tcg_flush() after iowrite*()s



On 08/07/2017 09:59 AM, Julia Cartwright wrote:
> On Fri, Aug 04, 2017 at 04:56:51PM -0500, Haris Okanovic wrote:
>> I have a latency issue using a SPI-based TPM chip with tpm_tis driver
>> from non-rt usermode application, which induces ~400 us latency spikes
>> in cyclictest (Intel Atom E3940 system, PREEMPT_RT_FULL kernel).
>>
>> The spikes are caused by a stalling ioread8() operation, following a
>> sequence of 30+ iowrite8()s to the same address. I believe this happens
>> because the writes are cached (in cpu or somewhere along the bus), which
>> gets flushed on the first LOAD instruction (ioread*()) that follows.
>
> To use the ARM parlance, these accesses aren't "cached" (which would
> imply that a result could be returned to the load from any intermediate
> node in the interconnect), but instead are "bufferable".
>

Fixed wording

> It is really unfortunate that we continue to run into this class of
> problem across various CPU vendors and various underlying bus
> technologies; it's the continuing curse of running an PREEMPT_RT on
> commodity hardware. RT is not easy :)
>
>> The enclosed change appears to fix this issue: read the TPM chip's
>> access register (status code) after every iowrite*() operation.
>
> Are we engaged in a game of wack-a-mole with all of the drivers which
> use this same access pattern (of which I imagine there are quite a
> few!)?
>
> I'm wondering if we should explore the idea of adding a load in the
> iowriteN()/writeX() macros (marking those accesses in which reads cause
> side effects explicitly, redirecting to a _raw() variant or something).
>
> Obviously that would be expensive for non-RT use cases, but for helping
> constrain latency, it may be worth it for RT.
>
> Julia
>

2017-08-14 22:54:16

by Haris Okanovic

[permalink] [raw]
Subject: [PATCH] tpm_tis: fix stall after iowrite*()s

ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.

For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.

The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.

Signed-off-by: Haris Okanovic <[email protected]>
---
https://patchwork.kernel.org/patch/9882119/
https://github.com/harisokanovic/linux/tree/dev/hokanovi/tpm-latency-spike-fix
---
drivers/char/tpm/tpm_tis.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index c7e1384f1b08..3be2755d0514 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -52,6 +52,22 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
return container_of(data, struct tpm_tis_tcg_phy, priv);
}

+static inline void tpm_tis_iowrite8(u8 b, void __iomem *iobase, u32 addr)
+{
+ iowrite8(b, iobase + addr);
+#ifdef CONFIG_PREEMPT_RT_FULL
+ ioread8(iobase + TPM_ACCESS(0));
+#endif
+}
+
+static inline void tpm_tis_iowrite32(u32 b, void __iomem *iobase, u32 addr)
+{
+ iowrite32(b, iobase + addr);
+#ifdef CONFIG_PREEMPT_RT_FULL
+ ioread8(iobase + TPM_ACCESS(0));
+#endif
+}
+
static bool interrupts = true;
module_param(interrupts, bool, 0444);
MODULE_PARM_DESC(interrupts, "Enable interrupts");
@@ -105,7 +121,7 @@ static int tpm_tcg_write_bytes(struct tpm_tis_data *data, u32 addr, u16 len,
struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data);

while (len--)
- iowrite8(*value++, phy->iobase + addr);
+ tpm_tis_iowrite8(*value++, phy->iobase, addr);
return 0;
}

@@ -129,7 +145,7 @@ static int tpm_tcg_write32(struct tpm_tis_data *data, u32 addr, u32 value)
{
struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data);

- iowrite32(value, phy->iobase + addr);
+ tpm_tis_iowrite32(value, phy->iobase, addr);
return 0;
}

--
2.13.2

2017-08-14 23:08:07

by Haris Okanovic

[permalink] [raw]
Subject: Re: [PATCH] [RFC] tpm_tis: tpm_tcg_flush() after iowrite*()s



On 08/08/2017 04:58 PM, Jarkko Sakkinen wrote:
> On Mon, Aug 07, 2017 at 09:59:35AM -0500, Julia Cartwright wrote:
>> On Fri, Aug 04, 2017 at 04:56:51PM -0500, Haris Okanovic wrote:
>>> I have a latency issue using a SPI-based TPM chip with tpm_tis driver
>>> from non-rt usermode application, which induces ~400 us latency spikes
>>> in cyclictest (Intel Atom E3940 system, PREEMPT_RT_FULL kernel).
>>>
>>> The spikes are caused by a stalling ioread8() operation, following a
>>> sequence of 30+ iowrite8()s to the same address. I believe this happens
>>> because the writes are cached (in cpu or somewhere along the bus), which
>>> gets flushed on the first LOAD instruction (ioread*()) that follows.
>>
>> To use the ARM parlance, these accesses aren't "cached" (which would
>> imply that a result could be returned to the load from any intermediate
>> node in the interconnect), but instead are "bufferable".
>>
>> It is really unfortunate that we continue to run into this class of
>> problem across various CPU vendors and various underlying bus
>> technologies; it's the continuing curse of running an PREEMPT_RT on
>> commodity hardware. RT is not easy :)
>>
>>> The enclosed change appears to fix this issue: read the TPM chip's
>>> access register (status code) after every iowrite*() operation.
>>
>> Are we engaged in a game of wack-a-mole with all of the drivers which
>> use this same access pattern (of which I imagine there are quite a
>> few!)?
>>
>> I'm wondering if we should explore the idea of adding a load in the
>> iowriteN()/writeX() macros (marking those accesses in which reads cause
>> side effects explicitly, redirecting to a _raw() variant or something).
>>
>> Obviously that would be expensive for non-RT use cases, but for helping
>> constrain latency, it may be worth it for RT.
>>
>> Julia
>
> What if we as quick resort we add tpm_tis_iowrite8() to the TPM driver.
> Would be easy to move to iowrite8() if the problem is sorted out there
> later on.
>

Sounds OK to me. I will submit a PATCH shortly.

> /Jarkko
>

2017-08-15 06:11:14

by Alexander Stein

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: fix stall after iowrite*()s

On Monday 14 August 2017 17:53:47, Haris Okanovic wrote:
> --- a/drivers/char/tpm/tpm_tis.c
> +++ b/drivers/char/tpm/tpm_tis.c
> @@ -52,6 +52,22 @@ static inline struct tpm_tis_tcg_phy
> *to_tpm_tis_tcg_phy(struct tpm_tis_data *da return container_of(data,
> struct tpm_tis_tcg_phy, priv);
> }
>
> +static inline void tpm_tis_iowrite8(u8 b, void __iomem *iobase, u32 addr)
> +{
> + iowrite8(b, iobase + addr);
> +#ifdef CONFIG_PREEMPT_RT_FULL
> + ioread8(iobase + TPM_ACCESS(0));
> +#endif
> +}

Maybe add some comment why an iorad8 is actually requried after each write on
RT. Currently it is rather obvious why this additional read is necessary. But
is this still the case in a year?

> +static inline void tpm_tis_iowrite32(u32 b, void __iomem *iobase, u32 addr)
> +{
> + iowrite32(b, iobase + addr);
> +#ifdef CONFIG_PREEMPT_RT_FULL
> + ioread8(iobase + TPM_ACCESS(0));
> +#endif
> +}

Same applies here. Or add a comment above both functions describing their
purpose.

Just my 2 cents

Best regards,
Alexander

2017-08-15 20:11:38

by Haris Okanovic

[permalink] [raw]
Subject: Re: [PATCH] tpm_tis: fix stall after iowrite*()s



On 08/15/2017 01:11 AM, Alexander Stein wrote:
> On Monday 14 August 2017 17:53:47, Haris Okanovic wrote:
>> --- a/drivers/char/tpm/tpm_tis.c
>> +++ b/drivers/char/tpm/tpm_tis.c
>> @@ -52,6 +52,22 @@ static inline struct tpm_tis_tcg_phy
>> *to_tpm_tis_tcg_phy(struct tpm_tis_data *da return container_of(data,
>> struct tpm_tis_tcg_phy, priv);
>> }
>>
>> +static inline void tpm_tis_iowrite8(u8 b, void __iomem *iobase, u32 addr)
>> +{
>> + iowrite8(b, iobase + addr);
>> +#ifdef CONFIG_PREEMPT_RT_FULL
>> + ioread8(iobase + TPM_ACCESS(0));
>> +#endif
>> +}
>
> Maybe add some comment why an iorad8 is actually requried after each write on
> RT. Currently it is rather obvious why this additional read is necessary. But
> is this still the case in a year?
>

Sure. I re-added the tpm_tis_flush() function with comments to explain
what's going. Calling it from tpm_tis_iowrite8() and tpm_tis_iowrite32().

Will post a PATCH v2 shortly.

>> +static inline void tpm_tis_iowrite32(u32 b, void __iomem *iobase, u32 addr)
>> +{
>> + iowrite32(b, iobase + addr);
>> +#ifdef CONFIG_PREEMPT_RT_FULL
>> + ioread8(iobase + TPM_ACCESS(0));
>> +#endif
>> +}
>
> Same applies here. Or add a comment above both functions describing their
> purpose.
>
> Just my 2 cents
>
> Best regards,
> Alexander
>

2017-08-15 20:13:36

by Haris Okanovic

[permalink] [raw]
Subject: [PATCH v2] tpm_tis: fix stall after iowrite*()s

ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.

For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.

The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.

Signed-off-by: Haris Okanovic <[email protected]>
---
[PATCH v2] Add tpm_tis_flush() function with comment explaining
the ioread8() after write, per Alexander.

https://patchwork.kernel.org/patch/9882119/
https://github.com/harisokanovic/linux/tree/dev/hokanovi/tpm-latency-spike-fix-v2
---
drivers/char/tpm/tpm_tis.c | 29 +++++++++++++++++++++++++++--
1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index c7e1384f1b08..3fdec13eba8d 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -52,6 +52,31 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
return container_of(data, struct tpm_tis_tcg_phy, priv);
}

+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * Flushes previous write operations to chip so that a subsequent
+ * ioread*()s won't stall a cpu.
+ */
+static inline void tpm_tis_flush(void __iomem *iobase)
+{
+ ioread8(iobase + TPM_ACCESS(0));
+}
+#else
+#define tpm_tis_flush(iobase) do { } while (0)
+#endif
+
+static inline void tpm_tis_iowrite8(u8 b, void __iomem *iobase, u32 addr)
+{
+ iowrite8(b, iobase + addr);
+ tpm_tis_flush(iobase);
+}
+
+static inline void tpm_tis_iowrite32(u32 b, void __iomem *iobase, u32 addr)
+{
+ iowrite32(b, iobase + addr);
+ tpm_tis_flush(iobase);
+}
+
static bool interrupts = true;
module_param(interrupts, bool, 0444);
MODULE_PARM_DESC(interrupts, "Enable interrupts");
@@ -105,7 +130,7 @@ static int tpm_tcg_write_bytes(struct tpm_tis_data *data, u32 addr, u16 len,
struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data);

while (len--)
- iowrite8(*value++, phy->iobase + addr);
+ tpm_tis_iowrite8(*value++, phy->iobase, addr);
return 0;
}

@@ -129,7 +154,7 @@ static int tpm_tcg_write32(struct tpm_tis_data *data, u32 addr, u32 value)
{
struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data);

- iowrite32(value, phy->iobase + addr);
+ tpm_tis_iowrite32(value, phy->iobase, addr);
return 0;
}

--
2.13.2

2017-08-16 21:15:17

by Ken Goldman

[permalink] [raw]
Subject: Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

On 8/15/2017 4:13 PM, Haris Okanovic wrote:
> ioread8() operations to TPM MMIO addresses can stall the cpu when
> immediately following a sequence of iowrite*()'s to the same region.
>
> For example, cyclitest measures ~400us latency spikes when a non-RT
> usermode application communicates with an SPI-based TPM chip (Intel Atom
> E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
> stalling ioread8() operation following a sequence of 30+ iowrite8()s to
> the same address. I believe this happens because the write sequence is
> buffered (in cpu or somewhere along the bus), and gets flushed on the
> first LOAD instruction (ioread*()) that follows.
>
> The enclosed change appears to fix this issue: read the TPM chip's
> access register (status code) after every iowrite*() operation to
> amortize the cost of flushing data to chip across multiple instructions.

I worry a bit about "appears to fix". It seems odd that the TPM device
driver would be the first code to uncover this. Can anyone confirm that
the chipset does indeed have this bug?

I'd also like an indication of the performance penalty. We're doing a
lot of work to improve the performance and I worry that "do a read after
every write" will have a performance impact.

2017-08-17 05:57:17

by Alexander Stein

[permalink] [raw]
Subject: Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

On Wednesday 16 August 2017 17:15:55, Ken Goldman wrote:
> On 8/15/2017 4:13 PM, Haris Okanovic wrote:
> > ioread8() operations to TPM MMIO addresses can stall the cpu when
> > immediately following a sequence of iowrite*()'s to the same region.
> >
> > For example, cyclitest measures ~400us latency spikes when a non-RT
> > usermode application communicates with an SPI-based TPM chip (Intel Atom
> > E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
> > stalling ioread8() operation following a sequence of 30+ iowrite8()s to
> > the same address. I believe this happens because the write sequence is
> > buffered (in cpu or somewhere along the bus), and gets flushed on the
> > first LOAD instruction (ioread*()) that follows.
> >
> > The enclosed change appears to fix this issue: read the TPM chip's
> > access register (status code) after every iowrite*() operation to
> > amortize the cost of flushing data to chip across multiple instructions.
>
> I worry a bit about "appears to fix". It seems odd that the TPM device
> driver would be the first code to uncover this. Can anyone confirm that
> the chipset does indeed have this bug?

No, there was already a similar problem in e1000e where a PCIe read stalled
the CPU, hence no interrupts are serviced. See
https://www.spinics.net/lists/linux-rt-users/msg14077.html
AFAIK there was no outcome though.

> I'd also like an indication of the performance penalty. We're doing a
> lot of work to improve the performance and I worry that "do a read after
> every write" will have a performance impact.

Realtime will always affect performance, but IMHO the latter is much more
important.

Best regards,
Alexander


Subject: Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

On 2017-08-16 17:15:55 [-0400], Ken Goldman wrote:
> On 8/15/2017 4:13 PM, Haris Okanovic wrote:
> > ioread8() operations to TPM MMIO addresses can stall the cpu when
> > immediately following a sequence of iowrite*()'s to the same region.
> >
> > For example, cyclitest measures ~400us latency spikes when a non-RT
> > usermode application communicates with an SPI-based TPM chip (Intel Atom
> > E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
> > stalling ioread8() operation following a sequence of 30+ iowrite8()s to
> > the same address. I believe this happens because the write sequence is
> > buffered (in cpu or somewhere along the bus), and gets flushed on the
> > first LOAD instruction (ioread*()) that follows.
> >
> > The enclosed change appears to fix this issue: read the TPM chip's
> > access register (status code) after every iowrite*() operation to
> > amortize the cost of flushing data to chip across multiple instructions.

Haris, could you try a wmb() instead the read?

> I worry a bit about "appears to fix". It seems odd that the TPM device
> driver would be the first code to uncover this. Can anyone confirm that the
> chipset does indeed have this bug?

What Haris says makes sense. It is just not all architectures
accumulate/ batch writes to HW.

> I'd also like an indication of the performance penalty. We're doing a lot
> of work to improve the performance and I worry that "do a read after every
> write" will have a performance impact.
So powerpc (for instance) has a sync operation after each write to HW. I
am wondering if we could need something like that on x86.

Sebastian

2017-08-17 17:17:46

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote:

> > I worry a bit about "appears to fix". It seems odd that the TPM device
> > driver would be the first code to uncover this. Can anyone confirm that the
> > chipset does indeed have this bug?
>
> What Haris says makes sense. It is just not all architectures
> accumulate/ batch writes to HW.

It doesn't seem that odd to me.. In modern Intel chipsets the physical
LPC bus is used for very little. Maybe some flash and possibly a
winbond super IO at worst? Plus the TPM.

I can't confirm what Intel has done, but if writes are posted, then it
is not a 'bug', but expected operation for a PCI/LPC bridge device to
have an ordered queue of posted writes, and thus higher latency when
processing reads due to ordering requirments.

Other drivers may not see it because most LPC usages would not be
write heavy, or might use IO instructions which are not posted..

I can confirm that my ARM systems with a custom PCI-LPC bridge will
have exactly the same problem, and that the readl is the only
solution.

This is becuase writes to LPC are posted over PCI and will be buffered
in the root complex, device end port and internally in the LPC
bridge. Since they are posted there is no way for the CPU to know when
the complete and when it would be 'low latency' to issue a read.

> So powerpc (for instance) has a sync operation after each write to HW. I
> am wondering if we could need something like that on x86.

Even on something like PPC 'sync' is not defined to globally flush
posted writes, and wil not help. WMB is probably similar.

Jason

2017-08-17 20:12:47

by Haris Okanovic

[permalink] [raw]
Subject: Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

Neither wmb() nor mb() have any effect when substituted for
ioread8(iobase + TPM_ACCESS(0)) in tpm_tis_flush(). I still see 300 -
400 us spikes in cyclictest invoking my TPM chip's RNG.

-- Haris


On 08/17/2017 12:17 PM, Jason Gunthorpe wrote:
> On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote:
>
>>> I worry a bit about "appears to fix". It seems odd that the TPM device
>>> driver would be the first code to uncover this. Can anyone confirm that the
>>> chipset does indeed have this bug?
>>
>> What Haris says makes sense. It is just not all architectures
>> accumulate/ batch writes to HW.
>
> It doesn't seem that odd to me.. In modern Intel chipsets the physical
> LPC bus is used for very little. Maybe some flash and possibly a
> winbond super IO at worst? Plus the TPM.
>
> I can't confirm what Intel has done, but if writes are posted, then it
> is not a 'bug', but expected operation for a PCI/LPC bridge device to
> have an ordered queue of posted writes, and thus higher latency when
> processing reads due to ordering requirments.
>
> Other drivers may not see it because most LPC usages would not be
> write heavy, or might use IO instructions which are not posted..
>
> I can confirm that my ARM systems with a custom PCI-LPC bridge will
> have exactly the same problem, and that the readl is the only
> solution.
>
> This is becuase writes to LPC are posted over PCI and will be buffered
> in the root complex, device end port and internally in the LPC
> bridge. Since they are posted there is no way for the CPU to know when
> the complete and when it would be 'low latency' to issue a read.
>
>> So powerpc (for instance) has a sync operation after each write to HW. I
>> am wondering if we could need something like that on x86.
>
> Even on something like PPC 'sync' is not defined to globally flush
> posted writes, and wil not help. WMB is probably similar.
>
> Jason
>

2017-08-19 17:03:55

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

On Thu, Aug 17, 2017 at 11:17:32AM -0600, Jason Gunthorpe wrote:
> On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote:
>
> > > I worry a bit about "appears to fix". It seems odd that the TPM device
> > > driver would be the first code to uncover this. Can anyone confirm that the
> > > chipset does indeed have this bug?
> >
> > What Haris says makes sense. It is just not all architectures
> > accumulate/ batch writes to HW.
>
> It doesn't seem that odd to me.. In modern Intel chipsets the physical
> LPC bus is used for very little. Maybe some flash and possibly a
> winbond super IO at worst? Plus the TPM.
>
> I can't confirm what Intel has done, but if writes are posted, then it
> is not a 'bug', but expected operation for a PCI/LPC bridge device to
> have an ordered queue of posted writes, and thus higher latency when
> processing reads due to ordering requirments.
>
> Other drivers may not see it because most LPC usages would not be
> write heavy, or might use IO instructions which are not posted..
>
> I can confirm that my ARM systems with a custom PCI-LPC bridge will
> have exactly the same problem, and that the readl is the only
> solution.
>
> This is becuase writes to LPC are posted over PCI and will be buffered
> in the root complex, device end port and internally in the LPC
> bridge. Since they are posted there is no way for the CPU to know when
> the complete and when it would be 'low latency' to issue a read.

What you say makes sense to me but wasn't the patch tried with SPI-TPM,
not LPC? Anyway, what you're saying probably still applies.

/Jarkko