2018-05-07 16:10:21

by Nayna Jain

[permalink] [raw]
Subject: [PATCH v3 0/2] tpm: improving granularity in poll sleep times

The existing TPM polling code sleeps in each loop iteration for time in
msecs ranging from 1 msecs to 5 msecs. However, many of the TPM commands
complete much faster, resulting in unnecessary delays.

This set of patches identifies such iterations and optimizes the sleep
time. The first patch replaces TPM_POLL_SLEEP with TPM_TIMEOUT_POLL and
moves it from tpm_tis_core.c to tpm.h as an enum with value 1 msecs. The
second patch further reduces the TPM poll sleep time in get_burstcount()
and wait_for_tpm_stat() in tpm_tis_core.c by calling usleep_range()
directly.

The change is only in the polling time, and the maximum timeout is still
maintained the same. Thus, it should not affect the overall existing
behavior.

Changelog:

v3:

tpm: reduce poll sleep time in tpm_transmit()
* added testing platform information
* updated patch description for more clarity on reasoning

tpm: reduce polling time to usecs for even finer granularity
* added testing platform information
* added Jarkko's and Mimi's Reviewed-by

v2:

tpm: reduce poll sleep time in tpm_transmit()
* merged previously defined two patches into this.
* updated patch description as per Jarkko's feedback

tpm: reduce polling time to usecs for even finer granularity
* directly use usleep_range with finer granularity less than 1msec

Nayna Jain (2):
tpm: reduce poll sleep time in tpm_transmit()
tpm: reduce polling time to usecs for even finer granularity

drivers/char/tpm/tpm-interface.c | 2 +-
drivers/char/tpm/tpm.h | 5 ++++-
drivers/char/tpm/tpm_tis_core.c | 11 +++--------
3 files changed, 8 insertions(+), 10 deletions(-)

--
2.13.3



2018-05-07 16:08:24

by Nayna Jain

[permalink] [raw]
Subject: [PATCH v3 1/2] tpm: reduce poll sleep time in tpm_transmit()

tpm_try_transmit currently checks TPM status every 5 msecs between
send and recv. It does so in a loop for the maximum timeout as defined
in the TPM Interface Specification. However, the TPM may return before
5 msecs. Thus the polling interval for each iteration can be reduced,
which improves overall performance. This patch changes the polling sleep
time from 5 msecs to 1 msec.

Additionally, this patch renames TPM_POLL_SLEEP to TPM_TIMEOUT_POLL and
moves it to tpm.h as an enum value.

After this change, performance on a system[1] with a TPM 1.2 with an 8 byte
burstcount for 1000 extends improved from ~14 sec to ~10.7 sec.

[1] All tests are performed on an x86 based, locked down, single purpose
closed system. It has Infineon TPM 1.2 using LPC Bus.

Signed-off-by: Nayna Jain <[email protected]>
---
drivers/char/tpm/tpm-interface.c | 2 +-
drivers/char/tpm/tpm.h | 3 ++-
drivers/char/tpm/tpm_tis_core.c | 10 ++--------
3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 6201aab374e6..e32f6e85dc6d 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -489,7 +489,7 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip,
goto out;
}

- tpm_msleep(TPM_TIMEOUT);
+ tpm_msleep(TPM_TIMEOUT_POLL);
rmb();
} while (time_before(jiffies, stop));

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index eedcd9cf30bc..ca05828b6981 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -53,7 +53,8 @@ enum tpm_const {
enum tpm_timeout {
TPM_TIMEOUT = 5, /* msecs */
TPM_TIMEOUT_RETRY = 100, /* msecs */
- TPM_TIMEOUT_RANGE_US = 300 /* usecs */
+ TPM_TIMEOUT_RANGE_US = 300, /* usecs */
+ TPM_TIMEOUT_POLL = 1 /* msecs */
};

/* TPM addresses */
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 5a1f47b43947..493401f5fd39 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -31,12 +31,6 @@
#include "tpm.h"
#include "tpm_tis_core.h"

-/* This is a polling delay to check for status and burstcount.
- * As per ddwg input, expectation is that status check and burstcount
- * check should return within few usecs.
- */
-#define TPM_POLL_SLEEP 1 /* msec */
-
static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);

static bool wait_for_tpm_stat_cond(struct tpm_chip *chip, u8 mask,
@@ -90,7 +84,7 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
}
} else {
do {
- tpm_msleep(TPM_POLL_SLEEP);
+ tpm_msleep(TPM_TIMEOUT_POLL);
status = chip->ops->status(chip);
if ((status & mask) == mask)
return 0;
@@ -234,7 +228,7 @@ static int get_burstcount(struct tpm_chip *chip)
burstcnt = (value >> 8) & 0xFFFF;
if (burstcnt)
return burstcnt;
- tpm_msleep(TPM_POLL_SLEEP);
+ tpm_msleep(TPM_TIMEOUT_POLL);
} while (time_before(jiffies, stop));
return -EBUSY;
}
--
2.13.3


2018-05-07 16:08:29

by Nayna Jain

[permalink] [raw]
Subject: [PATCH v3 2/2] tpm: reduce polling time to usecs for even finer granularity

The TPM burstcount and status commands are supposed to return very
quickly [2][3]. This patch further reduces the TPM poll sleep time to usecs
in get_burstcount() and wait_for_tpm_stat() by calling usleep_range()
directly.

After this change, performance on a system[1] with a TPM 1.2 with an 8 byte
burstcount for 1000 extends improved from ~10.7 sec to ~7 sec.

[1] All tests are performed on an x86 based, locked down, single purpose
closed system. It has Infineon TPM 1.2 using LPC Bus.

[2] From the TCG Specification "TCG PC Client Specific TPM Interface
Specification (TIS), Family 1.2":

"NOTE : It takes roughly 330 ns per byte transfer on LPC. 256 bytes would
take 84 us, which is a long time to stall the CPU. Chipsets may not be
designed to post this much data to LPC; therefore, the CPU itself is
stalled for much of this time. Sending 1 kB would take 350 μs. Therefore,
even if the TPM_STS_x.burstCount field is a high value, software SHOULD
be interruptible during this period."

[3] From the TCG Specification 2.0, "TCG PC Client Platform TPM Profile
(PTP) Specification":

"It takes roughly 330 ns per byte transfer on LPC. 256 bytes would take
84 us. Chipsets may not be designed to post this much data to LPC;
therefore, the CPU itself is stalled for much of this time. Sending 1 kB
would take 350 us. Therefore, even if the TPM_STS_x.burstCount field is a
high value, software should be interruptible during this period. For SPI,
assuming 20MHz clock and 64-byte transfers, it would take about 120 usec
to move 256B of data. Sending 1kB would take about 500 usec. If the
transactions are done using 4 bytes at a time, then it would take about
1 msec. to transfer 1kB of data."

Signed-off-by: Nayna Jain <[email protected]>
Reviewed-by: Mimi Zohar <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
---
drivers/char/tpm/tpm.h | 4 +++-
drivers/char/tpm/tpm_tis_core.c | 5 +++--
2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index ca05828b6981..9824cccb2c76 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -54,7 +54,9 @@ enum tpm_timeout {
TPM_TIMEOUT = 5, /* msecs */
TPM_TIMEOUT_RETRY = 100, /* msecs */
TPM_TIMEOUT_RANGE_US = 300, /* usecs */
- TPM_TIMEOUT_POLL = 1 /* msecs */
+ TPM_TIMEOUT_POLL = 1, /* msecs */
+ TPM_TIMEOUT_USECS_MIN = 100, /* usecs */
+ TPM_TIMEOUT_USECS_MAX = 500 /* usecs */
};

/* TPM addresses */
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 493401f5fd39..b77a8dcfb822 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -84,7 +84,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
}
} else {
do {
- tpm_msleep(TPM_TIMEOUT_POLL);
+ usleep_range(TPM_TIMEOUT_USECS_MIN,
+ TPM_TIMEOUT_USECS_MAX);
status = chip->ops->status(chip);
if ((status & mask) == mask)
return 0;
@@ -228,7 +229,7 @@ static int get_burstcount(struct tpm_chip *chip)
burstcnt = (value >> 8) & 0xFFFF;
if (burstcnt)
return burstcnt;
- tpm_msleep(TPM_TIMEOUT_POLL);
+ usleep_range(TPM_TIMEOUT_USECS_MIN, TPM_TIMEOUT_USECS_MAX);
} while (time_before(jiffies, stop));
return -EBUSY;
}
--
2.13.3


2018-05-08 16:34:34

by Jay Freyensee

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: reduce poll sleep time in tpm_transmit()


> do {
> - tpm_msleep(TPM_POLL_SLEEP);
> + tpm_msleep(TPM_TIMEOUT_POLL);
>
I'm just curious why it was decided to still use tpm_msleep() here
instead of usleep_range() which was used in the 2nd patch.

Otherwise,

Acked-by: Jay Freyensee <[email protected]>


2018-05-08 16:35:03

by Jay Freyensee

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] tpm: reduce polling time to usecs for even finer granularity



On 5/7/18 9:07 AM, Nayna Jain wrote:
> The TPM burstcount and status commands are supposed to return very
> quickly [2][3]. This patch further reduces the TPM poll sleep time to usecs
> in get_burstcount() and wait_for_tpm_stat() by calling usleep_range()
> directly.
>
> After this change, performance on a system[1] with a TPM 1.2 with an 8 byte
> burstcount for 1000 extends improved from ~10.7 sec to ~7 sec.
>
> [1] All tests are performed on an x86 based, locked down, single purpose
> closed system. It has Infineon TPM 1.2 using LPC Bus.
>
> [2] From the TCG Specification "TCG PC Client Specific TPM Interface
> Specification (TIS), Family 1.2":
>
> "NOTE : It takes roughly 330 ns per byte transfer on LPC. 256 bytes would
> take 84 us, which is a long time to stall the CPU. Chipsets may not be
> designed to post this much data to LPC; therefore, the CPU itself is
> stalled for much of this time. Sending 1 kB would take 350 μs. Therefore,
> even if the TPM_STS_x.burstCount field is a high value, software SHOULD
> be interruptible during this period."
>
> [3] From the TCG Specification 2.0, "TCG PC Client Platform TPM Profile
> (PTP) Specification":
>
> "It takes roughly 330 ns per byte transfer on LPC. 256 bytes would take
> 84 us. Chipsets may not be designed to post this much data to LPC;
> therefore, the CPU itself is stalled for much of this time. Sending 1 kB
> would take 350 us. Therefore, even if the TPM_STS_x.burstCount field is a
> high value, software should be interruptible during this period. For SPI,
> assuming 20MHz clock and 64-byte transfers, it would take about 120 usec
> to move 256B of data. Sending 1kB would take about 500 usec. If the
> transactions are done using 4 bytes at a time, then it would take about
> 1 msec. to transfer 1kB of data."
>
> Signed-off-by: Nayna Jain <[email protected]>
> Reviewed-by: Mimi Zohar <[email protected]>
> Reviewed-by: Jarkko Sakkinen <[email protected]>
> ---

Acked-by: Jay Freyensee <[email protected]>


2018-05-10 12:44:53

by Nayna Jain

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: reduce poll sleep time in tpm_transmit()



On 05/08/2018 10:04 PM, J Freyensee wrote:
>
>>           do {
>> -            tpm_msleep(TPM_POLL_SLEEP);
>> +            tpm_msleep(TPM_TIMEOUT_POLL);
>>
> I'm just curious why it was decided to still use tpm_msleep() here
> instead of usleep_range() which was used in the 2nd patch.

TPM_TIMEOUT_POLL is in msec i.e. 1 msec and usleep_range() is used only
when timeout is needed in usecs.

>
> Otherwise,
>
> Acked-by: Jay Freyensee <[email protected]>

Thanks !!

Thanks & Regards,
    - Nayna



2018-05-14 10:42:22

by Nayna Jain

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: reduce poll sleep time in tpm_transmit()



On 05/10/2018 06:11 PM, Nayna Jain wrote:
>
>
> On 05/08/2018 10:04 PM, J Freyensee wrote:
>>
>>>           do {
>>> -            tpm_msleep(TPM_POLL_SLEEP);
>>> +            tpm_msleep(TPM_TIMEOUT_POLL);
>>>
>> I'm just curious why it was decided to still use tpm_msleep() here
>> instead of usleep_range() which was used in the 2nd patch.
>
> TPM_TIMEOUT_POLL is in msec i.e. 1 msec and usleep_range() is used
> only when timeout is needed in usecs.

Just to add bit more details:

usleep_range() is used in wait_for_tpm_stat() and get_burstcount() which
are expected to return quickly. tpm_transmit() is a generic function
used across all drivers and commands.
Some of the commands (eg. hash, key generation) take longer compared to
other commands (eg. extend). The sleep time in tpm_transmit is reduced
but kept in msecs to balance between the smaller and longer commands.

Thanks & Regards,
    - Nayna

>
>>
>> Otherwise,
>>
>> Acked-by: Jay Freyensee <[email protected]>
>
> Thanks !!
>
> Thanks & Regards,
>     - Nayna
>
>


2018-05-14 10:48:12

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: reduce poll sleep time in tpm_transmit()

On Mon, May 07, 2018 at 12:07:32PM -0400, Nayna Jain wrote:
> tpm_try_transmit currently checks TPM status every 5 msecs between
> send and recv. It does so in a loop for the maximum timeout as defined
> in the TPM Interface Specification. However, the TPM may return before
> 5 msecs. Thus the polling interval for each iteration can be reduced,
> which improves overall performance. This patch changes the polling sleep
> time from 5 msecs to 1 msec.
>
> Additionally, this patch renames TPM_POLL_SLEEP to TPM_TIMEOUT_POLL and
> moves it to tpm.h as an enum value.
>
> After this change, performance on a system[1] with a TPM 1.2 with an 8 byte
> burstcount for 1000 extends improved from ~14 sec to ~10.7 sec.
>
> [1] All tests are performed on an x86 based, locked down, single purpose
> closed system. It has Infineon TPM 1.2 using LPC Bus.
>
> Signed-off-by: Nayna Jain <[email protected]>

Reviewed-by: Jarkko Sakkinen <[email protected]>

/Jarkko

2018-05-14 10:50:53

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: reduce poll sleep time in tpm_transmit()

On Mon, May 14, 2018 at 01:46:00PM +0300, Jarkko Sakkinen wrote:
> On Mon, May 07, 2018 at 12:07:32PM -0400, Nayna Jain wrote:
> > tpm_try_transmit currently checks TPM status every 5 msecs between
> > send and recv. It does so in a loop for the maximum timeout as defined
> > in the TPM Interface Specification. However, the TPM may return before
> > 5 msecs. Thus the polling interval for each iteration can be reduced,
> > which improves overall performance. This patch changes the polling sleep
> > time from 5 msecs to 1 msec.
> >
> > Additionally, this patch renames TPM_POLL_SLEEP to TPM_TIMEOUT_POLL and
> > moves it to tpm.h as an enum value.
> >
> > After this change, performance on a system[1] with a TPM 1.2 with an 8 byte
> > burstcount for 1000 extends improved from ~14 sec to ~10.7 sec.
> >
> > [1] All tests are performed on an x86 based, locked down, single purpose
> > closed system. It has Infineon TPM 1.2 using LPC Bus.
> >
> > Signed-off-by: Nayna Jain <[email protected]>
>
> Reviewed-by: Jarkko Sakkinen <[email protected]>

Tested-by: Jarkko Sakkinen <[email protected]>

/Jarkko

2018-05-14 10:53:13

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] tpm: reduce polling time to usecs for even finer granularity

On Mon, May 07, 2018 at 12:07:33PM -0400, Nayna Jain wrote:
> The TPM burstcount and status commands are supposed to return very
> quickly [2][3]. This patch further reduces the TPM poll sleep time to usecs
> in get_burstcount() and wait_for_tpm_stat() by calling usleep_range()
> directly.
>
> After this change, performance on a system[1] with a TPM 1.2 with an 8 byte
> burstcount for 1000 extends improved from ~10.7 sec to ~7 sec.
>
> [1] All tests are performed on an x86 based, locked down, single purpose
> closed system. It has Infineon TPM 1.2 using LPC Bus.
>
> [2] From the TCG Specification "TCG PC Client Specific TPM Interface
> Specification (TIS), Family 1.2":
>
> "NOTE : It takes roughly 330 ns per byte transfer on LPC. 256 bytes would
> take 84 us, which is a long time to stall the CPU. Chipsets may not be
> designed to post this much data to LPC; therefore, the CPU itself is
> stalled for much of this time. Sending 1 kB would take 350 μs. Therefore,
> even if the TPM_STS_x.burstCount field is a high value, software SHOULD
> be interruptible during this period."
>
> [3] From the TCG Specification 2.0, "TCG PC Client Platform TPM Profile
> (PTP) Specification":
>
> "It takes roughly 330 ns per byte transfer on LPC. 256 bytes would take
> 84 us. Chipsets may not be designed to post this much data to LPC;
> therefore, the CPU itself is stalled for much of this time. Sending 1 kB
> would take 350 us. Therefore, even if the TPM_STS_x.burstCount field is a
> high value, software should be interruptible during this period. For SPI,
> assuming 20MHz clock and 64-byte transfers, it would take about 120 usec
> to move 256B of data. Sending 1kB would take about 500 usec. If the
> transactions are done using 4 bytes at a time, then it would take about
> 1 msec. to transfer 1kB of data."
>
> Signed-off-by: Nayna Jain <[email protected]>
> Reviewed-by: Mimi Zohar <[email protected]>
> Reviewed-by: Jarkko Sakkinen <[email protected]>
> ---
> drivers/char/tpm/tpm.h | 4 +++-
> drivers/char/tpm/tpm_tis_core.c | 5 +++--
> 2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index ca05828b6981..9824cccb2c76 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -54,7 +54,9 @@ enum tpm_timeout {
> TPM_TIMEOUT = 5, /* msecs */
> TPM_TIMEOUT_RETRY = 100, /* msecs */
> TPM_TIMEOUT_RANGE_US = 300, /* usecs */
> - TPM_TIMEOUT_POLL = 1 /* msecs */
> + TPM_TIMEOUT_POLL = 1, /* msecs */
> + TPM_TIMEOUT_USECS_MIN = 100, /* usecs */
> + TPM_TIMEOUT_USECS_MAX = 500 /* usecs */
> };
>
> /* TPM addresses */
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 493401f5fd39..b77a8dcfb822 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -84,7 +84,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
> }
> } else {
> do {
> - tpm_msleep(TPM_TIMEOUT_POLL);
> + usleep_range(TPM_TIMEOUT_USECS_MIN,
> + TPM_TIMEOUT_USECS_MAX);

This is not properly aligned and it split is into two lines for no good
reason.

> status = chip->ops->status(chip);
> if ((status & mask) == mask)
> return 0;
> @@ -228,7 +229,7 @@ static int get_burstcount(struct tpm_chip *chip)
> burstcnt = (value >> 8) & 0xFFFF;
> if (burstcnt)
> return burstcnt;
> - tpm_msleep(TPM_TIMEOUT_POLL);
> + usleep_range(TPM_TIMEOUT_USECS_MIN, TPM_TIMEOUT_USECS_MAX);

And it is incosistent with this in terms how the code is laid out...

/Jarkko