2020-05-26 18:35:14

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

This reverts commit d23d12484307b40eea549b8a858f5fffad913897.

This commit has caused regressions for the XPS 9560 containing
a Nuvoton TPM.

As mentioned by the reporter all TPM2 commands are failing with:
ERROR:tcti:src/tss2-tcti/tcti-device.c:290:tcti_device_receive()
Failed to read response from fd 3, got errno 1: Operation not permitted

The reporter bisected this issue back to this commit which was
backported to stable as commit 4d6ebc4.

Cc: Jeffrin Jose T <[email protected]>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206275
Fixes: d23d124 ("tpm: fix invalid locking in NONBLOCKING mode")
Reported-by: Alex Guzman <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
---
drivers/char/tpm/tpm-dev-common.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c
index 87f449340202..bc9d7c7ddc01 100644
--- a/drivers/char/tpm/tpm-dev-common.c
+++ b/drivers/char/tpm/tpm-dev-common.c
@@ -61,12 +61,6 @@ static void tpm_dev_async_work(struct work_struct *work)

mutex_lock(&priv->buffer_mutex);
priv->command_enqueued = false;
- ret = tpm_try_get_ops(priv->chip);
- if (ret) {
- priv->response_length = ret;
- goto out;
- }
-
ret = tpm_dev_transmit(priv->chip, priv->space, priv->data_buffer,
sizeof(priv->data_buffer));
tpm_put_ops(priv->chip);
@@ -74,7 +68,6 @@ static void tpm_dev_async_work(struct work_struct *work)
priv->response_length = ret;
mod_timer(&priv->user_read_timer, jiffies + (120 * HZ));
}
-out:
mutex_unlock(&priv->buffer_mutex);
wake_up_interruptible(&priv->async_wait);
}
@@ -211,7 +204,6 @@ ssize_t tpm_common_write(struct file *file, const char __user *buf,
if (file->f_flags & O_NONBLOCK) {
priv->command_enqueued = true;
queue_work(tpm_dev_wq, &priv->async_work);
- tpm_put_ops(priv->chip);
mutex_unlock(&priv->buffer_mutex);
return size;
}
--
2.25.1


2020-05-26 19:16:56

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote:
> This reverts commit d23d12484307b40eea549b8a858f5fffad913897.
>
> This commit has caused regressions for the XPS 9560 containing
> a Nuvoton TPM.

Presumably this is using the tis driver?

> As mentioned by the reporter all TPM2 commands are failing with:
> ERROR:tcti:src/tss2-tcti/tcti-device.c:290:tcti_device_receive()
> Failed to read response from fd 3, got errno 1: Operation not
> permitted
>
> The reporter bisected this issue back to this commit which was
> backported to stable as commit 4d6ebc4.

I think the problem is request_locality ... for some inexplicable
reason a failure there returns -1, which is EPERM to user space.

That seems to be a bug in the async code since everything else gives a
ESPIPE error if tpm_try_get_ops fails ... at least no-one assumes it
gives back a sensible return code.

What I think is happening is that with the patch the TPM goes through a
quick sequence of request, relinquish, request, relinquish and it's the
third request which is failing (likely timing out). Without the patch,
the patch there's only one request,relinquish cycle because the ops are
held while the async work is executed. I have a vague recollection
that there is a problem with too many locality request in quick
succession, but I'll defer to Jason, who I think understands the
intricacies of localities better than I do.

If that's the problem, the solution looks simple enough: just move the
ops get down because the priv state is already protected by the buffer
mutex

James

---

diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c
index 87f449340202..1784530b8387 100644
--- a/drivers/char/tpm/tpm-dev-common.c
+++ b/drivers/char/tpm/tpm-dev-common.c
@@ -189,15 +189,6 @@ ssize_t tpm_common_write(struct file *file, const char __user *buf,
goto out;
}

- /* atomic tpm command send and result receive. We only hold the ops
- * lock during this period so that the tpm can be unregistered even if
- * the char dev is held open.
- */
- if (tpm_try_get_ops(priv->chip)) {
- ret = -EPIPE;
- goto out;
- }
-
priv->response_length = 0;
priv->response_read = false;
*off = 0;
@@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file, const char __user *buf,
if (file->f_flags & O_NONBLOCK) {
priv->command_enqueued = true;
queue_work(tpm_dev_wq, &priv->async_work);
- tpm_put_ops(priv->chip);
mutex_unlock(&priv->buffer_mutex);
return size;
}

+ /* atomic tpm command send and result receive. We only hold the ops
+ * lock during this period so that the tpm can be unregistered even if
+ * the char dev is held open.
+ */
+ if (tpm_try_get_ops(priv->chip)) {
+ ret = -EPIPE;
+ goto out;
+ }
+
ret = tpm_dev_transmit(priv->chip, priv->space, priv->data_buffer,
sizeof(priv->data_buffer));
tpm_put_ops(priv->chip);

2020-05-26 19:26:45

by Mario Limonciello

[permalink] [raw]
Subject: RE: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

> On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote:
> > This reverts commit d23d12484307b40eea549b8a858f5fffad913897.
> >
> > This commit has caused regressions for the XPS 9560 containing
> > a Nuvoton TPM.
>
> Presumably this is using the tis driver?

Correct.

>
> > As mentioned by the reporter all TPM2 commands are failing with:
> > ERROR:tcti:src/tss2-tcti/tcti-device.c:290:tcti_device_receive()
> > Failed to read response from fd 3, got errno 1: Operation not
> > permitted
> >
> > The reporter bisected this issue back to this commit which was
> > backported to stable as commit 4d6ebc4.
>
> I think the problem is request_locality ... for some inexplicable
> reason a failure there returns -1, which is EPERM to user space.
>
> That seems to be a bug in the async code since everything else gives a
> ESPIPE error if tpm_try_get_ops fails ... at least no-one assumes it
> gives back a sensible return code.
>
> What I think is happening is that with the patch the TPM goes through a
> quick sequence of request, relinquish, request, relinquish and it's the
> third request which is failing (likely timing out). Without the patch,
> the patch there's only one request,relinquish cycle because the ops are
> held while the async work is executed. I have a vague recollection
> that there is a problem with too many locality request in quick
> succession, but I'll defer to Jason, who I think understands the
> intricacies of localities better than I do.

Thanks, I don't pretend to understand the nuances of this particular code,
but I was hoping that the request to revert got some attention since Alex's
kernel Bugzilla and message a few months ago to linux integrity weren't.

>
> If that's the problem, the solution looks simple enough: just move the
> ops get down because the priv state is already protected by the buffer
> mutex

Yeah, if that works for Alex's situation it certainly sounds like a better
solution than reverting this patch as this patch actually does fix a problem
reported by Jeffrin originally.

Could you propose a specific patch that Alex and Jeffrin can perhaps both try?

>
> James
>
> ---
>
> diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-
> common.c
> index 87f449340202..1784530b8387 100644
> --- a/drivers/char/tpm/tpm-dev-common.c
> +++ b/drivers/char/tpm/tpm-dev-common.c
> @@ -189,15 +189,6 @@ ssize_t tpm_common_write(struct file *file, const char
> __user *buf,
> goto out;
> }
>
> - /* atomic tpm command send and result receive. We only hold the ops
> - * lock during this period so that the tpm can be unregistered even if
> - * the char dev is held open.
> - */
> - if (tpm_try_get_ops(priv->chip)) {
> - ret = -EPIPE;
> - goto out;
> - }
> -
> priv->response_length = 0;
> priv->response_read = false;
> *off = 0;
> @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file, const char
> __user *buf,
> if (file->f_flags & O_NONBLOCK) {
> priv->command_enqueued = true;
> queue_work(tpm_dev_wq, &priv->async_work);
> - tpm_put_ops(priv->chip);
> mutex_unlock(&priv->buffer_mutex);
> return size;
> }
>
> + /* atomic tpm command send and result receive. We only hold the ops
> + * lock during this period so that the tpm can be unregistered even if
> + * the char dev is held open.
> + */
> + if (tpm_try_get_ops(priv->chip)) {
> + ret = -EPIPE;
> + goto out;
> + }
> +
> ret = tpm_dev_transmit(priv->chip, priv->space, priv->data_buffer,
> sizeof(priv->data_buffer));
> tpm_put_ops(priv->chip);

2020-05-26 19:41:17

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Tue, 2020-05-26 at 19:23 +0000, [email protected] wrote:
> > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote:
> > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897.
> > >
> > > This commit has caused regressions for the XPS 9560 containing
> > > a Nuvoton TPM.
> >
> > Presumably this is using the tis driver?
>
> Correct.
>
> >
> > > As mentioned by the reporter all TPM2 commands are failing with:
> > > ERROR:tcti:src/tss2-tcti/tcti-
> > > device.c:290:tcti_device_receive()
> > > Failed to read response from fd 3, got errno 1: Operation not
> > > permitted
> > >
> > > The reporter bisected this issue back to this commit which was
> > > backported to stable as commit 4d6ebc4.
> >
> > I think the problem is request_locality ... for some inexplicable
> > reason a failure there returns -1, which is EPERM to user space.
> >
> > That seems to be a bug in the async code since everything else
> > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one
> > assumes it gives back a sensible return code.
> >
> > What I think is happening is that with the patch the TPM goes
> > through a quick sequence of request, relinquish, request,
> > relinquish and it's the third request which is failing (likely
> > timing out). Without the patch, the patch there's only one
> > request,relinquish cycle because the ops are held while the async
> > work is executed. I have a vague recollection that there is a
> > problem with too many locality request in quick succession, but
> > I'll defer to Jason, who I think understands the intricacies of
> > localities better than I do.
>
> Thanks, I don't pretend to understand the nuances of this particular
> code, but I was hoping that the request to revert got some attention
> since Alex's kernel Bugzilla and message a few months ago to linux
> integrity weren't.
>
> >
> > If that's the problem, the solution looks simple enough: just move
> > the ops get down because the priv state is already protected by the
> > buffer mutex
>
> Yeah, if that works for Alex's situation it certainly sounds like a
> better solution than reverting this patch as this patch actually does
> fix a problem reported by Jeffrin originally.
>
> Could you propose a specific patch that Alex and Jeffrin can perhaps
> both try?

Um, what's wrong with the one I originally attached and which you quote
below? It's only compile tested, but I think it will work, if the
theory is correct.

James

> >
> > James
> >
> > ---
> >
> > diff --git a/drivers/char/tpm/tpm-dev-common.c
> > b/drivers/char/tpm/tpm-dev-
> > common.c
> > index 87f449340202..1784530b8387 100644
> > --- a/drivers/char/tpm/tpm-dev-common.c
> > +++ b/drivers/char/tpm/tpm-dev-common.c
> > @@ -189,15 +189,6 @@ ssize_t tpm_common_write(struct file *file,
> > const char
> > __user *buf,
> > goto out;
> > }
> >
> > - /* atomic tpm command send and result receive. We only
> > hold the ops
> > - * lock during this period so that the tpm can be
> > unregistered even if
> > - * the char dev is held open.
> > - */
> > - if (tpm_try_get_ops(priv->chip)) {
> > - ret = -EPIPE;
> > - goto out;
> > - }
> > -
> > priv->response_length = 0;
> > priv->response_read = false;
> > *off = 0;
> > @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file,
> > const char
> > __user *buf,
> > if (file->f_flags & O_NONBLOCK) {
> > priv->command_enqueued = true;
> > queue_work(tpm_dev_wq, &priv->async_work);
> > - tpm_put_ops(priv->chip);
> > mutex_unlock(&priv->buffer_mutex);
> > return size;
> > }
> >
> > + /* atomic tpm command send and result receive. We only
> > hold the ops
> > + * lock during this period so that the tpm can be
> > unregistered even if
> > + * the char dev is held open.
> > + */
> > + if (tpm_try_get_ops(priv->chip)) {
> > + ret = -EPIPE;
> > + goto out;
> > + }
> > +
> > ret = tpm_dev_transmit(priv->chip, priv->space, priv-
> > > data_buffer,
> >
> > sizeof(priv->data_buffer));
> > tpm_put_ops(priv->chip);

2020-05-26 19:42:38

by Tadeusz Struk

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On 5/26/20 12:14 PM, James Bottomley wrote:
> + /* atomic tpm command send and result receive. We only hold the ops
> + * lock during this period so that the tpm can be unregistered even if
> + * the char dev is held open.
> + */
> + if (tpm_try_get_ops(priv->chip)) {
> + ret = -EPIPE;
> + goto out;
> + }
> +
Hi James,
This won't help if the message is read by an async tcti. If the problem lies
in the chip get locality code, perhaps this could help to debug the root-cause
instead of masking it out in the upper layer code:

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 2435216bd10a..da5ecd0376bf 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -202,20 +202,22 @@ static int request_locality(struct tpm_chip *chip, int l)
return rc;

stop = jiffies + chip->timeout_a;
+ timeout = stop - jiffies;

if (chip->flags & TPM_CHIP_FLAG_IRQ) {
again:
timeout = stop - jiffies;
if ((long)timeout <= 0)
- return -1;
+ goto out;
+
rc = wait_event_interruptible_timeout(priv->int_queue,
- (check_locality
- (chip, l)),
+ check_locality(chip, l),
timeout);
if (rc > 0)
return l;
if (rc == -ERESTARTSYS && freezing(current)) {
clear_thread_flag(TIF_SIGPENDING);
+ timeout = stop - jiffies;
goto again;
}
} else {
@@ -226,6 +228,10 @@ static int request_locality(struct tpm_chip *chip, int l)
tpm_msleep(TPM_TIMEOUT);
} while (time_before(jiffies, stop));
}
+out:
+ dev_warn(&chip->dev, "%s: failed to request locality %d after %lu ms\n",
+ __func__, l, timeout * HZ / 1000);
+
return -1;
}

2020-05-26 23:25:13

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Tue, 2020-05-26 at 12:39 -0700, Tadeusz Struk wrote:
> On 5/26/20 12:14 PM, James Bottomley wrote:
> > + /* atomic tpm command send and result receive. We only
> > hold the ops
> > + * lock during this period so that the tpm can be
> > unregistered even if
> > + * the char dev is held open.
> > + */
> > + if (tpm_try_get_ops(priv->chip)) {
> > + ret = -EPIPE;
> > + goto out;
> > + }
> > +
>
> Hi James,
> This won't help if the message is read by an async tcti.

Why not? It moves the ops get underneath the async path, so it's now
all done in the direct read or the async read. That seems to be more
efficient.

> If the problem lies in the chip get locality code, perhaps this
> could help to debug the root-cause instead of masking it out in the
> upper layer code:

I don't think there is a root cause other than a TIS TPM is getting
annoyed by us cycling localities too rapidly because we don't do an
actual TPM operation between request and relinquish. Since the first
request/relinquish seems unnecessary for the async case, moving the ops
get eliminates the problem.

James

2020-05-27 01:00:05

by Tadeusz Struk

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On 5/26/20 1:00 PM, James Bottomley wrote:
> I don't think there is a root cause other than a TIS TPM is getting
> annoyed by us cycling localities too rapidly because we don't do an
> actual TPM operation between request and relinquish. Since the first
> request/relinquish seems unnecessary for the async case, moving the ops
> get eliminates the problem.

Could be, so maybe we could try both patches.
More debug info on the error path won't hurt.
Thanks,
Tadeusz

2020-05-27 04:23:31

by Alex Guzman

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Tue, May 26, 2020 at 4:06 PM James Bottomley
<[email protected]> wrote:
>
> On Tue, 2020-05-26 at 15:19 -0700, Alex Guzman wrote:
> [...]
> > When using your patch, I get a hang when trying to use tpm2_getcap,
> > and dmesg shows some info.
>
> Are you sure it's all applied? This
>
> > [ 570.913803] tpm_tcg_write_bytes+0x2f/0x40
> > [ 570.913805] release_locality+0x49/0x220
> > [ 570.913807] tpm_relinquish_locality+0x1f/0x40
> > [ 570.913808] tpm_chip_stop+0x21/0x40
> > [ 570.913810] tpm_put_ops+0x9/0x30
> > [ 570.913811] tpm_common_write+0x179/0x190
> > [ 570.913813] vfs_write+0xb1/0x1a0
>
> Implies an unmatched tpm_put_ops() in the async write path, as though
> this hunk:
>
> > @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file,
> > const char __user *buf,
> > if (file->f_flags & O_NONBLOCK) {
> > priv->command_enqueued = true;
> > queue_work(tpm_dev_wq, &priv->async_work);
> > - tpm_put_ops(priv->chip);
> > mutex_unlock(&priv->buffer_mutex);
> > return size;
> > }
>
> Is missing. I actually booted the patch in my TPM based VM and it all
> seems to work OK when I execute tpm2_getcap (I verified it's using
> O_NONBLOCK) and tssgetcapability in sync mode.
>
> James
>

Oh, I did miss that bit. The patch had issues applying for some reason
and I missed the single-line removal when I was looking at the diff.

I gave it a spin on my machine again. getcap seems to work correctly
with and without having the async config flag set for tpm2-tss. The
pkcs11 plugin seems to work correctly again too. :)

2020-05-27 04:26:18

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Tue, 2020-05-26 at 16:31 -0700, Alex Guzman wrote:
> On Tue, May 26, 2020 at 4:06 PM James Bottomley
> <[email protected]> wrote:
> >
> > On Tue, 2020-05-26 at 15:19 -0700, Alex Guzman wrote:
> > [...]
> > > When using your patch, I get a hang when trying to use
> > > tpm2_getcap, and dmesg shows some info.
> >
> > Are you sure it's all applied? This
> >
> > > [ 570.913803] tpm_tcg_write_bytes+0x2f/0x40
> > > [ 570.913805] release_locality+0x49/0x220
> > > [ 570.913807] tpm_relinquish_locality+0x1f/0x40
> > > [ 570.913808] tpm_chip_stop+0x21/0x40
> > > [ 570.913810] tpm_put_ops+0x9/0x30
> > > [ 570.913811] tpm_common_write+0x179/0x190
> > > [ 570.913813] vfs_write+0xb1/0x1a0
> >
> > Implies an unmatched tpm_put_ops() in the async write path, as
> > though this hunk:
> >
> > > @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file,
> > > const char __user *buf,
> > > if (file->f_flags & O_NONBLOCK) {
> > > priv->command_enqueued = true;
> > > queue_work(tpm_dev_wq, &priv->async_work);
> > > - tpm_put_ops(priv->chip);
> > > mutex_unlock(&priv->buffer_mutex);
> > > return size;
> > > }
> >
> > Is missing. I actually booted the patch in my TPM based VM and it
> > all seems to work OK when I execute tpm2_getcap (I verified it's
> > using O_NONBLOCK) and tssgetcapability in sync mode.
> >
> > James
> >
>
> Oh, I did miss that bit. The patch had issues applying for some
> reason and I missed the single-line removal when I was looking at the
> diff.

Sorry, that's likely my fault: I did it on top of my current TPM tree.
I'll prepare a version against the vanilla kernel with a real
changelog.

> I gave it a spin on my machine again. getcap seems to work correctly
> with and without having the async config flag set for tpm2-tss. The
> pkcs11 plugin seems to work correctly again too. :)

Great, thanks! I'll add your tested-by to the above.

James

2020-05-27 09:22:36

by Alex Guzman

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote:
> On Tue, 2020-05-26 at 19:23 +0000, [email protected] wrote:
> > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote:
> > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897.
> > > >
> > > > This commit has caused regressions for the XPS 9560 containing
> > > > a Nuvoton TPM.
> > >
> > > Presumably this is using the tis driver?
> >
> > Correct.
> >
> > > > As mentioned by the reporter all TPM2 commands are failing
> > > > with:
> > > > ERROR:tcti:src/tss2-tcti/tcti-
> > > > device.c:290:tcti_device_receive()
> > > > Failed to read response from fd 3, got errno 1: Operation not
> > > > permitted
> > > >
> > > > The reporter bisected this issue back to this commit which was
> > > > backported to stable as commit 4d6ebc4.
> > >
> > > I think the problem is request_locality ... for some inexplicable
> > > reason a failure there returns -1, which is EPERM to user space.
> > >
> > > That seems to be a bug in the async code since everything else
> > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one
> > > assumes it gives back a sensible return code.
> > >
> > > What I think is happening is that with the patch the TPM goes
> > > through a quick sequence of request, relinquish, request,
> > > relinquish and it's the third request which is failing (likely
> > > timing out). Without the patch, the patch there's only one
> > > request,relinquish cycle because the ops are held while the async
> > > work is executed. I have a vague recollection that there is a
> > > problem with too many locality request in quick succession, but
> > > I'll defer to Jason, who I think understands the intricacies of
> > > localities better than I do.
> >
> > Thanks, I don't pretend to understand the nuances of this
> > particular
> > code, but I was hoping that the request to revert got some
> > attention
> > since Alex's kernel Bugzilla and message a few months ago to linux
> > integrity weren't.
> >
> > > If that's the problem, the solution looks simple enough: just
> > > move
> > > the ops get down because the priv state is already protected by
> > > the
> > > buffer mutex
> >
> > Yeah, if that works for Alex's situation it certainly sounds like a
> > better solution than reverting this patch as this patch actually
> > does
> > fix a problem reported by Jeffrin originally.
> >
> > Could you propose a specific patch that Alex and Jeffrin can
> > perhaps
> > both try?
>
> Um, what's wrong with the one I originally attached and which you
> quote
> below? It's only compile tested, but I think it will work, if the
> theory is correct.
>
> James
>
> > > James
> > >
> > > ---
> > >
> > > diff --git a/drivers/char/tpm/tpm-dev-common.c
> > > b/drivers/char/tpm/tpm-dev-
> > > common.c
> > > index 87f449340202..1784530b8387 100644
> > > --- a/drivers/char/tpm/tpm-dev-common.c
> > > +++ b/drivers/char/tpm/tpm-dev-common.c
> > > @@ -189,15 +189,6 @@ ssize_t tpm_common_write(struct file *file,
> > > const char
> > > __user *buf,
> > > goto out;
> > > }
> > >
> > > - /* atomic tpm command send and result receive. We only
> > > hold the ops
> > > - * lock during this period so that the tpm can be
> > > unregistered even if
> > > - * the char dev is held open.
> > > - */
> > > - if (tpm_try_get_ops(priv->chip)) {
> > > - ret = -EPIPE;
> > > - goto out;
> > > - }
> > > -
> > > priv->response_length = 0;
> > > priv->response_read = false;
> > > *off = 0;
> > > @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file,
> > > const char
> > > __user *buf,
> > > if (file->f_flags & O_NONBLOCK) {
> > > priv->command_enqueued = true;
> > > queue_work(tpm_dev_wq, &priv->async_work);
> > > - tpm_put_ops(priv->chip);
> > > mutex_unlock(&priv->buffer_mutex);
> > > return size;
> > > }
> > >
> > > + /* atomic tpm command send and result receive. We only
> > > hold the ops
> > > + * lock during this period so that the tpm can be
> > > unregistered even if
> > > + * the char dev is held open.
> > > + */
> > > + if (tpm_try_get_ops(priv->chip)) {
> > > + ret = -EPIPE;
> > > + goto out;
> > > + }
> > > +
> > > ret = tpm_dev_transmit(priv->chip, priv->space, priv-
> > > > data_buffer,
> > >
> > > sizeof(priv->data_buffer));
> > > tpm_put_ops(priv->chip);

When using your patch, I get a hang when trying to use tpm2_getcap, and
dmesg shows some info.



Attachments:
buglog.txt (3.67 kB)

2020-05-27 09:23:24

by Alex Guzman

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Tue, 2020-05-26 at 14:33 -0700, Tadeusz Struk wrote:
> On 5/26/20 1:00 PM, James Bottomley wrote:
> > I don't think there is a root cause other than a TIS TPM is getting
> > annoyed by us cycling localities too rapidly because we don't do an
> > actual TPM operation between request and relinquish. Since the
> > first
> > request/relinquish seems unnecessary for the async case, moving the
> > ops
> > get eliminates the problem.
>
> Could be, so maybe we could try both patches.
> More debug info on the error path won't hurt.
> Thanks,
> Tadeusz

With your logging patch, I consistently see this message in dmesg when
tpm2_getcap fails:

tpm tpm0: request_locality: failed to request locality 0 after 750 ms

2020-05-27 09:23:49

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Tue, 2020-05-26 at 15:19 -0700, Alex Guzman wrote:
[...]
> When using your patch, I get a hang when trying to use tpm2_getcap,
> and dmesg shows some info.

Are you sure it's all applied? This

> [ 570.913803] tpm_tcg_write_bytes+0x2f/0x40
> [ 570.913805] release_locality+0x49/0x220
> [ 570.913807] tpm_relinquish_locality+0x1f/0x40
> [ 570.913808] tpm_chip_stop+0x21/0x40
> [ 570.913810] tpm_put_ops+0x9/0x30
> [ 570.913811] tpm_common_write+0x179/0x190
> [ 570.913813] vfs_write+0xb1/0x1a0

Implies an unmatched tpm_put_ops() in the async write path, as though
this hunk:

> @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file,
> const char __user *buf,
> if (file->f_flags & O_NONBLOCK) {
> priv->command_enqueued = true;
> queue_work(tpm_dev_wq, &priv->async_work);
> - tpm_put_ops(priv->chip);
> mutex_unlock(&priv->buffer_mutex);
> return size;
> }

Is missing. I actually booted the patch in my TPM based VM and it all
seems to work OK when I execute tpm2_getcap (I verified it's using
O_NONBLOCK) and tssgetcapability in sync mode.

James

2020-05-27 20:14:03

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote:
> On Tue, 2020-05-26 at 19:23 +0000, [email protected] wrote:
> > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote:
> > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897.
> > > >
> > > > This commit has caused regressions for the XPS 9560 containing
> > > > a Nuvoton TPM.
> > >
> > > Presumably this is using the tis driver?
> >
> > Correct.
> >
> > > > As mentioned by the reporter all TPM2 commands are failing with:
> > > > ERROR:tcti:src/tss2-tcti/tcti-
> > > > device.c:290:tcti_device_receive()
> > > > Failed to read response from fd 3, got errno 1: Operation not
> > > > permitted
> > > >
> > > > The reporter bisected this issue back to this commit which was
> > > > backported to stable as commit 4d6ebc4.
> > >
> > > I think the problem is request_locality ... for some inexplicable
> > > reason a failure there returns -1, which is EPERM to user space.
> > >
> > > That seems to be a bug in the async code since everything else
> > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one
> > > assumes it gives back a sensible return code.
> > >
> > > What I think is happening is that with the patch the TPM goes
> > > through a quick sequence of request, relinquish, request,
> > > relinquish and it's the third request which is failing (likely
> > > timing out). Without the patch, the patch there's only one
> > > request,relinquish cycle because the ops are held while the async
> > > work is executed. I have a vague recollection that there is a
> > > problem with too many locality request in quick succession, but
> > > I'll defer to Jason, who I think understands the intricacies of
> > > localities better than I do.
> >
> > Thanks, I don't pretend to understand the nuances of this particular
> > code, but I was hoping that the request to revert got some attention
> > since Alex's kernel Bugzilla and message a few months ago to linux
> > integrity weren't.
> >
> > > If that's the problem, the solution looks simple enough: just move
> > > the ops get down because the priv state is already protected by the
> > > buffer mutex
> >
> > Yeah, if that works for Alex's situation it certainly sounds like a
> > better solution than reverting this patch as this patch actually does
> > fix a problem reported by Jeffrin originally.
> >
> > Could you propose a specific patch that Alex and Jeffrin can perhaps
> > both try?
>
> Um, what's wrong with the one I originally attached and which you quote
> below? It's only compile tested, but I think it will work, if the
> theory is correct.

Please send a legit patch, thanks.

/Jarkko

2020-05-27 20:23:33

by Mario Limonciello

[permalink] [raw]
Subject: RE: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

> -----Original Message-----
> From: Jarkko Sakkinen <[email protected]>
> Sent: Wednesday, May 27, 2020 3:09 PM
> To: James Bottomley; Limonciello, Mario; [email protected]; [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"
>
>
> [EXTERNAL EMAIL]
>
> On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote:
> > On Tue, 2020-05-26 at 19:23 +0000, [email protected] wrote:
> > > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote:
> > > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897.
> > > > >
> > > > > This commit has caused regressions for the XPS 9560 containing
> > > > > a Nuvoton TPM.
> > > >
> > > > Presumably this is using the tis driver?
> > >
> > > Correct.
> > >
> > > > > As mentioned by the reporter all TPM2 commands are failing with:
> > > > > ERROR:tcti:src/tss2-tcti/tcti-
> > > > > device.c:290:tcti_device_receive()
> > > > > Failed to read response from fd 3, got errno 1: Operation not
> > > > > permitted
> > > > >
> > > > > The reporter bisected this issue back to this commit which was
> > > > > backported to stable as commit 4d6ebc4.
> > > >
> > > > I think the problem is request_locality ... for some inexplicable
> > > > reason a failure there returns -1, which is EPERM to user space.
> > > >
> > > > That seems to be a bug in the async code since everything else
> > > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one
> > > > assumes it gives back a sensible return code.
> > > >
> > > > What I think is happening is that with the patch the TPM goes
> > > > through a quick sequence of request, relinquish, request,
> > > > relinquish and it's the third request which is failing (likely
> > > > timing out). Without the patch, the patch there's only one
> > > > request,relinquish cycle because the ops are held while the async
> > > > work is executed. I have a vague recollection that there is a
> > > > problem with too many locality request in quick succession, but
> > > > I'll defer to Jason, who I think understands the intricacies of
> > > > localities better than I do.
> > >
> > > Thanks, I don't pretend to understand the nuances of this particular
> > > code, but I was hoping that the request to revert got some attention
> > > since Alex's kernel Bugzilla and message a few months ago to linux
> > > integrity weren't.
> > >
> > > > If that's the problem, the solution looks simple enough: just move
> > > > the ops get down because the priv state is already protected by the
> > > > buffer mutex
> > >
> > > Yeah, if that works for Alex's situation it certainly sounds like a
> > > better solution than reverting this patch as this patch actually does
> > > fix a problem reported by Jeffrin originally.
> > >
> > > Could you propose a specific patch that Alex and Jeffrin can perhaps
> > > both try?
> >
> > Um, what's wrong with the one I originally attached and which you quote
> > below? It's only compile tested, but I think it will work, if the
> > theory is correct.
>
> Please send a legit patch, thanks.
>
> /Jarkko

Jarkko,

After the confirmation from Alex that this patch attached to the end of the thread
worked, James did send a proper patch that can be accessed here:
https://lore.kernel.org/linux-integrity/20200527155800.ya43xm2ltuwduwjg@cantor/T/#t

Thanks,

2020-05-28 00:32:58

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Tue, May 26, 2020 at 12:39:37PM -0700, Tadeusz Struk wrote:
> On 5/26/20 12:14 PM, James Bottomley wrote:
> > + /* atomic tpm command send and result receive. We only hold the ops
> > + * lock during this period so that the tpm can be unregistered even if
> > + * the char dev is held open.
> > + */
> > + if (tpm_try_get_ops(priv->chip)) {
> > + ret = -EPIPE;
> > + goto out;
> > + }
> > +
> Hi James,
> This won't help if the message is read by an async tcti. If the problem lies
> in the chip get locality code, perhaps this could help to debug the root-cause
> instead of masking it out in the upper layer code:

What is TCTI and async TCTI? Not following.

> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 2435216bd10a..da5ecd0376bf 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -202,20 +202,22 @@ static int request_locality(struct tpm_chip *chip, int l)
> return rc;
>
> stop = jiffies + chip->timeout_a;
> + timeout = stop - jiffies;
>
> if (chip->flags & TPM_CHIP_FLAG_IRQ) {
> again:
> timeout = stop - jiffies;
> if ((long)timeout <= 0)
> - return -1;
> + goto out;
> +
> rc = wait_event_interruptible_timeout(priv->int_queue,
> - (check_locality
> - (chip, l)),
> + check_locality(chip, l),
> timeout);
> if (rc > 0)
> return l;
> if (rc == -ERESTARTSYS && freezing(current)) {
> clear_thread_flag(TIF_SIGPENDING);
> + timeout = stop - jiffies;
> goto again;
> }
> } else {
> @@ -226,6 +228,10 @@ static int request_locality(struct tpm_chip *chip, int l)
> tpm_msleep(TPM_TIMEOUT);
> } while (time_before(jiffies, stop));
> }
> +out:
> + dev_warn(&chip->dev, "%s: failed to request locality %d after %lu ms\n",
> + __func__, l, timeout * HZ / 1000);
> +
> return -1;
> }
>

/Jarkko

2020-05-28 00:46:15

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Wed, May 27, 2020 at 08:18:56PM +0000, [email protected] wrote:
> > -----Original Message-----
> > From: Jarkko Sakkinen <[email protected]>
> > Sent: Wednesday, May 27, 2020 3:09 PM
> > To: James Bottomley; Limonciello, Mario; [email protected]; [email protected]
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"
> >
> >
> > [EXTERNAL EMAIL]

What is this?

> > On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote:
> > > On Tue, 2020-05-26 at 19:23 +0000, [email protected] wrote:
> > > > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote:
> > > > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897.
> > > > > >
> > > > > > This commit has caused regressions for the XPS 9560 containing
> > > > > > a Nuvoton TPM.
> > > > >
> > > > > Presumably this is using the tis driver?
> > > >
> > > > Correct.
> > > >
> > > > > > As mentioned by the reporter all TPM2 commands are failing with:
> > > > > > ERROR:tcti:src/tss2-tcti/tcti-
> > > > > > device.c:290:tcti_device_receive()
> > > > > > Failed to read response from fd 3, got errno 1: Operation not
> > > > > > permitted
> > > > > >
> > > > > > The reporter bisected this issue back to this commit which was
> > > > > > backported to stable as commit 4d6ebc4.
> > > > >
> > > > > I think the problem is request_locality ... for some inexplicable
> > > > > reason a failure there returns -1, which is EPERM to user space.
> > > > >
> > > > > That seems to be a bug in the async code since everything else
> > > > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one
> > > > > assumes it gives back a sensible return code.
> > > > >
> > > > > What I think is happening is that with the patch the TPM goes
> > > > > through a quick sequence of request, relinquish, request,
> > > > > relinquish and it's the third request which is failing (likely
> > > > > timing out). Without the patch, the patch there's only one
> > > > > request,relinquish cycle because the ops are held while the async
> > > > > work is executed. I have a vague recollection that there is a
> > > > > problem with too many locality request in quick succession, but
> > > > > I'll defer to Jason, who I think understands the intricacies of
> > > > > localities better than I do.
> > > >
> > > > Thanks, I don't pretend to understand the nuances of this particular
> > > > code, but I was hoping that the request to revert got some attention
> > > > since Alex's kernel Bugzilla and message a few months ago to linux
> > > > integrity weren't.
> > > >
> > > > > If that's the problem, the solution looks simple enough: just move
> > > > > the ops get down because the priv state is already protected by the
> > > > > buffer mutex
> > > >
> > > > Yeah, if that works for Alex's situation it certainly sounds like a
> > > > better solution than reverting this patch as this patch actually does
> > > > fix a problem reported by Jeffrin originally.
> > > >
> > > > Could you propose a specific patch that Alex and Jeffrin can perhaps
> > > > both try?
> > >
> > > Um, what's wrong with the one I originally attached and which you quote
> > > below? It's only compile tested, but I think it will work, if the
> > > theory is correct.
> >
> > Please send a legit patch, thanks.
> >
> > /Jarkko
>
> Jarkko,
>
> After the confirmation from Alex that this patch attached to the end of the thread
> worked, James did send a proper patch that can be accessed here:
> https://lore.kernel.org/linux-integrity/20200527155800.ya43xm2ltuwduwjg@cantor/T/#t
>
> Thanks,

Hi thanks a lot! I did read the full discussions and agree with the
conclusions as I get a patch in proper form.

Please ping next time a bit earlier. It's not that I don't want to deal
with the issues quickly as possible. It's probably just that I've forgot
something or missed.

/Jarkko

2020-05-28 01:02:36

by Mario Limonciello

[permalink] [raw]
Subject: RE: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

> > > [EXTERNAL EMAIL]
>
> What is this?

Something my employer's mail system automatically tags in external email.

My mistakes in forgetting to remove it on the response.

>
> > > On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote:
> > > > On Tue, 2020-05-26 at 19:23 +0000, [email protected] wrote:
> > > > > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote:
> > > > > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897.
> > > > > > >
> > > > > > > This commit has caused regressions for the XPS 9560 containing
> > > > > > > a Nuvoton TPM.
> > > > > >
> > > > > > Presumably this is using the tis driver?
> > > > >
> > > > > Correct.
> > > > >
> > > > > > > As mentioned by the reporter all TPM2 commands are failing with:
> > > > > > > ERROR:tcti:src/tss2-tcti/tcti-
> > > > > > > device.c:290:tcti_device_receive()
> > > > > > > Failed to read response from fd 3, got errno 1: Operation not
> > > > > > > permitted
> > > > > > >
> > > > > > > The reporter bisected this issue back to this commit which was
> > > > > > > backported to stable as commit 4d6ebc4.
> > > > > >
> > > > > > I think the problem is request_locality ... for some inexplicable
> > > > > > reason a failure there returns -1, which is EPERM to user space.
> > > > > >
> > > > > > That seems to be a bug in the async code since everything else
> > > > > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one
> > > > > > assumes it gives back a sensible return code.
> > > > > >
> > > > > > What I think is happening is that with the patch the TPM goes
> > > > > > through a quick sequence of request, relinquish, request,
> > > > > > relinquish and it's the third request which is failing (likely
> > > > > > timing out). Without the patch, the patch there's only one
> > > > > > request,relinquish cycle because the ops are held while the async
> > > > > > work is executed. I have a vague recollection that there is a
> > > > > > problem with too many locality request in quick succession, but
> > > > > > I'll defer to Jason, who I think understands the intricacies of
> > > > > > localities better than I do.
> > > > >
> > > > > Thanks, I don't pretend to understand the nuances of this particular
> > > > > code, but I was hoping that the request to revert got some attention
> > > > > since Alex's kernel Bugzilla and message a few months ago to linux
> > > > > integrity weren't.
> > > > >
> > > > > > If that's the problem, the solution looks simple enough: just move
> > > > > > the ops get down because the priv state is already protected by the
> > > > > > buffer mutex
> > > > >
> > > > > Yeah, if that works for Alex's situation it certainly sounds like a
> > > > > better solution than reverting this patch as this patch actually does
> > > > > fix a problem reported by Jeffrin originally.
> > > > >
> > > > > Could you propose a specific patch that Alex and Jeffrin can perhaps
> > > > > both try?
> > > >
> > > > Um, what's wrong with the one I originally attached and which you quote
> > > > below? It's only compile tested, but I think it will work, if the
> > > > theory is correct.
> > >
> > > Please send a legit patch, thanks.
> > >
> > > /Jarkko
> >
> > Jarkko,
> >
> > After the confirmation from Alex that this patch attached to the end of the
> thread
> > worked, James did send a proper patch that can be accessed here:
> > https://lore.kernel.org/linux-
> integrity/20200527155800.ya43xm2ltuwduwjg@cantor/T/#t
> >
> > Thanks,
>
> Hi thanks a lot! I did read the full discussions and agree with the
> conclusions as I get a patch in proper form.
>
> Please ping next time a bit earlier. It's not that I don't want to deal
> with the issues quickly as possible. It's probably just that I've forgot
> something or missed.
>
> /Jarkko

Thanks!

I completely forgot about it too, it was mentioned to me right after holidays
and I forgot to follow up and see that it got sorted.

2020-05-28 04:44:31

by Tadeusz Struk

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On 5/27/20 5:30 PM, Jarkko Sakkinen wrote:
>> This won't help if the message is read by an async tcti. If the problem lies
>> in the chip get locality code, perhaps this could help to debug the root-cause
>> instead of masking it out in the upper layer code:
> What is TCTI and async TCTI? Not following.

TPM Command Transmission Interface (TCTI) as defined by TCG in
https://trustedcomputinggroup.org/resource/tss-tcti-specification/

the reason we added the O_NONBLOCK mode was to satisfy the TCG spec for async TCTI.

Thanks,
Tadeusz

2020-05-28 06:55:45

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Thu, May 28, 2020 at 12:59:59AM +0000, [email protected] wrote:
> > > > [EXTERNAL EMAIL]
> >
> > What is this?
>
> Something my employer's mail system automatically tags in external email.
>
> My mistakes in forgetting to remove it on the response.

NP, just asking :-)

> > > > On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote:
> > > > > On Tue, 2020-05-26 at 19:23 +0000, [email protected] wrote:
> > > > > > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote:
> > > > > > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897.
> > > > > > > >
> > > > > > > > This commit has caused regressions for the XPS 9560 containing
> > > > > > > > a Nuvoton TPM.
> > > > > > >
> > > > > > > Presumably this is using the tis driver?
> > > > > >
> > > > > > Correct.
> > > > > >
> > > > > > > > As mentioned by the reporter all TPM2 commands are failing with:
> > > > > > > > ERROR:tcti:src/tss2-tcti/tcti-
> > > > > > > > device.c:290:tcti_device_receive()
> > > > > > > > Failed to read response from fd 3, got errno 1: Operation not
> > > > > > > > permitted
> > > > > > > >
> > > > > > > > The reporter bisected this issue back to this commit which was
> > > > > > > > backported to stable as commit 4d6ebc4.
> > > > > > >
> > > > > > > I think the problem is request_locality ... for some inexplicable
> > > > > > > reason a failure there returns -1, which is EPERM to user space.
> > > > > > >
> > > > > > > That seems to be a bug in the async code since everything else
> > > > > > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one
> > > > > > > assumes it gives back a sensible return code.
> > > > > > >
> > > > > > > What I think is happening is that with the patch the TPM goes
> > > > > > > through a quick sequence of request, relinquish, request,
> > > > > > > relinquish and it's the third request which is failing (likely
> > > > > > > timing out). Without the patch, the patch there's only one
> > > > > > > request,relinquish cycle because the ops are held while the async
> > > > > > > work is executed. I have a vague recollection that there is a
> > > > > > > problem with too many locality request in quick succession, but
> > > > > > > I'll defer to Jason, who I think understands the intricacies of
> > > > > > > localities better than I do.
> > > > > >
> > > > > > Thanks, I don't pretend to understand the nuances of this particular
> > > > > > code, but I was hoping that the request to revert got some attention
> > > > > > since Alex's kernel Bugzilla and message a few months ago to linux
> > > > > > integrity weren't.
> > > > > >
> > > > > > > If that's the problem, the solution looks simple enough: just move
> > > > > > > the ops get down because the priv state is already protected by the
> > > > > > > buffer mutex
> > > > > >
> > > > > > Yeah, if that works for Alex's situation it certainly sounds like a
> > > > > > better solution than reverting this patch as this patch actually does
> > > > > > fix a problem reported by Jeffrin originally.
> > > > > >
> > > > > > Could you propose a specific patch that Alex and Jeffrin can perhaps
> > > > > > both try?
> > > > >
> > > > > Um, what's wrong with the one I originally attached and which you quote
> > > > > below? It's only compile tested, but I think it will work, if the
> > > > > theory is correct.
> > > >
> > > > Please send a legit patch, thanks.
> > > >
> > > > /Jarkko
> > >
> > > Jarkko,
> > >
> > > After the confirmation from Alex that this patch attached to the end of the
> > thread
> > > worked, James did send a proper patch that can be accessed here:
> > > https://lore.kernel.org/linux-
> > integrity/20200527155800.ya43xm2ltuwduwjg@cantor/T/#t
> > >
> > > Thanks,
> >
> > Hi thanks a lot! I did read the full discussions and agree with the
> > conclusions as I get a patch in proper form.
> >
> > Please ping next time a bit earlier. It's not that I don't want to deal
> > with the issues quickly as possible. It's probably just that I've forgot
> > something or missed.
> >
> > /Jarkko
>
> Thanks!
>
> I completely forgot about it too, it was mentioned to me right after holidays
> and I forgot to follow up and see that it got sorted.

Yeah, sure, lets try to get a fix landed asap :-)

/Jarkko

2020-05-28 06:59:43

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

On Wed, May 27, 2020 at 09:40:25PM -0700, Tadeusz Struk wrote:
> On 5/27/20 5:30 PM, Jarkko Sakkinen wrote:
> >> This won't help if the message is read by an async tcti. If the problem lies
> >> in the chip get locality code, perhaps this could help to debug the root-cause
> >> instead of masking it out in the upper layer code:
> > What is TCTI and async TCTI? Not following.
>
> TPM Command Transmission Interface (TCTI) as defined by TCG in
> https://trustedcomputinggroup.org/resource/tss-tcti-specification/
>
> the reason we added the O_NONBLOCK mode was to satisfy the TCG spec for async TCTI.
>
> Thanks,
> Tadeusz

OK, thanks recalling.

/Jarkko