2021-02-05 01:56:36

by Lino Sanfilippo

[permalink] [raw]
Subject: [PATCH v3 0/2] TPM fixes

Changes in v3
- drop the patch that introduces the new function tpm_chip_free()
- rework the commit messages for the patches (style, typos, etc.)
- add fixes tag to patch 2
- add James Bottomley to cc list
- add stable mailing list to cc list

Changes in v2:
- drop the patch that erroneously cleaned up after failed installation of
an action handler in tpmm_chip_alloc() (pointed out by Jarkko Sakkinen)
- make the commit message for patch 1 more detailed
- add fixes tags and kernel logs

Lino Sanfilippo (2):
tpm: fix reference counting for struct tpm_chip
tpm: in tpm2_del_space check if ops pointer is still valid

drivers/char/tpm/tpm-chip.c | 18 +++++++++++++++---
drivers/char/tpm/tpm2-space.c | 15 ++++++++++-----
drivers/char/tpm/tpm_ftpm_tee.c | 2 ++
drivers/char/tpm/tpm_vtpm_proxy.c | 1 +
4 files changed, 28 insertions(+), 8 deletions(-)

--
2.7.4


2021-02-05 02:36:56

by Lino Sanfilippo

[permalink] [raw]
Subject: [PATCH v3 1/2] tpm: fix reference counting for struct tpm_chip

From: Lino Sanfilippo <[email protected]>

The following sequence of operations results in a refcount warning:

1. Open device /dev/tpmrm
2. Remove module tpm_tis_spi
3. Write a TPM command to the file descriptor opened at step 1.

------------[ cut here ]------------
WARNING: CPU: 3 PID: 1161 at lib/refcount.c:25 kobject_get+0xa0/0xa4
refcount_t: addition on 0; use-after-free.
Modules linked in: tpm_tis_spi tpm_tis_core tpm mdio_bcm_unimac brcmfmac
sha256_generic libsha256 sha256_arm hci_uart btbcm bluetooth cfg80211 vc4
brcmutil ecdh_generic ecc snd_soc_core crc32_arm_ce libaes
raspberrypi_hwmon ac97_bus snd_pcm_dmaengine bcm2711_thermal snd_pcm
snd_timer genet snd phy_generic soundcore [last unloaded: spi_bcm2835]
CPU: 3 PID: 1161 Comm: hold_open Not tainted 5.10.0ls-main-dirty #2
Hardware name: BCM2711
[<c0410c3c>] (unwind_backtrace) from [<c040b580>] (show_stack+0x10/0x14)
[<c040b580>] (show_stack) from [<c1092174>] (dump_stack+0xc4/0xd8)
[<c1092174>] (dump_stack) from [<c0445a30>] (__warn+0x104/0x108)
[<c0445a30>] (__warn) from [<c0445aa8>] (warn_slowpath_fmt+0x74/0xb8)
[<c0445aa8>] (warn_slowpath_fmt) from [<c08435d0>] (kobject_get+0xa0/0xa4)
[<c08435d0>] (kobject_get) from [<bf0a715c>] (tpm_try_get_ops+0x14/0x54 [tpm])
[<bf0a715c>] (tpm_try_get_ops [tpm]) from [<bf0a7d6c>] (tpm_common_write+0x38/0x60 [tpm])
[<bf0a7d6c>] (tpm_common_write [tpm]) from [<c05a7ac0>] (vfs_write+0xc4/0x3c0)
[<c05a7ac0>] (vfs_write) from [<c05a7ee4>] (ksys_write+0x58/0xcc)
[<c05a7ee4>] (ksys_write) from [<c04001a0>] (ret_fast_syscall+0x0/0x4c)
Exception stack(0xc226bfa8 to 0xc226bff0)
bfa0: 00000000 000105b4 00000003 beafe664 00000014 00000000
bfc0: 00000000 000105b4 000103f8 00000004 00000000 00000000 b6f9c000 beafe684
bfe0: 0000006c beafe648 0001056c b6eb6944
---[ end trace d4b8409def9b8b1f ]---

The reason for this warning is the attempt to get the chip->dev reference
in tpm_common_write() although the reference counter is already zero.

Since commit 8979b02aaf1d ("tpm: Fix reference count to main device") the
extra reference used to prevent a premature zero counter is never taken,
because the required TPM_CHIP_FLAG_TPM2 flag is never set.

Fix this by removing the flag condition.

Commit fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
already introduced function tpm_devs_release() to release the extra
reference but did not implement the required put on chip->devs that results
in the call of this function.

Fix this also by installing an action handler that puts chip->devs as soon
as the chip is unregistered.

Fixes: fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
Fixes: 8979b02aaf1d ("tpm: Fix reference count to main device")
Signed-off-by: Lino Sanfilippo <[email protected]>
---
drivers/char/tpm/tpm-chip.c | 18 +++++++++++++++---
drivers/char/tpm/tpm_ftpm_tee.c | 2 ++
drivers/char/tpm/tpm_vtpm_proxy.c | 1 +
3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index ddaeceb..3ace199 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -360,8 +360,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
* while cdevs is in use. The corresponding put
* is in the tpm_devs_release (TPM2 only)
*/
- if (chip->flags & TPM_CHIP_FLAG_TPM2)
- get_device(&chip->dev);
+ get_device(&chip->dev);

if (chip->dev_num == 0)
chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
@@ -422,8 +421,21 @@ struct tpm_chip *tpmm_chip_alloc(struct device *pdev,
rc = devm_add_action_or_reset(pdev,
(void (*)(void *)) put_device,
&chip->dev);
- if (rc)
+ if (rc) {
+ put_device(&chip->devs);
return ERR_PTR(rc);
+ }
+
+ rc = devm_add_action_or_reset(pdev,
+ (void (*)(void *)) put_device,
+ &chip->devs);
+ if (rc) {
+ devm_remove_action(pdev,
+ (void (*)(void *)) put_device,
+ &chip->dev);
+ put_device(&chip->dev);
+ return ERR_PTR(rc);
+ }

dev_set_drvdata(pdev, chip);

diff --git a/drivers/char/tpm/tpm_ftpm_tee.c b/drivers/char/tpm/tpm_ftpm_tee.c
index 2ccdf8a..82858c2 100644
--- a/drivers/char/tpm/tpm_ftpm_tee.c
+++ b/drivers/char/tpm/tpm_ftpm_tee.c
@@ -286,6 +286,7 @@ static int ftpm_tee_probe(struct device *dev)

out_chip:
put_device(&pvt_data->chip->dev);
+ put_device(&pvt_data->chip->devs);
out_chip_alloc:
tee_shm_free(pvt_data->shm);
out_shm_alloc:
@@ -318,6 +319,7 @@ static int ftpm_tee_remove(struct device *dev)
tpm_chip_unregister(pvt_data->chip);

/* frees chip */
+ put_device(&pvt_data->chip->devs);
put_device(&pvt_data->chip->dev);

/* Free the shared memory pool */
diff --git a/drivers/char/tpm/tpm_vtpm_proxy.c b/drivers/char/tpm/tpm_vtpm_proxy.c
index 91c772e3..97b60f8 100644
--- a/drivers/char/tpm/tpm_vtpm_proxy.c
+++ b/drivers/char/tpm/tpm_vtpm_proxy.c
@@ -520,6 +520,7 @@ static struct proxy_dev *vtpm_proxy_create_proxy_dev(void)
*/
static inline void vtpm_proxy_delete_proxy_dev(struct proxy_dev *proxy_dev)
{
+ put_device(&proxy_dev->chip->devs);
put_device(&proxy_dev->chip->dev); /* frees chip */
kfree(proxy_dev);
}
--
2.7.4

2021-02-05 06:53:15

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: fix reference counting for struct tpm_chip

On Fri, Feb 05, 2021 at 12:50:42AM +0100, Lino Sanfilippo wrote:
> From: Lino Sanfilippo <[email protected]>
>
> The following sequence of operations results in a refcount warning:
>
> 1. Open device /dev/tpmrm
> 2. Remove module tpm_tis_spi
> 3. Write a TPM command to the file descriptor opened at step 1.
>
> ------------[ cut here ]------------
> WARNING: CPU: 3 PID: 1161 at lib/refcount.c:25 kobject_get+0xa0/0xa4
> refcount_t: addition on 0; use-after-free.
> Modules linked in: tpm_tis_spi tpm_tis_core tpm mdio_bcm_unimac brcmfmac
> sha256_generic libsha256 sha256_arm hci_uart btbcm bluetooth cfg80211 vc4
> brcmutil ecdh_generic ecc snd_soc_core crc32_arm_ce libaes
> raspberrypi_hwmon ac97_bus snd_pcm_dmaengine bcm2711_thermal snd_pcm
> snd_timer genet snd phy_generic soundcore [last unloaded: spi_bcm2835]
> CPU: 3 PID: 1161 Comm: hold_open Not tainted 5.10.0ls-main-dirty #2
> Hardware name: BCM2711
> [<c0410c3c>] (unwind_backtrace) from [<c040b580>] (show_stack+0x10/0x14)
> [<c040b580>] (show_stack) from [<c1092174>] (dump_stack+0xc4/0xd8)
> [<c1092174>] (dump_stack) from [<c0445a30>] (__warn+0x104/0x108)
> [<c0445a30>] (__warn) from [<c0445aa8>] (warn_slowpath_fmt+0x74/0xb8)
> [<c0445aa8>] (warn_slowpath_fmt) from [<c08435d0>] (kobject_get+0xa0/0xa4)
> [<c08435d0>] (kobject_get) from [<bf0a715c>] (tpm_try_get_ops+0x14/0x54 [tpm])
> [<bf0a715c>] (tpm_try_get_ops [tpm]) from [<bf0a7d6c>] (tpm_common_write+0x38/0x60 [tpm])
> [<bf0a7d6c>] (tpm_common_write [tpm]) from [<c05a7ac0>] (vfs_write+0xc4/0x3c0)
> [<c05a7ac0>] (vfs_write) from [<c05a7ee4>] (ksys_write+0x58/0xcc)
> [<c05a7ee4>] (ksys_write) from [<c04001a0>] (ret_fast_syscall+0x0/0x4c)
> Exception stack(0xc226bfa8 to 0xc226bff0)
> bfa0: 00000000 000105b4 00000003 beafe664 00000014 00000000
> bfc0: 00000000 000105b4 000103f8 00000004 00000000 00000000 b6f9c000 beafe684
> bfe0: 0000006c beafe648 0001056c b6eb6944
> ---[ end trace d4b8409def9b8b1f ]---
>
> The reason for this warning is the attempt to get the chip->dev reference
> in tpm_common_write() although the reference counter is already zero.
>
> Since commit 8979b02aaf1d ("tpm: Fix reference count to main device") the
> extra reference used to prevent a premature zero counter is never taken,
> because the required TPM_CHIP_FLAG_TPM2 flag is never set.
>
> Fix this by removing the flag condition.
>
> Commit fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
> already introduced function tpm_devs_release() to release the extra
> reference but did not implement the required put on chip->devs that results
> in the call of this function.
>
> Fix this also by installing an action handler that puts chip->devs as soon
> as the chip is unregistered.
>
> Fixes: fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
> Fixes: 8979b02aaf1d ("tpm: Fix reference count to main device")
> Signed-off-by: Lino Sanfilippo <[email protected]>
> ---
> drivers/char/tpm/tpm-chip.c | 18 +++++++++++++++---
> drivers/char/tpm/tpm_ftpm_tee.c | 2 ++
> drivers/char/tpm/tpm_vtpm_proxy.c | 1 +
> 3 files changed, 18 insertions(+), 3 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

2021-02-05 16:16:01

by Lino Sanfilippo

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: fix reference counting for struct tpm_chip


On 05.02.21 16:15, Jason Gunthorpe wrote:
>
> No, the cdev layer holds the refcount on the device while open is
> being called.
>
> Jason
>

Yes, but the reference that is responsible for the chip deallocation is chip->dev
which is linked to chip->cdev and represents /dev/tpm, not /dev/tpmrm.
You are right, we dont have the issue with /dev/tpm for the reason you mentioned.
But /dev/tpmrm is represented by chip->cdevs and keeping this ref held by the cdev
layer wont protect us from the chip being freed (which is the reason why we need
the chip->dev reference in the first place).

And yes, the naming dev/devs/cdev/cdevs is quite confusing :(

Regards,
Lino

2021-02-05 21:53:46

by Lino Sanfilippo

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: fix reference counting for struct tpm_chip

On 05.02.21 at 16:58, Jason Gunthorpe wrote:
eference in the first place).
>
> No, they are all chained together because they are all in the same
> struct:
>
> struct tpm_chip {
> struct device dev;
> struct device devs;
> struct cdev cdev;
> struct cdev cdevs;
>
> dev holds the refcount on memory, when it goes 0 the whole thing is
> kfreed.
>
> The rule is dev's refcount can't go to zero while any other refcount
> is != 0.
>
> For instance devs holds a get on dev that is put back only when devs
> goes to 0:
>
> static void tpm_devs_release(struct device *dev)
> {
> struct tpm_chip *chip = container_of(dev, struct tpm_chip, devs);
>
> /* release the master device reference */
> put_device(&chip->dev);
> }
>
> Both cdev elements do something similar inside the cdev layer.

Well this chaining is exactly what does not work nowadays and what the patch is supposed
to fix: currently we dont ever take the extra ref (not even in TPM 2 case, note that
TPM_CHIP_FLAG_TMP2 is never set), so

- if (chip->flags & TPM_CHIP_FLAG_TPM2)
- get_device(&chip->dev);
+ get_device(&chip->dev);


and tpm_devs_release() is never called, since there is nothing that ever puts devs, so


+ rc = devm_add_action_or_reset(pdev,
+ (void (*)(void *)) put_device,
+ &chip->devs);


The race with only get_device()/putdevice() in tpm_common_open()/tpm_common_release() is:

1. tpm chip is allocated with dev refcount = 1, devs refcount = 1
2. /dev/tpmrm is opened but before we get the ref to dev in tpm_common() another thread
rmmmods the chip driver:
3. the chip is unregistered, dev is put with refcount = 0 and the whole chip struct is freed
3. Now open() proceeds, tries to grab the extra ref chip->dev from a chip that has already
been deallocated and the system crashes.

As I already wrote, that approach was my first thought, too, but since the result crashed due to the
race condition, I chose the approach in patch 1.

Regards,
Lino

> The net result is during any open() the tpm_chip is guarenteed to have
> a positive refcount.
>


2021-02-05 22:06:40

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: fix reference counting for struct tpm_chip

On Fri, Feb 05, 2021 at 03:55:09PM +0100, Lino Sanfilippo wrote:
> Hi,
>
> On 05.02.21 14:05, Jason Gunthorpe wrote:
>
> >>
> >> Commit fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
> >> already introduced function tpm_devs_release() to release the extra
> >> reference but did not implement the required put on chip->devs that results
> >> in the call of this function.
> >
> > Seems wonky, the devs is just supposed to be a side thing, nothing
> > should be using it as a primary reference count for a tpm.
> >
> > The bug here is only that tpm_common_open() did not get a kref on the
> > chip before putting it in priv and linking it to the fd. See the
> > comment before tpm_try_get_ops() indicating the caller must already
> > have taken care to ensure the chip is valid.
> >
> > This should be all you need to fix the oops:
> >
> > diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c
> > index 1784530b8387bb..1b738dca7fffb5 100644
> > +++ b/drivers/char/tpm/tpm-dev-common.c
> > @@ -105,6 +105,7 @@ static void tpm_timeout_work(struct work_struct *work)
> > void tpm_common_open(struct file *file, struct tpm_chip *chip,
> > struct file_priv *priv, struct tpm_space *space)
> > {
> > + get_device(&priv->chip.dev);
> > priv->chip = chip;
> > priv->space = space;
> > priv->response_read = true;
>
> This is racy, isnt it? The time between we open the file and we want to grab the
> reference in common_open() the chip can already be unregistered and freed.

No, the cdev layer holds the refcount on the device while open is
being called.

Jason

2021-02-05 22:43:11

by Lino Sanfilippo

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: fix reference counting for struct tpm_chip

Hi,

On 05.02.21 14:05, Jason Gunthorpe wrote:

>>
>> Commit fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
>> already introduced function tpm_devs_release() to release the extra
>> reference but did not implement the required put on chip->devs that results
>> in the call of this function.
>
> Seems wonky, the devs is just supposed to be a side thing, nothing
> should be using it as a primary reference count for a tpm.
>
> The bug here is only that tpm_common_open() did not get a kref on the
> chip before putting it in priv and linking it to the fd. See the
> comment before tpm_try_get_ops() indicating the caller must already
> have taken care to ensure the chip is valid.
>
> This should be all you need to fix the oops:
>
> diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c
> index 1784530b8387bb..1b738dca7fffb5 100644
> --- a/drivers/char/tpm/tpm-dev-common.c
> +++ b/drivers/char/tpm/tpm-dev-common.c
> @@ -105,6 +105,7 @@ static void tpm_timeout_work(struct work_struct *work)
> void tpm_common_open(struct file *file, struct tpm_chip *chip,
> struct file_priv *priv, struct tpm_space *space)
> {
> + get_device(&priv->chip.dev);
> priv->chip = chip;
> priv->space = space;
> priv->response_read = true;

This is racy, isnt it? The time between we open the file and we want to grab the
reference in common_open() the chip can already be unregistered and freed.

As a matter of fact this solution was the first thing that came into my mind, too,
until I noticed the possible race condition. I can only guess that this was what
James had in mind when he chose to take the extra reference to chip->dev in
tpm_chip_alloc() instead of common_open().


>> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
>> index ddaeceb..3ace199 100644
>> +++ b/drivers/char/tpm/tpm-chip.c
>> @@ -360,8 +360,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
>> * while cdevs is in use. The corresponding put
>> * is in the tpm_devs_release (TPM2 only)
>> */
>> - if (chip->flags & TPM_CHIP_FLAG_TPM2)
>> - get_device(&chip->dev);
>> + get_device(&chip->dev);
>>
>> if (chip->dev_num == 0)
>> chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
>> @@ -422,8 +421,21 @@ struct tpm_chip *tpmm_chip_alloc(struct device *pdev,
>> rc = devm_add_action_or_reset(pdev,
>> (void (*)(void *)) put_device,
>> &chip->dev);
>> - if (rc)
>> + if (rc) {
>> + put_device(&chip->devs);
>> return ERR_PTR(rc);
>
> This isn't right read what 'or_reset' does
>

In case of failure installing the action handler devm_add_action_or_reset() puts
chip->dev for us. But we also have put chip->devs since we have retrieved a
reference to both chip->dev and chip->devs. Or do I miss something here?

> Jason
>

Regards,
Lino

2021-02-05 23:45:24

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: fix reference counting for struct tpm_chip

On Fri, Feb 05, 2021 at 04:50:13PM +0100, Lino Sanfilippo wrote:
>
> On 05.02.21 16:15, Jason Gunthorpe wrote:
> >
> > No, the cdev layer holds the refcount on the device while open is
> > being called.
> >
> Yes, but the reference that is responsible for the chip deallocation is chip->dev
> which is linked to chip->cdev and represents /dev/tpm, not /dev/tpmrm.
> You are right, we dont have the issue with /dev/tpm for the reason you mentioned.
> But /dev/tpmrm is represented by chip->cdevs and keeping this ref held by the cdev
> layer wont protect us from the chip being freed (which is the reason why we need
> the chip->dev reference in the first place).

No, they are all chained together because they are all in the same
struct:

struct tpm_chip {
struct device dev;
struct device devs;
struct cdev cdev;
struct cdev cdevs;

dev holds the refcount on memory, when it goes 0 the whole thing is
kfreed.

The rule is dev's refcount can't go to zero while any other refcount
is != 0.

For instance devs holds a get on dev that is put back only when devs
goes to 0:

static void tpm_devs_release(struct device *dev)
{
struct tpm_chip *chip = container_of(dev, struct tpm_chip, devs);

/* release the master device reference */
put_device(&chip->dev);
}

Both cdev elements do something similar inside the cdev layer.

The net result is during any open() the tpm_chip is guarenteed to have
a positive refcount.

Jason

2021-02-06 00:14:35

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: fix reference counting for struct tpm_chip

On Fri, Feb 05, 2021 at 12:50:42AM +0100, Lino Sanfilippo wrote:
> From: Lino Sanfilippo <[email protected]>
>
> The following sequence of operations results in a refcount warning:
>
> 1. Open device /dev/tpmrm
> 2. Remove module tpm_tis_spi
> 3. Write a TPM command to the file descriptor opened at step 1.
>
> WARNING: CPU: 3 PID: 1161 at lib/refcount.c:25 kobject_get+0xa0/0xa4
> refcount_t: addition on 0; use-after-free.
> Modules linked in: tpm_tis_spi tpm_tis_core tpm mdio_bcm_unimac brcmfmac
> sha256_generic libsha256 sha256_arm hci_uart btbcm bluetooth cfg80211 vc4
> brcmutil ecdh_generic ecc snd_soc_core crc32_arm_ce libaes
> raspberrypi_hwmon ac97_bus snd_pcm_dmaengine bcm2711_thermal snd_pcm
> snd_timer genet snd phy_generic soundcore [last unloaded: spi_bcm2835]
> CPU: 3 PID: 1161 Comm: hold_open Not tainted 5.10.0ls-main-dirty #2
> Hardware name: BCM2711
> [<c0410c3c>] (unwind_backtrace) from [<c040b580>] (show_stack+0x10/0x14)
> [<c040b580>] (show_stack) from [<c1092174>] (dump_stack+0xc4/0xd8)
> [<c1092174>] (dump_stack) from [<c0445a30>] (__warn+0x104/0x108)
> [<c0445a30>] (__warn) from [<c0445aa8>] (warn_slowpath_fmt+0x74/0xb8)
> [<c0445aa8>] (warn_slowpath_fmt) from [<c08435d0>] (kobject_get+0xa0/0xa4)
> [<c08435d0>] (kobject_get) from [<bf0a715c>] (tpm_try_get_ops+0x14/0x54 [tpm])
> [<bf0a715c>] (tpm_try_get_ops [tpm]) from [<bf0a7d6c>] (tpm_common_write+0x38/0x60 [tpm])
> [<bf0a7d6c>] (tpm_common_write [tpm]) from [<c05a7ac0>] (vfs_write+0xc4/0x3c0)
> [<c05a7ac0>] (vfs_write) from [<c05a7ee4>] (ksys_write+0x58/0xcc)
> [<c05a7ee4>] (ksys_write) from [<c04001a0>] (ret_fast_syscall+0x0/0x4c)
> Exception stack(0xc226bfa8 to 0xc226bff0)
> bfa0: 00000000 000105b4 00000003 beafe664 00000014 00000000
> bfc0: 00000000 000105b4 000103f8 00000004 00000000 00000000 b6f9c000 beafe684
> bfe0: 0000006c beafe648 0001056c b6eb6944
>
> The reason for this warning is the attempt to get the chip->dev reference
> in tpm_common_write() although the reference counter is already zero.


> Since commit 8979b02aaf1d ("tpm: Fix reference count to main device") the
> extra reference used to prevent a premature zero counter is never taken,
> because the required TPM_CHIP_FLAG_TPM2 flag is never set.
>
> Fix this by removing the flag condition.
>
> Commit fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
> already introduced function tpm_devs_release() to release the extra
> reference but did not implement the required put on chip->devs that results
> in the call of this function.

Seems wonky, the devs is just supposed to be a side thing, nothing
should be using it as a primary reference count for a tpm.

The bug here is only that tpm_common_open() did not get a kref on the
chip before putting it in priv and linking it to the fd. See the
comment before tpm_try_get_ops() indicating the caller must already
have taken care to ensure the chip is valid.

This should be all you need to fix the oops:

diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c
index 1784530b8387bb..1b738dca7fffb5 100644
--- a/drivers/char/tpm/tpm-dev-common.c
+++ b/drivers/char/tpm/tpm-dev-common.c
@@ -105,6 +105,7 @@ static void tpm_timeout_work(struct work_struct *work)
void tpm_common_open(struct file *file, struct tpm_chip *chip,
struct file_priv *priv, struct tpm_space *space)
{
+ get_device(&priv->chip.dev);
priv->chip = chip;
priv->space = space;
priv->response_read = true;
@@ -261,6 +262,7 @@ void tpm_common_release(struct file *file, struct file_priv *priv)
flush_work(&priv->timeout_work);
file->private_data = NULL;
priv->response_length = 0;
+ put_device(&chip->dev);
}

int __init tpm_dev_common_init(void)

> Fix this also by installing an action handler that puts chip->devs as soon
> as the chip is unregistered.
>
> Fixes: fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
> Fixes: 8979b02aaf1d ("tpm: Fix reference count to main device")
> Signed-off-by: Lino Sanfilippo <[email protected]>
> drivers/char/tpm/tpm-chip.c | 18 +++++++++++++++---
> drivers/char/tpm/tpm_ftpm_tee.c | 2 ++
> drivers/char/tpm/tpm_vtpm_proxy.c | 1 +
> 3 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> index ddaeceb..3ace199 100644
> +++ b/drivers/char/tpm/tpm-chip.c
> @@ -360,8 +360,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
> * while cdevs is in use. The corresponding put
> * is in the tpm_devs_release (TPM2 only)
> */
> - if (chip->flags & TPM_CHIP_FLAG_TPM2)
> - get_device(&chip->dev);
> + get_device(&chip->dev);
>
> if (chip->dev_num == 0)
> chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
> @@ -422,8 +421,21 @@ struct tpm_chip *tpmm_chip_alloc(struct device *pdev,
> rc = devm_add_action_or_reset(pdev,
> (void (*)(void *)) put_device,
> &chip->dev);
> - if (rc)
> + if (rc) {
> + put_device(&chip->devs);
> return ERR_PTR(rc);

This isn't right read what 'or_reset' does

Jason

2021-02-06 02:45:07

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] tpm: fix reference counting for struct tpm_chip

On Fri, Feb 05, 2021 at 10:50:02PM +0100, Lino Sanfilippo wrote:
> On 05.02.21 at 16:58, Jason Gunthorpe wrote:
> eference in the first place).
> >
> > No, they are all chained together because they are all in the same
> > struct:
> >
> > struct tpm_chip {
> > struct device dev;
> > struct device devs;
> > struct cdev cdev;
> > struct cdev cdevs;
> >
> > dev holds the refcount on memory, when it goes 0 the whole thing is
> > kfreed.
> >
> > The rule is dev's refcount can't go to zero while any other refcount
> > is != 0.
> >
> > For instance devs holds a get on dev that is put back only when devs
> > goes to 0:
> >
> > static void tpm_devs_release(struct device *dev)
> > {
> > struct tpm_chip *chip = container_of(dev, struct tpm_chip, devs);
> >
> > /* release the master device reference */
> > put_device(&chip->dev);
> > }
> >
> > Both cdev elements do something similar inside the cdev layer.
>
> Well this chaining is exactly what does not work nowadays and what the patch is supposed
> to fix: currently we dont ever take the extra ref (not even in TPM 2 case, note that
> TPM_CHIP_FLAG_TMP2 is never set), so
>
> - if (chip->flags & TPM_CHIP_FLAG_TPM2)
> - get_device(&chip->dev);
> + get_device(&chip->dev);

Oh, hah, yes that is busted up. The patch sketch I sent to James is
the right way to handle it, feel free to take it up

> and tpm_devs_release() is never called, since there is nothing that ever puts devs, so

Yes, that is a pre-existing memory leak

> The race with only get_device()/putdevice() in tpm_common_open()/tpm_common_release() is:

The refcount handling is busted up and not working the way it is
designed, when that is fixed there is no race.

Jason