2021-03-29 06:11:27

by Dan Carpenter

[permalink] [raw]
Subject: [PATCH 1/2] thunderbolt: Fix a leak in tb_retimer_add()

After the device_register() succeeds, then the correct way to clean up
is to call device_unregister(). The unregister calls both device_del()
and device_put(). Since this code was only device_del() it results in
a memory leak.

Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers")
Signed-off-by: Dan Carpenter <[email protected]>
---
This is from a new static checker warning. Not tested. With new
warnings it's also possible that I have misunderstood something
fundamental so review carefully etc.

drivers/thunderbolt/retimer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thunderbolt/retimer.c b/drivers/thunderbolt/retimer.c
index 620bcf586ee2..7a5d61604c8b 100644
--- a/drivers/thunderbolt/retimer.c
+++ b/drivers/thunderbolt/retimer.c
@@ -347,7 +347,7 @@ static int tb_retimer_add(struct tb_port *port, u8 index, u32 auth_status)
ret = tb_retimer_nvm_add(rt);
if (ret) {
dev_err(&rt->dev, "failed to add NVM devices: %d\n", ret);
- device_del(&rt->dev);
+ device_unregister(&rt->dev);
return ret;
}

--
2.30.2


2021-03-29 13:04:10

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/2] thunderbolt: Fix a leak in tb_retimer_add()

On Mon, Mar 29, 2021 at 09:07:18AM +0300, Dan Carpenter wrote:
> After the device_register() succeeds, then the correct way to clean up
> is to call device_unregister(). The unregister calls both device_del()
> and device_put(). Since this code was only device_del() it results in
> a memory leak.
>
> Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers")
> Signed-off-by: Dan Carpenter <[email protected]>
> ---
> This is from a new static checker warning. Not tested. With new
> warnings it's also possible that I have misunderstood something
> fundamental so review carefully etc.

It looks OK to me

Reviewed-by: Jason Gunthorpe <[email protected]>

This also highlights the code has an ordering issue too, it calls
device_register() then goes to do tb_retimer_nvm_add() however
device_register() makes sysfs attributes visible before the rt->nvm is
initialized and this:

static ssize_t nvm_authenticate_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
{
if (!rt->nvm) {

Isn't strong enough to close the potential racing. The nvm should be
setup before device_register and all the above tests in the sysfs
deleted so we can rely on the CPU barriers built into
device_register() for correctness.

[which is a general tip, be very suspicious if device_register() is
being error unwound]

Jason

2021-03-29 14:45:16

by Mika Westerberg

[permalink] [raw]
Subject: Re: [PATCH 1/2] thunderbolt: Fix a leak in tb_retimer_add()

Hi,

On Mon, Mar 29, 2021 at 10:02:20AM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 29, 2021 at 09:07:18AM +0300, Dan Carpenter wrote:
> > After the device_register() succeeds, then the correct way to clean up
> > is to call device_unregister(). The unregister calls both device_del()
> > and device_put(). Since this code was only device_del() it results in
> > a memory leak.
> >
> > Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers")
> > Signed-off-by: Dan Carpenter <[email protected]>
> > ---
> > This is from a new static checker warning. Not tested. With new
> > warnings it's also possible that I have misunderstood something
> > fundamental so review carefully etc.
>
> It looks OK to me

I agree too.

> Reviewed-by: Jason Gunthorpe <[email protected]>

Thanks for the review!

> This also highlights the code has an ordering issue too, it calls
> device_register() then goes to do tb_retimer_nvm_add() however
> device_register() makes sysfs attributes visible before the rt->nvm is
> initialized and this:
>
> static ssize_t nvm_authenticate_store(struct device *dev,
> struct device_attribute *attr, const char *buf, size_t count)
> {
> if (!rt->nvm) {
>
> Isn't strong enough to close the potential racing. The nvm should be
> setup before device_register and all the above tests in the sysfs
> deleted so we can rely on the CPU barriers built into
> device_register() for correctness.
>
> [which is a general tip, be very suspicious if device_register() is
> being error unwound]

The nvm is a separate (physical Linux) device that gets added under this
one. It cannot be added before AFAICT.

The code you refer actually looks like this:

static ssize_t nvm_authenticate_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
{
...
if (!mutex_trylock(&rt->tb->lock)) {
ret = restart_syscall();
goto exit_rpm;
}

if (!rt->nvm) {
ret = -EAGAIN;
goto exit_unlock;
}


Idea here is that if the NVMem (nvm) is not yet registered the attribute is
there but we return -EAGAIN to the userspace.

2021-03-29 14:55:58

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/2] thunderbolt: Fix a leak in tb_retimer_add()

On Mon, Mar 29, 2021 at 05:43:23PM +0300, Mika Westerberg wrote:

> The nvm is a separate (physical Linux) device that gets added under this
> one. It cannot be added before AFAICT.

Hum, yes, but then it is odd that a parent is holding sysfs attributes
that refer to a child.

> The code you refer actually looks like this:
>
> static ssize_t nvm_authenticate_store(struct device *dev,
> struct device_attribute *attr, const char *buf, size_t count)
> {
> ...
> if (!mutex_trylock(&rt->tb->lock)) {
> ret = restart_syscall();
> goto exit_rpm;
> }

Is that lock held during tb_retimer_nvm_add() I looked for a bit and
didn't find something. So someplace more than 4 call site above
mandatory locking is being held?

static void tb_retimer_remove(struct tb_retimer *rt)
{
dev_info(&rt->dev, "retimer disconnected\n");
tb_nvm_free(rt->nvm);
device_unregister(&rt->dev);
}

Here too?

And this is why it is all trylock because it deadlocks with unregister
otherwise?

Jason

2021-03-29 15:08:05

by Mika Westerberg

[permalink] [raw]
Subject: Re: [PATCH 1/2] thunderbolt: Fix a leak in tb_retimer_add()

On Mon, Mar 29, 2021 at 11:54:05AM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 29, 2021 at 05:43:23PM +0300, Mika Westerberg wrote:
>
> > The nvm is a separate (physical Linux) device that gets added under this
> > one. It cannot be added before AFAICT.
>
> Hum, yes, but then it is odd that a parent is holding sysfs attributes
> that refer to a child.

Well the child (NVMem) comes from completely different subsystem that
does not have a concept of "authentication" or anythin similar. This is
what we add on top. We actually exposer two NVMem devices under each
retimer: one that is the current active one, and then the one that is
used to write the new firmware image.

> > The code you refer actually looks like this:
> >
> > static ssize_t nvm_authenticate_store(struct device *dev,
> > struct device_attribute *attr, const char *buf, size_t count)
> > {
> > ...
> > if (!mutex_trylock(&rt->tb->lock)) {
> > ret = restart_syscall();
> > goto exit_rpm;
> > }
>
> Is that lock held during tb_retimer_nvm_add() I looked for a bit and
> didn't find something. So someplace more than 4 call site above
> mandatory locking is being held?

Yes it is. It is called from tb_scan_port() where that lock is held.

> static void tb_retimer_remove(struct tb_retimer *rt)
> {
> dev_info(&rt->dev, "retimer disconnected\n");
> tb_nvm_free(rt->nvm);
> device_unregister(&rt->dev);
> }
>
> Here too?

Yes.

> And this is why it is all trylock because it deadlocks with unregister
> otherwise?

I tried to explain it in 09f11b6c99fe ("thunderbolt: Take domain lock in
switch sysfs attribute callbacks"), except that at that time we did not
have retimers exposed but the same applies here too.

2021-03-30 10:43:18

by Mika Westerberg

[permalink] [raw]
Subject: Re: [PATCH 1/2] thunderbolt: Fix a leak in tb_retimer_add()

On Mon, Mar 29, 2021 at 09:07:18AM +0300, Dan Carpenter wrote:
> After the device_register() succeeds, then the correct way to clean up
> is to call device_unregister(). The unregister calls both device_del()
> and device_put(). Since this code was only device_del() it results in
> a memory leak.
>
> Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers")
> Signed-off-by: Dan Carpenter <[email protected]>
> ---
> This is from a new static checker warning. Not tested. With new
> warnings it's also possible that I have misunderstood something
> fundamental so review carefully etc.
>
> drivers/thunderbolt/retimer.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

Both patches applied to thunderbolt.git/fixes. Thanks Dan!