2020-11-03 19:51:41

by Jeremy Cline

[permalink] [raw]
Subject: [PATCH 0/3] drm/nouveau: fix a use-after-free in postclose()

This series fixes a number of use-after-frees in nouveau's postclose()
handler. It was discovered by pointing IGT's core_hotunplug tests at a
nouveau device, but the steps to reproduce it are simple:

1. Open the device file
2. Unbind the driver or remove the device
3. Close the file opened in step 1.

During the device removal, the nouveau_drm structure is de-allocated,
but is dereferenced in the postclose() handler.

One obvious solution is to ensure all the operations in the postclose()
handler are valid by extending the lifetime of the nouveau_drm
structure. This is possible with the new devm_drm_dev_alloc() interface,
but the change is somewhat invasive so I thought it best to submit that
work separately.

Instead, we make use of the drm_dev_unplug() API, clean up all clients
in the device removal call, and check to make sure the device has not
been unplugged in the postclose() handler. While this does not enable
hot-unplug support for nouveau, it's enough to avoid crashing the kernel
and leads to all the core_hotunplug tests to pass.

Jeremy Cline (3):
drm/nouveau: use drm_dev_unplug() during device removal
drm/nouveau: Add a dedicated mutex for the clients list
drm/nouveau: clean up all clients on device removal

drivers/gpu/drm/nouveau/nouveau_drm.c | 39 +++++++++++++++++++++++----
drivers/gpu/drm/nouveau/nouveau_drv.h | 5 ++++
2 files changed, 39 insertions(+), 5 deletions(-)

--
2.28.0


2020-11-03 19:51:51

by Jeremy Cline

[permalink] [raw]
Subject: [PATCH 1/3] drm/nouveau: use drm_dev_unplug() during device removal

Nouveau does not currently support hot-unplugging, but it still makes
sense to switch from drm_dev_unregister() to drm_dev_unplug().
drm_dev_unplug() calls drm_dev_unregister() after marking the device as
unplugged, but only after any device critical sections are finished.

Since nouveau isn't using drm_dev_enter() and drm_dev_exit(), there are
no critical sections so this is nearly functionally equivalent. However,
the DRM layer does check to see if the device is unplugged, and if it is
returns appropriate error codes.

In the future nouveau can add critical sections in order to truly
support hot-unplugging.

Signed-off-by: Jeremy Cline <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_drm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index d141a5f004af..4fe4d664c5f2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -792,7 +792,7 @@ nouveau_drm_device_remove(struct drm_device *dev)
struct nvkm_client *client;
struct nvkm_device *device;

- drm_dev_unregister(dev);
+ drm_dev_unplug(dev);

dev->irq_enabled = false;
client = nvxx_client(&drm->client.base);
--
2.28.0

2020-11-03 19:53:49

by Jeremy Cline

[permalink] [raw]
Subject: [PATCH 2/3] drm/nouveau: Add a dedicated mutex for the clients list

Rather than protecting the nouveau_drm clients list with the lock within
the "client" nouveau_cli, add a dedicated lock to serialize access to
the list. This is both clearer and necessary to avoid lockdep being
upset with us when we need to iterate through all the clients in the
list and potentially lock their mutex, which is the same class as the
lock protecting the entire list.

Signed-off-by: Jeremy Cline <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_drm.c | 9 +++++----
drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++
2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 4fe4d664c5f2..d182b877258a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -557,6 +557,7 @@ nouveau_drm_device_init(struct drm_device *dev)
nvkm_dbgopt(nouveau_debug, "DRM");

INIT_LIST_HEAD(&drm->clients);
+ mutex_init(&drm->clients_lock);
spin_lock_init(&drm->tile.lock);

/* workaround an odd issue on nvc1 by disabling the device's
@@ -1089,9 +1090,9 @@ nouveau_drm_open(struct drm_device *dev, struct drm_file *fpriv)

fpriv->driver_priv = cli;

- mutex_lock(&drm->client.mutex);
+ mutex_lock(&drm->clients_lock);
list_add(&cli->head, &drm->clients);
- mutex_unlock(&drm->client.mutex);
+ mutex_unlock(&drm->clients_lock);

done:
if (ret && cli) {
@@ -1117,9 +1118,9 @@ nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
nouveau_abi16_fini(cli->abi16);
mutex_unlock(&cli->mutex);

- mutex_lock(&drm->client.mutex);
+ mutex_lock(&drm->clients_lock);
list_del(&cli->head);
- mutex_unlock(&drm->client.mutex);
+ mutex_unlock(&drm->clients_lock);

nouveau_cli_fini(cli);
kfree(cli);
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 9d04d1b36434..550e5f335146 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -141,6 +141,11 @@ struct nouveau_drm {

struct list_head clients;

+ /**
+ * @clients_lock: Protects access to the @clients list of &struct nouveau_cli.
+ */
+ struct mutex clients_lock;
+
u8 old_pm_cap;

struct {
--
2.28.0

2020-11-03 19:53:49

by Jeremy Cline

[permalink] [raw]
Subject: [PATCH 3/3] drm/nouveau: clean up all clients on device removal

The postclose handler can run after the device has been removed (or the
driver has been unbound) since userspace clients are free to hold the
file open the file as long as they want. Because the device removal
callback frees the entire nouveau_drm structure, any reference to it in
the postclose handler will result in a use-after-free.

To reproduce this, one must simply open the device file, unbind the
driver (or physically remove the device), and then close the device
file. This was found and can be reproduced easily with the IGT
core_hotunplug tests.

To avoid this, all clients are cleaned up in the device finialization
rather than deferring it to the postclose handler, and the postclose
handler is protected by a critical section which ensures the
drm_dev_unplug() and the postclose handler won't race.

This is not an ideal fix, since as I understand the proposed plan for
the kernel<->userspace interface for hotplug support, destroying the
client before the file is closed will cause problems. However, I believe
to properly fix this issue, the lifetime of the nouveau_drm structure
needs to be extended to match the drm_device, and this proved to be a
rather invasive change. Thus, I've broken this out so the fix can be
easily backported.

Signed-off-by: Jeremy Cline <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_drm.c | 30 +++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index d182b877258a..74fab777f4d0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -628,6 +628,7 @@ nouveau_drm_device_init(struct drm_device *dev)
static void
nouveau_drm_device_fini(struct drm_device *dev)
{
+ struct nouveau_cli *cli, *temp_cli;
struct nouveau_drm *drm = nouveau_drm(dev);

if (nouveau_pmops_runtime()) {
@@ -652,6 +653,24 @@ nouveau_drm_device_fini(struct drm_device *dev)
nouveau_ttm_fini(drm);
nouveau_vga_fini(drm);

+ /*
+ * There may be existing clients from as-yet unclosed files. For now,
+ * clean them up here rather than deferring until the file is closed,
+ * but this likely not correct if we want to support hot-unplugging
+ * properly.
+ */
+ mutex_lock(&drm->clients_lock);
+ list_for_each_entry_safe(cli, temp_cli, &drm->clients, head) {
+ list_del(&cli->head);
+ mutex_lock(&cli->mutex);
+ if (cli->abi16)
+ nouveau_abi16_fini(cli->abi16);
+ mutex_unlock(&cli->mutex);
+ nouveau_cli_fini(cli);
+ kfree(cli);
+ }
+ mutex_unlock(&drm->clients_lock);
+
nouveau_cli_fini(&drm->client);
nouveau_cli_fini(&drm->master);
nvif_parent_dtor(&drm->parent);
@@ -1110,6 +1129,16 @@ nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
{
struct nouveau_cli *cli = nouveau_cli(fpriv);
struct nouveau_drm *drm = nouveau_drm(dev);
+ int dev_index;
+
+ /*
+ * The device is gone, and as it currently stands all clients are
+ * cleaned up in the removal codepath. In the future this may change
+ * so that we can support hot-unplugging, but for now we immediately
+ * return to avoid a double-free situation.
+ */
+ if (!drm_dev_enter(dev, &dev_index))
+ return;

pm_runtime_get_sync(dev->dev);

@@ -1126,6 +1155,7 @@ nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
kfree(cli);
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);
+ drm_dev_exit(dev_index);
}

static const struct drm_ioctl_desc
--
2.28.0

2020-11-25 18:41:19

by Lyude Paul

[permalink] [raw]
Subject: Re: [PATCH 2/3] drm/nouveau: Add a dedicated mutex for the clients list

On Tue, 2020-11-03 at 14:49 -0500, Jeremy Cline wrote:
> Rather than protecting the nouveau_drm clients list with the lock within
> the "client" nouveau_cli, add a dedicated lock to serialize access to
> the list. This is both clearer and necessary to avoid lockdep being
> upset with us when we need to iterate through all the clients in the
> list and potentially lock their mutex, which is the same class as the
> lock protecting the entire list.
>
> Signed-off-by: Jeremy Cline <[email protected]>
> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 9 +++++----
>  drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++
>  2 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c
> b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 4fe4d664c5f2..d182b877258a 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -557,6 +557,7 @@ nouveau_drm_device_init(struct drm_device *dev)
>                 nvkm_dbgopt(nouveau_debug, "DRM");
>  
>         INIT_LIST_HEAD(&drm->clients);
> +       mutex_init(&drm->clients_lock);

Looks like you forgot to hook up mutex_destroy() somewhere. Note there's
actually plenty of code in nouveau right now that forgets to do this, but at
some point we should probably fix that and avoid adding more spots where there's
no mutex_destroy().

>         spin_lock_init(&drm->tile.lock);
>  
>         /* workaround an odd issue on nvc1 by disabling the device's
> @@ -1089,9 +1090,9 @@ nouveau_drm_open(struct drm_device *dev, struct drm_file
> *fpriv)
>  
>         fpriv->driver_priv = cli;
>  
> -       mutex_lock(&drm->client.mutex);
> +       mutex_lock(&drm->clients_lock);
>         list_add(&cli->head, &drm->clients);
> -       mutex_unlock(&drm->client.mutex);
> +       mutex_unlock(&drm->clients_lock);
>  
>  done:
>         if (ret && cli) {
> @@ -1117,9 +1118,9 @@ nouveau_drm_postclose(struct drm_device *dev, struct
> drm_file *fpriv)
>                 nouveau_abi16_fini(cli->abi16);
>         mutex_unlock(&cli->mutex);
>  
> -       mutex_lock(&drm->client.mutex);
> +       mutex_lock(&drm->clients_lock);
>         list_del(&cli->head);
> -       mutex_unlock(&drm->client.mutex);
> +       mutex_unlock(&drm->clients_lock);
>  
>         nouveau_cli_fini(cli);
>         kfree(cli);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h
> b/drivers/gpu/drm/nouveau/nouveau_drv.h
> index 9d04d1b36434..550e5f335146 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drv.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
> @@ -141,6 +141,11 @@ struct nouveau_drm {
>  
>         struct list_head clients;
>  
> +       /**
> +        * @clients_lock: Protects access to the @clients list of &struct
> nouveau_cli.
> +        */
> +       struct mutex clients_lock;
> +
>         u8 old_pm_cap;
>  
>         struct {

--
Sincerely,
Lyude Paul (she/her)
Software Engineer at Red Hat

Note: I deal with a lot of emails and have a lot of bugs on my plate. If you've
asked me a question, are waiting for a review/merge on a patch, etc. and I
haven't responded in a while, please feel free to send me another email to check
on my status. I don't bite!

2020-11-25 18:48:53

by Lyude Paul

[permalink] [raw]
Subject: Re: [PATCH 3/3] drm/nouveau: clean up all clients on device removal

On Tue, 2020-11-03 at 14:49 -0500, Jeremy Cline wrote:
> The postclose handler can run after the device has been removed (or the
> driver has been unbound) since userspace clients are free to hold the
> file open the file as long as they want. Because the device removal
^
typo

> callback frees the entire nouveau_drm structure, any reference to it in
> the postclose handler will result in a use-after-free.
>
> To reproduce this, one must simply open the device file, unbind the
> driver (or physically remove the device), and then close the device
> file. This was found and can be reproduced easily with the IGT
> core_hotunplug tests.
>
> To avoid this, all clients are cleaned up in the device finialization
> rather than deferring it to the postclose handler, and the postclose
> handler is protected by a critical section which ensures the
> drm_dev_unplug() and the postclose handler won't race.
>
> This is not an ideal fix, since as I understand the proposed plan for
> the kernel<->userspace interface for hotplug support, destroying the
> client before the file is closed will cause problems. However, I believe
> to properly fix this issue, the lifetime of the nouveau_drm structure
> needs to be extended to match the drm_device, and this proved to be a
> rather invasive change. Thus, I've broken this out so the fix can be
> easily backported.

JFYI - if the intent is for this to be backported, you should add:

Cc: [email protected]
> Signed-off-by: Jeremy Cline <[email protected]>
> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 30 +++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c
> b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index d182b877258a..74fab777f4d0 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -628,6 +628,7 @@ nouveau_drm_device_init(struct drm_device *dev)
>  static void
>  nouveau_drm_device_fini(struct drm_device *dev)
>  {
> +       struct nouveau_cli *cli, *temp_cli;
>         struct nouveau_drm *drm = nouveau_drm(dev);
>  
>         if (nouveau_pmops_runtime()) {
> @@ -652,6 +653,24 @@ nouveau_drm_device_fini(struct drm_device *dev)
>         nouveau_ttm_fini(drm);
>         nouveau_vga_fini(drm);
>  
> +       /*
> +        * There may be existing clients from as-yet unclosed files. For now,
> +        * clean them up here rather than deferring until the file is closed,
> +        * but this likely not correct if we want to support hot-unplugging
> +        * properly.
> +        */
> +       mutex_lock(&drm->clients_lock);
> +       list_for_each_entry_safe(cli, temp_cli, &drm->clients, head) {
> +               list_del(&cli->head);
> +               mutex_lock(&cli->mutex);
> +               if (cli->abi16)
> +                       nouveau_abi16_fini(cli->abi16);
> +               mutex_unlock(&cli->mutex);
> +               nouveau_cli_fini(cli);
> +               kfree(cli);
> +       }
> +       mutex_unlock(&drm->clients_lock);
> +
>         nouveau_cli_fini(&drm->client);
>         nouveau_cli_fini(&drm->master);
>         nvif_parent_dtor(&drm->parent);
> @@ -1110,6 +1129,16 @@ nouveau_drm_postclose(struct drm_device *dev, struct
> drm_file *fpriv)
>  {
>         struct nouveau_cli *cli = nouveau_cli(fpriv);
>         struct nouveau_drm *drm = nouveau_drm(dev);
> +       int dev_index;
> +
> +       /*
> +        * The device is gone, and as it currently stands all clients are
> +        * cleaned up in the removal codepath. In the future this may change
> +        * so that we can support hot-unplugging, but for now we immediately
> +        * return to avoid a double-free situation.
> +        */
> +       if (!drm_dev_enter(dev, &dev_index))
> +               return;
>  
>         pm_runtime_get_sync(dev->dev);
>  
> @@ -1126,6 +1155,7 @@ nouveau_drm_postclose(struct drm_device *dev, struct
> drm_file *fpriv)
>         kfree(cli);
>         pm_runtime_mark_last_busy(dev->dev);
>         pm_runtime_put_autosuspend(dev->dev);
> +       drm_dev_exit(dev_index);
>  }
>  
>  static const struct drm_ioctl_desc

--
Sincerely,
Lyude Paul (she/her)
Software Engineer at Red Hat

Note: I deal with a lot of emails and have a lot of bugs on my plate. If you've
asked me a question, are waiting for a review/merge on a patch, etc. and I
haven't responded in a while, please feel free to send me another email to check
on my status. I don't bite!

2020-11-25 22:52:29

by Jeremy Cline

[permalink] [raw]
Subject: Re: [PATCH 2/3] drm/nouveau: Add a dedicated mutex for the clients list

On Wed, Nov 25, 2020 at 01:37:06PM -0500, Lyude Paul wrote:
> On Tue, 2020-11-03 at 14:49 -0500, Jeremy Cline wrote:
> > Rather than protecting the nouveau_drm clients list with the lock within
> > the "client" nouveau_cli, add a dedicated lock to serialize access to
> > the list. This is both clearer and necessary to avoid lockdep being
> > upset with us when we need to iterate through all the clients in the
> > list and potentially lock their mutex, which is the same class as the
> > lock protecting the entire list.
> >
> > Signed-off-by: Jeremy Cline <[email protected]>
> > ---
> >  drivers/gpu/drm/nouveau/nouveau_drm.c | 9 +++++----
> >  drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++
> >  2 files changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c
> > b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > index 4fe4d664c5f2..d182b877258a 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > @@ -557,6 +557,7 @@ nouveau_drm_device_init(struct drm_device *dev)
> >                 nvkm_dbgopt(nouveau_debug, "DRM");
> >  
> >         INIT_LIST_HEAD(&drm->clients);
> > +       mutex_init(&drm->clients_lock);
>
> Looks like you forgot to hook up mutex_destroy() somewhere. Note there's
> actually plenty of code in nouveau right now that forgets to do this, but at
> some point we should probably fix that and avoid adding more spots where there's
> no mutex_destroy().
>

I'm guilty of having looked at the existing locking init code in nouveau
and modeling this work accordingly. I'll send out a fix for this shortly
and look at tidying up the rest of the locks in a separate series.
Thanks!

> >         spin_lock_init(&drm->tile.lock);
> >  
> >         /* workaround an odd issue on nvc1 by disabling the device's
> > @@ -1089,9 +1090,9 @@ nouveau_drm_open(struct drm_device *dev, struct drm_file
> > *fpriv)
> >  
> >         fpriv->driver_priv = cli;
> >  
> > -       mutex_lock(&drm->client.mutex);
> > +       mutex_lock(&drm->clients_lock);
> >         list_add(&cli->head, &drm->clients);
> > -       mutex_unlock(&drm->client.mutex);
> > +       mutex_unlock(&drm->clients_lock);
> >  
> >  done:
> >         if (ret && cli) {
> > @@ -1117,9 +1118,9 @@ nouveau_drm_postclose(struct drm_device *dev, struct
> > drm_file *fpriv)
> >                 nouveau_abi16_fini(cli->abi16);
> >         mutex_unlock(&cli->mutex);
> >  
> > -       mutex_lock(&drm->client.mutex);
> > +       mutex_lock(&drm->clients_lock);
> >         list_del(&cli->head);
> > -       mutex_unlock(&drm->client.mutex);
> > +       mutex_unlock(&drm->clients_lock);
> >  
> >         nouveau_cli_fini(cli);
> >         kfree(cli);
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h
> > b/drivers/gpu/drm/nouveau/nouveau_drv.h
> > index 9d04d1b36434..550e5f335146 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_drv.h
> > +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
> > @@ -141,6 +141,11 @@ struct nouveau_drm {
> >  
> >         struct list_head clients;
> >  
> > +       /**
> > +        * @clients_lock: Protects access to the @clients list of &struct
> > nouveau_cli.
> > +        */
> > +       struct mutex clients_lock;
> > +
> >         u8 old_pm_cap;
> >  
> >         struct {
>
> --
> Sincerely,
> Lyude Paul (she/her)
> Software Engineer at Red Hat
>
> Note: I deal with a lot of emails and have a lot of bugs on my plate. If you've
> asked me a question, are waiting for a review/merge on a patch, etc. and I
> haven't responded in a while, please feel free to send me another email to check
> on my status. I don't bite!
>

2020-11-25 22:57:03

by Jeremy Cline

[permalink] [raw]
Subject: [PATCH v2 0/3] drm/nouveau: fix a use-after-free in postclose()

This series fixes a number of use-after-frees in nouveau's postclose()
handler. It was discovered by pointing IGT's core_hotunplug tests at a
nouveau device, but the steps to reproduce it are simple:

1. Open the device file
2. Unbind the driver or remove the device
3. Close the file opened in step 1.

During the device removal, the nouveau_drm structure is de-allocated,
but is dereferenced in the postclose() handler.

One obvious solution is to ensure all the operations in the postclose()
handler are valid by extending the lifetime of the nouveau_drm
structure. This is possible with the new devm_drm_dev_alloc() interface,
but the change is somewhat invasive so I thought it best to submit that
work separately.

Instead, we make use of the drm_dev_unplug() API, clean up all clients
in the device removal call, and check to make sure the device has not
been unplugged in the postclose() handler. While this does not enable
hot-unplug support for nouveau, it's enough to avoid crashing the kernel
and leads to all the core_hotunplug tests to pass.

This series reroll addresses a missing mutex_destroy() call and a typo
in a commit message.

Jeremy Cline (3):
drm/nouveau: use drm_dev_unplug() during device removal
drm/nouveau: Add a dedicated mutex for the clients list
drm/nouveau: clean up all clients on device removal

drivers/gpu/drm/nouveau/nouveau_drm.c | 42 +++++++++++++++++++++++----
drivers/gpu/drm/nouveau/nouveau_drv.h | 5 ++++
2 files changed, 42 insertions(+), 5 deletions(-)

--
2.28.0

2020-11-25 22:57:56

by Jeremy Cline

[permalink] [raw]
Subject: [PATCH v2 2/3] drm/nouveau: Add a dedicated mutex for the clients list

Rather than protecting the nouveau_drm clients list with the lock within
the "client" nouveau_cli, add a dedicated lock to serialize access to
the list. This is both clearer and necessary to avoid lockdep being
upset with us when we need to iterate through all the clients in the
list and potentially lock their mutex, which is the same class as the
lock protecting the entire list.

Cc: [email protected]
Signed-off-by: Jeremy Cline <[email protected]>
---
Changes since v1:
- Add a mutex_destroy() call when destroying the device struct

drivers/gpu/drm/nouveau/nouveau_drm.c | 10 ++++++----
drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++
2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 4fe4d664c5f2..6ee1adc9bd40 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -557,6 +557,7 @@ nouveau_drm_device_init(struct drm_device *dev)
nvkm_dbgopt(nouveau_debug, "DRM");

INIT_LIST_HEAD(&drm->clients);
+ mutex_init(&drm->clients_lock);
spin_lock_init(&drm->tile.lock);

/* workaround an odd issue on nvc1 by disabling the device's
@@ -654,6 +655,7 @@ nouveau_drm_device_fini(struct drm_device *dev)
nouveau_cli_fini(&drm->client);
nouveau_cli_fini(&drm->master);
nvif_parent_dtor(&drm->parent);
+ mutex_destroy(&drm->clients_lock);
kfree(drm);
}

@@ -1089,9 +1091,9 @@ nouveau_drm_open(struct drm_device *dev, struct drm_file *fpriv)

fpriv->driver_priv = cli;

- mutex_lock(&drm->client.mutex);
+ mutex_lock(&drm->clients_lock);
list_add(&cli->head, &drm->clients);
- mutex_unlock(&drm->client.mutex);
+ mutex_unlock(&drm->clients_lock);

done:
if (ret && cli) {
@@ -1117,9 +1119,9 @@ nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
nouveau_abi16_fini(cli->abi16);
mutex_unlock(&cli->mutex);

- mutex_lock(&drm->client.mutex);
+ mutex_lock(&drm->clients_lock);
list_del(&cli->head);
- mutex_unlock(&drm->client.mutex);
+ mutex_unlock(&drm->clients_lock);

nouveau_cli_fini(cli);
kfree(cli);
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 9d04d1b36434..550e5f335146 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -141,6 +141,11 @@ struct nouveau_drm {

struct list_head clients;

+ /**
+ * @clients_lock: Protects access to the @clients list of &struct nouveau_cli.
+ */
+ struct mutex clients_lock;
+
u8 old_pm_cap;

struct {
--
2.28.0

2020-11-25 22:59:20

by Jeremy Cline

[permalink] [raw]
Subject: [PATCH v2 1/3] drm/nouveau: use drm_dev_unplug() during device removal

Nouveau does not currently support hot-unplugging, but it still makes
sense to switch from drm_dev_unregister() to drm_dev_unplug().
drm_dev_unplug() calls drm_dev_unregister() after marking the device as
unplugged, but only after any device critical sections are finished.

Since nouveau isn't using drm_dev_enter() and drm_dev_exit(), there are
no critical sections so this is nearly functionally equivalent. However,
the DRM layer does check to see if the device is unplugged, and if it is
returns appropriate error codes.

In the future nouveau can add critical sections in order to truly
support hot-unplugging.

Cc: [email protected]
Signed-off-by: Jeremy Cline <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_drm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index d141a5f004af..4fe4d664c5f2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -792,7 +792,7 @@ nouveau_drm_device_remove(struct drm_device *dev)
struct nvkm_client *client;
struct nvkm_device *device;

- drm_dev_unregister(dev);
+ drm_dev_unplug(dev);

dev->irq_enabled = false;
client = nvxx_client(&drm->client.base);
--
2.28.0

2020-11-25 23:00:22

by Jeremy Cline

[permalink] [raw]
Subject: [PATCH v2 3/3] drm/nouveau: clean up all clients on device removal

The postclose handler can run after the device has been removed (or the
driver has been unbound) since userspace clients are free to hold the
file open as long as they want. Because the device removal callback
frees the entire nouveau_drm structure, any reference to it in the
postclose handler will result in a use-after-free.

To reproduce this, one must simply open the device file, unbind the
driver (or physically remove the device), and then close the device
file. This was found and can be reproduced easily with the IGT
core_hotunplug tests.

To avoid this, all clients are cleaned up in the device finalization
rather than deferring it to the postclose handler, and the postclose
handler is protected by a critical section which ensures the
drm_dev_unplug() and the postclose handler won't race.

This is not an ideal fix, since as I understand the proposed plan for
the kernel<->userspace interface for hotplug support, destroying the
client before the file is closed will cause problems. However, I believe
to properly fix this issue, the lifetime of the nouveau_drm structure
needs to be extended to match the drm_device, and this proved to be a
rather invasive change. Thus, I've broken this out so the fix can be
easily backported.

Cc: [email protected]
Signed-off-by: Jeremy Cline <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_drm.c | 30 +++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 6ee1adc9bd40..afaf1774ee35 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -628,6 +628,7 @@ nouveau_drm_device_init(struct drm_device *dev)
static void
nouveau_drm_device_fini(struct drm_device *dev)
{
+ struct nouveau_cli *cli, *temp_cli;
struct nouveau_drm *drm = nouveau_drm(dev);

if (nouveau_pmops_runtime()) {
@@ -652,6 +653,24 @@ nouveau_drm_device_fini(struct drm_device *dev)
nouveau_ttm_fini(drm);
nouveau_vga_fini(drm);

+ /*
+ * There may be existing clients from as-yet unclosed files. For now,
+ * clean them up here rather than deferring until the file is closed,
+ * but this likely not correct if we want to support hot-unplugging
+ * properly.
+ */
+ mutex_lock(&drm->clients_lock);
+ list_for_each_entry_safe(cli, temp_cli, &drm->clients, head) {
+ list_del(&cli->head);
+ mutex_lock(&cli->mutex);
+ if (cli->abi16)
+ nouveau_abi16_fini(cli->abi16);
+ mutex_unlock(&cli->mutex);
+ nouveau_cli_fini(cli);
+ kfree(cli);
+ }
+ mutex_unlock(&drm->clients_lock);
+
nouveau_cli_fini(&drm->client);
nouveau_cli_fini(&drm->master);
nvif_parent_dtor(&drm->parent);
@@ -1111,6 +1130,16 @@ nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
{
struct nouveau_cli *cli = nouveau_cli(fpriv);
struct nouveau_drm *drm = nouveau_drm(dev);
+ int dev_index;
+
+ /*
+ * The device is gone, and as it currently stands all clients are
+ * cleaned up in the removal codepath. In the future this may change
+ * so that we can support hot-unplugging, but for now we immediately
+ * return to avoid a double-free situation.
+ */
+ if (!drm_dev_enter(dev, &dev_index))
+ return;

pm_runtime_get_sync(dev->dev);

@@ -1127,6 +1156,7 @@ nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
kfree(cli);
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);
+ drm_dev_exit(dev_index);
}

static const struct drm_ioctl_desc
--
2.28.0

2021-03-26 22:04:14

by Lyude Paul

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] drm/nouveau: fix a use-after-free in postclose()

This patch series is:

Reviewed-by: Lyude Paul <[email protected]>

Btw - in the future if you need to send a respin of multiple patches, you need
to send it as it's own separate series instead of replying to the previous one
(one-off respins can just be posted as replies though), otherwise patchwork
won't pick it up

On Wed, 2020-11-25 at 15:26 -0500, Jeremy Cline wrote:
> This series fixes a number of use-after-frees in nouveau's postclose()
> handler. It was discovered by pointing IGT's core_hotunplug tests at a
> nouveau device, but the steps to reproduce it are simple:
>
> 1. Open the device file
> 2. Unbind the driver or remove the device
> 3. Close the file opened in step 1.
>
> During the device removal, the nouveau_drm structure is de-allocated,
> but is dereferenced in the postclose() handler.
>
> One obvious solution is to ensure all the operations in the postclose()
> handler are valid by extending the lifetime of the nouveau_drm
> structure. This is possible with the new devm_drm_dev_alloc() interface,
> but the change is somewhat invasive so I thought it best to submit that
> work separately.
>
> Instead, we make use of the drm_dev_unplug() API, clean up all clients
> in the device removal call, and check to make sure the device has not
> been unplugged in the postclose() handler. While this does not enable
> hot-unplug support for nouveau, it's enough to avoid crashing the kernel
> and leads to all the core_hotunplug tests to pass.
>
> This series reroll addresses a missing mutex_destroy() call and a typo
> in a commit message.
>
> Jeremy Cline (3):
>   drm/nouveau: use drm_dev_unplug() during device removal
>   drm/nouveau: Add a dedicated mutex for the clients list
>   drm/nouveau: clean up all clients on device removal
>
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 42 +++++++++++++++++++++++----
>  drivers/gpu/drm/nouveau/nouveau_drv.h |  5 ++++
>  2 files changed, 42 insertions(+), 5 deletions(-)
>

--
Sincerely,
Lyude Paul (she/her)
Software Engineer at Red Hat

Note: I deal with a lot of emails and have a lot of bugs on my plate. If you've
asked me a question, are waiting for a review/merge on a patch, etc. and I
haven't responded in a while, please feel free to send me another email to check
on my status. I don't bite!

2021-08-16 07:05:49

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] drm/nouveau: fix a use-after-free in postclose()

Hi,

On Fri, Mar 26, 2021 at 06:00:51PM -0400, Lyude Paul wrote:
> This patch series is:
>
> Reviewed-by: Lyude Paul <[email protected]>
>
> Btw - in the future if you need to send a respin of multiple patches, you need
> to send it as it's own separate series instead of replying to the previous one
> (one-off respins can just be posted as replies though), otherwise patchwork
> won't pick it up

Did this patch series somehow fall through the cracks or got lost?

Regards,
Salvatore

2021-08-17 20:37:40

by Lyude Paul

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] drm/nouveau: fix a use-after-free in postclose()

It may have been, we're in the process of trying to change around how we
currently accept nouveau patches to stop this from happening in the future.

Ben, whenever you get a moment can you take a look at this?

On Mon, 2021-08-16 at 09:03 +0200, Salvatore Bonaccorso wrote:
> Hi,
>
> On Fri, Mar 26, 2021 at 06:00:51PM -0400, Lyude Paul wrote:
> > This patch series is:
> >
> > Reviewed-by: Lyude Paul <[email protected]>
> >
> > Btw - in the future if you need to send a respin of multiple patches, you
> > need
> > to send it as it's own separate series instead of replying to the previous
> > one
> > (one-off respins can just be posted as replies though), otherwise
> > patchwork
> > won't pick it up
>
> Did this patch series somehow fall through the cracks or got lost?
>
> Regards,
> Salvatore
>

--
Cheers,
Lyude Paul (she/her)
Software Engineer at Red Hat

2021-10-11 11:29:26

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] drm/nouveau: fix a use-after-free in postclose()

Hi Ben,

On Tue, Aug 17, 2021 at 04:32:31PM -0400, Lyude Paul wrote:
> It may have been, we're in the process of trying to change around how we
> currently accept nouveau patches to stop this from happening in the future.
>
> Ben, whenever you get a moment can you take a look at this?
>
> On Mon, 2021-08-16 at 09:03 +0200, Salvatore Bonaccorso wrote:
> > Hi,
> >
> > On Fri, Mar 26, 2021 at 06:00:51PM -0400, Lyude Paul wrote:
> > > This patch series is:
> > >
> > > Reviewed-by: Lyude Paul <[email protected]>
> > >
> > > Btw - in the future if you need to send a respin of multiple patches, you
> > > need
> > > to send it as it's own separate series instead of replying to the previous
> > > one
> > > (one-off respins can just be posted as replies though), otherwise
> > > patchwork
> > > won't pick it up
> >
> > Did this patch series somehow fall through the cracks or got lost?

Looking some older threads, noticed this one. Ben did you got a chance
to look at it, or is it now irrelevant by other means?

Regards,
Salvatore

2021-10-11 13:37:58

by Karol Herbst

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] drm/nouveau: fix a use-after-free in postclose()

I am currently checking the ML for such old patches, just have to make
sure it's actually fine and not breaking stuff as well.

But I think we will pull this in soonish, I just also work on
improving our CI stuff at the same time by trying out some things.

On Mon, Oct 11, 2021 at 9:06 AM Salvatore Bonaccorso <[email protected]> wrote:
>
> Hi Ben,
>
> On Tue, Aug 17, 2021 at 04:32:31PM -0400, Lyude Paul wrote:
> > It may have been, we're in the process of trying to change around how we
> > currently accept nouveau patches to stop this from happening in the future.
> >
> > Ben, whenever you get a moment can you take a look at this?
> >
> > On Mon, 2021-08-16 at 09:03 +0200, Salvatore Bonaccorso wrote:
> > > Hi,
> > >
> > > On Fri, Mar 26, 2021 at 06:00:51PM -0400, Lyude Paul wrote:
> > > > This patch series is:
> > > >
> > > > Reviewed-by: Lyude Paul <[email protected]>
> > > >
> > > > Btw - in the future if you need to send a respin of multiple patches, you
> > > > need
> > > > to send it as it's own separate series instead of replying to the previous
> > > > one
> > > > (one-off respins can just be posted as replies though), otherwise
> > > > patchwork
> > > > won't pick it up
> > >
> > > Did this patch series somehow fall through the cracks or got lost?
>
> Looking some older threads, noticed this one. Ben did you got a chance
> to look at it, or is it now irrelevant by other means?
>
> Regards,
> Salvatore
>