2023-10-27 05:57:08

by Yu Wang

[permalink] [raw]
Subject: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

With sample code as below, it may hit use-after-free issue when
releasing devcd device.

struct my_coredump_state {
struct completion dump_done;
...
};

static void my_coredump_free(void *data)
{
struct my_coredump_state *dump_state = data;
...
complete(&dump_state->dump_done);
}

static void my_dev_release(struct device *dev)
{
kfree(dev);
}

static void my_coredump()
{
struct my_coredump_state dump_state;
struct device *new_device =
kzalloc(sizeof(*new_device), GFP_KERNEL);

...
new_device->release = my_dev_release;
device_initialize(new_device);
...
device_add(new_device);
...
init_completion(&dump_state.dump_done);
dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
my_coredump_read, my_coredump_free);
wait_for_completion(&dump_state.dump_done);
device_del(new_device);
put_device(new_device);
}

In devcoredump framework, devcd_dev_release() will be called when
releasing the devcd device, it will call the free() callback first
and try to delete the symlink in sysfs directory of the failing device.
Eventhough it has checked 'devcd->failing_dev->kobj.sd' before that,
there is no mechanism to ensure it's still available when accessing
it in kernfs_find_ns(), refer to the diagram as below:

Thread A was waiting for 'dump_state.dump_done' at #A-1-2 after
calling dev_coredumpm().
When thread B calling devcd->free() at #B-2-1, it wakes up
thread A from point #A-1-2, which will call device_del() to
delete the device.
If #B-2-2 comes before #A-3-1, but #B-4 comes after #A-4, it
will hit use-after-free issue when trying to access
'devcd->failing_dev->kobj.sd'.

#A-1-1: dev_coredumpm()
#A-1-2: wait_for_completion(&dump_state.dump_done)
#A-1-3: device_del()
#A-2: kobject_del()
#A-3-1: sysfs_remove_dir() --> set kobj->sd=NULL
#A-3-2: kernfs_put()
#A-4: kmem_cache_free() --> free kobj->sd

#B-1: devcd_dev_release()
#B-2-1: devcd->free(devcd->data)
#B-2-2: check devcd->failing_dev->kobj.sd
#B-2-3: sysfs_delete_link()
#B-3: kernfs_remove_by_name_ns()
#B-4: kernfs_find_ns() --> access devcd->failing_dev->kobj.sd

To fix this issue, put operations on devcd->failing_dev before
calling the free() callback in devcd_dev_release().

Signed-off-by: Yu Wang <[email protected]>
---
drivers/base/devcoredump.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index 91536ee05f14..35c704ddfeae 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -83,9 +83,6 @@ static void devcd_dev_release(struct device *dev)
{
struct devcd_entry *devcd = dev_to_devcd(dev);

- devcd->free(devcd->data);
- module_put(devcd->owner);
-
/*
* this seems racy, but I don't see a notifier or such on
* a struct device to know when it goes away?
@@ -95,6 +92,8 @@ static void devcd_dev_release(struct device *dev)
"devcoredump");

put_device(devcd->failing_dev);
+ devcd->free(devcd->data);
+ module_put(devcd->owner);
kfree(devcd);
}

--
2.17.1


2023-10-27 06:22:23

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device



On 10/27/2023 11:25 AM, Yu Wang wrote:
> With sample code as below, it may hit use-after-free issue when
> releasing devcd device.
>
> struct my_coredump_state {
> struct completion dump_done;
> ...
> };
>
> static void my_coredump_free(void *data)
> {
> struct my_coredump_state *dump_state = data;
> ...
> complete(&dump_state->dump_done);
> }
>
> static void my_dev_release(struct device *dev)
> {
> kfree(dev);
> }
>
> static void my_coredump()
> {
> struct my_coredump_state dump_state;
> struct device *new_device =
> kzalloc(sizeof(*new_device), GFP_KERNEL);
>
> ...
> new_device->release = my_dev_release;
> device_initialize(new_device);
> ...
> device_add(new_device);
> ...
> init_completion(&dump_state.dump_done);
> dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
> my_coredump_read, my_coredump_free);
> wait_for_completion(&dump_state.dump_done);
> device_del(new_device);
> put_device(new_device);
> }
>
> In devcoredump framework, devcd_dev_release() will be called when
> releasing the devcd device, it will call the free() callback first
> and try to delete the symlink in sysfs directory of the failing device.
> Eventhough it has checked 'devcd->failing_dev->kobj.sd' before that,
> there is no mechanism to ensure it's still available when accessing
> it in kernfs_find_ns(), refer to the diagram as below:
>
> Thread A was waiting for 'dump_state.dump_done' at #A-1-2 after
> calling dev_coredumpm().
> When thread B calling devcd->free() at #B-2-1, it wakes up
> thread A from point #A-1-2, which will call device_del() to
> delete the device.
> If #B-2-2 comes before #A-3-1, but #B-4 comes after #A-4, it
> will hit use-after-free issue when trying to access
> 'devcd->failing_dev->kobj.sd'.
>
> #A-1-1: dev_coredumpm()
> #A-1-2: wait_for_completion(&dump_state.dump_done)
> #A-1-3: device_del()
> #A-2: kobject_del()
> #A-3-1: sysfs_remove_dir() --> set kobj->sd=NULL
> #A-3-2: kernfs_put()
> #A-4: kmem_cache_free() --> free kobj->sd
>
> #B-1: devcd_dev_release()
> #B-2-1: devcd->free(devcd->data)
> #B-2-2: check devcd->failing_dev->kobj.sd
> #B-2-3: sysfs_delete_link()
> #B-3: kernfs_remove_by_name_ns()
> #B-4: kernfs_find_ns() --> access devcd->failing_dev->kobj.sd
>
> To fix this issue, put operations on devcd->failing_dev before
> calling the free() callback in devcd_dev_release().
>
> Signed-off-by: Yu Wang <[email protected]>

Awesome explanation.

Reviewed-by: Mukesh Ojha <[email protected]>

-Mukesh

> ---
> drivers/base/devcoredump.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
> index 91536ee05f14..35c704ddfeae 100644
> --- a/drivers/base/devcoredump.c
> +++ b/drivers/base/devcoredump.c
> @@ -83,9 +83,6 @@ static void devcd_dev_release(struct device *dev)
> {
> struct devcd_entry *devcd = dev_to_devcd(dev);
>
> - devcd->free(devcd->data);
> - module_put(devcd->owner);
> -
> /*
> * this seems racy, but I don't see a notifier or such on
> * a struct device to know when it goes away?
> @@ -95,6 +92,8 @@ static void devcd_dev_release(struct device *dev)
> "devcoredump");
>
> put_device(devcd->failing_dev);
> + devcd->free(devcd->data);
> + module_put(devcd->owner);
> kfree(devcd);
> }
>

2023-10-27 06:23:58

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device



On 10/27/2023 11:25 AM, Yu Wang wrote:
> With sample code as below, it may hit use-after-free issue when
> releasing devcd device.
>
> struct my_coredump_state {
> struct completion dump_done;
> ...
> };
>
> static void my_coredump_free(void *data)
> {
> struct my_coredump_state *dump_state = data;
> ...
> complete(&dump_state->dump_done);
> }
>
> static void my_dev_release(struct device *dev)
> {
> kfree(dev);
> }
>
> static void my_coredump()
> {
> struct my_coredump_state dump_state;
> struct device *new_device =
> kzalloc(sizeof(*new_device), GFP_KERNEL);
>
> ...
> new_device->release = my_dev_release;
> device_initialize(new_device);
> ...
> device_add(new_device);
> ...
> init_completion(&dump_state.dump_done);
> dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
> my_coredump_read, my_coredump_free);
> wait_for_completion(&dump_state.dump_done);
> device_del(new_device);
> put_device(new_device);
> }
>
> In devcoredump framework, devcd_dev_release() will be called when
> releasing the devcd device, it will call the free() callback first
> and try to delete the symlink in sysfs directory of the failing device.
> Eventhough it has checked 'devcd->failing_dev->kobj.sd' before that,
> there is no mechanism to ensure it's still available when accessing
> it in kernfs_find_ns(), refer to the diagram as below:
>
> Thread A was waiting for 'dump_state.dump_done' at #A-1-2 after
> calling dev_coredumpm().
> When thread B calling devcd->free() at #B-2-1, it wakes up
> thread A from point #A-1-2, which will call device_del() to
> delete the device.
> If #B-2-2 comes before #A-3-1, but #B-4 comes after #A-4, it
> will hit use-after-free issue when trying to access
> 'devcd->failing_dev->kobj.sd'.
>
> #A-1-1: dev_coredumpm()
> #A-1-2: wait_for_completion(&dump_state.dump_done)
> #A-1-3: device_del()
> #A-2: kobject_del()
> #A-3-1: sysfs_remove_dir() --> set kobj->sd=NULL
> #A-3-2: kernfs_put()
> #A-4: kmem_cache_free() --> free kobj->sd
>
> #B-1: devcd_dev_release()
> #B-2-1: devcd->free(devcd->data)
> #B-2-2: check devcd->failing_dev->kobj.sd
> #B-2-3: sysfs_delete_link()
> #B-3: kernfs_remove_by_name_ns()
> #B-4: kernfs_find_ns() --> access devcd->failing_dev->kobj.sd
>
> To fix this issue, put operations on devcd->failing_dev before
> calling the free() callback in devcd_dev_release().
>
> Signed-off-by: Yu Wang <[email protected]>
> ---
> drivers/base/devcoredump.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
> index 91536ee05f14..35c704ddfeae 100644
> --- a/drivers/base/devcoredump.c
> +++ b/drivers/base/devcoredump.c
> @@ -83,9 +83,6 @@ static void devcd_dev_release(struct device *dev)
> {
> struct devcd_entry *devcd = dev_to_devcd(dev);
>
> - devcd->free(devcd->data);
> - module_put(devcd->owner);
> -
> /*
> * this seems racy, but I don't see a notifier or such on
> * a struct device to know when it goes away?

Does this comment became obsolete now ?

-Mukesh

> @@ -95,6 +92,8 @@ static void devcd_dev_release(struct device *dev)
> "devcoredump");
>
> put_device(devcd->failing_dev);
> + devcd->free(devcd->data);
> + module_put(devcd->owner);
> kfree(devcd);
> }
>

2023-10-27 06:55:36

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device



On 10/27/2023 11:53 AM, Mukesh Ojha wrote:
>
>
> On 10/27/2023 11:25 AM, Yu Wang wrote:
>> With sample code as below, it may hit use-after-free issue when
>> releasing devcd device.
>>
>>      struct my_coredump_state {
>>          struct completion dump_done;
>>          ...
>>      };
>>
>>      static void my_coredump_free(void *data)
>>      {
>>          struct my_coredump_state *dump_state = data;
>>          ...
>>          complete(&dump_state->dump_done);
>>      }
>>
>>      static void my_dev_release(struct device *dev)
>>      {
>>          kfree(dev);
>>      }
>>
>>      static void my_coredump()
>>      {
>>          struct my_coredump_state dump_state;
>>          struct device *new_device =
>>              kzalloc(sizeof(*new_device), GFP_KERNEL);
>>
>>          ...
>>          new_device->release = my_dev_release;
>>          device_initialize(new_device);
>>          ...
>>          device_add(new_device);
>>          ...
>>          init_completion(&dump_state.dump_done);
>>          dev_coredumpm(new_device, NULL, &dump_state, datalen,
>> GFP_KERNEL,
>>                        my_coredump_read, my_coredump_free);
>>          wait_for_completion(&dump_state.dump_done);
>>          device_del(new_device);
>>          put_device(new_device);
>>      }
>>
>> In devcoredump framework, devcd_dev_release() will be called when
>> releasing the devcd device, it will call the free() callback first
>> and try to delete the symlink in sysfs directory of the failing device.
>> Eventhough it has checked 'devcd->failing_dev->kobj.sd' before that,
>> there is no mechanism to ensure it's still available when accessing
>> it in kernfs_find_ns(), refer to the diagram as below:
>>
>>      Thread A was waiting for 'dump_state.dump_done' at #A-1-2 after
>>      calling dev_coredumpm().
>>      When thread B calling devcd->free() at #B-2-1, it wakes up
>>      thread A from point #A-1-2, which will call device_del() to
>>      delete the device.
>>      If #B-2-2 comes before #A-3-1, but #B-4 comes after #A-4, it
>>      will hit use-after-free issue when trying to access
>>      'devcd->failing_dev->kobj.sd'.
>>
>>      #A-1-1: dev_coredumpm()
>>        #A-1-2: wait_for_completion(&dump_state.dump_done)
>>        #A-1-3: device_del()
>>          #A-2: kobject_del()
>>            #A-3-1: sysfs_remove_dir() --> set kobj->sd=NULL
>>            #A-3-2: kernfs_put()
>>              #A-4: kmem_cache_free() --> free kobj->sd
>>
>>      #B-1: devcd_dev_release()
>>        #B-2-1: devcd->free(devcd->data)
>>        #B-2-2: check devcd->failing_dev->kobj.sd
>>        #B-2-3: sysfs_delete_link()
>>          #B-3: kernfs_remove_by_name_ns()
>>            #B-4: kernfs_find_ns() --> access devcd->failing_dev->kobj.sd
>>
>> To fix this issue, put operations on devcd->failing_dev before
>> calling the free() callback in devcd_dev_release().
>>
>> Signed-off-by: Yu Wang <[email protected]>
>> ---
>>   drivers/base/devcoredump.c | 5 ++---
>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
>> index 91536ee05f14..35c704ddfeae 100644
>> --- a/drivers/base/devcoredump.c
>> +++ b/drivers/base/devcoredump.c
>> @@ -83,9 +83,6 @@ static void devcd_dev_release(struct device *dev)
>>   {
>>       struct devcd_entry *devcd = dev_to_devcd(dev);
>> -    devcd->free(devcd->data);
>> -    module_put(devcd->owner);
>> -
>>       /*
>>        * this seems racy, but I don't see a notifier or such on
>>        * a struct device to know when it goes away?
>
> Does this comment became obsolete now ?

I thought about this, looks like this comment could be valid
for dev_coredumpv() case.

-Mukesh
>
> -Mukesh
>
>> @@ -95,6 +92,8 @@ static void devcd_dev_release(struct device *dev)
>>                     "devcoredump");
>>       put_device(devcd->failing_dev);
>> +    devcd->free(devcd->data);
>> +    module_put(devcd->owner);
>>       kfree(devcd);
>>   }

2023-10-27 11:12:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

On Thu, Oct 26, 2023 at 10:55:21PM -0700, Yu Wang wrote:
> With sample code as below, it may hit use-after-free issue when
> releasing devcd device.
>
> struct my_coredump_state {
> struct completion dump_done;
> ...
> };
>
> static void my_coredump_free(void *data)
> {
> struct my_coredump_state *dump_state = data;
> ...
> complete(&dump_state->dump_done);
> }
>
> static void my_dev_release(struct device *dev)
> {
> kfree(dev);
> }
>
> static void my_coredump()
> {
> struct my_coredump_state dump_state;
> struct device *new_device =
> kzalloc(sizeof(*new_device), GFP_KERNEL);
>
> ...
> new_device->release = my_dev_release;
> device_initialize(new_device);
> ...
> device_add(new_device);
> ...
> init_completion(&dump_state.dump_done);
> dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
> my_coredump_read, my_coredump_free);
> wait_for_completion(&dump_state.dump_done);
> device_del(new_device);
> put_device(new_device);
> }

Is there any in-kernel user like this? If so, why not fix them up to
not do this?

thanks,

greg k-h

2023-10-27 11:12:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

On Thu, Oct 26, 2023 at 10:55:21PM -0700, Yu Wang wrote:
> With sample code as below, it may hit use-after-free issue when
> releasing devcd device.
>
> struct my_coredump_state {
> struct completion dump_done;
> ...
> };
>
> static void my_coredump_free(void *data)
> {
> struct my_coredump_state *dump_state = data;
> ...
> complete(&dump_state->dump_done);
> }
>
> static void my_dev_release(struct device *dev)
> {
> kfree(dev);
> }
>
> static void my_coredump()
> {
> struct my_coredump_state dump_state;
> struct device *new_device =
> kzalloc(sizeof(*new_device), GFP_KERNEL);
>
> ...
> new_device->release = my_dev_release;
> device_initialize(new_device);
> ...
> device_add(new_device);
> ...
> init_completion(&dump_state.dump_done);
> dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
> my_coredump_read, my_coredump_free);
> wait_for_completion(&dump_state.dump_done);
> device_del(new_device);
> put_device(new_device);
> }
>
> In devcoredump framework, devcd_dev_release() will be called when
> releasing the devcd device, it will call the free() callback first
> and try to delete the symlink in sysfs directory of the failing device.
> Eventhough it has checked 'devcd->failing_dev->kobj.sd' before that,
> there is no mechanism to ensure it's still available when accessing
> it in kernfs_find_ns(), refer to the diagram as below:
>
> Thread A was waiting for 'dump_state.dump_done' at #A-1-2 after
> calling dev_coredumpm().
> When thread B calling devcd->free() at #B-2-1, it wakes up
> thread A from point #A-1-2, which will call device_del() to
> delete the device.
> If #B-2-2 comes before #A-3-1, but #B-4 comes after #A-4, it
> will hit use-after-free issue when trying to access
> 'devcd->failing_dev->kobj.sd'.
>
> #A-1-1: dev_coredumpm()
> #A-1-2: wait_for_completion(&dump_state.dump_done)
> #A-1-3: device_del()
> #A-2: kobject_del()
> #A-3-1: sysfs_remove_dir() --> set kobj->sd=NULL
> #A-3-2: kernfs_put()
> #A-4: kmem_cache_free() --> free kobj->sd
>
> #B-1: devcd_dev_release()
> #B-2-1: devcd->free(devcd->data)
> #B-2-2: check devcd->failing_dev->kobj.sd
> #B-2-3: sysfs_delete_link()
> #B-3: kernfs_remove_by_name_ns()
> #B-4: kernfs_find_ns() --> access devcd->failing_dev->kobj.sd
>
> To fix this issue, put operations on devcd->failing_dev before
> calling the free() callback in devcd_dev_release().
>
> Signed-off-by: Yu Wang <[email protected]>
> ---
> drivers/base/devcoredump.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)

Also, what commit id does this fix?

thanks,

greg k-h

2023-10-27 12:47:32

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

On Fri, 2023-10-27 at 13:11 +0200, Greg KH wrote:
> >
> > static void my_coredump()
> > {
> > struct my_coredump_state dump_state;
> > struct device *new_device =
> > kzalloc(sizeof(*new_device), GFP_KERNEL);
> >
> > ...
> > new_device->release = my_dev_release;
> > device_initialize(new_device);
> > ...
> > device_add(new_device);
> > ...
> > init_completion(&dump_state.dump_done);
> > dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
> > my_coredump_read, my_coredump_free);
> > wait_for_completion(&dump_state.dump_done);
> > device_del(new_device);
> > put_device(new_device);
> > }
>
> Is there any in-kernel user like this? If so, why not fix them up to
> not do this?
>

Maybe this is only a simplified scenario and whenever you remove a
device when a coredump is still pending this can happen?

Actually, no, wait, what is this doing??? Why is there a completion and
all that stuff there? You shouldn't really care about the dump once you
have created it, and not pass NULL for the struct module pointer
either?!

johannes

2023-10-28 09:21:28

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device



On 10/27/2023 11:52 AM, Mukesh Ojha wrote:
>
>
> On 10/27/2023 11:25 AM, Yu Wang wrote:
>> With sample code as below, it may hit use-after-free issue when
>> releasing devcd device.
>>
>>      struct my_coredump_state {
>>          struct completion dump_done;
>>          ...
>>      };
>>
>>      static void my_coredump_free(void *data)
>>      {
>>          struct my_coredump_state *dump_state = data;
>>          ...
>>          complete(&dump_state->dump_done);
>>      }
>>
>>      static void my_dev_release(struct device *dev)
>>      {
>>          kfree(dev);
>>      }
>>
>>      static void my_coredump()
>>      {
>>          struct my_coredump_state dump_state;
>>          struct device *new_device =
>>              kzalloc(sizeof(*new_device), GFP_KERNEL);
>>
>>          ...
>>          new_device->release = my_dev_release;
>>          device_initialize(new_device);
>>          ...
>>          device_add(new_device);
>>          ...
>>          init_completion(&dump_state.dump_done);
>>          dev_coredumpm(new_device, NULL, &dump_state, datalen,
>> GFP_KERNEL,
>>                        my_coredump_read, my_coredump_free);
>>          wait_for_completion(&dump_state.dump_done);
>>          device_del(new_device);
>>          put_device(new_device);
>>      }
>>
>> In devcoredump framework, devcd_dev_release() will be called when
>> releasing the devcd device, it will call the free() callback first
>> and try to delete the symlink in sysfs directory of the failing device.
>> Eventhough it has checked 'devcd->failing_dev->kobj.sd' before that,
>> there is no mechanism to ensure it's still available when accessing
>> it in kernfs_find_ns(), refer to the diagram as below:
>>
>>      Thread A was waiting for 'dump_state.dump_done' at #A-1-2 after
>>      calling dev_coredumpm().
>>      When thread B calling devcd->free() at #B-2-1, it wakes up
>>      thread A from point #A-1-2, which will call device_del() to
>>      delete the device.
>>      If #B-2-2 comes before #A-3-1, but #B-4 comes after #A-4, it
>>      will hit use-after-free issue when trying to access
>>      'devcd->failing_dev->kobj.sd'.
>>
>>      #A-1-1: dev_coredumpm()
>>        #A-1-2: wait_for_completion(&dump_state.dump_done)
>>        #A-1-3: device_del()
>>          #A-2: kobject_del()
>>            #A-3-1: sysfs_remove_dir() --> set kobj->sd=NULL
>>            #A-3-2: kernfs_put()
>>              #A-4: kmem_cache_free() --> free kobj->sd
>>
>>      #B-1: devcd_dev_release()
>>        #B-2-1: devcd->free(devcd->data)
>>        #B-2-2: check devcd->failing_dev->kobj.sd
>>        #B-2-3: sysfs_delete_link()
>>          #B-3: kernfs_remove_by_name_ns()
>>            #B-4: kernfs_find_ns() --> access devcd->failing_dev->kobj.sd
>>
>> To fix this issue, put operations on devcd->failing_dev before
>> calling the free() callback in devcd_dev_release().
>>
>> Signed-off-by: Yu Wang <[email protected]>
>
> Awesome explanation.

But it is still not solving the original race like device can go
anytime., what if caller of dev_coredumpm() is not waiting and
called device_del() right during this devcd_dev_release() where,
this race can still happen.

Although, dev_coredumpm() does keep reference to the device
devcd->failing_dev = get_device(dev);

Does it also need to keep reference to
kernfs_get(devcd->failing_dev->kobj.sd) inside dev_coredumpm() itself
and release reference kernfs_put(devcd->failing_dev->kobj.sd) after
sysfs_delete_link() to avoid this issue completely.

thoughts ?

>
> Reviewed-by: Mukesh Ojha <[email protected]>

Ignore this for now

-Mukesh

>
> -Mukesh
>
>> ---
>>   drivers/base/devcoredump.c | 5 ++---
>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
>> index 91536ee05f14..35c704ddfeae 100644
>> --- a/drivers/base/devcoredump.c
>> +++ b/drivers/base/devcoredump.c
>> @@ -83,9 +83,6 @@ static void devcd_dev_release(struct device *dev)
>>   {
>>       struct devcd_entry *devcd = dev_to_devcd(dev);
>> -    devcd->free(devcd->data);
>> -    module_put(devcd->owner);
>> -
>>       /*
>>        * this seems racy, but I don't see a notifier or such on
>>        * a struct device to know when it goes away?
>> @@ -95,6 +92,8 @@ static void devcd_dev_release(struct device *dev)
>>                     "devcoredump");
>>       put_device(devcd->failing_dev);
>> +    devcd->free(devcd->data);
>> +    module_put(devcd->owner);
>>       kfree(devcd);
>>   }

2023-10-31 07:15:52

by Yu Wang

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device



On 10/27/2023 7:12 PM, Greg KH wrote:
> On Thu, Oct 26, 2023 at 10:55:21PM -0700, Yu Wang wrote:
>> With sample code as below, it may hit use-after-free issue when
>> releasing devcd device.
>>
>> struct my_coredump_state {
>> struct completion dump_done;
>> ...
>> };
>>
>> static void my_coredump_free(void *data)
>> {
>> struct my_coredump_state *dump_state = data;
>> ...
>> complete(&dump_state->dump_done);
>> }
>>
>> static void my_dev_release(struct device *dev)
>> {
>> kfree(dev);
>> }
>>
>> static void my_coredump()
>> {
>> struct my_coredump_state dump_state;
>> struct device *new_device =
>> kzalloc(sizeof(*new_device), GFP_KERNEL);
>>
>> ...
>> new_device->release = my_dev_release;
>> device_initialize(new_device);
>> ...
>> device_add(new_device);
>> ...
>> init_completion(&dump_state.dump_done);
>> dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
>> my_coredump_read, my_coredump_free);
>> wait_for_completion(&dump_state.dump_done);
>> device_del(new_device);
>> put_device(new_device);
>> }
>>
>> In devcoredump framework, devcd_dev_release() will be called when
>> releasing the devcd device, it will call the free() callback first
>> and try to delete the symlink in sysfs directory of the failing device.
>> Eventhough it has checked 'devcd->failing_dev->kobj.sd' before that,
>> there is no mechanism to ensure it's still available when accessing
>> it in kernfs_find_ns(), refer to the diagram as below:
>>
>> Thread A was waiting for 'dump_state.dump_done' at #A-1-2 after
>> calling dev_coredumpm().
>> When thread B calling devcd->free() at #B-2-1, it wakes up
>> thread A from point #A-1-2, which will call device_del() to
>> delete the device.
>> If #B-2-2 comes before #A-3-1, but #B-4 comes after #A-4, it
>> will hit use-after-free issue when trying to access
>> 'devcd->failing_dev->kobj.sd'.
>>
>> #A-1-1: dev_coredumpm()
>> #A-1-2: wait_for_completion(&dump_state.dump_done)
>> #A-1-3: device_del()
>> #A-2: kobject_del()
>> #A-3-1: sysfs_remove_dir() --> set kobj->sd=NULL
>> #A-3-2: kernfs_put()
>> #A-4: kmem_cache_free() --> free kobj->sd
>>
>> #B-1: devcd_dev_release()
>> #B-2-1: devcd->free(devcd->data)
>> #B-2-2: check devcd->failing_dev->kobj.sd
>> #B-2-3: sysfs_delete_link()
>> #B-3: kernfs_remove_by_name_ns()
>> #B-4: kernfs_find_ns() --> access devcd->failing_dev->kobj.sd
>>
>> To fix this issue, put operations on devcd->failing_dev before
>> calling the free() callback in devcd_dev_release().
>>
>> Signed-off-by: Yu Wang <[email protected]>
>> ---
>> drivers/base/devcoredump.c | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> Also, what commit id does this fix?

Thanks for your comment :)
Do you mean the commit which introduced this issue? It's from initial version of devcoredump.c.

>
> thanks,
>
> greg k-h

2023-10-31 07:40:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

On Tue, Oct 31, 2023 at 03:15:12PM +0800, Yu Wang wrote:
>
>
> On 10/27/2023 7:12 PM, Greg KH wrote:
> > On Thu, Oct 26, 2023 at 10:55:21PM -0700, Yu Wang wrote:
> >> With sample code as below, it may hit use-after-free issue when
> >> releasing devcd device.
> >>
> >> struct my_coredump_state {
> >> struct completion dump_done;
> >> ...
> >> };
> >>
> >> static void my_coredump_free(void *data)
> >> {
> >> struct my_coredump_state *dump_state = data;
> >> ...
> >> complete(&dump_state->dump_done);
> >> }
> >>
> >> static void my_dev_release(struct device *dev)
> >> {
> >> kfree(dev);
> >> }
> >>
> >> static void my_coredump()
> >> {
> >> struct my_coredump_state dump_state;
> >> struct device *new_device =
> >> kzalloc(sizeof(*new_device), GFP_KERNEL);
> >>
> >> ...
> >> new_device->release = my_dev_release;
> >> device_initialize(new_device);
> >> ...
> >> device_add(new_device);
> >> ...
> >> init_completion(&dump_state.dump_done);
> >> dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
> >> my_coredump_read, my_coredump_free);
> >> wait_for_completion(&dump_state.dump_done);
> >> device_del(new_device);
> >> put_device(new_device);
> >> }
> >>
> >> In devcoredump framework, devcd_dev_release() will be called when
> >> releasing the devcd device, it will call the free() callback first
> >> and try to delete the symlink in sysfs directory of the failing device.
> >> Eventhough it has checked 'devcd->failing_dev->kobj.sd' before that,
> >> there is no mechanism to ensure it's still available when accessing
> >> it in kernfs_find_ns(), refer to the diagram as below:
> >>
> >> Thread A was waiting for 'dump_state.dump_done' at #A-1-2 after
> >> calling dev_coredumpm().
> >> When thread B calling devcd->free() at #B-2-1, it wakes up
> >> thread A from point #A-1-2, which will call device_del() to
> >> delete the device.
> >> If #B-2-2 comes before #A-3-1, but #B-4 comes after #A-4, it
> >> will hit use-after-free issue when trying to access
> >> 'devcd->failing_dev->kobj.sd'.
> >>
> >> #A-1-1: dev_coredumpm()
> >> #A-1-2: wait_for_completion(&dump_state.dump_done)
> >> #A-1-3: device_del()
> >> #A-2: kobject_del()
> >> #A-3-1: sysfs_remove_dir() --> set kobj->sd=NULL
> >> #A-3-2: kernfs_put()
> >> #A-4: kmem_cache_free() --> free kobj->sd
> >>
> >> #B-1: devcd_dev_release()
> >> #B-2-1: devcd->free(devcd->data)
> >> #B-2-2: check devcd->failing_dev->kobj.sd
> >> #B-2-3: sysfs_delete_link()
> >> #B-3: kernfs_remove_by_name_ns()
> >> #B-4: kernfs_find_ns() --> access devcd->failing_dev->kobj.sd
> >>
> >> To fix this issue, put operations on devcd->failing_dev before
> >> calling the free() callback in devcd_dev_release().
> >>
> >> Signed-off-by: Yu Wang <[email protected]>
> >> ---
> >> drivers/base/devcoredump.c | 5 ++---
> >> 1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > Also, what commit id does this fix?
>
> Thanks for your comment :)
> Do you mean the commit which introduced this issue? It's from initial version of devcoredump.c.

Ok, but then what in-kernel code has the above pattern to cause this
"problem"? Why not fix that up?

thanks,

greg k-h

2023-10-31 08:30:41

by Yu Wang

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

Thanks for your comments :)

On 10/27/2023 8:45 PM, Johannes Berg wrote:
> On Fri, 2023-10-27 at 13:11 +0200, Greg KH wrote:
>>>
>>> static void my_coredump()
>>> {
>>> struct my_coredump_state dump_state;
>>> struct device *new_device =
>>> kzalloc(sizeof(*new_device), GFP_KERNEL);
>>>
>>> ...
>>> new_device->release = my_dev_release;
>>> device_initialize(new_device);
>>> ...
>>> device_add(new_device);
>>> ...
>>> init_completion(&dump_state.dump_done);
>>> dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
>>> my_coredump_read, my_coredump_free);
>>> wait_for_completion(&dump_state.dump_done);
>>> device_del(new_device);
>>> put_device(new_device);
>>> }
>>
>> Is there any in-kernel user like this? If so, why not fix them up to
>> not do this?

In this case, the device is temporarily added for dump only, so we need to
delete it when dump is completed.
The other users doesn't add/delete the device like this.

>>
>
> Maybe this is only a simplified scenario and whenever you remove a
> device when a coredump is still pending this can happen?

It removes the device when the @free function has been called, I think
the @free function should be considered as a completion signal, and so
we need to put @free at the end of falling-device-related-clean-up in
devcoredump framework (the change in the patch).

>
> Actually, no, wait, what is this doing??? Why is there a completion and
> all that stuff there? You shouldn't really care about the dump once you
> have created it, and not pass NULL for the struct module pointer
> either?!

Yes, I know we don't need to care about the dump data, but as mentioned
upon, the device is locally and temporarily created for this one-time dump
only, we need to free it when dump is completed, so introduce this completion.
Refer to drivers/remoteproc/remoteproc_coredump.c.

Regarding NULL for the struct module pointer, looks it's allowed for this
API (<remoteproc_coredump.c> also pass NULL)? But yes, it's not nice indeed,
we need this to get a reference of the calling module for safety.
Will correct in the next patch set.

>
> johannes

Best Regards,
Yu


2023-10-31 09:01:08

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

On Tue, 2023-10-31 at 16:29 +0800, Yu Wang wrote:
>
> In this case, the device is temporarily added for dump only, so we need to
> delete it when dump is completed.
> The other users doesn't add/delete the device like this.

For good reason, I guess? I think this is probably a bad idea.

The whole point of this was to actually know which device created the
coredump? If you make one up on the fly that's ... pointless? Surely you
must have _some_ device that already exists?

> It removes the device when the @free function has been called, I think
> the @free function should be considered as a completion signal, and so
> we need to put @free at the end of falling-device-related-clean-up in
> devcoredump framework (the change in the patch).

That may be true for the case you have, but really, I wouldn't consider
that a bug now?

Besides, we do hav<e put_device() at the end, and I'm not sure I see how
the device can be removed before put_device()?

Can parts of the device there disappear before we release all the
references? Could that mean the scenario is also possible without your
contorted device creation and removal, just by unplugging the device
while a coredump exists in sysfs?

> Yes, I know we don't need to care about the dump data, but as mentioned
> upon, the device is locally and temporarily created for this one-time dump
> only, we need to free it when dump is completed, so introduce this completion.
> Refer to drivers/remoteproc/remoteproc_coredump.c.

I still don't think this is right ... Even the code there is awful, it
potentially blocks for 5 minutes? I'm not sure we should encourage that.

Note that you could also argue exactly the other way around - what if
the *free* function requires access to the device? It gets arbitrary
data after all.


> Regarding NULL for the struct module pointer, looks it's allowed for this
> API (<remoteproc_coredump.c> also pass NULL)? But yes, it's not nice indeed,
> we need this to get a reference of the calling module for safety.
> Will correct in the next patch set.

Well, it's not really allowed to literally write NULL - maybe we should
have a macro that fills in THIS_MODULE. But THIS_MODULE can be NULL at
runtime if it's in built-in code ... so we can't catch it.

But actually if you do have the completion you wouldn't care about this
specific case, but ...

johannes

2023-10-31 09:42:09

by Yu Wang

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device



On 10/31/2023 3:39 PM, Greg KH wrote:
> On Tue, Oct 31, 2023 at 03:15:12PM +0800, Yu Wang wrote:
>>
>>
>> On 10/27/2023 7:12 PM, Greg KH wrote:
>>> On Thu, Oct 26, 2023 at 10:55:21PM -0700, Yu Wang wrote:
>>>> With sample code as below, it may hit use-after-free issue when
>>>> releasing devcd device.
>>>>
>>>> struct my_coredump_state {
>>>> struct completion dump_done;
>>>> ...
>>>> };
>>>>
>>>> static void my_coredump_free(void *data)
>>>> {
>>>> struct my_coredump_state *dump_state = data;
>>>> ...
>>>> complete(&dump_state->dump_done);
>>>> }
>>>>
>>>> static void my_dev_release(struct device *dev)
>>>> {
>>>> kfree(dev);
>>>> }
>>>>
>>>> static void my_coredump()
>>>> {
>>>> struct my_coredump_state dump_state;
>>>> struct device *new_device =
>>>> kzalloc(sizeof(*new_device), GFP_KERNEL);
>>>>
>>>> ...
>>>> new_device->release = my_dev_release;
>>>> device_initialize(new_device);
>>>> ...
>>>> device_add(new_device);
>>>> ...
>>>> init_completion(&dump_state.dump_done);
>>>> dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
>>>> my_coredump_read, my_coredump_free);
>>>> wait_for_completion(&dump_state.dump_done);
>>>> device_del(new_device);
>>>> put_device(new_device);
>>>> }
>>>>
>>>> In devcoredump framework, devcd_dev_release() will be called when
>>>> releasing the devcd device, it will call the free() callback first
>>>> and try to delete the symlink in sysfs directory of the failing device.
>>>> Eventhough it has checked 'devcd->failing_dev->kobj.sd' before that,
>>>> there is no mechanism to ensure it's still available when accessing
>>>> it in kernfs_find_ns(), refer to the diagram as below:
>>>>
>>>> Thread A was waiting for 'dump_state.dump_done' at #A-1-2 after
>>>> calling dev_coredumpm().
>>>> When thread B calling devcd->free() at #B-2-1, it wakes up
>>>> thread A from point #A-1-2, which will call device_del() to
>>>> delete the device.
>>>> If #B-2-2 comes before #A-3-1, but #B-4 comes after #A-4, it
>>>> will hit use-after-free issue when trying to access
>>>> 'devcd->failing_dev->kobj.sd'.
>>>>
>>>> #A-1-1: dev_coredumpm()
>>>> #A-1-2: wait_for_completion(&dump_state.dump_done)
>>>> #A-1-3: device_del()
>>>> #A-2: kobject_del()
>>>> #A-3-1: sysfs_remove_dir() --> set kobj->sd=NULL
>>>> #A-3-2: kernfs_put()
>>>> #A-4: kmem_cache_free() --> free kobj->sd
>>>>
>>>> #B-1: devcd_dev_release()
>>>> #B-2-1: devcd->free(devcd->data)
>>>> #B-2-2: check devcd->failing_dev->kobj.sd
>>>> #B-2-3: sysfs_delete_link()
>>>> #B-3: kernfs_remove_by_name_ns()
>>>> #B-4: kernfs_find_ns() --> access devcd->failing_dev->kobj.sd
>>>>
>>>> To fix this issue, put operations on devcd->failing_dev before
>>>> calling the free() callback in devcd_dev_release().
>>>>
>>>> Signed-off-by: Yu Wang <[email protected]>
>>>> ---
>>>> drivers/base/devcoredump.c | 5 ++---
>>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> Also, what commit id does this fix?
>>
>> Thanks for your comment :)
>> Do you mean the commit which introduced this issue? It's from initial version of devcoredump.c.
>
> Ok, but then what in-kernel code has the above pattern to cause this
> "problem"? Why not fix that up?
>
We use this API as below:
<Create a device> -> <submit dump on it and wait for completion> -> <Remove the device>.

The difference with the in-kernel code is that the time between <submit dump on it and wait for completion>
and <remove the device> is very short and causes race between sysfs_delete_link() and device_del().
I think devcoredump framework should also cover this case.

> thanks,
>
> greg k-h

2023-10-31 09:51:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

On Tue, Oct 31, 2023 at 05:41:29PM +0800, Yu Wang wrote:
>
>
> On 10/31/2023 3:39 PM, Greg KH wrote:
> > On Tue, Oct 31, 2023 at 03:15:12PM +0800, Yu Wang wrote:
> >>
> >>
> >> On 10/27/2023 7:12 PM, Greg KH wrote:
> >>> On Thu, Oct 26, 2023 at 10:55:21PM -0700, Yu Wang wrote:
> >>>> With sample code as below, it may hit use-after-free issue when
> >>>> releasing devcd device.
> >>>>
> >>>> struct my_coredump_state {
> >>>> struct completion dump_done;
> >>>> ...
> >>>> };
> >>>>
> >>>> static void my_coredump_free(void *data)
> >>>> {
> >>>> struct my_coredump_state *dump_state = data;
> >>>> ...
> >>>> complete(&dump_state->dump_done);
> >>>> }
> >>>>
> >>>> static void my_dev_release(struct device *dev)
> >>>> {
> >>>> kfree(dev);
> >>>> }
> >>>>
> >>>> static void my_coredump()
> >>>> {
> >>>> struct my_coredump_state dump_state;
> >>>> struct device *new_device =
> >>>> kzalloc(sizeof(*new_device), GFP_KERNEL);
> >>>>
> >>>> ...
> >>>> new_device->release = my_dev_release;
> >>>> device_initialize(new_device);
> >>>> ...
> >>>> device_add(new_device);
> >>>> ...
> >>>> init_completion(&dump_state.dump_done);
> >>>> dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
> >>>> my_coredump_read, my_coredump_free);
> >>>> wait_for_completion(&dump_state.dump_done);
> >>>> device_del(new_device);
> >>>> put_device(new_device);
> >>>> }
> >>>>
> >>>> In devcoredump framework, devcd_dev_release() will be called when
> >>>> releasing the devcd device, it will call the free() callback first
> >>>> and try to delete the symlink in sysfs directory of the failing device.
> >>>> Eventhough it has checked 'devcd->failing_dev->kobj.sd' before that,
> >>>> there is no mechanism to ensure it's still available when accessing
> >>>> it in kernfs_find_ns(), refer to the diagram as below:
> >>>>
> >>>> Thread A was waiting for 'dump_state.dump_done' at #A-1-2 after
> >>>> calling dev_coredumpm().
> >>>> When thread B calling devcd->free() at #B-2-1, it wakes up
> >>>> thread A from point #A-1-2, which will call device_del() to
> >>>> delete the device.
> >>>> If #B-2-2 comes before #A-3-1, but #B-4 comes after #A-4, it
> >>>> will hit use-after-free issue when trying to access
> >>>> 'devcd->failing_dev->kobj.sd'.
> >>>>
> >>>> #A-1-1: dev_coredumpm()
> >>>> #A-1-2: wait_for_completion(&dump_state.dump_done)
> >>>> #A-1-3: device_del()
> >>>> #A-2: kobject_del()
> >>>> #A-3-1: sysfs_remove_dir() --> set kobj->sd=NULL
> >>>> #A-3-2: kernfs_put()
> >>>> #A-4: kmem_cache_free() --> free kobj->sd
> >>>>
> >>>> #B-1: devcd_dev_release()
> >>>> #B-2-1: devcd->free(devcd->data)
> >>>> #B-2-2: check devcd->failing_dev->kobj.sd
> >>>> #B-2-3: sysfs_delete_link()
> >>>> #B-3: kernfs_remove_by_name_ns()
> >>>> #B-4: kernfs_find_ns() --> access devcd->failing_dev->kobj.sd
> >>>>
> >>>> To fix this issue, put operations on devcd->failing_dev before
> >>>> calling the free() callback in devcd_dev_release().
> >>>>
> >>>> Signed-off-by: Yu Wang <[email protected]>
> >>>> ---
> >>>> drivers/base/devcoredump.c | 5 ++---
> >>>> 1 file changed, 2 insertions(+), 3 deletions(-)
> >>>
> >>> Also, what commit id does this fix?
> >>
> >> Thanks for your comment :)
> >> Do you mean the commit which introduced this issue? It's from initial version of devcoredump.c.
> >
> > Ok, but then what in-kernel code has the above pattern to cause this
> > "problem"? Why not fix that up?
> >
> We use this API as below:
> <Create a device> -> <submit dump on it and wait for completion> -> <Remove the device>.

What device are you creating? What driver does this?

> The difference with the in-kernel code is that the time between <submit dump on it and wait for completion>
> and <remove the device> is very short and causes race between sysfs_delete_link() and device_del().
> I think devcoredump framework should also cover this case.

I think you shouldn't abuse the api as are you sure this is what it was
actually designed for? :)

Perhaps your "wait for completion really isn't waiting long enough?
Rememember, you never really know when a device is going to be removed,
that's out of your control due to reference counting. You are doing the
reference counting correct, right? Pointers to the code that uses this
would be appreciated.

thanks,

greg k-h

2023-10-31 12:46:43

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device



On 10/31/2023 2:29 PM, Johannes Berg wrote:
> On Tue, 2023-10-31 at 16:29 +0800, Yu Wang wrote:
>>
>> In this case, the device is temporarily added for dump only, so we need to
>> delete it when dump is completed.
>> The other users doesn't add/delete the device like this.
>
> For good reason, I guess? I think this is probably a bad idea.
>
> The whole point of this was to actually know which device created the
> coredump? If you make one up on the fly that's ... pointless? Surely you
> must have _some_ device that already exists?

Passing device name to be user space looks to be the reason.

Looks like here, it is trying to do what devcoredump is already doing,
like dev_coredump creates a custom device and deletes it after either
5 min or based on user space action. Same might being called from caller
of devcoredumpm().
devcd->free() should not be assumed to delete custom device
as devcd has reference to your device..and it can not be freed unless
devcoredump put reference to your device..

What is the assumption behind this comment 89-92 ?
sysfs_delete_link() is assuming the device is still there and
deleting the link..why is this needed if this is vulnerable..

82 static void devcd_dev_release(struct device *dev)
83 {
84 struct devcd_entry *devcd = dev_to_devcd(dev);
85
86 devcd->free(devcd->data);
87 module_put(devcd->owner);
88
89 /*
90 * this seems racy, but I don't see a notifier or such on
91 * a struct device to know when it goes away?
92 */
93 if (devcd->failing_dev->kobj.sd)
94 sysfs_delete_link(&devcd->failing_dev->kobj, &dev->kobj,
95 "devcoredump");
96
97 put_device(devcd->failing_dev);
98 kfree(devcd);


-Mukesh
>
>> It removes the device when the @free function has been called, I think
>> the @free function should be considered as a completion signal, and so
>> we need to put @free at the end of falling-device-related-clean-up in
>> devcoredump framework (the change in the patch).
>
> That may be true for the case you have, but really, I wouldn't consider
> that a bug now? >
> Besides, we do hav<e put_device() at the end, and I'm not sure I see how
> the device can be removed before put_device()?
>
> Can parts of the device there disappear before we release all the
> references? Could that mean the scenario is also possible without your
> contorted device creation and removal, just by unplugging the device
> while a coredump exists in sysfs?


>
>> Yes, I know we don't need to care about the dump data, but as mentioned
>> upon, the device is locally and temporarily created for this one-time dump
>> only, we need to free it when dump is completed, so introduce this completion.
>> Refer to drivers/remoteproc/remoteproc_coredump.c.
>
> I still don't think this is right ... Even the code there is awful, it
> potentially blocks for 5 minutes? I'm not sure we should encourage that.
>
> Note that you could also argue exactly the other way around - what if
> the *free* function requires access to the device? It gets arbitrary
> data after all.
>
>
>> Regarding NULL for the struct module pointer, looks it's allowed for this
>> API (<remoteproc_coredump.c> also pass NULL)? But yes, it's not nice indeed,
>> we need this to get a reference of the calling module for safety.
>> Will correct in the next patch set.
>
> Well, it's not really allowed to literally write NULL - maybe we should
> have a macro that fills in THIS_MODULE. But THIS_MODULE can be NULL at
> runtime if it's in built-in code ... so we can't catch it.
>
> But actually if you do have the completion you wouldn't care about this
> specific case, but ...
>
> johannes

2023-10-31 13:02:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

On Tue, Oct 31, 2023 at 06:16:08PM +0530, Mukesh Ojha wrote:
>
>
> On 10/31/2023 2:29 PM, Johannes Berg wrote:
> > On Tue, 2023-10-31 at 16:29 +0800, Yu Wang wrote:
> > >
> > > In this case, the device is temporarily added for dump only, so we need to
> > > delete it when dump is completed.
> > > The other users doesn't add/delete the device like this.
> >
> > For good reason, I guess? I think this is probably a bad idea.
> >
> > The whole point of this was to actually know which device created the
> > coredump? If you make one up on the fly that's ... pointless? Surely you
> > must have _some_ device that already exists?
>
> Passing device name to be user space looks to be the reason.

Wait, again, why are you creating a fake device just to dump data?
That's not what this api is for at all, why are you abusing it in ways
it was not designed to be used?

And I will strongly argue, that if no in-kernel users are having
problems, perhaps it is your out-of-tree code?

Unless you can show any in-kernel user of this trigging the issue, I
don't think there's anything we need to do here, do you?

thanks,

greg k-h