LinuxLists.cc - [PATCH v2 ] devcoredump : Serialize devcd

2022-04-25 20:52:54

Subject: [PATCH v2 ] devcoredump : Serialize devcd_del work

In following scenario(diagram), when one thread X running dev_coredumpm() adds devcd
device to the framework which sends uevent notification to userspace
and another thread Y reads this uevent and call to devcd_data_write()
which eventually try to delete the queued timer that is not initialized/queued yet.

So, debug object reports some warning and in the meantime, timer is initialized
and queued from X path. and from Y path, it gets reinitialized again and
timer->entry.pprev=NULL and try_to_grab_pending() stucks.

To fix this, introduce mutex to serialize the behaviour.

cpu0(X) cpu1(Y)

dev_coredump() uevent sent to userspace
device_add() =========================> userspace process Y reads the uevents
writes to devcd fd which
results into writes to

devcd_data_write()
mod_delayed_work()
try_to_grab_pending()
del_timer()
debug_assert_init()
INIT_DELAYED_WORK
schedule_delayed_work
debug_object_fixup()
timer_fixup_assert_init()
timer_setup()
do_init_timer() ==> reinitialized the
timer to
timer->entry.pprev=NULL

timer_pending()
!hlist_unhashed_lockless(&timer->entry)
!h->pprev ==> del_timer checks
and finds it to be NULL
try_to_grab_pending() stucks.

Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Mukesh Ojha <[email protected]>
---
v1->v2:
- Added del_wk_queued to serialize the race between devcd_data_write()
and disabled_store().

drivers/base/devcoredump.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index f4d794d..3e6fd6b 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -25,6 +25,8 @@ struct devcd_entry {
struct device devcd_dev;
void *data;
size_t datalen;
+ struct mutex mutex;
+ bool del_wk_queued;
struct module *owner;
ssize_t (*read)(char *buffer, loff_t offset, size_t count,
void *data, size_t datalen);
@@ -84,7 +86,12 @@ static ssize_t devcd_data_write(struct file *filp, struct kobject *kobj,
struct device *dev = kobj_to_dev(kobj);
struct devcd_entry *devcd = dev_to_devcd(dev);

- mod_delayed_work(system_wq, &devcd->del_wk, 0);
+ mutex_lock(&devcd->mutex);
+ if (!devcd->del_wk_queued) {
+ devcd->del_wk_queued = true;
+ mod_delayed_work(system_wq, &devcd->del_wk, 0);
+ }
+ mutex_unlock(&devcd->mutex);

return count;
}
@@ -112,7 +119,12 @@ static int devcd_free(struct device *dev, void *data)
{
struct devcd_entry *devcd = dev_to_devcd(dev);

+ mutex_lock(&devcd->mutex);
+ if (!devcd->del_wk_queued)
+ devcd->del_wk_queued = true;
+
flush_delayed_work(&devcd->del_wk);
+ mutex_unlock(&devcd->mutex);
return 0;
}

@@ -278,13 +290,15 @@ void dev_coredumpm(struct device *dev, struct module *owner,
devcd->read = read;
devcd->free = free;
devcd->failing_dev = get_device(dev);
-
+ mutex_init(&devcd->mutex);
device_initialize(&devcd->devcd_dev);

dev_set_name(&devcd->devcd_dev, "devcd%d",
atomic_inc_return(&devcd_count));
devcd->devcd_dev.class = &devcd_class;

+ mutex_lock(&devcd->mutex);
+ devcd->del_wk_queued = false;
if (device_add(&devcd->devcd_dev))
goto put_device;

@@ -301,10 +315,11 @@ void dev_coredumpm(struct device *dev, struct module *owner,

INIT_DELAYED_WORK(&devcd->del_wk, devcd_del);
schedule_delayed_work(&devcd->del_wk, DEVCD_TIMEOUT);
-
+ mutex_unlock(&devcd->mutex);
return;
put_device:
put_device(&devcd->devcd_dev);
+ mutex_unlock(&devcd->mutex);
put_module:
module_put(owner);
free:
--
2.7.4

2022-04-25 22:22:34

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH v2 ] devcoredump : Serialize devcd_del work

On Mon, Apr 25 2022 at 18:39, Mukesh Ojha wrote:
> v1->v2:
> - Added del_wk_queued to serialize the race between devcd_data_write()
> and disabled_store().

How so?

Neither the flag nor the mutex can prevent the race between the work
being executed in parallel.

disabled_store() worker()

class_for_each_device(&devcd_class, NULL, NULL, devcd_free)
...
while ((dev = class_dev_iter_next(&iter)) {
devcd_del()
device_del()
put_device() <- last reference
error = fn(dev, data) devcd_dev_release()
devcd_free(dev, data) kfree(devcd)
mutex_lock(&devcd->mutex);

There is zero protection of the class iterator against the work being
executed and removing the device and freeing its data. IOW, at the
point where fn(), i.e. devcd_free(), dereferences 'dev' to acquire the
mutex, it might be gone. No?

If disabled_store() really needs to flush all instances immediately,
then it requires global serialization, not device specific serialization.

Johannes, can you please explain whether this immediate flush in
disabled_store() is really required and if so, why?

Thanks,

tglx

2022-04-26 00:26:23

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH v2 ] devcoredump : Serialize devcd_del work

On Mon, Apr 25, 2022 at 06:39:53PM +0530, Mukesh Ojha wrote:
> In following scenario(diagram), when one thread X running dev_coredumpm() adds devcd
> device to the framework which sends uevent notification to userspace
> and another thread Y reads this uevent and call to devcd_data_write()
> which eventually try to delete the queued timer that is not initialized/queued yet.
>
> So, debug object reports some warning and in the meantime, timer is initialized
> and queued from X path. and from Y path, it gets reinitialized again and
> timer->entry.pprev=NULL and try_to_grab_pending() stucks.

Nit, please wrap your lines at 72 columns like git asked you to when you
made the commit

>
> To fix this, introduce mutex to serialize the behaviour.
>
> cpu0(X) cpu1(Y)
>
> dev_coredump() uevent sent to userspace
> device_add() =========================> userspace process Y reads the uevents
> writes to devcd fd which
> results into writes to
>
> devcd_data_write()
> mod_delayed_work()
> try_to_grab_pending()
> del_timer()
> debug_assert_init()
> INIT_DELAYED_WORK
> schedule_delayed_work
> debug_object_fixup()
> timer_fixup_assert_init()
> timer_setup()
> do_init_timer()?? ==> reinitialized the
> timer to
> timer->entry.pprev=NULL
>
> timer_pending()
> !hlist_unhashed_lockless(&timer->entry)
> !h->pprev ==> del_timer checks
> and finds it to be NULL
> try_to_grab_pending() stucks.

Mix of tabs and spaces? This can all go left a bit as well.

>
> Link: https://lore.kernel.org/lkml/[email protected]/
> Signed-off-by: Mukesh Ojha <[email protected]>
> ---
> v1->v2:
> - Added del_wk_queued to serialize the race between devcd_data_write()
> and disabled_store().
>
> drivers/base/devcoredump.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
> index f4d794d..3e6fd6b 100644
> --- a/drivers/base/devcoredump.c
> +++ b/drivers/base/devcoredump.c
> @@ -25,6 +25,8 @@ struct devcd_entry {
> struct device devcd_dev;
> void *data;
> size_t datalen;
> + struct mutex mutex;

Document what this lock is for here please. I think checkpatch asks you
for that, right?

> + bool del_wk_queued;

Please spell this out better, you can use vowels :)

> struct module *owner;
> ssize_t (*read)(char *buffer, loff_t offset, size_t count,
> void *data, size_t datalen);
> @@ -84,7 +86,12 @@ static ssize_t devcd_data_write(struct file *filp, struct kobject *kobj,
> struct device *dev = kobj_to_dev(kobj);
> struct devcd_entry *devcd = dev_to_devcd(dev);
>
> - mod_delayed_work(system_wq, &devcd->del_wk, 0);
> + mutex_lock(&devcd->mutex);
> + if (!devcd->del_wk_queued) {
> + devcd->del_wk_queued = true;
> + mod_delayed_work(system_wq, &devcd->del_wk, 0);
> + }
> + mutex_unlock(&devcd->mutex);
>
> return count;
> }
> @@ -112,7 +119,12 @@ static int devcd_free(struct device *dev, void *data)
> {
> struct devcd_entry *devcd = dev_to_devcd(dev);
>
> + mutex_lock(&devcd->mutex);
> + if (!devcd->del_wk_queued)
> + devcd->del_wk_queued = true;
> +
> flush_delayed_work(&devcd->del_wk);
> + mutex_unlock(&devcd->mutex);
> return 0;
> }
>
> @@ -278,13 +290,15 @@ void dev_coredumpm(struct device *dev, struct module *owner,
> devcd->read = read;
> devcd->free = free;
> devcd->failing_dev = get_device(dev);
> -
> + mutex_init(&devcd->mutex);

Why drop the blank line?

> device_initialize(&devcd->devcd_dev);
>
> dev_set_name(&devcd->devcd_dev, "devcd%d",
> atomic_inc_return(&devcd_count));
> devcd->devcd_dev.class = &devcd_class;
>
> + mutex_lock(&devcd->mutex);

Why lock this here?

> + devcd->del_wk_queued = false;

This was already set to false above, right? And if you want to
explicitly initialize it, do it where the other variables are
initialized up by mutex_init() please.

thanks,

greg k-h

2022-04-26 02:19:57

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH v2 ] devcoredump : Serialize devcd_del work

Cc+: Kees

On Mon, Apr 25 2022 at 19:19, Johannes Berg wrote:
> On Mon, 2022-04-25 at 19:00 +0200, Thomas Gleixner wrote:
>>
>> Johannes, can you please explain whether this immediate flush in
>> disabled_store() is really required and if so, why?
>>
> I don't really know, as I remember that requirement (or maybe even code,
> not sure) came from Kees, who needed the lockdown.
>
> Given the use case (ChromeOS?) I'm not sure I see a need to flush all of
> them, since I guess a typical system would set the lockdown early in
> boot and hopefully not have a crash-dump around already.
>
> That said, I don't think the diagram you made works - fn() during the
> iteration is guaranteed to be invoked with a reference of its own, so
> the put_device() there can't be the last reference, only as fn() returns
> you'd put the last reference *there*, freeing it.

Bah, you are right, it's magically protected by the klist ref, which
prevents devcd from going away. Damned obvious.

This really needs comments why this all can magically "work".

Thanks,

tglx

2022-04-26 17:28:35

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH v2 ] devcoredump : Serialize devcd_del work

On Mon, 2022-04-25 at 19:00 +0200, Thomas Gleixner wrote:
>
> Johannes, can you please explain whether this immediate flush in
> disabled_store() is really required and if so, why?
>

I don't really know, as I remember that requirement (or maybe even code,
not sure) came from Kees, who needed the lockdown.

Given the use case (ChromeOS?) I'm not sure I see a need to flush all of
them, since I guess a typical system would set the lockdown early in
boot and hopefully not have a crash-dump around already.

That said, I don't think the diagram you made works - fn() during the
iteration is guaranteed to be invoked with a reference of its own, so
the put_device() there can't be the last reference, only as fn() returns
you'd put the last reference *there*, freeing it.

johannes

2022-04-27 10:10:07

by Mukesh Ojha

[permalink] [raw]

Subject: Re: [PATCH v2 ] devcoredump : Serialize devcd_del work

On 4/26/2022 1:07 AM, Thomas Gleixner wrote:
> Cc+: Kees
>
> On Mon, Apr 25 2022 at 19:19, Johannes Berg wrote:
>> On Mon, 2022-04-25 at 19:00 +0200, Thomas Gleixner wrote:
>>>
>>> Johannes, can you please explain whether this immediate flush in
>>> disabled_store() is really required and if so, why?
>>>
>> I don't really know, as I remember that requirement (or maybe even code,
>> not sure) came from Kees, who needed the lockdown.
>>
>> Given the use case (ChromeOS?) I'm not sure I see a need to flush all of
>> them, since I guess a typical system would set the lockdown early in
>> boot and hopefully not have a crash-dump around already.
>>
>> That said, I don't think the diagram you made works - fn() during the
>> iteration is guaranteed to be invoked with a reference of its own, so
>> the put_device() there can't be the last reference, only as fn() returns
>> you'd put the last reference *there*, freeing it.
>
> Bah, you are right, it's magically protected by the klist ref, which
> prevents devcd from going away. Damned obvious.
>
> This really needs comments why this all can magically "work".
>
> Thanks,
>
> tglx
>

Thanks you all for your time in reviewing this.
I tried to address few comments in v3 here.

https://lore.kernel.org/lkml/[email protected]/

While, we would like to hear from Kees about reason of immediate flush
from disabled_store().

Regards,
-Mukesh
>
>
>

2022-04-27 11:11:49

by Kees Cook

[permalink] [raw]

Subject: Re: [PATCH v2 ] devcoredump : Serialize devcd_del work

On Tue, Apr 26, 2022 at 07:34:07PM +0530, Mukesh Ojha wrote:
>
>
> On 4/26/2022 1:07 AM, Thomas Gleixner wrote:
> > Cc+: Kees
> >
> > On Mon, Apr 25 2022 at 19:19, Johannes Berg wrote:
> > > On Mon, 2022-04-25 at 19:00 +0200, Thomas Gleixner wrote:
> > > >
> > > > Johannes, can you please explain whether this immediate flush in
> > > > disabled_store() is really required and if so, why?
> > > >
> > > I don't really know, as I remember that requirement (or maybe even code,
> > > not sure) came from Kees, who needed the lockdown.
> > >
> > > Given the use case (ChromeOS?) I'm not sure I see a need to flush all of
> > > them, since I guess a typical system would set the lockdown early in
> > > boot and hopefully not have a crash-dump around already.
> > >
> > > That said, I don't think the diagram you made works - fn() during the
> > > iteration is guaranteed to be invoked with a reference of its own, so
> > > the put_device() there can't be the last reference, only as fn() returns
> > > you'd put the last reference *there*, freeing it.
> >
> > Bah, you are right, it's magically protected by the klist ref, which
> > prevents devcd from going away. Damned obvious.
> >
> > This really needs comments why this all can magically "work".
> >
> > Thanks,
> >
> > tglx
> >
>
> Thanks you all for your time in reviewing this.
> I tried to address few comments in v3 here.
>
> https://lore.kernel.org/lkml/[email protected]/
>
> While, we would like to hear from Kees about reason of immediate flush from
> disabled_store().

This is lost to ancient history in my brain right now. Do you have any
references to past threads on this? The only thing I remember about
device memory dumping was just to make sure that lockdown's
CONFIDENTIALITY mode would block it.

--
Kees Cook

2022-04-27 12:23:12

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH v2 ] devcoredump : Serialize devcd_del work

On Tue, 2022-04-26 at 14:25 -0700, Kees Cook wrote:
> >
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > While, we would like to hear from Kees about reason of immediate flush from
> > disabled_store().
>
> This is lost to ancient history in my brain right now. Do you have any
> references to past threads on this? The only thing I remember about
> device memory dumping was just to make sure that lockdown's
> CONFIDENTIALITY mode would block it.
>

Hm, no, I don't, sorry. I don't even have an email record of it now,
perhaps it went to my @intel.com and has long expired?

Anyway, I seem to remember you asking to have a disable bit and I
implemented it in commit d45333294da8, which you reviewed, but I have no
record of you asking for it even though I think you did :)

johannes