2014-11-06 07:35:09

by Yijing Wang

[permalink] [raw]
Subject: [PATCH] sysfs: driver core: Fix glue dir race condition

There is a race condition when removing glue directory.
It can be reproduced in following test:

path 1: Add first child device
device_add()
get_device_parent()
/*find parent from glue_dirs.list*/
list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
if (k->parent == parent_kobj) {
kobj = kobject_get(k);
break;
}
....
class_dir_create_and_add()

path2: Remove last child device under glue dir
device_del()
cleanup_device_parent()
cleanup_glue_dir()
kobject_put(glue_dir);

If path2 has been called cleanup_glue_dir(), but not
call kobject_put(glue_dir), the glue dir is still
in parent's kset list. Meanwhile, path1 find the glue
dir from the glue_dirs.list. Path2 may release glue dir
before path1 call kobject_get(). So kernel will report
the warning and bug_on.

This fix keep glue dir around once it created suggested
by Tejun Heo.

The following calltrace is captured in kernel 3.4, but
the latest kernel still has this bug.

-----------------------------------------------------
<4>[ 3965.441471] WARNING: at ...include/linux/kref.h:41 kobject_get+0x33/0x40()
<4>[ 3965.441474] Hardware name: Romley
<4>[ 3965.441475] Modules linked in: isd_iop(O) isd_xda(O)...
...
<4>[ 3965.441605] Call Trace:
<4>[ 3965.441611] [<ffffffff8103717a>] warn_slowpath_common+0x7a/0xb0
<4>[ 3965.441615] [<ffffffff810371c5>] warn_slowpath_null+0x15/0x20
<4>[ 3965.441618] [<ffffffff81215963>] kobject_get+0x33/0x40
<4>[ 3965.441624] [<ffffffff812d1e45>] get_device_parent.isra.11+0x135/0x1f0
<4>[ 3965.441627] [<ffffffff812d22d4>] device_add+0xd4/0x6d0
<4>[ 3965.441631] [<ffffffff812d0dbc>] ? dev_set_name+0x3c/0x40
....
<2>[ 3965.441912] kernel BUG at ..../fs/sysfs/group.c:65!
<4>[ 3965.441915] invalid opcode: 0000 [#1] SMP
...
<4>[ 3965.686743] [<ffffffff811a677e>] sysfs_create_group+0xe/0x10
<4>[ 3965.686748] [<ffffffff810cfb04>] blk_trace_init_sysfs+0x14/0x20
<4>[ 3965.686753] [<ffffffff811fcabb>] blk_register_queue+0x3b/0x120
<4>[ 3965.686756] [<ffffffff812030bc>] add_disk+0x1cc/0x490
....
-------------------------------------------------------

Signed-off-by: Yijing Wang <[email protected]>
Signed-off-by: Weng Meiling <[email protected]>
Cc: <[email protected]> #3.4+
---
drivers/base/core.c | 21 +--------------------
1 files changed, 1 insertions(+), 20 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 14d1629..5b7a3e9 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -761,7 +761,7 @@ static struct kobject *get_device_parent(struct device *dev,
spin_lock(&dev->class->p->glue_dirs.list_lock);
list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
if (k->parent == parent_kobj) {
- kobj = kobject_get(k);
+ kobj = k;
break;
}
spin_unlock(&dev->class->p->glue_dirs.list_lock);
@@ -786,21 +786,6 @@ static struct kobject *get_device_parent(struct device *dev,
return NULL;
}

-static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
-{
- /* see if we live in a "glue" directory */
- if (!glue_dir || !dev->class ||
- glue_dir->kset != &dev->class->p->glue_dirs)
- return;
-
- kobject_put(glue_dir);
-}
-
-static void cleanup_device_parent(struct device *dev)
-{
- cleanup_glue_dir(dev, dev->kobj.parent);
-}
-
static int device_add_class_symlinks(struct device *dev)
{
int error;
@@ -1094,7 +1079,6 @@ done:
kobject_uevent(&dev->kobj, KOBJ_REMOVE);
kobject_del(&dev->kobj);
Error:
- cleanup_device_parent(dev);
if (parent)
put_device(parent);
name_error:
@@ -1215,7 +1199,6 @@ void device_del(struct device *dev)
blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
BUS_NOTIFY_REMOVED_DEVICE, dev);
kobject_uevent(&dev->kobj, KOBJ_REMOVE);
- cleanup_device_parent(dev);
kobject_del(&dev->kobj);
put_device(parent);
}
@@ -1873,7 +1856,6 @@ int device_move(struct device *dev, struct device *new_parent,
__func__, new_parent ? dev_name(new_parent) : "<NULL>");
error = kobject_move(&dev->kobj, new_parent_kobj);
if (error) {
- cleanup_glue_dir(dev, new_parent_kobj);
put_device(new_parent);
goto out;
}
@@ -1902,7 +1884,6 @@ int device_move(struct device *dev, struct device *new_parent,
set_dev_node(dev, dev_to_node(old_parent));
}
}
- cleanup_glue_dir(dev, new_parent_kobj);
put_device(new_parent);
goto out;
}
--
1.7.1


2014-11-06 16:55:58

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] sysfs: driver core: Fix glue dir race condition

Maybe "fix glue dir race condition by not removing them" is a better
title?

On Thu, Nov 06, 2014 at 04:16:38PM +0800, Yijing Wang wrote:
> There is a race condition when removing glue directory.
> It can be reproduced in following test:
>
> path 1: Add first child device
> device_add()
> get_device_parent()
> /*find parent from glue_dirs.list*/
> list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
> if (k->parent == parent_kobj) {
> kobj = kobject_get(k);
> break;
> }
> ....
> class_dir_create_and_add()
>
> path2: Remove last child device under glue dir
> device_del()
> cleanup_device_parent()
> cleanup_glue_dir()
> kobject_put(glue_dir);
>
> If path2 has been called cleanup_glue_dir(), but not
> call kobject_put(glue_dir), the glue dir is still
> in parent's kset list. Meanwhile, path1 find the glue
> dir from the glue_dirs.list. Path2 may release glue dir
> before path1 call kobject_get(). So kernel will report
> the warning and bug_on.
>
> This fix keep glue dir around once it created suggested
> by Tejun Heo.

I think you prolly want to explain why this is okay / desired.
e.g. list how the glue dir is used and how many of them are there and
explain that there's no real benefit in removing them.

...
> Signed-off-by: Yijing Wang <[email protected]>
> Signed-off-by: Weng Meiling <[email protected]>
> Cc: <[email protected]> #3.4+

Except for the above nits.

Reviewed-by: Tejun Heo <[email protected]>

--
tejun

2014-11-06 17:22:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] sysfs: driver core: Fix glue dir race condition

On Thu, Nov 06, 2014 at 11:55:47AM -0500, Tejun Heo wrote:
> Maybe "fix glue dir race condition by not removing them" is a better
> title?
>
> On Thu, Nov 06, 2014 at 04:16:38PM +0800, Yijing Wang wrote:
> > There is a race condition when removing glue directory.
> > It can be reproduced in following test:
> >
> > path 1: Add first child device
> > device_add()
> > get_device_parent()
> > /*find parent from glue_dirs.list*/
> > list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
> > if (k->parent == parent_kobj) {
> > kobj = kobject_get(k);
> > break;
> > }
> > ....
> > class_dir_create_and_add()
> >
> > path2: Remove last child device under glue dir
> > device_del()
> > cleanup_device_parent()
> > cleanup_glue_dir()
> > kobject_put(glue_dir);
> >
> > If path2 has been called cleanup_glue_dir(), but not
> > call kobject_put(glue_dir), the glue dir is still
> > in parent's kset list. Meanwhile, path1 find the glue
> > dir from the glue_dirs.list. Path2 may release glue dir
> > before path1 call kobject_get(). So kernel will report
> > the warning and bug_on.
> >
> > This fix keep glue dir around once it created suggested
> > by Tejun Heo.
>
> I think you prolly want to explain why this is okay / desired.
> e.g. list how the glue dir is used and how many of them are there and
> explain that there's no real benefit in removing them.

I'd really _like_ to remove them if at all possible, as if there isn't
any "children" in the subdirectory, there shouldn't be a need for that
directory to be there.

This seems to be the "classic" problem we have of a kref in a list that
can be found while the last instance could be removed at the same time.
I hate to just throw another lock at the problem, but wouldn't a lock to
protect the list of glue_dirs be the answer here?

thanks,

greg k-h

2014-11-07 01:24:10

by Yijing Wang

[permalink] [raw]
Subject: Re: [PATCH] sysfs: driver core: Fix glue dir race condition

On 2014/11/7 0:55, Tejun Heo wrote:
> Maybe "fix glue dir race condition by not removing them" is a better
> title?

Yes, it's better, thank you!

>
> On Thu, Nov 06, 2014 at 04:16:38PM +0800, Yijing Wang wrote:
>> There is a race condition when removing glue directory.
>> It can be reproduced in following test:
>>
>> path 1: Add first child device
>> device_add()
>> get_device_parent()
>> /*find parent from glue_dirs.list*/
>> list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
>> if (k->parent == parent_kobj) {
>> kobj = kobject_get(k);
>> break;
>> }
>> ....
>> class_dir_create_and_add()
>>
>> path2: Remove last child device under glue dir
>> device_del()
>> cleanup_device_parent()
>> cleanup_glue_dir()
>> kobject_put(glue_dir);
>>
>> If path2 has been called cleanup_glue_dir(), but not
>> call kobject_put(glue_dir), the glue dir is still
>> in parent's kset list. Meanwhile, path1 find the glue
>> dir from the glue_dirs.list. Path2 may release glue dir
>> before path1 call kobject_get(). So kernel will report
>> the warning and bug_on.
>>
>> This fix keep glue dir around once it created suggested
>> by Tejun Heo.
>
> I think you prolly want to explain why this is okay / desired.
> e.g. list how the glue dir is used and how many of them are there and
> explain that there's no real benefit in removing them.
>

Right, I will add the explanation. :)

> ...
>> Signed-off-by: Yijing Wang <[email protected]>
>> Signed-off-by: Weng Meiling <[email protected]>
>> Cc: <[email protected]> #3.4+
>
> Except for the above nits.
>
> Reviewed-by: Tejun Heo <[email protected]>

Thanks!

>


--
Thanks!
Yijing

2014-11-07 01:47:21

by Yijing Wang

[permalink] [raw]
Subject: Re: [PATCH] sysfs: driver core: Fix glue dir race condition

On 2014/11/7 1:22, Greg KH wrote:
> On Thu, Nov 06, 2014 at 11:55:47AM -0500, Tejun Heo wrote:
>> Maybe "fix glue dir race condition by not removing them" is a better
>> title?
>>
>> On Thu, Nov 06, 2014 at 04:16:38PM +0800, Yijing Wang wrote:
>>> There is a race condition when removing glue directory.
>>> It can be reproduced in following test:
>>>
>>> path 1: Add first child device
>>> device_add()
>>> get_device_parent()
>>> /*find parent from glue_dirs.list*/
>>> list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
>>> if (k->parent == parent_kobj) {
>>> kobj = kobject_get(k);
>>> break;
>>> }
>>> ....
>>> class_dir_create_and_add()
>>>
>>> path2: Remove last child device under glue dir
>>> device_del()
>>> cleanup_device_parent()
>>> cleanup_glue_dir()
>>> kobject_put(glue_dir);
>>>
>>> If path2 has been called cleanup_glue_dir(), but not
>>> call kobject_put(glue_dir), the glue dir is still
>>> in parent's kset list. Meanwhile, path1 find the glue
>>> dir from the glue_dirs.list. Path2 may release glue dir
>>> before path1 call kobject_get(). So kernel will report
>>> the warning and bug_on.
>>>
>>> This fix keep glue dir around once it created suggested
>>> by Tejun Heo.
>>
>> I think you prolly want to explain why this is okay / desired.
>> e.g. list how the glue dir is used and how many of them are there and
>> explain that there's no real benefit in removing them.
>
> I'd really _like_ to remove them if at all possible, as if there isn't
> any "children" in the subdirectory, there shouldn't be a need for that
> directory to be there.
>
> This seems to be the "classic" problem we have of a kref in a list that
> can be found while the last instance could be removed at the same time.
> I hate to just throw another lock at the problem, but wouldn't a lock to
> protect the list of glue_dirs be the answer here?

Hi Greg, in this case, we need to protect the race condition between traverse dev->class->p->glue_dirs.list
and kobject_put(glue_dir) in cleanup_glue_dir().

glue_dirs.list_lock only used to protect glue_dirs.list, but what we want to protect is
don't call kobject_put(glue_dir) to decrease glue_dir ref count during we traverse
dev->class->p->glue_dirs.list.


---------------------------------------------------------------------------
/* find our class-directory at the parent and reference it */
spin_lock(&dev->class->p->glue_dirs.list_lock);
list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry) ------>A
if (k->parent == parent_kobj) {
kobj = kobject_get(k);
break;
}
spin_unlock(&dev->class->p->glue_dirs.list_lock);
------------------------------------------------------------------------------
static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
{
/* see if we live in a "glue" directory */
if (!glue_dir || !dev->class ||
glue_dir->kset != &dev->class->p->glue_dirs)
return;

kobject_put(glue_dir); --------------->B
}
------------------------------------------------------------------------------


Tejun introduced a mutex gdp_mutex in commit 77d3d7c1d561f49 to fix the race condition in get_device_parent().
We could reuse the mutex to fix the race condition between glue_dirs.list traverse and kobject_put(glue_dir).

Greg, the two solutions (reuse the gdp_mutex and don't remove glue_dir), which one do you prefer ?


diff --git a/drivers/base/core.c b/drivers/base/core.c
index 28b808c..645eacf 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -724,12 +724,12 @@ class_dir_create_and_add(struct class *class, struct kobject *parent_kobj)
return &dir->kobj;
}

+static DEFINE_MUTEX(gdp_mutex);

static struct kobject *get_device_parent(struct device *dev,
struct device *parent)
{
if (dev->class) {
- static DEFINE_MUTEX(gdp_mutex);
struct kobject *kobj = NULL;
struct kobject *parent_kobj;
struct kobject *k;
@@ -793,7 +793,9 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
glue_dir->kset != &dev->class->p->glue_dirs)
return;

+ mutex_lock(&gdp_mutex);
kobject_put(glue_dir);
+ mutex_unlock(&gdp_mutex);
}

static void cleanup_device_parent(struct device *dev)









>
> thanks,
>
> greg k-h
>
> .
>


--
Thanks!
Yijing

2014-11-07 02:48:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] sysfs: driver core: Fix glue dir race condition

On Fri, Nov 07, 2014 at 09:44:40AM +0800, Yijing Wang wrote:
> On 2014/11/7 1:22, Greg KH wrote:
> > On Thu, Nov 06, 2014 at 11:55:47AM -0500, Tejun Heo wrote:
> >> Maybe "fix glue dir race condition by not removing them" is a better
> >> title?
> >>
> >> On Thu, Nov 06, 2014 at 04:16:38PM +0800, Yijing Wang wrote:
> >>> There is a race condition when removing glue directory.
> >>> It can be reproduced in following test:
> >>>
> >>> path 1: Add first child device
> >>> device_add()
> >>> get_device_parent()
> >>> /*find parent from glue_dirs.list*/
> >>> list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
> >>> if (k->parent == parent_kobj) {
> >>> kobj = kobject_get(k);
> >>> break;
> >>> }
> >>> ....
> >>> class_dir_create_and_add()
> >>>
> >>> path2: Remove last child device under glue dir
> >>> device_del()
> >>> cleanup_device_parent()
> >>> cleanup_glue_dir()
> >>> kobject_put(glue_dir);
> >>>
> >>> If path2 has been called cleanup_glue_dir(), but not
> >>> call kobject_put(glue_dir), the glue dir is still
> >>> in parent's kset list. Meanwhile, path1 find the glue
> >>> dir from the glue_dirs.list. Path2 may release glue dir
> >>> before path1 call kobject_get(). So kernel will report
> >>> the warning and bug_on.
> >>>
> >>> This fix keep glue dir around once it created suggested
> >>> by Tejun Heo.
> >>
> >> I think you prolly want to explain why this is okay / desired.
> >> e.g. list how the glue dir is used and how many of them are there and
> >> explain that there's no real benefit in removing them.
> >
> > I'd really _like_ to remove them if at all possible, as if there isn't
> > any "children" in the subdirectory, there shouldn't be a need for that
> > directory to be there.
> >
> > This seems to be the "classic" problem we have of a kref in a list that
> > can be found while the last instance could be removed at the same time.
> > I hate to just throw another lock at the problem, but wouldn't a lock to
> > protect the list of glue_dirs be the answer here?
>
> Hi Greg, in this case, we need to protect the race condition between traverse dev->class->p->glue_dirs.list
> and kobject_put(glue_dir) in cleanup_glue_dir().
>
> glue_dirs.list_lock only used to protect glue_dirs.list, but what we want to protect is
> don't call kobject_put(glue_dir) to decrease glue_dir ref count during we traverse
> dev->class->p->glue_dirs.list.
>
>
> ---------------------------------------------------------------------------
> /* find our class-directory at the parent and reference it */
> spin_lock(&dev->class->p->glue_dirs.list_lock);
> list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry) ------>A
> if (k->parent == parent_kobj) {
> kobj = kobject_get(k);
> break;
> }
> spin_unlock(&dev->class->p->glue_dirs.list_lock);
> ------------------------------------------------------------------------------
> static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
> {
> /* see if we live in a "glue" directory */
> if (!glue_dir || !dev->class ||
> glue_dir->kset != &dev->class->p->glue_dirs)
> return;
>
> kobject_put(glue_dir); --------------->B
> }
> ------------------------------------------------------------------------------
>
>
> Tejun introduced a mutex gdp_mutex in commit 77d3d7c1d561f49 to fix the race condition in get_device_parent().
> We could reuse the mutex to fix the race condition between glue_dirs.list traverse and kobject_put(glue_dir).
>
> Greg, the two solutions (reuse the gdp_mutex and don't remove glue_dir), which one do you prefer ?
>
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 28b808c..645eacf 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -724,12 +724,12 @@ class_dir_create_and_add(struct class *class, struct kobject *parent_kobj)
> return &dir->kobj;
> }
>
> +static DEFINE_MUTEX(gdp_mutex);
>
> static struct kobject *get_device_parent(struct device *dev,
> struct device *parent)
> {
> if (dev->class) {
> - static DEFINE_MUTEX(gdp_mutex);
> struct kobject *kobj = NULL;
> struct kobject *parent_kobj;
> struct kobject *k;
> @@ -793,7 +793,9 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
> glue_dir->kset != &dev->class->p->glue_dirs)
> return;
>
> + mutex_lock(&gdp_mutex);
> kobject_put(glue_dir);
> + mutex_unlock(&gdp_mutex);
> }
>
> static void cleanup_device_parent(struct device *dev)
>

I much prefer this patch over the other one, as it keeps the same
behavior as today, and fixes the existing bug.

Have you tested it out to see if it works properly? If so, can you
resend it in a "proper" form so I can apply it?

thanks,

greg k-h

2014-11-07 03:13:21

by Yijing Wang

[permalink] [raw]
Subject: Re: [PATCH] sysfs: driver core: Fix glue dir race condition

>> +static DEFINE_MUTEX(gdp_mutex);
>>
>> static struct kobject *get_device_parent(struct device *dev,
>> struct device *parent)
>> {
>> if (dev->class) {
>> - static DEFINE_MUTEX(gdp_mutex);
>> struct kobject *kobj = NULL;
>> struct kobject *parent_kobj;
>> struct kobject *k;
>> @@ -793,7 +793,9 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
>> glue_dir->kset != &dev->class->p->glue_dirs)
>> return;
>>
>> + mutex_lock(&gdp_mutex);
>> kobject_put(glue_dir);
>> + mutex_unlock(&gdp_mutex);
>> }
>>
>> static void cleanup_device_parent(struct device *dev)
>>
>
> I much prefer this patch over the other one, as it keeps the same
> behavior as today, and fixes the existing bug.
>
> Have you tested it out to see if it works properly? If so, can you
> resend it in a "proper" form so I can apply it?

Yes, we tested it in our system, I will resend it now, thanks!

>
> thanks,
>
> greg k-h
>
> .
>


--
Thanks!
Yijing

2014-11-07 05:52:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] sysfs: driver core: Fix glue dir race condition

On Fri, Nov 07, 2014 at 11:12:19AM +0800, Yijing Wang wrote:
> >> +static DEFINE_MUTEX(gdp_mutex);
> >>
> >> static struct kobject *get_device_parent(struct device *dev,
> >> struct device *parent)
> >> {
> >> if (dev->class) {
> >> - static DEFINE_MUTEX(gdp_mutex);
> >> struct kobject *kobj = NULL;
> >> struct kobject *parent_kobj;
> >> struct kobject *k;
> >> @@ -793,7 +793,9 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
> >> glue_dir->kset != &dev->class->p->glue_dirs)
> >> return;
> >>
> >> + mutex_lock(&gdp_mutex);
> >> kobject_put(glue_dir);
> >> + mutex_unlock(&gdp_mutex);
> >> }
> >>
> >> static void cleanup_device_parent(struct device *dev)
> >>
> >
> > I much prefer this patch over the other one, as it keeps the same
> > behavior as today, and fixes the existing bug.
> >
> > Have you tested it out to see if it works properly? If so, can you
> > resend it in a "proper" form so I can apply it?
>
> Yes, we tested it in our system, I will resend it now, thanks!

Wonderful, thanks for that, and persisting with this. I'll queue up
that patch tomorrow morning.

greg k-h