2019-05-16 14:26:36

by Muchun Song

[permalink] [raw]
Subject: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

There is a race condition between removing glue directory and adding a new
device under the glue directory. It can be reproduced in following test:

path 1: Add the child device under glue dir
device_add()
get_device_parent()
mutex_lock(&gdp_mutex);
....
/*find parent from glue_dirs.list*/
list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
if (k->parent == parent_kobj) {
kobj = kobject_get(k);
break;
}
....
mutex_unlock(&gdp_mutex);
....
....
kobject_add()
kobject_add_internal()
create_dir()
sysfs_create_dir_ns()
if (kobj->parent)
parent = kobj->parent->sd;
....
kernfs_create_dir_ns(parent)
kernfs_new_node()
kernfs_get(parent)
....
/* link in */
rc = kernfs_add_one(kn);
if (!rc)
return kn;

kernfs_put(kn)
....
repeat:
kmem_cache_free(kn)
kn = parent;

if (kn) {
if (atomic_dec_and_test(&kn->count))
goto repeat;
}
....

path2: Remove last child device under glue dir
device_del()
cleanup_device_parent()
cleanup_glue_dir()
mutex_lock(&gdp_mutex);
if (!kobject_has_children(glue_dir))
kobject_del(glue_dir);
kobject_put(glue_dir);
mutex_unlock(&gdp_mutex);

Before path2 remove last child device under glue dir, If path1 add a new
device under glue dir, the glue_dir kobject reference count will be
increase to 2 via kobject_get(k) in get_device_parent(). And path1 has
been called kernfs_new_node(), but not call kernfs_get(parent).
Meanwhile, path2 call kobject_del(glue_dir) beacause 0 is returned by
kobject_has_children(). This result in glue_dir->sd is freed and it's
reference count will be 0. Then path1 call kernfs_get(parent) will trigger
a warning in kernfs_get()(WARN_ON(!atomic_read(&kn->count))) and increase
it's reference count to 1. Because glue_dir->sd is freed by path2, the next
call kernfs_add_one() by path1 will fail(This is also use-after-free)
and call atomic_dec_and_test() to decrease reference count. Because the
reference count is decremented to 0, it will also call kmem_cache_free()
to free glue_dir->sd again. This will result in double free.

In order to avoid this happening, we can ensure the lookup of the glue
dir and creation of the child object(s) are done under a single instance
of gdp_mutex so we never see a stale "empty" but still poentially used
glue dir around.

The following calltrace is captured in kernel 4.14 with the following patch
applied:

commit 726e41097920 ("drivers: core: Remove glue dirs from sysfs earlier")

--------------------------------------------------------------------------
[ 3.633703] WARNING: CPU: 4 PID: 513 at .../fs/kernfs/dir.c:494
Here is WARN_ON(!atomic_read(&kn->count) in kernfs_get().
....
[ 3.633986] Call trace:
[ 3.633991] kernfs_create_dir_ns+0xa8/0xb0
[ 3.633994] sysfs_create_dir_ns+0x54/0xe8
[ 3.634001] kobject_add_internal+0x22c/0x3f0
[ 3.634005] kobject_add+0xe4/0x118
[ 3.634011] device_add+0x200/0x870
[ 3.634017] _request_firmware+0x958/0xc38
[ 3.634020] request_firmware_into_buf+0x4c/0x70
....
[ 3.634064] kernel BUG at .../mm/slub.c:294!
Here is BUG_ON(object == fp) in set_freepointer().
....
[ 3.634346] Call trace:
[ 3.634351] kmem_cache_free+0x504/0x6b8
[ 3.634355] kernfs_put+0x14c/0x1d8
[ 3.634359] kernfs_create_dir_ns+0x88/0xb0
[ 3.634362] sysfs_create_dir_ns+0x54/0xe8
[ 3.634366] kobject_add_internal+0x22c/0x3f0
[ 3.634370] kobject_add+0xe4/0x118
[ 3.634374] device_add+0x200/0x870
[ 3.634378] _request_firmware+0x958/0xc38
[ 3.634381] request_firmware_into_buf+0x4c/0x70
--------------------------------------------------------------------------

Fixes: 726e41097920 ("drivers: core: Remove glue dirs from sysfs earlier")

Signed-off-by: Muchun Song <[email protected]>
---

Change in v4:
1. Add some kerneldoc comment.
2. Remove unlock_if_glue_dir().
3. Rename get_device_parent_locked_if_glue_dir() to
get_device_parent_locked.
4. Update commit message.
Change in v3:
Add change log.
Change in v2:
Fix device_move() also.

drivers/base/core.c | 108 +++++++++++++++++++++++++++++++++++++-------
1 file changed, 92 insertions(+), 16 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 4aeaa0c92bda..2251e391a352 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1739,8 +1739,23 @@ class_dir_create_and_add(struct class *class, struct kobject *parent_kobj)

static DEFINE_MUTEX(gdp_mutex);

-static struct kobject *get_device_parent(struct device *dev,
- struct device *parent)
+/**
+ * __get_device_parent() - Get the parent device kobject.
+ * @dev: Pointer to the device structure.
+ * @parent: Pointer to the parent device structure.
+ * @lock: When we live in a glue directory, should we hold the
+ * gdp_mutex lock when this function returns? If @lock
+ * is true, this function returns with the gdp_mutex
+ * holed. Otherwise it will not.
+ *
+ * Note: Only when we live in a glue directory and @lock is
+ * true, the function will return with the gdp_mutex holed.
+ * In this case, The caller is responsible for releasing the
+ * gdp_mutex lock.
+ */
+static struct kobject *__get_device_parent(struct device *dev,
+ struct device *parent,
+ bool lock)
{
if (dev->class) {
struct kobject *kobj = NULL;
@@ -1778,16 +1793,32 @@ static struct kobject *get_device_parent(struct device *dev,
break;
}
spin_unlock(&dev->class->p->glue_dirs.list_lock);
- if (kobj) {
- mutex_unlock(&gdp_mutex);
- return kobj;
- }

- /* or create a new class-directory at the parent device */
- k = class_dir_create_and_add(dev->class, parent_kobj);
- /* do not emit an uevent for this simple "glue" directory */
- mutex_unlock(&gdp_mutex);
- return k;
+ /**
+ * If not found, create a new class-directory at the
+ * parent device and do not emit an uevent for this
+ * simple "glue" directory.
+ */
+ if (!kobj)
+ kobj = class_dir_create_and_add(dev->class,
+ parent_kobj);
+
+ /**
+ * If the caller want to add/move a new directory
+ * under the glue directory next. We should leave
+ * the function with the gdp_mutex holed. And then
+ * release the gdp_mutex lock after adding/moving
+ * the new directory.
+ *
+ * Because we should ensure the lookup of the glue
+ * dir and creation of the child object(s) are done
+ * under a single instance of gdp_mutex so we never
+ * see a stale "empty" but still poentially used
+ * glue dir around.
+ */
+ if (!lock)
+ mutex_unlock(&gdp_mutex);
+ return kobj;
}

/* subsystems can specify a default root directory for their devices */
@@ -1799,6 +1830,36 @@ static struct kobject *get_device_parent(struct device *dev,
return NULL;
}

+static inline struct kobject *get_device_parent(struct device *dev,
+ struct device *parent)
+{
+ return __get_device_parent(dev, parent, false);
+}
+
+/**
+ * Note: When this function returns successfully, the gdp_mutex
+ * lock may be held (when we live in a glue directory). The caller
+ * can determine wheather we hold the lock by live_in_glue_dir().
+ *
+ * If true is returned by live_in_glue_dir(), the caller should
+ * drop the gdp_mutex lock.
+ */
+static inline struct kobject *get_device_parent_locked(struct device *dev,
+ struct device *parent)
+{
+ struct kobject *kobj = __get_device_parent(dev, parent, true);
+
+ /**
+ * When we create a new glue directory fail, there
+ * is no need for us to leave the function with the
+ * the gdp_mutex holed.
+ */
+ if (IS_ERR(kobj))
+ mutex_unlock(&gdp_mutex);
+
+ return kobj;
+}
+
static inline bool live_in_glue_dir(struct kobject *kobj,
struct device *dev)
{
@@ -2040,7 +2101,7 @@ int device_add(struct device *dev)
pr_debug("device: '%s': %s\n", dev_name(dev), __func__);

parent = get_device(dev->parent);
- kobj = get_device_parent(dev, parent);
+ kobj = get_device_parent_locked(dev, parent);
if (IS_ERR(kobj)) {
error = PTR_ERR(kobj);
goto parent_error;
@@ -2055,10 +2116,17 @@ int device_add(struct device *dev)
/* first, register with generic layer. */
/* we require the name to be set before, and pass NULL */
error = kobject_add(&dev->kobj, dev->kobj.parent, NULL);
- if (error) {
- glue_dir = get_glue_dir(dev);
+
+ glue_dir = get_glue_dir(dev);
+ /**
+ * Drops the mutex possibly acquired by get_device_parent_locked().
+ * See the comment in __get_device_parent().
+ */
+ if (live_in_glue_dir(glue_dir, dev))
+ mutex_unlock(&gdp_mutex);
+
+ if (error)
goto Error;
- }

/* notify platform of device entry */
error = device_platform_notify(dev, KOBJ_ADD);
@@ -2972,7 +3040,7 @@ int device_move(struct device *dev, struct device *new_parent,

device_pm_lock();
new_parent = get_device(new_parent);
- new_parent_kobj = get_device_parent(dev, new_parent);
+ new_parent_kobj = get_device_parent_locked(dev, new_parent);
if (IS_ERR(new_parent_kobj)) {
error = PTR_ERR(new_parent_kobj);
put_device(new_parent);
@@ -2982,6 +3050,14 @@ int device_move(struct device *dev, struct device *new_parent,
pr_debug("device: '%s': %s: moving to '%s'\n", dev_name(dev),
__func__, new_parent ? dev_name(new_parent) : "<NULL>");
error = kobject_move(&dev->kobj, new_parent_kobj);
+
+ /**
+ * Drops the mutex possibly acquired by get_device_parent_locked().
+ * See the comment in __get_device_parent().
+ */
+ if (live_in_glue_dir(new_parent_kobj, dev))
+ mutex_unlock(&gdp_mutex);
+
if (error) {
cleanup_glue_dir(dev, new_parent_kobj);
put_device(new_parent);
--
2.17.1


2019-05-24 19:06:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

On Thu, May 16, 2019 at 10:23:42PM +0800, Muchun Song wrote:
> There is a race condition between removing glue directory and adding a new
> device under the glue directory. It can be reproduced in following test:

<snip>

Is this related to:
Subject: [PATCH v3] drivers: core: Remove glue dirs early only when refcount is 1

?

If so, why is the solution so different?

thanks,

greg k-h

2019-05-25 12:16:37

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

Hi greg k-h,

Greg KH <[email protected]> 于2019年5月25日周六 上午3:04写道:
>
> On Thu, May 16, 2019 at 10:23:42PM +0800, Muchun Song wrote:
> > There is a race condition between removing glue directory and adding a new
> > device under the glue directory. It can be reproduced in following test:
>
> <snip>
>
> Is this related to:
> Subject: [PATCH v3] drivers: core: Remove glue dirs early only when refcount is 1
>
> ?
>
> If so, why is the solution so different?

In the v1 patch, the solution is that remove glue dirs early only when
refcount is 1. So
the v1 patch like below:

@@ -1825,7 +1825,7 @@ static void cleanup_glue_dir(struct device *dev,
struct kobject *glue_dir)
return;

mutex_lock(&gdp_mutex);
- if (!kobject_has_children(glue_dir))
+ if (!kobject_has_children(glue_dir) && kref_read(&glue_dir->kref) == 1)
kobject_del(glue_dir);
kobject_put(glue_dir);
mutex_unlock(&gdp_mutex);
-----------------------------------------------------------------------

But from Ben's suggestion as below:

I find relying on the object count for such decisions rather fragile as
it could be taken temporarily for other reasons, couldn't it ? In which
case we would just fail...

Ideally, the looking up of the glue dir and creation of its child
should be protected by the same lock instance (the gdp_mutex in that
case).
-----------------------------------------------------------------------

So another solution is used from Ben's suggestion in the v2 patch. But
I forgot to update the commit message until the v4 patch. Thanks.

Yours,
Muchun

2019-06-18 13:40:46

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

Ping guys ? I think this is worth fixing.

Muchun Song <[email protected]> 于2019年5月25日周六 下午8:15写道:

>
> Hi greg k-h,
>
> Greg KH <[email protected]> 于2019年5月25日周六 上午3:04写道:
> >
> > On Thu, May 16, 2019 at 10:23:42PM +0800, Muchun Song wrote:
> > > There is a race condition between removing glue directory and adding a new
> > > device under the glue directory. It can be reproduced in following test:
> >
> > <snip>
> >
> > Is this related to:
> > Subject: [PATCH v3] drivers: core: Remove glue dirs early only when refcount is 1
> >
> > ?
> >
> > If so, why is the solution so different?
>
> In the v1 patch, the solution is that remove glue dirs early only when
> refcount is 1. So
> the v1 patch like below:
>
> @@ -1825,7 +1825,7 @@ static void cleanup_glue_dir(struct device *dev,
> struct kobject *glue_dir)
> return;
>
> mutex_lock(&gdp_mutex);
> - if (!kobject_has_children(glue_dir))
> + if (!kobject_has_children(glue_dir) && kref_read(&glue_dir->kref) == 1)
> kobject_del(glue_dir);
> kobject_put(glue_dir);
> mutex_unlock(&gdp_mutex);
> -----------------------------------------------------------------------
>
> But from Ben's suggestion as below:
>
> I find relying on the object count for such decisions rather fragile as
> it could be taken temporarily for other reasons, couldn't it ? In which
> case we would just fail...
>
> Ideally, the looking up of the glue dir and creation of its child
> should be protected by the same lock instance (the gdp_mutex in that
> case).
> -----------------------------------------------------------------------
>
> So another solution is used from Ben's suggestion in the v2 patch. But
> I forgot to update the commit message until the v4 patch. Thanks.
>
> Yours,
> Muchun

2019-06-18 14:13:38

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

On Tue, 2019-06-18 at 21:40 +0800, Muchun Song wrote:
> Ping guys ? I think this is worth fixing.

I agree :-)

My opinion hasn't changed though, the right fix isn't making guesses
based on the refcount but solve the actual race which is the mutex
being dropped between looking for the object existence and deciding to
create it :-)

Cheers,
Ben.

> Muchun Song <[email protected]> 于2019年5月25日周六 下午8:15写道:
>
> >
> > Hi greg k-h,
> >
> > Greg KH <[email protected]> 于2019年5月25日周六 上午3:04写道:
> > >
> > > On Thu, May 16, 2019 at 10:23:42PM +0800, Muchun Song wrote:
> > > > There is a race condition between removing glue directory and
> > > > adding a new
> > > > device under the glue directory. It can be reproduced in
> > > > following test:
> > >
> > > <snip>
> > >
> > > Is this related to:
> > > Subject: [PATCH v3] drivers: core: Remove glue dirs early
> > > only when refcount is 1
> > >
> > > ?
> > >
> > > If so, why is the solution so different?
> >
> > In the v1 patch, the solution is that remove glue dirs early only
> > when
> > refcount is 1. So
> > the v1 patch like below:
> >
> > @@ -1825,7 +1825,7 @@ static void cleanup_glue_dir(struct device
> > *dev,
> > struct kobject *glue_dir)
> > return;
> >
> > mutex_lock(&gdp_mutex);
> > - if (!kobject_has_children(glue_dir))
> > + if (!kobject_has_children(glue_dir) && kref_read(&glue_dir-
> > >kref) == 1)
> > kobject_del(glue_dir);
> > kobject_put(glue_dir);
> > mutex_unlock(&gdp_mutex);
> > -----------------------------------------------------------------
> > ------
> >
> > But from Ben's suggestion as below:
> >
> > I find relying on the object count for such decisions rather
> > fragile as
> > it could be taken temporarily for other reasons, couldn't it ? In
> > which
> > case we would just fail...
> >
> > Ideally, the looking up of the glue dir and creation of its child
> > should be protected by the same lock instance (the gdp_mutex in
> > that
> > case).
> > -----------------------------------------------------------------
> > ------
> >
> > So another solution is used from Ben's suggestion in the v2 patch.
> > But
> > I forgot to update the commit message until the v4 patch. Thanks.
> >
> > Yours,
> > Muchun

2019-06-18 15:29:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

On Tue, Jun 18, 2019 at 09:40:13PM +0800, Muchun Song wrote:
> Ping guys ? I think this is worth fixing.

That's great (no context here), but I need people to actually agree on
what the correct fix should be. I had two different patches that were
saying they fixed the same issue, and that feels really wrong.

So can everyone actually agree on one patch please?

thanks,

greg k-h

2019-06-18 16:10:35

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

Greg KH <[email protected]> 于2019年6月18日周二 下午11:29写道:
>
> On Tue, Jun 18, 2019 at 09:40:13PM +0800, Muchun Song wrote:
> > Ping guys ? I think this is worth fixing.
>
> That's great (no context here), but I need people to actually agree on
> what the correct fix should be. I had two different patches that were
> saying they fixed the same issue, and that feels really wrong.

Another patch:
Subject: [PATCH v3] drivers: core: Remove glue dirs early only
when refcount is 1

My first v1 patch:
Subject: [PATCH] driver core: Fix use-after-free and double free
on glue directory

The above two patches are almost the same that fix is based on the refcount.
But why we change the solution from v1 to v4? Some discussion can
refer to the mail:

Subject: [PATCH] driver core: Fix use-after-free and double free
on glue directory

Thanks.

Yours,
Muchun

2019-06-18 16:14:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

On Wed, Jun 19, 2019 at 12:09:40AM +0800, Muchun Song wrote:
> Greg KH <[email protected]> 于2019年6月18日周二 下午11:29写道:
> >
> > On Tue, Jun 18, 2019 at 09:40:13PM +0800, Muchun Song wrote:
> > > Ping guys ? I think this is worth fixing.
> >
> > That's great (no context here), but I need people to actually agree on
> > what the correct fix should be. I had two different patches that were
> > saying they fixed the same issue, and that feels really wrong.
>
> Another patch:
> Subject: [PATCH v3] drivers: core: Remove glue dirs early only
> when refcount is 1
>
> My first v1 patch:
> Subject: [PATCH] driver core: Fix use-after-free and double free
> on glue directory
>
> The above two patches are almost the same that fix is based on the refcount.
> But why we change the solution from v1 to v4? Some discussion can
> refer to the mail:
>
> Subject: [PATCH] driver core: Fix use-after-free and double free
> on glue directory

Again, I am totally confused and do not see a patch in an email that I
can apply...

Someone needs to get people to agree here...

thanks,

greg k-h

2019-06-18 21:53:55

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

On Tue, 2019-06-18 at 18:13 +0200, Greg KH wrote:
>
> Again, I am totally confused and do not see a patch in an email that
> I
> can apply...
>
> Someone needs to get people to agree here...

I think he was hoping you would chose which solution you prefered here
:-) His original or the one I suggested instead. I don't think there's
anybody else with understanding of sysfs guts around to form an
opinion.

Cheers,
Ben.


2019-06-25 15:07:38

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

Benjamin Herrenschmidt <[email protected]> 于2019年6月19日周三 上午5:51写道:
>
> On Tue, 2019-06-18 at 18:13 +0200, Greg KH wrote:
> >
> > Again, I am totally confused and do not see a patch in an email that
> > I
> > can apply...
> >
> > Someone needs to get people to agree here...
>
> I think he was hoping you would chose which solution you prefered here

Yeah, right, I am hoping you would chose which solution you prefered here.
Thanks.

> :-) His original or the one I suggested instead. I don't think there's
> anybody else with understanding of sysfs guts around to form an
> opinion.
>

Yours,
Muchun

2019-06-25 22:58:39

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

On Tue, 2019-06-25 at 23:06 +0800, Muchun Song wrote:
> Benjamin Herrenschmidt <[email protected]> 于2019年6月19日周三
> 上午5:51写道:
> >
> > On Tue, 2019-06-18 at 18:13 +0200, Greg KH wrote:
> > >
> > > Again, I am totally confused and do not see a patch in an email
> > > that
> > > I
> > > can apply...
> > >
> > > Someone needs to get people to agree here...
> >
> > I think he was hoping you would chose which solution you prefered
> > here
>
> Yeah, right, I am hoping you would chose which solution you prefered
> here.
> Thanks.
>
> > :-) His original or the one I suggested instead. I don't think
> > there's
> > anybody else with understanding of sysfs guts around to form an
> > opinion.
> >

Muchun, I don't think Greg still has the previous emails. He deals with
too much to keep track of old stuff.

Can you send both patches tagged as [OPT1] and [OPT2] along with a
comment in one go so Greg can see both and decide ?

I think looking at the refcount is fragile, I might be wrong, but I
think it mostly paper over the root of the problem which is the fact
that the lock isn't taken accross both operations, thus exposing the
race. But I'm happy if Greg prefers your approach as long as it's
fixed.

Cheers,
Ben.

2019-06-26 00:59:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v4] driver core: Fix use-after-free and double free on glue directory

On Wed, Jun 26, 2019 at 08:56:00AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2019-06-25 at 23:06 +0800, Muchun Song wrote:
> > Benjamin Herrenschmidt <[email protected]> 于2019年6月19日周三
> > 上午5:51写道:
> > >
> > > On Tue, 2019-06-18 at 18:13 +0200, Greg KH wrote:
> > > >
> > > > Again, I am totally confused and do not see a patch in an email
> > > > that
> > > > I
> > > > can apply...
> > > >
> > > > Someone needs to get people to agree here...
> > >
> > > I think he was hoping you would chose which solution you prefered
> > > here
> >
> > Yeah, right, I am hoping you would chose which solution you prefered
> > here.
> > Thanks.
> >
> > > :-) His original or the one I suggested instead. I don't think
> > > there's
> > > anybody else with understanding of sysfs guts around to form an
> > > opinion.
> > >
>
> Muchun, I don't think Greg still has the previous emails. He deals with
> too much to keep track of old stuff.
>
> Can you send both patches tagged as [OPT1] and [OPT2] along with a
> comment in one go so Greg can see both and decide ?

That would be wonderful, thank you as I can't really find the "latest"
versions of both options.

> I think looking at the refcount is fragile, I might be wrong, but I
> think it mostly paper over the root of the problem which is the fact
> that the lock isn't taken accross both operations, thus exposing the
> race. But I'm happy if Greg prefers your approach as long as it's
> fixed.

I'll look at them and try to figure this out next week, thanks.

greg k-h