2019-04-30 13:57:44

by Prateek Sood

[permalink] [raw]
Subject: [PATCH] drivers: core: Remove glue dirs early only when refcount is 1

While loading firmware blobs parallely in different threads, it is possible
to free sysfs node of glue_dirs in device_del() from a thread while another
thread is trying to add subdir from device_add() in glue_dirs sysfs node.

CPU1 CPU2
_request_firmware_load()
device_add()
get_device_parent()
class_dir_create_and_add()
kobject_add_internal()
create_dir() // glue_dir

_request_firmware_load()
device_add()
get_device_parent()
kobject_get() //glue_dir

device_del()
cleanup_glue_dir()
kobject_del()

kobject_add()
kobject_add_internal()
create_dir() // in glue_dir
kernfs_create_dir_ns()

sysfs_remove_dir() //glue_dir->sd=NULL
sysfs_put() // free glue_dir->sd

kernfs_new_node()
kernfs_get(glue_dir)

Fix this race by making sure that kernfs_node for glue_dir is released only
when refcount for glue_dir kobj is 1.

Signed-off-by: Prateek Sood <[email protected]>
---
drivers/base/core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 4aeaa0c..3955d07 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1820,12 +1820,15 @@ static inline struct kobject *get_glue_dir(struct device *dev)
*/
static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
{
+ unsigned int refcount;
+
/* see if we live in a "glue" directory */
if (!live_in_glue_dir(glue_dir, dev))
return;

mutex_lock(&gdp_mutex);
- if (!kobject_has_children(glue_dir))
+ refcount = kref_read(&glue_dir->kref);
+ if (!kobject_has_children(glue_dir) && !--refcount)
kobject_del(glue_dir);
kobject_put(glue_dir);
mutex_unlock(&gdp_mutex);
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


2019-05-01 04:25:44

by Prateek Sood

[permalink] [raw]
Subject: [PATCH V2] drivers: core: Remove glue dirs early only when refcount is 1

While loading firmware blobs parallely in different threads, it is possible
to free sysfs node of glue_dirs in device_del() from a thread while another
thread is trying to add subdir from device_add() in glue_dirs sysfs node.

CPU1 CPU2
fw_load_sysfs_fallback()
device_add()
get_device_parent()
class_dir_create_and_add()
kobject_add_internal()
create_dir() // glue_dir

fw_load_sysfs_fallback()
device_add()
get_device_parent()
kobject_get() //glue_dir

device_del()
cleanup_glue_dir()
kobject_del()

kobject_add()
kobject_add_internal()
create_dir() // in glue_dir
kernfs_create_dir_ns()

sysfs_remove_dir() //glue_dir->sd=NULL
sysfs_put() // free glue_dir->sd

kernfs_new_node()
kernfs_get(glue_dir)

Fix this race by making sure that kernfs_node for glue_dir is released only
when refcount for glue_dir kobj is 1.

Signed-off-by: Prateek Sood <[email protected]>
---
drivers/base/core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 4aeaa0c..3955d07 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1820,12 +1820,15 @@ static inline struct kobject *get_glue_dir(struct device *dev)
*/
static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
{
+ unsigned int refcount;
+
/* see if we live in a "glue" directory */
if (!live_in_glue_dir(glue_dir, dev))
return;

mutex_lock(&gdp_mutex);
- if (!kobject_has_children(glue_dir))
+ refcount = kref_read(&glue_dir->kref);
+ if (!kobject_has_children(glue_dir) && !--refcount)
kobject_del(glue_dir);
kobject_put(glue_dir);
mutex_unlock(&gdp_mutex);
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2019-05-01 06:54:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH V2] drivers: core: Remove glue dirs early only when refcount is 1

On Wed, May 01, 2019 at 09:52:47AM +0530, Prateek Sood wrote:
> While loading firmware blobs parallely in different threads, it is possible
> to free sysfs node of glue_dirs in device_del() from a thread while another
> thread is trying to add subdir from device_add() in glue_dirs sysfs node.
>
> CPU1 CPU2
> fw_load_sysfs_fallback()
> device_add()
> get_device_parent()
> class_dir_create_and_add()
> kobject_add_internal()
> create_dir() // glue_dir
>
> fw_load_sysfs_fallback()
> device_add()
> get_device_parent()
> kobject_get() //glue_dir
>
> device_del()
> cleanup_glue_dir()
> kobject_del()
>
> kobject_add()
> kobject_add_internal()
> create_dir() // in glue_dir
> kernfs_create_dir_ns()
>
> sysfs_remove_dir() //glue_dir->sd=NULL
> sysfs_put() // free glue_dir->sd
>
> kernfs_new_node()
> kernfs_get(glue_dir)
>
> Fix this race by making sure that kernfs_node for glue_dir is released only
> when refcount for glue_dir kobj is 1.
>
> Signed-off-by: Prateek Sood <[email protected]>
> ---
> drivers/base/core.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)

What changed from v1? That always has to go below the --- line.

v3 please.

2019-05-01 12:01:53

by Prateek Sood

[permalink] [raw]
Subject: [PATCH v3] drivers: core: Remove glue dirs early only when refcount is 1

While loading firmware blobs parallely in different threads, it is possible
to free sysfs node of glue_dirs in device_del() from a thread while another
thread is trying to add subdir from device_add() in glue_dirs sysfs node.

CPU1 CPU2
fw_load_sysfs_fallback()
device_add()
get_device_parent()
class_dir_create_and_add()
kobject_add_internal()
create_dir() // glue_dir

fw_load_sysfs_fallback()
device_add()
get_device_parent()
kobject_get() //glue_dir

device_del()
cleanup_glue_dir()
kobject_del()

kobject_add()
kobject_add_internal()
create_dir() // in glue_dir
kernfs_create_dir_ns()

sysfs_remove_dir() //glue_dir->sd=NULL
sysfs_put() // free glue_dir->sd

kernfs_new_node()
kernfs_get(glue_dir)

Fix this race by making sure that kernfs_node for glue_dir is released only
when refcount for glue_dir kobj is 1.

Signed-off-by: Prateek Sood <[email protected]>

---

Changes from v2->v3:
- Added patch version change related comments.

Changes from v1->v2:
- Updated callstack from _request_firmware_load() to fw_load_sysfs_fallback().


drivers/base/core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 4aeaa0c..3955d07 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1820,12 +1820,15 @@ static inline struct kobject *get_glue_dir(struct device *dev)
*/
static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
{
+ unsigned int refcount;
+
/* see if we live in a "glue" directory */
if (!live_in_glue_dir(glue_dir, dev))
return;

mutex_lock(&gdp_mutex);
- if (!kobject_has_children(glue_dir))
+ refcount = kref_read(&glue_dir->kref);
+ if (!kobject_has_children(glue_dir) && !--refcount)
kobject_del(glue_dir);
kobject_put(glue_dir);
mutex_unlock(&gdp_mutex);
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2019-05-06 05:12:56

by Prateek Sood

[permalink] [raw]
Subject: Re: [PATCH v3] drivers: core: Remove glue dirs early only when refcount is 1

On 5/1/19 5:29 PM, Prateek Sood wrote:
> While loading firmware blobs parallely in different threads, it is possible
> to free sysfs node of glue_dirs in device_del() from a thread while another
> thread is trying to add subdir from device_add() in glue_dirs sysfs node.
>
> CPU1 CPU2
> fw_load_sysfs_fallback()
> device_add()
> get_device_parent()
> class_dir_create_and_add()
> kobject_add_internal()
> create_dir() // glue_dir
>
> fw_load_sysfs_fallback()
> device_add()
> get_device_parent()
> kobject_get() //glue_dir
>
> device_del()
> cleanup_glue_dir()
> kobject_del()
>
> kobject_add()
> kobject_add_internal()
> create_dir() // in glue_dir
> kernfs_create_dir_ns()
>
> sysfs_remove_dir() //glue_dir->sd=NULL
> sysfs_put() // free glue_dir->sd
>
> kernfs_new_node()
> kernfs_get(glue_dir)
>
> Fix this race by making sure that kernfs_node for glue_dir is released only
> when refcount for glue_dir kobj is 1.
>
> Signed-off-by: Prateek Sood <[email protected]>
>
> ---
>
> Changes from v2->v3:
> - Added patch version change related comments.
>
> Changes from v1->v2:
> - Updated callstack from _request_firmware_load() to fw_load_sysfs_fallback().
>
>
> drivers/base/core.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 4aeaa0c..3955d07 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1820,12 +1820,15 @@ static inline struct kobject *get_glue_dir(struct device *dev)
> */
> static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
> {
> + unsigned int refcount;
> +
> /* see if we live in a "glue" directory */
> if (!live_in_glue_dir(glue_dir, dev))
> return;
>
> mutex_lock(&gdp_mutex);
> - if (!kobject_has_children(glue_dir))
> + refcount = kref_read(&glue_dir->kref);
> + if (!kobject_has_children(glue_dir) && !--refcount)
> kobject_del(glue_dir);
> kobject_put(glue_dir);
> mutex_unlock(&gdp_mutex);
>

Folks,

Please share feedback on the race condition and the patch to
fix it.

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project

2019-05-06 06:24:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] drivers: core: Remove glue dirs early only when refcount is 1

On Mon, May 06, 2019 at 10:41:34AM +0530, Prateek Sood wrote:
> On 5/1/19 5:29 PM, Prateek Sood wrote:
> > While loading firmware blobs parallely in different threads, it is possible
> > to free sysfs node of glue_dirs in device_del() from a thread while another
> > thread is trying to add subdir from device_add() in glue_dirs sysfs node.
> >
> > CPU1 CPU2
> > fw_load_sysfs_fallback()
> > device_add()
> > get_device_parent()
> > class_dir_create_and_add()
> > kobject_add_internal()
> > create_dir() // glue_dir
> >
> > fw_load_sysfs_fallback()
> > device_add()
> > get_device_parent()
> > kobject_get() //glue_dir
> >
> > device_del()
> > cleanup_glue_dir()
> > kobject_del()
> >
> > kobject_add()
> > kobject_add_internal()
> > create_dir() // in glue_dir
> > kernfs_create_dir_ns()
> >
> > sysfs_remove_dir() //glue_dir->sd=NULL
> > sysfs_put() // free glue_dir->sd
> >
> > kernfs_new_node()
> > kernfs_get(glue_dir)
> >
> > Fix this race by making sure that kernfs_node for glue_dir is released only
> > when refcount for glue_dir kobj is 1.
> >
> > Signed-off-by: Prateek Sood <[email protected]>
> >
> > ---
> >
> > Changes from v2->v3:
> > - Added patch version change related comments.
> >
> > Changes from v1->v2:
> > - Updated callstack from _request_firmware_load() to fw_load_sysfs_fallback().
> >
> >
> > drivers/base/core.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 4aeaa0c..3955d07 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -1820,12 +1820,15 @@ static inline struct kobject *get_glue_dir(struct device *dev)
> > */
> > static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
> > {
> > + unsigned int refcount;
> > +
> > /* see if we live in a "glue" directory */
> > if (!live_in_glue_dir(glue_dir, dev))
> > return;
> >
> > mutex_lock(&gdp_mutex);
> > - if (!kobject_has_children(glue_dir))
> > + refcount = kref_read(&glue_dir->kref);
> > + if (!kobject_has_children(glue_dir) && !--refcount)
> > kobject_del(glue_dir);
> > kobject_put(glue_dir);
> > mutex_unlock(&gdp_mutex);
> >
>
> Folks,
>
> Please share feedback on the race condition and the patch to
> fix it.

Please relax, we will get to this eventually, it has only been a week...

greg k-h

2019-05-24 19:06:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] drivers: core: Remove glue dirs early only when refcount is 1

On Wed, May 01, 2019 at 05:29:59PM +0530, Prateek Sood wrote:
> While loading firmware blobs parallely in different threads, it is possible
> to free sysfs node of glue_dirs in device_del() from a thread while another
> thread is trying to add subdir from device_add() in glue_dirs sysfs node.

<snip>

Is this the same fix that:
Subject: [PATCH v4] driver core: Fix use-after-free and double free on glue directory
is also trying to fix?

Why is it doing so in such a different way?

thanks,

greg k-h