by Ming Lei

[permalink] [raw]

Subject: Re: Possible race at user mode helper in request_firmware

On Wed, 16 Mar 2016 15:21:24 +0530
Chandra Sekhar Lingutla <[email protected]> wrote:

> Hi Ming,
>
> On 03/16/2016 02:34 PM, Chandra Sekhar Lingutla wrote:
> > On 03/16/2016 02:28 PM, Ming Lei wrote:
> >> On Wed, Mar 16, 2016 at 4:48 PM, Chandra Sekhar Lingutla
> >> <[email protected]> wrote:
> >>> On 03/16/2016 02:12 PM, Ming Lei wrote:
> >>>>
> >>>> On Wed, Mar 16, 2016 at 3:02 PM, Chandra Sekhar Lingutla
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> Hi Ming,
> >>>>>
> >>>>>
> >>>>> On 03/16/2016 07:36 AM, Ming Lei wrote:
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Mar 15, 2016 at 11:09 PM, Ming Lei <[email protected]>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> On Mon, Mar 14, 2016 at 8:00 PM, <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> I see possible race in _request_firmware_load function, on which I
> >>>>>>>> wanted to
> >>>>>>>> take your opinion.
> >>>>>>>>
> >>>>>>>> When system is going to low power mode, device_cache_fw_images() is
> >>>>>>>> called
> >>>>>>>> from pm notifier which schedules asynchronous workers to cache devices
> >>>>>>>> firmware.
> >>>>>>>> If more than 2 async requests falls under user helper mode, then there
> >>>>>>>> is a
> >>>>>>>> use after free of "firmware" directory kobject.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> If two requests are for getting same firmware image, only one can be
> >>>>>>> sent
> >>>>>>> out and be completed with user helper, please see
> >>>>>>> fw_lookup_and_allocate_buf(),
> >>>>>>> so the race shouldn't have been triggered in this case.
> >>>>>>>
> >>>>>>> So I guess your test must be involved with at least two different
> >>>>>>> requests.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> The race will hit, when one asynchronous worker calls device_del()
> >>>>>>>> while
> >>>>>>>> other asynchronous worker calls get_device_parent() in
> >>>>>>>> _request_firmware_load().
> >>>>>>>> After loading firmware, first worker calls device_del(), and
> >>>>>>>> "firmware"
> >>>>>>>> directory ref count is 1, so cleanup_glue_dir() calls kobject_put()
> >>>>>>>> with
> >>>>>>>> gdp_mutex lock held.
> >>>>>>>> Meanwhile, second async worker calls device_add(), and
> >>>>>>>> get_device_parent()
> >>>>>>>> is still able to find the firmware directory kobject in ksets after
> >>>>>>>> getting
> >>>>>>>> gdp_mutex.
> >>>>>>>> This kobject gets added as parent to the second device kobject.
> >>>>>>>> By the time of accessing the parent kobject, its ref count was 0 and
> >>>>>>>> prints
> >>>>>>>> below warning stack followed by crash.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> The trick here is just that we don't use 'struct device' in
> >>>>>>> device_cache_fw_images(),
> >>>>>>> then the firmware device's parent is null during calling device_add()
> >>>>>>> from
> >>>>>>> _request_firmware_load(), but this should be allowed by driver core.
> >>>>>>>
> >>>>>>> From firmware class view, both device_add()/device_del() is run with
> >>>>>>> pair,
> >>>>>>> and driver core still complains during get_device_parent(), so
> >>>>>>> something
> >>>>>>> inside driver core should be be wrong, IMO.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Thinking about the issue further, I believe it is a false positive which
> >>>>>> is
> >>>>>> caused by CONFIG_DEBUG_KOBJECT_RELEASE, follows the story:
> >>>>>>
> >>>>>> 1) device_add() path
> >>>>>>
> >>>>>> get_device_parent()
> >>>>>> ......
> >>>>>> mutex_lock(&gdp_mutex);
> >>>>>>
> >>>>>> /* find our class-directory at the parent and
> >>>>>> reference
> >>>>>> it */
> >>>>>> spin_lock(&dev->class->p->glue_dirs.list_lock);
> >>>>>> list_for_each_entry(k, &dev->class->p->glue_dirs.list,
> >>>>>> entry)
> >>>>>> if (k->parent == parent_kobj) {
> >>>>>> kobj = kobject_get(k);
> >>>>>> break;
> >>>>>> }
> >>>>>> spin_unlock(&dev->class->p->glue_dirs.list_lock);
> >>>>>> if (kobj) {
> >>>>>> mutex_unlock(&gdp_mutex);
> >>>>>> return kobj;
> >>>>>> }
> >>>>>> k = class_dir_create_and_add(dev->class, parent_kobj);
> >>>>>> mutex_unlock(&gdp_mutex);
> >>>>>> ......
> >>>>>>
> >>>>>> 2) device_del() path
> >>>>>>
> >>>>>> cleanup_device_parent()
> >>>>>> ->cleanup_glue_dir()
> >>>>>> ->mutex_lock(&gdp_mutex);
> >>>>>> ->kobject_put(glue_dir);
> >>>>>> ->kobject_cleanup()
> >>>>>> ->kobject_del()
> >>>>>> ->kobj_kset_leave()
> >>>>>> ->mutex_unlock(&gdp_mutex);
> >>>>>>
> >>>>>> So gdp_mutex is held wrt. both looking up/adding the parent kobject
> >>>>>> and removing the parent kobject. Actually with this lock, both glur_dir
> >>>>>> kobject's referece counting and joining/leaving kset are run atomically.
> >>>>>>
> >>>>>> The race you found is triggered by CONFIG_DEBUG_KOBJECT_RELEASE,
> >>>>>> which causes kobject_cleanup() to be run in workqueue context and the
> >>>>>> lock of gdp_mutex can't be hold in that situation anymore. That is the
> >>>>>> root
> >>>>>> cause for your race.
> >>>>>>
> >>>>>> That also said DEBUG_KOBJECT_RELEASE might cause trouble in
> >>>>>> some situations.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Ming
> >>>>>>
> >>>>> I agree that, CONFIG_DEBUG_KOBJECT_RELEASE can cause the race, as it
> >>>>> delays
> >>>>> call to kobject_cleanup() which deletes kobj entry in glue_dir list.
> >>>>>
> >>>>> I see the race even after disabling CONFIG_DEBUG_KOBJECT_RELEASE.
> >>>>>
> >>>>> The race is between kobject_put() and kobject_get() for the "firmware"
> >>>>> directory.
> >>>>>
> >>>>> In device_del() path, I see the race after the call of
> >>>>> cleanup_device_parent().
> >>>>> cleanup_device_parent() calls kobject_put(glue_dir) with gdp_mutex lock
> >>>>> holding,
> >>>>> So ideally, to call kobject_put/kobject_get(glue_dir) we need to acquire
> >>>>> gdp_mutex.
> >>>>>
> >>>>> but kobject_del(kobj) will not hold the gdp_mutex lock while calling
> >>>>> kobject_put(kobj->parent).
> >>>>>
> >>>>> Here is the race:
> >>>>>
> >>>>> Consider 2 async works calls request_firmware() for 2 different binaries
> >>>>> (fw1.bin, fw2.bin)
> >>>>> Async work1 got scheduled first, and it falls in user helper mode so it
> >>>>> creates "firmware" dir
> >>>>> and then creates "fw1.bin" directory to load data from user.
> >>>>> After load completion or timeout, it starts cleanup dir's, meantime async
> >>>>> work2 got scheduled and
> >>>>> it also falls in user helper mode, and story starts here:
> >>>>>
> >>>>> PATH1 req fw for "fw1.bin"
> >>>>> PATH2
> >>>>> req fw for "fw2.bin"
> >>>>> _request_firmware_load()
> >>>>> device_del(dev1)
> >>>>>
> >>>>> ->cleanup_device_parent()
> >>>>>
> >>>>> mutex_lock(&gdp_mutex);
> >>>>>
> >>>>> _request_firmware_load("fw2.bin")
> >>>>> kobject_put(kobj->parent); --> firmware ref ->1
> >>>>> ->device_add(dev2)
> >>>>> mutex_unlock(&gdp_mutex);
> >>>>> ->get_device_parent()
> >>>>> ->kobject_del()
> >>>>> mutex_lock(&gdp_mutex)
> >>>>> kobj_kset_leave(kobj); -->del "fw1.bin" kobj in kset
> >>>>> list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
> >>>>> kobject_put(kobj->parent); /* No lock :)*/
> >>>>> if (k->parent == parent_kobj) {
> >>>>>
> >>>>> ->kref_put() --> firmware ref ->0
> >>>>> kobj = kobject_get(k); -> k is "firmware"
> >>>>> dir
> >>>>> ->kboject_release()
> >>>>> break;
> >>>>>
> >>>>> ->kobject_del()
> >>>>> }
> >>>>>
> >>>>> /*here deletes firmware entry in kset*/
> >>>>> mutex_unlock(&gdp_mutex)
> >>>>>
> >>>>> ->dev2->kobj.parent = kobj -> kobj is released already
> >>>>>
> >>>>>
> >>>>>
> >>>>> PATH 1 is started removing the kobject of "fw1.bin" and its parent
> >>>>> "firmware"
> >>>>> .
> >>>>> cleanup_device_parent(dev1) makes firmware ref count to 1, by calling
> >>>>> kobject_put(firmware) with gdp_mutex held.
> >>>>>
> >>>>> and kobject_del(dev1->kobj) will again calls kobject_put(kobj->parent)
> >>>>> without holding gdp_mutex.
> >>>>>
> >>>>> In PATH2, glue_dir is able to find "firmware" kobj, as firmware dir kobj
> >>>>> is
> >>>>> deleted from the glue_dir list at kboject_release().
> >>>>> and it calls kobject_get(firmware), by the time PATH1 calls
> >>>>> kobject_release(firmware).
> >>>>>
> >>>>> So PATH2 got reference of freed firmware kobj and added as parent to
> >>>>> "fw2.bin" kobj.
> >>>>>
> >>>>> I think, here is possible fix for this race:
> >>>>>
> >>>>> ------------------------8<-------------------------------------
> >>>>> Signed-off-by: Lingutla Chandrasekhar <[email protected]>
> >>>>>
> >>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >>>>> index 5d8b713..47bc082 100644
> >>>>> --- a/drivers/base/core.c
> >>>>> +++ b/drivers/base/core.c
> >>>>> @@ -1224,7 +1224,13 @@ void device_del(struct device *dev)
> >>>>> BUS_NOTIFY_REMOVED_DEVICE,
> >>>>> dev);
> >>>>> kobject_uevent(&dev->kobj, KOBJ_REMOVE);
> >>>>> cleanup_device_parent(dev);
> >>>>> - kobject_del(&dev->kobj);
> >>>>> +
> >>>>> + if (dev->kobj.parent->kset == &dev->class->p->glue_dirs) {
> >>>>> + mutex_lock(&gdp_mutex);
> >>>>> + kobject_del(&dev->kobj);
> >>>>> + mutex_unlock(&gdp_mutex);
> >>>>> + } else
> >>>>> + kobject_del(&dev->kobj);
> >>>>> put_device(parent);
> >>>>> }
> >>>>> EXPORT_SYMBOL_GPL(device_del);
> >>>>>
> >>>>> --------------------------->8-----------------------------------
> >>>>
> >>>>
> >>>> OK, one simpler way is to move cleanup_device_parent(dev)
> >>>> after kobject_del(&dev->kobj), which looks more clean too.
> >>>>
> >>> No, We can't move that, bcz kobject_del(&dev->kobj) will clear the parent.
> >>
> >> Yeah, but I think it is better to do that and just call
> >> cleanup_glue_dir() directly
> >> because cleanup_device_parent() should have released the glue_dir.
> >>
> Sorry, i am not able understand correctly, If we reorder kobje_del and cleanup_device_parent(),
> kobject_del() clears kobj->parent, and cleanup_device_parent() gets glue_dir with dev->kobj->parent(),
> cleanup_glue_dir() returns without calling kobject_put() as glue_dir is NULL, so firmware dir becomes dangling kobj.
> Am I missing something here?

I mean something like below, and what do you think about it?

---

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0a8bdad..3f2e1f8 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -836,11 +836,31 @@ static struct kobject *get_device_parent(struct device *dev,
return NULL;
}

+static inline bool live_in_glue_dir(struct kobject *kobj,
+ struct device *dev)
+{
+ if (!kobj || !dev->class ||
+ kobj->kset != &dev->class->p->glue_dirs)
+ return true;
+ return false;
+}
+
+static inline struct kobject *get_glue_dir(struct device *dev)
+{
+ if (live_in_glue_dir(&dev->kobj, dev))
+ return dev->kobj.parent;
+ return NULL;
+}
+
+/*
+ * make sure cleaning up dir as the last step, we need to make
+ * sure .release handler of kobject is run with holding the
+ * global lock
+ */
static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
{
/* see if we live in a "glue" directory */
- if (!glue_dir || !dev->class ||
- glue_dir->kset != &dev->class->p->glue_dirs)
+ if (!live_in_glue_dir(glue_dir, dev))
return;

mutex_lock(&gdp_mutex);
@@ -848,11 +868,6 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
mutex_unlock(&gdp_mutex);
}

-static void cleanup_device_parent(struct device *dev)
-{
- cleanup_glue_dir(dev, dev->kobj.parent);
-}
-
static int device_add_class_symlinks(struct device *dev)
{
struct device_node *of_node = dev_of_node(dev);
@@ -1028,6 +1043,7 @@ int device_add(struct device *dev)
struct kobject *kobj;
struct class_interface *class_intf;
int error = -EINVAL;
+ struct kobject *glue_dir = NULL;

dev = get_device(dev);
if (!dev)
@@ -1072,8 +1088,10 @@ int device_add(struct device *dev)
/* first, register with generic layer. */
/* we require the name to be set before, and pass NULL */
error = kobject_add(&dev->kobj, dev->kobj.parent, NULL);
- if (error)
+ if (error) {
+ glue_dir = get_glue_dir(dev);
goto Error;
+ }

/* notify platform of device entry */
if (platform_notify)
@@ -1154,9 +1172,10 @@ done:
device_remove_file(dev, &dev_attr_uevent);
attrError:
kobject_uevent(&dev->kobj, KOBJ_REMOVE);
+ glue_dir = get_glue_dir(dev);
kobject_del(&dev->kobj);
Error:
- cleanup_device_parent(dev);
+ cleanup_glue_dir(dev, glue_dir);
put_device(parent);
name_error:
kfree(dev->p);
@@ -1232,6 +1251,7 @@ EXPORT_SYMBOL_GPL(put_device);
void device_del(struct device *dev)
{
struct device *parent = dev->parent;
+ struct kobject *glue_dir = NULL;
struct class_interface *class_intf;

/* Notify clients of device removal. This call must come
@@ -1276,8 +1296,9 @@ void device_del(struct device *dev)
blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
BUS_NOTIFY_REMOVED_DEVICE, dev);
kobject_uevent(&dev->kobj, KOBJ_REMOVE);
- cleanup_device_parent(dev);
+ glue_dir = get_glue_dir(dev);
kobject_del(&dev->kobj);
+ cleanup_glue_dir(dev, glue_dir);
put_device(parent);
}
EXPORT_SYMBOL_GPL(device_del);

>
> >> BTW, failure path of device_add() has been broken already.
> >>
> > Yah, forgot to mention that. we need to reorder like above fix, can we?.
> >
> >>>
> >>>
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> [21147.455038] ------------[ cut here ]------------
> >>>>>>>> [21147.458910] WARNING: at <...>/kernel/include/linux/kref.h:47
> >>>>>>>> kobject_get+0x50/0x68()
> >>>>>>>> [21147.481199] Modules linked in: core_ctl(PO) qdrbg_module(O)
> >>>>>>>> qcrypto_module(O)
> >>>>>>>> [21147.481256] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O
> >>>>>>>> 3.10.84-g0957845-00427-g56a05c2 #1
> >>>>>>>> [21147.481284] Workqueue: events_unbound async_run_entry_fn
> >>>>>>>> [21147.481296] Call trace:
> >>>>>>>> [21147.481315] [<ffffffc000206b60>] dump_backtrace+0x0/0x270
> >>>>>>>> [21147.481331] [<ffffffc000206de0>] show_stack+0x10/0x1c
> >>>>>>>> [21147.481353] [<ffffffc000d569ec>] dump_stack+0x1c/0x28
> >>>>>>>> [21147.481370] [<ffffffc00021cc18>] warn_slowpath_common+0x74/0x9c
> >>>>>>>> [21147.481383] [<ffffffc00021ce4c>] warn_slowpath_null+0x14/0x20
> >>>>>>>> [21147.481397] [<ffffffc000456b10>] kobject_get+0x4c/0x68
> >>>>>>>> [21147.481416] [<ffffffc000609f64>] get_device_parent+0xa0/0x194
> >>>>>>>> [21147.481430] [<ffffffc00060a3f4>] device_add+0x100/0x600
> >>>>>>>> [21147.481446] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
> >>>>>>>> [21147.481459] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
> >>>>>>>> [21147.481473] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
> >>>>>>>> [21147.481490] [<ffffffc00061c65c>]
> >>>>>>>> __async_dev_cache_fw_image+0x14/0x54
> >>>>>>>> [21147.481505] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
> >>>>>>>> [21147.481521] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
> >>>>>>>> [21147.481535] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
> >>>>>>>> [21147.481553] [<ffffffc00023e500>] kthread+0xac/0xb8
> >>>>>>>> [21147.481563] ---[ end trace dabc98ea48b8ba59 ]---
> >>>>>>>>
> >>>>>>>> [21147.486436] Unable to handle kernel paging request at virtual
> >>>>>>>> address
> >>>>>>>> 0x6b6b6b6b6b6b6bcb
> >>>>>>>> [21147.493816] pgd = ffffffc0a982d000
> >>>>>>>> [21147.498337] [6b6b6b6b6b6b6bcb] *pgd=0000000000000000
> >>>>>>>> [21147.502287] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> >>>>>>>> [21147.507826] Modules linked in: core_ctl(PO) qdrbg_module(O)
> >>>>>>>> qcrypto_module(O)
> >>>>>>>> [21147.514951] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O
> >>>>>>>> 3.10.84-g0957845-00427-g56a05c2 #1
> >>>>>>>> [21147.524686] Workqueue: events_unbound async_run_entry_fn
> >>>>>>>> [21147.529965] task: ffffffc008138000 ti: ffffffc007880000 task.ti:
> >>>>>>>> ffffffc007880000
> >>>>>>>> [21147.537438] PC is at sysfs_create_dir+0x34/0xd4
> >>>>>>>> [21147.541949] LR is at kobject_add_internal+0x120/0x24c
> >>>>>>>> [21147.546979] pc : [<ffffffc000361fa8>] lr : [<ffffffc000456f5c>]
> >>>>>>>> pstate:
> >>>>>>>> 80000145
> >>>>>>>> [21147.996295] Process kworker/u16:8 (pid: 23935, stack limit =
> >>>>>>>> 0xffffffc007880058)
> >>>>>>>> [21148.003673] Call trace:
> >>>>>>>> [21148.006117] [<ffffffc000361fa8>] sysfs_create_dir+0x34/0xd4
> >>>>>>>> [21148.011663] [<ffffffc000456f58>] kobject_add_internal+0x11c/0x24c
> >>>>>>>> [21148.017737] [<ffffffc0004573ec>] kobject_add+0xc8/0xe4
> >>>>>>>> [21148.022863] [<ffffffc00060a410>] device_add+0x11c/0x600
> >>>>>>>> [21148.028069] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
> >>>>>>>> [21148.033886] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
> >>>>>>>> [21148.039439] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
> >>>>>>>> [21148.044822] [<ffffffc00061c65c>]
> >>>>>>>> __async_dev_cache_fw_image+0x14/0x54
> >>>>>>>> [21148.051249] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
> >>>>>>>> [21148.057064] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
> >>>>>>>> [21148.062792] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
> >>>>>>>> [21148.068263] [<ffffffc00023e500>] kthread+0xac/0xb8
> >>>>>>>> [21148.073039] Code: b5000094 14000026 d000ac54 9125a294 (7940c281)
> >>>>>>>> [21148.080312] ---[ end trace dabc98ea48b8ba5a ]---
> >>>>>>>>
> >>>>>>>> Below is the callstack where the parent is freed from the slub_debug
> >>>>>>>> info.
> >>>>>>>> 0xFFFFFFC000608D74 \\vmlinux\base/core\class_dir_release+0x0C
> >>>>>>>> 0xFFFFFFC0002FE6BC \\vmlinux\slub\free_debug_processing\fail+0x114
> >>>>>>>> 0xFFFFFFC0002FE7A0 \\vmlinux\slub\__slab_free+0x44
> >>>>>>>> 0xFFFFFFC0002FEE60 \\vmlinux\slub\kfree+0x1F8
> >>>>>>>> 0xFFFFFFC000608D70 \\vmlinux\base/core\class_dir_release+0x8
> >>>>>>>> 0xFFFFFFC000456D70 \\vmlinux\kobject\kobject_release+0x134
> >>>>>>>> 0xFFFFFFC000456E10 \\vmlinux\kobject\kobject_put+0x58
> >>>>>>>> 0xFFFFFFC000456C28 \\vmlinux\kobject\kobject_del+0x64
> >>>>>>>> 0xFFFFFFC000609D78 \\vmlinux\base/core\device_del+0x150
> >>>>>>>> 0xFFFFFFC00061C524
> >>>>>>>> \\vmlinux\firmware_class\_request_firmware\out+0x71C
> >>>>>>>> 0xFFFFFFC00061C58C \\vmlinux\firmware_class\request_firmware+0x30
> >>>>>>>> 0xFFFFFFC00061C5EC \\vmlinux\firmware_class\cache_firmware+0x54
> >>>>>>>> 0xFFFFFFC00061C65C
> >>>>>>>> \\vmlinux\firmware_class\__async_dev_cache_fw_image+0x14
> >>>>>>>> 0xFFFFFFC000243794 \\vmlinux\async\async_run_entry_fn+0x6C
> >>>>>>>> 0xFFFFFFC000237E88 \\vmlinux\workqueue\process_one_work+0x264
> >>>>>>>> 0xFFFFFFC0002392C0 \\vmlinux\workqueue\worker_thread\recheck+0x1A0
> >>>>>>>> 0xFFFFFFC00023E500 \\vmlinux\kthread\kthread+0xAC
> >>>>>>>>
> >>>>>>>> For similar type of issue, I see there is an existing fix: "sysfs:
> >>>>>>>> driver
> >>>>>>>> core: Fix glue dir race condition by gdp_mutex".
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/base/core.c?id=e4a60d139060975eb956717e4f63ae348d4d8cc5
> >>>>>>>>
> >>>>>>>> Still I am able to reproduce the issue. I have verified this race on
> >>>>>>>> kernels: 3.10, 3.18 and 4.4.
> >>>>>>>>
> >>>>>>>> I followed below procedure to reproduce the issue:
> >>>>>>>> 1. Enable "CONFIG_DEBUG_KOBJECT_RELEASE"
> >>>>>>>> 2. Use test_firmware, modified the test for async calls
> >>>>>>>> 3. Replace WARN_ON with BUG_ON in kref_get().
> >>>>>>>> I ran below script from shell:
> >>>>>>>> count=0
> >>>>>>>> while [ 1 ]
> >>>>>>>> do
> >>>>>>>> echo 3 >
> >>>>>>>> /sys/devices/virtual/misc/test_firmware/trigger_ufw
> >>>>>>>> if [ $? -ne 0 ]; then
> >>>>>>>> print "Exiting.. "
> >>>>>>>> exit 1
> >>>>>>>> fi
> >>>>>>>> count=$(($count +1))
> >>>>>>>> echo "count $count"
> >>>>>>>> done
> >>>>>>>>
> >>>>>>>> With below patch, I could reproduce the issue in the 2nd iteration.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> I will run your test script to see if there is new findings.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -------------------------8<----------------------------------------------
> >>>>>>>>
> >>>>>>>> To reproduce the race condition, use firmware test and
> >>>>>>>> CONFIG_DEBUG_KOBJECT_RELEASE=y, which delays kobject
> >>>>>>>> releases. So that we can catch the bug easily.
> >>>>>>>>
> >>>>>>>> Enabled below flags in kernel config file:
> >>>>>>>>
> >>>>>>>> +CONFIG_DEBUG_KOBJECT_RELEASE=y
> >>>>>>>> +CONFIG_TEST_FIRMWARE=y
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Looks I can't reproduce the issue with your approach.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> diff --git a/include/linux/kref.h b/include/linux/kref.h
> >>>>>>>> index 484604d..727fb24 100644
> >>>>>>>> --- a/include/linux/kref.h
> >>>>>>>> +++ b/include/linux/kref.h
> >>>>>>>> @@ -44,7 +44,7 @@ static inline void kref_get(struct kref *kref)
> >>>>>>>> * condition when this kref is freeing by some other thread
> >>>>>>>> right
> >>>>>>>> now.
> >>>>>>>> * In this case one should use kref_get_unless_zero()
> >>>>>>>> */
> >>>>>>>> - WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
> >>>>>>>> + BUG_ON(atomic_inc_return(&kref->refcount) < 2);
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> /**
> >>>>>>>> diff --git a/lib/test_firmware.c b/lib/test_firmware.c
> >>>>>>>> old mode 100644
> >>>>>>>> new mode 100755
> >>>>>>>> index 86374c1..14c9598
> >>>>>>>> --- a/lib/test_firmware.c
> >>>>>>>> +++ b/lib/test_firmware.c
> >>>>>>>> @@ -18,6 +18,7 @@
> >>>>>>>> #include <linux/miscdevice.h>
> >>>>>>>> #include <linux/slab.h>
> >>>>>>>> #include <linux/uaccess.h>
> >>>>>>>> +#include <linux/async.h>
> >>>>>>>>
> >>>>>>>> +
> >>>>>>>> +static ASYNC_DOMAIN_EXCLUSIVE(ufw_domain);
> >>>>>>>> +
> >>>>>>>> +static void __async_ufw_work(void *name,
> >>>>>>>> + async_cookie_t cookie)
> >>>>>>>> +{
> >>>>>>>> + const struct firmware *fw;
> >>>>>>>> + const char *fw_name = name;
> >>>>>>>> + int ret;
> >>>>>>>> +
> >>>>>>>> + pr_err("requesting %s\n", fw_name);
> >>>>>>>> +
> >>>>>>>> + ret = request_firmware(&fw, fw_name, NULL);
> >>>>>>>> + pr_err("loaded: %zu\n", fw ? fw->size : 0);
> >>>>>>>> + if (!ret)
> >>>>>>>> + kfree(fw);
> >>>>>>>> +
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static char *name[] = { "fw1.bin", "fw2.bin"};
> >>>>>>>> +
> >>>>>>>> +static ssize_t trigger_ufw_store(struct device *dev,
> >>>>>>>> + struct device_attribute *attr,
> >>>>>>>> + const char *buf, size_t count)
> >>>>>>>> +{
> >>>>>>>> + int rc, i, c=0;
> >>>>>>>> + u32 start;
> >>>>>>>> + char *fw_name;
> >>>>>>>> +
> >>>>>>>> + rc = kstrtou32(buf, 0, &start);
> >>>>>>>> + if (rc){
> >>>>>>>> + pr_err("Invalid option\n");
> >>>>>>>> + return rc;
> >>>>>>>> + }
> >>>>>>>> + pr_err(" no of iterations %d\n", start);
> >>>>>>>> +
> >>>>>>>> + for (i=0; i< start; i++)
> >>>>>>>> + {
> >>>>>>>> + if (c >= 2)
> >>>>>>>> + c = 0;
> >>>>>>>> + fw_name = name[c++];
> >>>>>>>> + async_schedule_domain(__async_ufw_work, (void *)fw_name,
> >>>>>>>> &ufw_domain);
> >>>>>>>> + }
> >>>>>>>> + async_synchronize_full_domain(&ufw_domain);
> >>>>>>>> + return count;
> >>>>>>>> +}
> >>>>>>>>
> >>>>>>>> static DEVICE_ATTR_WO(trigger_request);
> >>>>>>>> +static DEVICE_ATTR_WO(trigger_ufw);
> >>>>>>>>
> >>>>>>>> static int __init test_firmware_init(void)
> >>>>>>>> {
> >>>>>>>> @@ -92,6 +140,12 @@ static int __init test_firmware_init(void)
> >>>>>>>> goto dereg;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> + rc = device_create_file(test_fw_misc_device.this_device,
> >>>>>>>> + &dev_attr_trigger_ufw);
> >>>>>>>> + if (rc) {
> >>>>>>>> + pr_err("could not create sysfs interface: %d\n", rc);
> >>>>>>>> + goto dereg;
> >>>>>>>> + }
> >>>>>>>> pr_warn("interface ready\n");
> >>>>>>>>
> >>>>>>>> return 0;
> >>>>>>>> @@ -106,7 +160,9 @@ static void __exit test_firmware_exit(void)
> >>>>>>>> {
> >>>>>>>> release_firmware(test_firmware);
> >>>>>>>> device_remove_file(test_fw_misc_device.this_device,
> >>>>>>>> - &dev_attr_trigger_request);
> >>>>>>>> + &dev_attr_trigger_request);
> >>>>>>>> + device_remove_file(test_fw_misc_device.this_device,
> >>>>>>>> + &dev_attr_trigger_ufw);
> >>>>>>>> misc_deregister(&test_fw_misc_device);
> >>>>>>>> pr_warn("removed interface\n");
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ----------------------------8<--------------------------------------------
> >>>>>>>>
> >>>>>>>> Thanks and regards,
> >>>>>>>> Chandrasekhar L.
> >>>>>>>> --
> >>>>>>>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> >>>>>>>> member of
> >>>>>>>> Code Aurora Forum,
> >>>>>>>> a Linux Foundation Collaborative Project.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-03-17 05:50:36

by Lingutla Chandrasekhar

[permalink] [raw]

Subject: Re: Possible race at user mode helper in request_firmware

Hi Ming,

On 03/16/2016 03:55 PM, Ming Lei wrote:
> On Wed, 16 Mar 2016 15:21:24 +0530
> Chandra Sekhar Lingutla <[email protected]> wrote:
>
>> Hi Ming,
>>
>> On 03/16/2016 02:34 PM, Chandra Sekhar Lingutla wrote:
>>> On 03/16/2016 02:28 PM, Ming Lei wrote:
>>>> On Wed, Mar 16, 2016 at 4:48 PM, Chandra Sekhar Lingutla
>>>> <[email protected]> wrote:
>>>>> On 03/16/2016 02:12 PM, Ming Lei wrote:
>>>>>>
>>>>>> On Wed, Mar 16, 2016 at 3:02 PM, Chandra Sekhar Lingutla
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> Hi Ming,
>>>>>>>
>>>>>>>
>>>>>>> On 03/16/2016 07:36 AM, Ming Lei wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 15, 2016 at 11:09 PM, Ming Lei <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On Mon, Mar 14, 2016 at 8:00 PM, <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I see possible race in _request_firmware_load function, on which I
>>>>>>>>>> wanted to
>>>>>>>>>> take your opinion.
>>>>>>>>>>
>>>>>>>>>> When system is going to low power mode, device_cache_fw_images() is
>>>>>>>>>> called
>>>>>>>>>> from pm notifier which schedules asynchronous workers to cache devices
>>>>>>>>>> firmware.
>>>>>>>>>> If more than 2 async requests falls under user helper mode, then there
>>>>>>>>>> is a
>>>>>>>>>> use after free of "firmware" directory kobject.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If two requests are for getting same firmware image, only one can be
>>>>>>>>> sent
>>>>>>>>> out and be completed with user helper, please see
>>>>>>>>> fw_lookup_and_allocate_buf(),
>>>>>>>>> so the race shouldn't have been triggered in this case.
>>>>>>>>>
>>>>>>>>> So I guess your test must be involved with at least two different
>>>>>>>>> requests.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The race will hit, when one asynchronous worker calls device_del()
>>>>>>>>>> while
>>>>>>>>>> other asynchronous worker calls get_device_parent() in
>>>>>>>>>> _request_firmware_load().
>>>>>>>>>> After loading firmware, first worker calls device_del(), and
>>>>>>>>>> "firmware"
>>>>>>>>>> directory ref count is 1, so cleanup_glue_dir() calls kobject_put()
>>>>>>>>>> with
>>>>>>>>>> gdp_mutex lock held.
>>>>>>>>>> Meanwhile, second async worker calls device_add(), and
>>>>>>>>>> get_device_parent()
>>>>>>>>>> is still able to find the firmware directory kobject in ksets after
>>>>>>>>>> getting
>>>>>>>>>> gdp_mutex.
>>>>>>>>>> This kobject gets added as parent to the second device kobject.
>>>>>>>>>> By the time of accessing the parent kobject, its ref count was 0 and
>>>>>>>>>> prints
>>>>>>>>>> below warning stack followed by crash.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The trick here is just that we don't use 'struct device' in
>>>>>>>>> device_cache_fw_images(),
>>>>>>>>> then the firmware device's parent is null during calling device_add()
>>>>>>>>> from
>>>>>>>>> _request_firmware_load(), but this should be allowed by driver core.
>>>>>>>>>
>>>>>>>>> From firmware class view, both device_add()/device_del() is run with
>>>>>>>>> pair,
>>>>>>>>> and driver core still complains during get_device_parent(), so
>>>>>>>>> something
>>>>>>>>> inside driver core should be be wrong, IMO.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thinking about the issue further, I believe it is a false positive which
>>>>>>>> is
>>>>>>>> caused by CONFIG_DEBUG_KOBJECT_RELEASE, follows the story:
>>>>>>>>
>>>>>>>> 1) device_add() path
>>>>>>>>
>>>>>>>> get_device_parent()
>>>>>>>> ......
>>>>>>>> mutex_lock(&gdp_mutex);
>>>>>>>>
>>>>>>>> /* find our class-directory at the parent and
>>>>>>>> reference
>>>>>>>> it */
>>>>>>>> spin_lock(&dev->class->p->glue_dirs.list_lock);
>>>>>>>> list_for_each_entry(k, &dev->class->p->glue_dirs.list,
>>>>>>>> entry)
>>>>>>>> if (k->parent == parent_kobj) {
>>>>>>>> kobj = kobject_get(k);
>>>>>>>> break;
>>>>>>>> }
>>>>>>>> spin_unlock(&dev->class->p->glue_dirs.list_lock);
>>>>>>>> if (kobj) {
>>>>>>>> mutex_unlock(&gdp_mutex);
>>>>>>>> return kobj;
>>>>>>>> }
>>>>>>>> k = class_dir_create_and_add(dev->class, parent_kobj);
>>>>>>>> mutex_unlock(&gdp_mutex);
>>>>>>>> ......
>>>>>>>>
>>>>>>>> 2) device_del() path
>>>>>>>>
>>>>>>>> cleanup_device_parent()
>>>>>>>> ->cleanup_glue_dir()
>>>>>>>> ->mutex_lock(&gdp_mutex);
>>>>>>>> ->kobject_put(glue_dir);
>>>>>>>> ->kobject_cleanup()
>>>>>>>> ->kobject_del()
>>>>>>>> ->kobj_kset_leave()
>>>>>>>> ->mutex_unlock(&gdp_mutex);
>>>>>>>>
>>>>>>>> So gdp_mutex is held wrt. both looking up/adding the parent kobject
>>>>>>>> and removing the parent kobject. Actually with this lock, both glur_dir
>>>>>>>> kobject's referece counting and joining/leaving kset are run atomically.
>>>>>>>>
>>>>>>>> The race you found is triggered by CONFIG_DEBUG_KOBJECT_RELEASE,
>>>>>>>> which causes kobject_cleanup() to be run in workqueue context and the
>>>>>>>> lock of gdp_mutex can't be hold in that situation anymore. That is the
>>>>>>>> root
>>>>>>>> cause for your race.
>>>>>>>>
>>>>>>>> That also said DEBUG_KOBJECT_RELEASE might cause trouble in
>>>>>>>> some situations.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Ming
>>>>>>>>
>>>>>>> I agree that, CONFIG_DEBUG_KOBJECT_RELEASE can cause the race, as it
>>>>>>> delays
>>>>>>> call to kobject_cleanup() which deletes kobj entry in glue_dir list.
>>>>>>>
>>>>>>> I see the race even after disabling CONFIG_DEBUG_KOBJECT_RELEASE.
>>>>>>>
>>>>>>> The race is between kobject_put() and kobject_get() for the "firmware"
>>>>>>> directory.
>>>>>>>
>>>>>>> In device_del() path, I see the race after the call of
>>>>>>> cleanup_device_parent().
>>>>>>> cleanup_device_parent() calls kobject_put(glue_dir) with gdp_mutex lock
>>>>>>> holding,
>>>>>>> So ideally, to call kobject_put/kobject_get(glue_dir) we need to acquire
>>>>>>> gdp_mutex.
>>>>>>>
>>>>>>> but kobject_del(kobj) will not hold the gdp_mutex lock while calling
>>>>>>> kobject_put(kobj->parent).
>>>>>>>
>>>>>>> Here is the race:
>>>>>>>
>>>>>>> Consider 2 async works calls request_firmware() for 2 different binaries
>>>>>>> (fw1.bin, fw2.bin)
>>>>>>> Async work1 got scheduled first, and it falls in user helper mode so it
>>>>>>> creates "firmware" dir
>>>>>>> and then creates "fw1.bin" directory to load data from user.
>>>>>>> After load completion or timeout, it starts cleanup dir's, meantime async
>>>>>>> work2 got scheduled and
>>>>>>> it also falls in user helper mode, and story starts here:
>>>>>>>
>>>>>>> PATH1 req fw for "fw1.bin"
>>>>>>> PATH2
>>>>>>> req fw for "fw2.bin"
>>>>>>> _request_firmware_load()
>>>>>>> device_del(dev1)
>>>>>>>
>>>>>>> ->cleanup_device_parent()
>>>>>>>
>>>>>>> mutex_lock(&gdp_mutex);
>>>>>>>
>>>>>>> _request_firmware_load("fw2.bin")
>>>>>>> kobject_put(kobj->parent); --> firmware ref ->1
>>>>>>> ->device_add(dev2)
>>>>>>> mutex_unlock(&gdp_mutex);
>>>>>>> ->get_device_parent()
>>>>>>> ->kobject_del()
>>>>>>> mutex_lock(&gdp_mutex)
>>>>>>> kobj_kset_leave(kobj); -->del "fw1.bin" kobj in kset
>>>>>>> list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
>>>>>>> kobject_put(kobj->parent); /* No lock :)*/
>>>>>>> if (k->parent == parent_kobj) {
>>>>>>>
>>>>>>> ->kref_put() --> firmware ref ->0
>>>>>>> kobj = kobject_get(k); -> k is "firmware"
>>>>>>> dir
>>>>>>> ->kboject_release()
>>>>>>> break;
>>>>>>>
>>>>>>> ->kobject_del()
>>>>>>> }
>>>>>>>
>>>>>>> /*here deletes firmware entry in kset*/
>>>>>>> mutex_unlock(&gdp_mutex)
>>>>>>>
>>>>>>> ->dev2->kobj.parent = kobj -> kobj is released already
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> PATH 1 is started removing the kobject of "fw1.bin" and its parent
>>>>>>> "firmware"
>>>>>>> .
>>>>>>> cleanup_device_parent(dev1) makes firmware ref count to 1, by calling
>>>>>>> kobject_put(firmware) with gdp_mutex held.
>>>>>>>
>>>>>>> and kobject_del(dev1->kobj) will again calls kobject_put(kobj->parent)
>>>>>>> without holding gdp_mutex.
>>>>>>>
>>>>>>> In PATH2, glue_dir is able to find "firmware" kobj, as firmware dir kobj
>>>>>>> is
>>>>>>> deleted from the glue_dir list at kboject_release().
>>>>>>> and it calls kobject_get(firmware), by the time PATH1 calls
>>>>>>> kobject_release(firmware).
>>>>>>>
>>>>>>> So PATH2 got reference of freed firmware kobj and added as parent to
>>>>>>> "fw2.bin" kobj.
>>>>>>>
>>>>>>> I think, here is possible fix for this race:
>>>>>>>
>>>>>>> ------------------------8<-------------------------------------
>>>>>>> Signed-off-by: Lingutla Chandrasekhar <[email protected]>
>>>>>>>
>>>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>>>>> index 5d8b713..47bc082 100644
>>>>>>> --- a/drivers/base/core.c
>>>>>>> +++ b/drivers/base/core.c
>>>>>>> @@ -1224,7 +1224,13 @@ void device_del(struct device *dev)
>>>>>>> BUS_NOTIFY_REMOVED_DEVICE,
>>>>>>> dev);
>>>>>>> kobject_uevent(&dev->kobj, KOBJ_REMOVE);
>>>>>>> cleanup_device_parent(dev);
>>>>>>> - kobject_del(&dev->kobj);
>>>>>>> +
>>>>>>> + if (dev->kobj.parent->kset == &dev->class->p->glue_dirs) {
>>>>>>> + mutex_lock(&gdp_mutex);
>>>>>>> + kobject_del(&dev->kobj);
>>>>>>> + mutex_unlock(&gdp_mutex);
>>>>>>> + } else
>>>>>>> + kobject_del(&dev->kobj);
>>>>>>> put_device(parent);
>>>>>>> }
>>>>>>> EXPORT_SYMBOL_GPL(device_del);
>>>>>>>
>>>>>>> --------------------------->8-----------------------------------
>>>>>>
>>>>>>
>>>>>> OK, one simpler way is to move cleanup_device_parent(dev)
>>>>>> after kobject_del(&dev->kobj), which looks more clean too.
>>>>>>
>>>>> No, We can't move that, bcz kobject_del(&dev->kobj) will clear the parent.
>>>>
>>>> Yeah, but I think it is better to do that and just call
>>>> cleanup_glue_dir() directly
>>>> because cleanup_device_parent() should have released the glue_dir.
>>>>
>> Sorry, i am not able understand correctly, If we reorder kobje_del and cleanup_device_parent(),
>> kobject_del() clears kobj->parent, and cleanup_device_parent() gets glue_dir with dev->kobj->parent(),
>> cleanup_glue_dir() returns without calling kobject_put() as glue_dir is NULL, so firmware dir becomes dangling kobj.
>> Am I missing something here?
>
> I mean something like below, and what do you think about it?
>
> ---
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 0a8bdad..3f2e1f8 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -836,11 +836,31 @@ static struct kobject *get_device_parent(struct device *dev,
> return NULL;
> }
>
> +static inline bool live_in_glue_dir(struct kobject *kobj,
> + struct device *dev)
> +{
> + if (!kobj || !dev->class ||
> + kobj->kset != &dev->class->p->glue_dirs)
> + return true;
> + return false;
> +}
> +
> +static inline struct kobject *get_glue_dir(struct device *dev)
> +{
> + if (live_in_glue_dir(&dev->kobj, dev))
> + return dev->kobj.parent;
> + return NULL;
> +}
> +
> +/*
> + * make sure cleaning up dir as the last step, we need to make
> + * sure .release handler of kobject is run with holding the
> + * global lock
> + */
> static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
> {
> /* see if we live in a "glue" directory */
> - if (!glue_dir || !dev->class ||
> - glue_dir->kset != &dev->class->p->glue_dirs)
> + if (!live_in_glue_dir(glue_dir, dev))
> return;
>
> mutex_lock(&gdp_mutex);
> @@ -848,11 +868,6 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
> mutex_unlock(&gdp_mutex);
> }
>
> -static void cleanup_device_parent(struct device *dev)
> -{
> - cleanup_glue_dir(dev, dev->kobj.parent);
> -}
> -
> static int device_add_class_symlinks(struct device *dev)
> {
> struct device_node *of_node = dev_of_node(dev);
> @@ -1028,6 +1043,7 @@ int device_add(struct device *dev)
> struct kobject *kobj;
> struct class_interface *class_intf;
> int error = -EINVAL;
> + struct kobject *glue_dir = NULL;
>
> dev = get_device(dev);
> if (!dev)
> @@ -1072,8 +1088,10 @@ int device_add(struct device *dev)
> /* first, register with generic layer. */
> /* we require the name to be set before, and pass NULL */
> error = kobject_add(&dev->kobj, dev->kobj.parent, NULL);
> - if (error)
> + if (error) {
> + glue_dir = get_glue_dir(dev);
> goto Error;
> + }
>
> /* notify platform of device entry */
> if (platform_notify)
> @@ -1154,9 +1172,10 @@ done:
> device_remove_file(dev, &dev_attr_uevent);
> attrError:
> kobject_uevent(&dev->kobj, KOBJ_REMOVE);
> + glue_dir = get_glue_dir(dev);
> kobject_del(&dev->kobj);
> Error:
> - cleanup_device_parent(dev);
> + cleanup_glue_dir(dev, glue_dir);
> put_device(parent);
> name_error:
> kfree(dev->p);
> @@ -1232,6 +1251,7 @@ EXPORT_SYMBOL_GPL(put_device);
> void device_del(struct device *dev)
> {
> struct device *parent = dev->parent;
> + struct kobject *glue_dir = NULL;
> struct class_interface *class_intf;
>
> /* Notify clients of device removal. This call must come
> @@ -1276,8 +1296,9 @@ void device_del(struct device *dev)
> blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> BUS_NOTIFY_REMOVED_DEVICE, dev);
> kobject_uevent(&dev->kobj, KOBJ_REMOVE);
> - cleanup_device_parent(dev);
> + glue_dir = get_glue_dir(dev);
> kobject_del(&dev->kobj);
> + cleanup_glue_dir(dev, glue_dir);
> put_device(parent);
> }
> EXPORT_SYMBOL_GPL(device_del);
>
>
This looks more clean to me.

>
>
>
>
>>
>>>> BTW, failure path of device_add() has been broken already.
>>>>
>>> Yah, forgot to mention that. we need to reorder like above fix, can we?.
>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [21147.455038] ------------[ cut here ]------------
>>>>>>>>>> [21147.458910] WARNING: at <...>/kernel/include/linux/kref.h:47
>>>>>>>>>> kobject_get+0x50/0x68()
>>>>>>>>>> [21147.481199] Modules linked in: core_ctl(PO) qdrbg_module(O)
>>>>>>>>>> qcrypto_module(O)
>>>>>>>>>> [21147.481256] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O
>>>>>>>>>> 3.10.84-g0957845-00427-g56a05c2 #1
>>>>>>>>>> [21147.481284] Workqueue: events_unbound async_run_entry_fn
>>>>>>>>>> [21147.481296] Call trace:
>>>>>>>>>> [21147.481315] [<ffffffc000206b60>] dump_backtrace+0x0/0x270
>>>>>>>>>> [21147.481331] [<ffffffc000206de0>] show_stack+0x10/0x1c
>>>>>>>>>> [21147.481353] [<ffffffc000d569ec>] dump_stack+0x1c/0x28
>>>>>>>>>> [21147.481370] [<ffffffc00021cc18>] warn_slowpath_common+0x74/0x9c
>>>>>>>>>> [21147.481383] [<ffffffc00021ce4c>] warn_slowpath_null+0x14/0x20
>>>>>>>>>> [21147.481397] [<ffffffc000456b10>] kobject_get+0x4c/0x68
>>>>>>>>>> [21147.481416] [<ffffffc000609f64>] get_device_parent+0xa0/0x194
>>>>>>>>>> [21147.481430] [<ffffffc00060a3f4>] device_add+0x100/0x600
>>>>>>>>>> [21147.481446] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
>>>>>>>>>> [21147.481459] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
>>>>>>>>>> [21147.481473] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
>>>>>>>>>> [21147.481490] [<ffffffc00061c65c>]
>>>>>>>>>> __async_dev_cache_fw_image+0x14/0x54
>>>>>>>>>> [21147.481505] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
>>>>>>>>>> [21147.481521] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
>>>>>>>>>> [21147.481535] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
>>>>>>>>>> [21147.481553] [<ffffffc00023e500>] kthread+0xac/0xb8
>>>>>>>>>> [21147.481563] ---[ end trace dabc98ea48b8ba59 ]---
>>>>>>>>>>
>>>>>>>>>> [21147.486436] Unable to handle kernel paging request at virtual
>>>>>>>>>> address
>>>>>>>>>> 0x6b6b6b6b6b6b6bcb
>>>>>>>>>> [21147.493816] pgd = ffffffc0a982d000
>>>>>>>>>> [21147.498337] [6b6b6b6b6b6b6bcb] *pgd=0000000000000000
>>>>>>>>>> [21147.502287] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>>>>>>>>> [21147.507826] Modules linked in: core_ctl(PO) qdrbg_module(O)
>>>>>>>>>> qcrypto_module(O)
>>>>>>>>>> [21147.514951] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O
>>>>>>>>>> 3.10.84-g0957845-00427-g56a05c2 #1
>>>>>>>>>> [21147.524686] Workqueue: events_unbound async_run_entry_fn
>>>>>>>>>> [21147.529965] task: ffffffc008138000 ti: ffffffc007880000 task.ti:
>>>>>>>>>> ffffffc007880000
>>>>>>>>>> [21147.537438] PC is at sysfs_create_dir+0x34/0xd4
>>>>>>>>>> [21147.541949] LR is at kobject_add_internal+0x120/0x24c
>>>>>>>>>> [21147.546979] pc : [<ffffffc000361fa8>] lr : [<ffffffc000456f5c>]
>>>>>>>>>> pstate:
>>>>>>>>>> 80000145
>>>>>>>>>> [21147.996295] Process kworker/u16:8 (pid: 23935, stack limit =
>>>>>>>>>> 0xffffffc007880058)
>>>>>>>>>> [21148.003673] Call trace:
>>>>>>>>>> [21148.006117] [<ffffffc000361fa8>] sysfs_create_dir+0x34/0xd4
>>>>>>>>>> [21148.011663] [<ffffffc000456f58>] kobject_add_internal+0x11c/0x24c
>>>>>>>>>> [21148.017737] [<ffffffc0004573ec>] kobject_add+0xc8/0xe4
>>>>>>>>>> [21148.022863] [<ffffffc00060a410>] device_add+0x11c/0x600
>>>>>>>>>> [21148.028069] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
>>>>>>>>>> [21148.033886] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
>>>>>>>>>> [21148.039439] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
>>>>>>>>>> [21148.044822] [<ffffffc00061c65c>]
>>>>>>>>>> __async_dev_cache_fw_image+0x14/0x54
>>>>>>>>>> [21148.051249] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
>>>>>>>>>> [21148.057064] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
>>>>>>>>>> [21148.062792] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
>>>>>>>>>> [21148.068263] [<ffffffc00023e500>] kthread+0xac/0xb8
>>>>>>>>>> [21148.073039] Code: b5000094 14000026 d000ac54 9125a294 (7940c281)
>>>>>>>>>> [21148.080312] ---[ end trace dabc98ea48b8ba5a ]---
>>>>>>>>>>
>>>>>>>>>> Below is the callstack where the parent is freed from the slub_debug
>>>>>>>>>> info.
>>>>>>>>>> 0xFFFFFFC000608D74 \\vmlinux\base/core\class_dir_release+0x0C
>>>>>>>>>> 0xFFFFFFC0002FE6BC \\vmlinux\slub\free_debug_processing\fail+0x114
>>>>>>>>>> 0xFFFFFFC0002FE7A0 \\vmlinux\slub\__slab_free+0x44
>>>>>>>>>> 0xFFFFFFC0002FEE60 \\vmlinux\slub\kfree+0x1F8
>>>>>>>>>> 0xFFFFFFC000608D70 \\vmlinux\base/core\class_dir_release+0x8
>>>>>>>>>> 0xFFFFFFC000456D70 \\vmlinux\kobject\kobject_release+0x134
>>>>>>>>>> 0xFFFFFFC000456E10 \\vmlinux\kobject\kobject_put+0x58
>>>>>>>>>> 0xFFFFFFC000456C28 \\vmlinux\kobject\kobject_del+0x64
>>>>>>>>>> 0xFFFFFFC000609D78 \\vmlinux\base/core\device_del+0x150
>>>>>>>>>> 0xFFFFFFC00061C524
>>>>>>>>>> \\vmlinux\firmware_class\_request_firmware\out+0x71C
>>>>>>>>>> 0xFFFFFFC00061C58C \\vmlinux\firmware_class\request_firmware+0x30
>>>>>>>>>> 0xFFFFFFC00061C5EC \\vmlinux\firmware_class\cache_firmware+0x54
>>>>>>>>>> 0xFFFFFFC00061C65C
>>>>>>>>>> \\vmlinux\firmware_class\__async_dev_cache_fw_image+0x14
>>>>>>>>>> 0xFFFFFFC000243794 \\vmlinux\async\async_run_entry_fn+0x6C
>>>>>>>>>> 0xFFFFFFC000237E88 \\vmlinux\workqueue\process_one_work+0x264
>>>>>>>>>> 0xFFFFFFC0002392C0 \\vmlinux\workqueue\worker_thread\recheck+0x1A0
>>>>>>>>>> 0xFFFFFFC00023E500 \\vmlinux\kthread\kthread+0xAC
>>>>>>>>>>
>>>>>>>>>> For similar type of issue, I see there is an existing fix: "sysfs:
>>>>>>>>>> driver
>>>>>>>>>> core: Fix glue dir race condition by gdp_mutex".
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/base/core.c?id=e4a60d139060975eb956717e4f63ae348d4d8cc5
>>>>>>>>>>
>>>>>>>>>> Still I am able to reproduce the issue. I have verified this race on
>>>>>>>>>> kernels: 3.10, 3.18 and 4.4.
>>>>>>>>>>
>>>>>>>>>> I followed below procedure to reproduce the issue:
>>>>>>>>>> 1. Enable "CONFIG_DEBUG_KOBJECT_RELEASE"
>>>>>>>>>> 2. Use test_firmware, modified the test for async calls
>>>>>>>>>> 3. Replace WARN_ON with BUG_ON in kref_get().
>>>>>>>>>> I ran below script from shell:
>>>>>>>>>> count=0
>>>>>>>>>> while [ 1 ]
>>>>>>>>>> do
>>>>>>>>>> echo 3 >
>>>>>>>>>> /sys/devices/virtual/misc/test_firmware/trigger_ufw
>>>>>>>>>> if [ $? -ne 0 ]; then
>>>>>>>>>> print "Exiting.. "
>>>>>>>>>> exit 1
>>>>>>>>>> fi
>>>>>>>>>> count=$(($count +1))
>>>>>>>>>> echo "count $count"
>>>>>>>>>> done
>>>>>>>>>>
>>>>>>>>>> With below patch, I could reproduce the issue in the 2nd iteration.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I will run your test script to see if there is new findings.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------8<----------------------------------------------
>>>>>>>>>>
>>>>>>>>>> To reproduce the race condition, use firmware test and
>>>>>>>>>> CONFIG_DEBUG_KOBJECT_RELEASE=y, which delays kobject
>>>>>>>>>> releases. So that we can catch the bug easily.
>>>>>>>>>>
>>>>>>>>>> Enabled below flags in kernel config file:
>>>>>>>>>>
>>>>>>>>>> +CONFIG_DEBUG_KOBJECT_RELEASE=y
>>>>>>>>>> +CONFIG_TEST_FIRMWARE=y
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looks I can't reproduce the issue with your approach.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/linux/kref.h b/include/linux/kref.h
>>>>>>>>>> index 484604d..727fb24 100644
>>>>>>>>>> --- a/include/linux/kref.h
>>>>>>>>>> +++ b/include/linux/kref.h
>>>>>>>>>> @@ -44,7 +44,7 @@ static inline void kref_get(struct kref *kref)
>>>>>>>>>> * condition when this kref is freeing by some other thread
>>>>>>>>>> right
>>>>>>>>>> now.
>>>>>>>>>> * In this case one should use kref_get_unless_zero()
>>>>>>>>>> */
>>>>>>>>>> - WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>>>>>>>>>> + BUG_ON(atomic_inc_return(&kref->refcount) < 2);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> /**
>>>>>>>>>> diff --git a/lib/test_firmware.c b/lib/test_firmware.c
>>>>>>>>>> old mode 100644
>>>>>>>>>> new mode 100755
>>>>>>>>>> index 86374c1..14c9598
>>>>>>>>>> --- a/lib/test_firmware.c
>>>>>>>>>> +++ b/lib/test_firmware.c
>>>>>>>>>> @@ -18,6 +18,7 @@
>>>>>>>>>> #include <linux/miscdevice.h>
>>>>>>>>>> #include <linux/slab.h>
>>>>>>>>>> #include <linux/uaccess.h>
>>>>>>>>>> +#include <linux/async.h>
>>>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> +static ASYNC_DOMAIN_EXCLUSIVE(ufw_domain);
>>>>>>>>>> +
>>>>>>>>>> +static void __async_ufw_work(void *name,
>>>>>>>>>> + async_cookie_t cookie)
>>>>>>>>>> +{
>>>>>>>>>> + const struct firmware *fw;
>>>>>>>>>> + const char *fw_name = name;
>>>>>>>>>> + int ret;
>>>>>>>>>> +
>>>>>>>>>> + pr_err("requesting %s\n", fw_name);
>>>>>>>>>> +
>>>>>>>>>> + ret = request_firmware(&fw, fw_name, NULL);
>>>>>>>>>> + pr_err("loaded: %zu\n", fw ? fw->size : 0);
>>>>>>>>>> + if (!ret)
>>>>>>>>>> + kfree(fw);
>>>>>>>>>> +
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static char *name[] = { "fw1.bin", "fw2.bin"};
>>>>>>>>>> +
>>>>>>>>>> +static ssize_t trigger_ufw_store(struct device *dev,
>>>>>>>>>> + struct device_attribute *attr,
>>>>>>>>>> + const char *buf, size_t count)
>>>>>>>>>> +{
>>>>>>>>>> + int rc, i, c=0;
>>>>>>>>>> + u32 start;
>>>>>>>>>> + char *fw_name;
>>>>>>>>>> +
>>>>>>>>>> + rc = kstrtou32(buf, 0, &start);
>>>>>>>>>> + if (rc){
>>>>>>>>>> + pr_err("Invalid option\n");
>>>>>>>>>> + return rc;
>>>>>>>>>> + }
>>>>>>>>>> + pr_err(" no of iterations %d\n", start);
>>>>>>>>>> +
>>>>>>>>>> + for (i=0; i< start; i++)
>>>>>>>>>> + {
>>>>>>>>>> + if (c >= 2)
>>>>>>>>>> + c = 0;
>>>>>>>>>> + fw_name = name[c++];
>>>>>>>>>> + async_schedule_domain(__async_ufw_work, (void *)fw_name,
>>>>>>>>>> &ufw_domain);
>>>>>>>>>> + }
>>>>>>>>>> + async_synchronize_full_domain(&ufw_domain);
>>>>>>>>>> + return count;
>>>>>>>>>> +}
>>>>>>>>>>
>>>>>>>>>> static DEVICE_ATTR_WO(trigger_request);
>>>>>>>>>> +static DEVICE_ATTR_WO(trigger_ufw);
>>>>>>>>>>
>>>>>>>>>> static int __init test_firmware_init(void)
>>>>>>>>>> {
>>>>>>>>>> @@ -92,6 +140,12 @@ static int __init test_firmware_init(void)
>>>>>>>>>> goto dereg;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> + rc = device_create_file(test_fw_misc_device.this_device,
>>>>>>>>>> + &dev_attr_trigger_ufw);
>>>>>>>>>> + if (rc) {
>>>>>>>>>> + pr_err("could not create sysfs interface: %d\n", rc);
>>>>>>>>>> + goto dereg;
>>>>>>>>>> + }
>>>>>>>>>> pr_warn("interface ready\n");
>>>>>>>>>>
>>>>>>>>>> return 0;
>>>>>>>>>> @@ -106,7 +160,9 @@ static void __exit test_firmware_exit(void)
>>>>>>>>>> {
>>>>>>>>>> release_firmware(test_firmware);
>>>>>>>>>> device_remove_file(test_fw_misc_device.this_device,
>>>>>>>>>> - &dev_attr_trigger_request);
>>>>>>>>>> + &dev_attr_trigger_request);
>>>>>>>>>> + device_remove_file(test_fw_misc_device.this_device,
>>>>>>>>>> + &dev_attr_trigger_ufw);
>>>>>>>>>> misc_deregister(&test_fw_misc_device);
>>>>>>>>>> pr_warn("removed interface\n");
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ----------------------------8<--------------------------------------------
>>>>>>>>>>
>>>>>>>>>> Thanks and regards,
>>>>>>>>>> Chandrasekhar L.
>>>>>>>>>> --
>>>>>>>>>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
>>>>>>>>>> member of
>>>>>>>>>> Code Aurora Forum,
>>>>>>>>>> a Linux Foundation Collaborative Project.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>