Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935037AbcCPIm4 (ORCPT ); Wed, 16 Mar 2016 04:42:56 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:34562 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752639AbcCPImO (ORCPT ); Wed, 16 Mar 2016 04:42:14 -0400 MIME-Version: 1.0 In-Reply-To: <56E904EC.8090802@codeaurora.org> References: <56E2768C.1050000@codeaurora.com> <56E904EC.8090802@codeaurora.org> Date: Wed, 16 Mar 2016 16:42:11 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Possible race at user mode helper in request_firmware From: Ming Lei To: Chandra Sekhar Lingutla Cc: Linux Kernel Mailing List , Greg Kroah-Hartman , linux-arm-msm@vger.kernel.org, Russell King Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 18399 Lines: 485 On Wed, Mar 16, 2016 at 3:02 PM, Chandra Sekhar Lingutla wrote: > Hi Ming, > > > On 03/16/2016 07:36 AM, Ming Lei wrote: >> >> On Tue, Mar 15, 2016 at 11:09 PM, Ming Lei wrote: >>> >>> Hi, >>> >>> On Mon, Mar 14, 2016 at 8:00 PM, wrote: >>>> >>>> Hello, >>>> >>>> I see possible race in _request_firmware_load function, on which I >>>> wanted to >>>> take your opinion. >>>> >>>> When system is going to low power mode, device_cache_fw_images() is >>>> called >>>> from pm notifier which schedules asynchronous workers to cache devices >>>> firmware. >>>> If more than 2 async requests falls under user helper mode, then there >>>> is a >>>> use after free of "firmware" directory kobject. >>> >>> >>> If two requests are for getting same firmware image, only one can be sent >>> out and be completed with user helper, please see >>> fw_lookup_and_allocate_buf(), >>> so the race shouldn't have been triggered in this case. >>> >>> So I guess your test must be involved with at least two different >>> requests. >>> >>>> >>>> The race will hit, when one asynchronous worker calls device_del() while >>>> other asynchronous worker calls get_device_parent() in >>>> _request_firmware_load(). >>>> After loading firmware, first worker calls device_del(), and "firmware" >>>> directory ref count is 1, so cleanup_glue_dir() calls kobject_put() with >>>> gdp_mutex lock held. >>>> Meanwhile, second async worker calls device_add(), and >>>> get_device_parent() >>>> is still able to find the firmware directory kobject in ksets after >>>> getting >>>> gdp_mutex. >>>> This kobject gets added as parent to the second device kobject. >>>> By the time of accessing the parent kobject, its ref count was 0 and >>>> prints >>>> below warning stack followed by crash. >>> >>> >>> The trick here is just that we don't use 'struct device' in >>> device_cache_fw_images(), >>> then the firmware device's parent is null during calling device_add() >>> from >>> _request_firmware_load(), but this should be allowed by driver core. >>> >>> From firmware class view, both device_add()/device_del() is run with >>> pair, >>> and driver core still complains during get_device_parent(), so something >>> inside driver core should be be wrong, IMO. >> >> >> Thinking about the issue further, I believe it is a false positive which >> is >> caused by CONFIG_DEBUG_KOBJECT_RELEASE, follows the story: >> >> 1) device_add() path >> >> get_device_parent() >> ...... >> mutex_lock(&gdp_mutex); >> >> /* find our class-directory at the parent and reference >> it */ >> spin_lock(&dev->class->p->glue_dirs.list_lock); >> list_for_each_entry(k, &dev->class->p->glue_dirs.list, >> entry) >> if (k->parent == parent_kobj) { >> kobj = kobject_get(k); >> break; >> } >> spin_unlock(&dev->class->p->glue_dirs.list_lock); >> if (kobj) { >> mutex_unlock(&gdp_mutex); >> return kobj; >> } >> k = class_dir_create_and_add(dev->class, parent_kobj); >> mutex_unlock(&gdp_mutex); >> ...... >> >> 2) device_del() path >> >> cleanup_device_parent() >> ->cleanup_glue_dir() >> ->mutex_lock(&gdp_mutex); >> ->kobject_put(glue_dir); >> ->kobject_cleanup() >> ->kobject_del() >> ->kobj_kset_leave() >> ->mutex_unlock(&gdp_mutex); >> >> So gdp_mutex is held wrt. both looking up/adding the parent kobject >> and removing the parent kobject. Actually with this lock, both glur_dir >> kobject's referece counting and joining/leaving kset are run atomically. >> >> The race you found is triggered by CONFIG_DEBUG_KOBJECT_RELEASE, >> which causes kobject_cleanup() to be run in workqueue context and the >> lock of gdp_mutex can't be hold in that situation anymore. That is the >> root >> cause for your race. >> >> That also said DEBUG_KOBJECT_RELEASE might cause trouble in >> some situations. >> >> Thanks, >> Ming >> > I agree that, CONFIG_DEBUG_KOBJECT_RELEASE can cause the race, as it delays > call to kobject_cleanup() which deletes kobj entry in glue_dir list. > > I see the race even after disabling CONFIG_DEBUG_KOBJECT_RELEASE. > > The race is between kobject_put() and kobject_get() for the "firmware" > directory. > > In device_del() path, I see the race after the call of > cleanup_device_parent(). > cleanup_device_parent() calls kobject_put(glue_dir) with gdp_mutex lock > holding, > So ideally, to call kobject_put/kobject_get(glue_dir) we need to acquire > gdp_mutex. > > but kobject_del(kobj) will not hold the gdp_mutex lock while calling > kobject_put(kobj->parent). > > Here is the race: > > Consider 2 async works calls request_firmware() for 2 different binaries > (fw1.bin, fw2.bin) > Async work1 got scheduled first, and it falls in user helper mode so it > creates "firmware" dir > and then creates "fw1.bin" directory to load data from user. > After load completion or timeout, it starts cleanup dir's, meantime async > work2 got scheduled and > it also falls in user helper mode, and story starts here: > > PATH1 req fw for "fw1.bin" PATH2 > req fw for "fw2.bin" > _request_firmware_load() > device_del(dev1) > > ->cleanup_device_parent() > > mutex_lock(&gdp_mutex); > _request_firmware_load("fw2.bin") > kobject_put(kobj->parent); --> firmware ref ->1 > ->device_add(dev2) > mutex_unlock(&gdp_mutex); > ->get_device_parent() > ->kobject_del() > mutex_lock(&gdp_mutex) > kobj_kset_leave(kobj); -->del "fw1.bin" kobj in kset > list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry) > kobject_put(kobj->parent); /* No lock :)*/ > if (k->parent == parent_kobj) { > > ->kref_put() --> firmware ref ->0 > kobj = kobject_get(k); -> k is "firmware" > dir > ->kboject_release() > break; > > ->kobject_del() > } > > /*here deletes firmware entry in kset*/ > mutex_unlock(&gdp_mutex) > > ->dev2->kobj.parent = kobj -> kobj is released already > > > > PATH 1 is started removing the kobject of "fw1.bin" and its parent > "firmware" > . > cleanup_device_parent(dev1) makes firmware ref count to 1, by calling > kobject_put(firmware) with gdp_mutex held. > > and kobject_del(dev1->kobj) will again calls kobject_put(kobj->parent) > without holding gdp_mutex. > > In PATH2, glue_dir is able to find "firmware" kobj, as firmware dir kobj is > deleted from the glue_dir list at kboject_release(). > and it calls kobject_get(firmware), by the time PATH1 calls > kobject_release(firmware). > > So PATH2 got reference of freed firmware kobj and added as parent to > "fw2.bin" kobj. > > I think, here is possible fix for this race: > > ------------------------8<------------------------------------- > Signed-off-by: Lingutla Chandrasekhar > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index 5d8b713..47bc082 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -1224,7 +1224,13 @@ void device_del(struct device *dev) > BUS_NOTIFY_REMOVED_DEVICE, > dev); > kobject_uevent(&dev->kobj, KOBJ_REMOVE); > cleanup_device_parent(dev); > - kobject_del(&dev->kobj); > + > + if (dev->kobj.parent->kset == &dev->class->p->glue_dirs) { > + mutex_lock(&gdp_mutex); > + kobject_del(&dev->kobj); > + mutex_unlock(&gdp_mutex); > + } else > + kobject_del(&dev->kobj); > put_device(parent); > } > EXPORT_SYMBOL_GPL(device_del); > > --------------------------->8----------------------------------- OK, one simpler way is to move cleanup_device_parent(dev) after kobject_del(&dev->kobj), which looks more clean too. > > >>> >>>> >>>> [21147.455038] ------------[ cut here ]------------ >>>> [21147.458910] WARNING: at <...>/kernel/include/linux/kref.h:47 >>>> kobject_get+0x50/0x68() >>>> [21147.481199] Modules linked in: core_ctl(PO) qdrbg_module(O) >>>> qcrypto_module(O) >>>> [21147.481256] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O >>>> 3.10.84-g0957845-00427-g56a05c2 #1 >>>> [21147.481284] Workqueue: events_unbound async_run_entry_fn >>>> [21147.481296] Call trace: >>>> [21147.481315] [] dump_backtrace+0x0/0x270 >>>> [21147.481331] [] show_stack+0x10/0x1c >>>> [21147.481353] [] dump_stack+0x1c/0x28 >>>> [21147.481370] [] warn_slowpath_common+0x74/0x9c >>>> [21147.481383] [] warn_slowpath_null+0x14/0x20 >>>> [21147.481397] [] kobject_get+0x4c/0x68 >>>> [21147.481416] [] get_device_parent+0xa0/0x194 >>>> [21147.481430] [] device_add+0x100/0x600 >>>> [21147.481446] [] _request_firmware+0x8b4/0xa80 >>>> [21147.481459] [] request_firmware+0x30/0x3c >>>> [21147.481473] [] cache_firmware+0x54/0xb0 >>>> [21147.481490] [] __async_dev_cache_fw_image+0x14/0x54 >>>> [21147.481505] [] async_run_entry_fn+0x6c/0x12c >>>> [21147.481521] [] process_one_work+0x264/0x3dc >>>> [21147.481535] [] worker_thread+0x1f0/0x340 >>>> [21147.481553] [] kthread+0xac/0xb8 >>>> [21147.481563] ---[ end trace dabc98ea48b8ba59 ]--- >>>> >>>> [21147.486436] Unable to handle kernel paging request at virtual address >>>> 0x6b6b6b6b6b6b6bcb >>>> [21147.493816] pgd = ffffffc0a982d000 >>>> [21147.498337] [6b6b6b6b6b6b6bcb] *pgd=0000000000000000 >>>> [21147.502287] Internal error: Oops: 96000004 [#1] PREEMPT SMP >>>> [21147.507826] Modules linked in: core_ctl(PO) qdrbg_module(O) >>>> qcrypto_module(O) >>>> [21147.514951] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O >>>> 3.10.84-g0957845-00427-g56a05c2 #1 >>>> [21147.524686] Workqueue: events_unbound async_run_entry_fn >>>> [21147.529965] task: ffffffc008138000 ti: ffffffc007880000 task.ti: >>>> ffffffc007880000 >>>> [21147.537438] PC is at sysfs_create_dir+0x34/0xd4 >>>> [21147.541949] LR is at kobject_add_internal+0x120/0x24c >>>> [21147.546979] pc : [] lr : [] >>>> pstate: >>>> 80000145 >>>> [21147.996295] Process kworker/u16:8 (pid: 23935, stack limit = >>>> 0xffffffc007880058) >>>> [21148.003673] Call trace: >>>> [21148.006117] [] sysfs_create_dir+0x34/0xd4 >>>> [21148.011663] [] kobject_add_internal+0x11c/0x24c >>>> [21148.017737] [] kobject_add+0xc8/0xe4 >>>> [21148.022863] [] device_add+0x11c/0x600 >>>> [21148.028069] [] _request_firmware+0x8b4/0xa80 >>>> [21148.033886] [] request_firmware+0x30/0x3c >>>> [21148.039439] [] cache_firmware+0x54/0xb0 >>>> [21148.044822] [] __async_dev_cache_fw_image+0x14/0x54 >>>> [21148.051249] [] async_run_entry_fn+0x6c/0x12c >>>> [21148.057064] [] process_one_work+0x264/0x3dc >>>> [21148.062792] [] worker_thread+0x1f0/0x340 >>>> [21148.068263] [] kthread+0xac/0xb8 >>>> [21148.073039] Code: b5000094 14000026 d000ac54 9125a294 (7940c281) >>>> [21148.080312] ---[ end trace dabc98ea48b8ba5a ]--- >>>> >>>> Below is the callstack where the parent is freed from the slub_debug >>>> info. >>>> 0xFFFFFFC000608D74 \\vmlinux\base/core\class_dir_release+0x0C >>>> 0xFFFFFFC0002FE6BC \\vmlinux\slub\free_debug_processing\fail+0x114 >>>> 0xFFFFFFC0002FE7A0 \\vmlinux\slub\__slab_free+0x44 >>>> 0xFFFFFFC0002FEE60 \\vmlinux\slub\kfree+0x1F8 >>>> 0xFFFFFFC000608D70 \\vmlinux\base/core\class_dir_release+0x8 >>>> 0xFFFFFFC000456D70 \\vmlinux\kobject\kobject_release+0x134 >>>> 0xFFFFFFC000456E10 \\vmlinux\kobject\kobject_put+0x58 >>>> 0xFFFFFFC000456C28 \\vmlinux\kobject\kobject_del+0x64 >>>> 0xFFFFFFC000609D78 \\vmlinux\base/core\device_del+0x150 >>>> 0xFFFFFFC00061C524 \\vmlinux\firmware_class\_request_firmware\out+0x71C >>>> 0xFFFFFFC00061C58C \\vmlinux\firmware_class\request_firmware+0x30 >>>> 0xFFFFFFC00061C5EC \\vmlinux\firmware_class\cache_firmware+0x54 >>>> 0xFFFFFFC00061C65C >>>> \\vmlinux\firmware_class\__async_dev_cache_fw_image+0x14 >>>> 0xFFFFFFC000243794 \\vmlinux\async\async_run_entry_fn+0x6C >>>> 0xFFFFFFC000237E88 \\vmlinux\workqueue\process_one_work+0x264 >>>> 0xFFFFFFC0002392C0 \\vmlinux\workqueue\worker_thread\recheck+0x1A0 >>>> 0xFFFFFFC00023E500 \\vmlinux\kthread\kthread+0xAC >>>> >>>> For similar type of issue, I see there is an existing fix: "sysfs: >>>> driver >>>> core: Fix glue dir race condition by gdp_mutex". >>>> >>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/base/core.c?id=e4a60d139060975eb956717e4f63ae348d4d8cc5 >>>> >>>> Still I am able to reproduce the issue. I have verified this race on >>>> kernels: 3.10, 3.18 and 4.4. >>>> >>>> I followed below procedure to reproduce the issue: >>>> 1. Enable "CONFIG_DEBUG_KOBJECT_RELEASE" >>>> 2. Use test_firmware, modified the test for async calls >>>> 3. Replace WARN_ON with BUG_ON in kref_get(). >>>> I ran below script from shell: >>>> count=0 >>>> while [ 1 ] >>>> do >>>> echo 3 > >>>> /sys/devices/virtual/misc/test_firmware/trigger_ufw >>>> if [ $? -ne 0 ]; then >>>> print "Exiting.. " >>>> exit 1 >>>> fi >>>> count=$(($count +1)) >>>> echo "count $count" >>>> done >>>> >>>> With below patch, I could reproduce the issue in the 2nd iteration. >>> >>> >>> I will run your test script to see if there is new findings. >>> >>>> >>>> >>>> -------------------------8<---------------------------------------------- >>>> >>>> To reproduce the race condition, use firmware test and >>>> CONFIG_DEBUG_KOBJECT_RELEASE=y, which delays kobject >>>> releases. So that we can catch the bug easily. >>>> >>>> Enabled below flags in kernel config file: >>>> >>>> +CONFIG_DEBUG_KOBJECT_RELEASE=y >>>> +CONFIG_TEST_FIRMWARE=y >>> >>> >>> Looks I can't reproduce the issue with your approach. >>> >>>> >>>> >>>> diff --git a/include/linux/kref.h b/include/linux/kref.h >>>> index 484604d..727fb24 100644 >>>> --- a/include/linux/kref.h >>>> +++ b/include/linux/kref.h >>>> @@ -44,7 +44,7 @@ static inline void kref_get(struct kref *kref) >>>> * condition when this kref is freeing by some other thread right >>>> now. >>>> * In this case one should use kref_get_unless_zero() >>>> */ >>>> - WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2); >>>> + BUG_ON(atomic_inc_return(&kref->refcount) < 2); >>>> } >>>> >>>> /** >>>> diff --git a/lib/test_firmware.c b/lib/test_firmware.c >>>> old mode 100644 >>>> new mode 100755 >>>> index 86374c1..14c9598 >>>> --- a/lib/test_firmware.c >>>> +++ b/lib/test_firmware.c >>>> @@ -18,6 +18,7 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> >>>> + >>>> +static ASYNC_DOMAIN_EXCLUSIVE(ufw_domain); >>>> + >>>> +static void __async_ufw_work(void *name, >>>> + async_cookie_t cookie) >>>> +{ >>>> + const struct firmware *fw; >>>> + const char *fw_name = name; >>>> + int ret; >>>> + >>>> + pr_err("requesting %s\n", fw_name); >>>> + >>>> + ret = request_firmware(&fw, fw_name, NULL); >>>> + pr_err("loaded: %zu\n", fw ? fw->size : 0); >>>> + if (!ret) >>>> + kfree(fw); >>>> + >>>> +} >>>> + >>>> +static char *name[] = { "fw1.bin", "fw2.bin"}; >>>> + >>>> +static ssize_t trigger_ufw_store(struct device *dev, >>>> + struct device_attribute *attr, >>>> + const char *buf, size_t count) >>>> +{ >>>> + int rc, i, c=0; >>>> + u32 start; >>>> + char *fw_name; >>>> + >>>> + rc = kstrtou32(buf, 0, &start); >>>> + if (rc){ >>>> + pr_err("Invalid option\n"); >>>> + return rc; >>>> + } >>>> + pr_err(" no of iterations %d\n", start); >>>> + >>>> + for (i=0; i< start; i++) >>>> + { >>>> + if (c >= 2) >>>> + c = 0; >>>> + fw_name = name[c++]; >>>> + async_schedule_domain(__async_ufw_work, (void *)fw_name, >>>> &ufw_domain); >>>> + } >>>> + async_synchronize_full_domain(&ufw_domain); >>>> + return count; >>>> +} >>>> >>>> static DEVICE_ATTR_WO(trigger_request); >>>> +static DEVICE_ATTR_WO(trigger_ufw); >>>> >>>> static int __init test_firmware_init(void) >>>> { >>>> @@ -92,6 +140,12 @@ static int __init test_firmware_init(void) >>>> goto dereg; >>>> } >>>> >>>> + rc = device_create_file(test_fw_misc_device.this_device, >>>> + &dev_attr_trigger_ufw); >>>> + if (rc) { >>>> + pr_err("could not create sysfs interface: %d\n", rc); >>>> + goto dereg; >>>> + } >>>> pr_warn("interface ready\n"); >>>> >>>> return 0; >>>> @@ -106,7 +160,9 @@ static void __exit test_firmware_exit(void) >>>> { >>>> release_firmware(test_firmware); >>>> device_remove_file(test_fw_misc_device.this_device, >>>> - &dev_attr_trigger_request); >>>> + &dev_attr_trigger_request); >>>> + device_remove_file(test_fw_misc_device.this_device, >>>> + &dev_attr_trigger_ufw); >>>> misc_deregister(&test_fw_misc_device); >>>> pr_warn("removed interface\n"); >>>> } >>>> >>>> >>>> ----------------------------8<-------------------------------------------- >>>> >>>> Thanks and regards, >>>> Chandrasekhar L. >>>> -- >>>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a >>>> member of >>>> Code Aurora Forum, >>>> a Linux Foundation Collaborative Project.