Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966425AbcCPI6e (ORCPT ); Wed, 16 Mar 2016 04:58:34 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:34698 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933401AbcCPI6Z (ORCPT ); Wed, 16 Mar 2016 04:58:25 -0400 MIME-Version: 1.0 In-Reply-To: <56E91DF7.7050005@codeaurora.org> References: <56E2768C.1050000@codeaurora.com> <56E904EC.8090802@codeaurora.org> <56E91DF7.7050005@codeaurora.org> Date: Wed, 16 Mar 2016 16:58:22 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Possible race at user mode helper in request_firmware From: Ming Lei To: Chandra Sekhar Lingutla Cc: Linux Kernel Mailing List , Greg Kroah-Hartman , linux-arm-msm@vger.kernel.org, Russell King Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 19972 Lines: 528 On Wed, Mar 16, 2016 at 4:48 PM, Chandra Sekhar Lingutla wrote: > On 03/16/2016 02:12 PM, Ming Lei wrote: >> >> On Wed, Mar 16, 2016 at 3:02 PM, Chandra Sekhar Lingutla >> wrote: >>> >>> Hi Ming, >>> >>> >>> On 03/16/2016 07:36 AM, Ming Lei wrote: >>>> >>>> >>>> On Tue, Mar 15, 2016 at 11:09 PM, Ming Lei >>>> wrote: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> On Mon, Mar 14, 2016 at 8:00 PM, wrote: >>>>>> >>>>>> >>>>>> Hello, >>>>>> >>>>>> I see possible race in _request_firmware_load function, on which I >>>>>> wanted to >>>>>> take your opinion. >>>>>> >>>>>> When system is going to low power mode, device_cache_fw_images() is >>>>>> called >>>>>> from pm notifier which schedules asynchronous workers to cache devices >>>>>> firmware. >>>>>> If more than 2 async requests falls under user helper mode, then there >>>>>> is a >>>>>> use after free of "firmware" directory kobject. >>>>> >>>>> >>>>> >>>>> If two requests are for getting same firmware image, only one can be >>>>> sent >>>>> out and be completed with user helper, please see >>>>> fw_lookup_and_allocate_buf(), >>>>> so the race shouldn't have been triggered in this case. >>>>> >>>>> So I guess your test must be involved with at least two different >>>>> requests. >>>>> >>>>>> >>>>>> The race will hit, when one asynchronous worker calls device_del() >>>>>> while >>>>>> other asynchronous worker calls get_device_parent() in >>>>>> _request_firmware_load(). >>>>>> After loading firmware, first worker calls device_del(), and >>>>>> "firmware" >>>>>> directory ref count is 1, so cleanup_glue_dir() calls kobject_put() >>>>>> with >>>>>> gdp_mutex lock held. >>>>>> Meanwhile, second async worker calls device_add(), and >>>>>> get_device_parent() >>>>>> is still able to find the firmware directory kobject in ksets after >>>>>> getting >>>>>> gdp_mutex. >>>>>> This kobject gets added as parent to the second device kobject. >>>>>> By the time of accessing the parent kobject, its ref count was 0 and >>>>>> prints >>>>>> below warning stack followed by crash. >>>>> >>>>> >>>>> >>>>> The trick here is just that we don't use 'struct device' in >>>>> device_cache_fw_images(), >>>>> then the firmware device's parent is null during calling device_add() >>>>> from >>>>> _request_firmware_load(), but this should be allowed by driver core. >>>>> >>>>> From firmware class view, both device_add()/device_del() is run with >>>>> pair, >>>>> and driver core still complains during get_device_parent(), so >>>>> something >>>>> inside driver core should be be wrong, IMO. >>>> >>>> >>>> >>>> Thinking about the issue further, I believe it is a false positive which >>>> is >>>> caused by CONFIG_DEBUG_KOBJECT_RELEASE, follows the story: >>>> >>>> 1) device_add() path >>>> >>>> get_device_parent() >>>> ...... >>>> mutex_lock(&gdp_mutex); >>>> >>>> /* find our class-directory at the parent and >>>> reference >>>> it */ >>>> spin_lock(&dev->class->p->glue_dirs.list_lock); >>>> list_for_each_entry(k, &dev->class->p->glue_dirs.list, >>>> entry) >>>> if (k->parent == parent_kobj) { >>>> kobj = kobject_get(k); >>>> break; >>>> } >>>> spin_unlock(&dev->class->p->glue_dirs.list_lock); >>>> if (kobj) { >>>> mutex_unlock(&gdp_mutex); >>>> return kobj; >>>> } >>>> k = class_dir_create_and_add(dev->class, parent_kobj); >>>> mutex_unlock(&gdp_mutex); >>>> ...... >>>> >>>> 2) device_del() path >>>> >>>> cleanup_device_parent() >>>> ->cleanup_glue_dir() >>>> ->mutex_lock(&gdp_mutex); >>>> ->kobject_put(glue_dir); >>>> ->kobject_cleanup() >>>> ->kobject_del() >>>> ->kobj_kset_leave() >>>> ->mutex_unlock(&gdp_mutex); >>>> >>>> So gdp_mutex is held wrt. both looking up/adding the parent kobject >>>> and removing the parent kobject. Actually with this lock, both glur_dir >>>> kobject's referece counting and joining/leaving kset are run atomically. >>>> >>>> The race you found is triggered by CONFIG_DEBUG_KOBJECT_RELEASE, >>>> which causes kobject_cleanup() to be run in workqueue context and the >>>> lock of gdp_mutex can't be hold in that situation anymore. That is the >>>> root >>>> cause for your race. >>>> >>>> That also said DEBUG_KOBJECT_RELEASE might cause trouble in >>>> some situations. >>>> >>>> Thanks, >>>> Ming >>>> >>> I agree that, CONFIG_DEBUG_KOBJECT_RELEASE can cause the race, as it >>> delays >>> call to kobject_cleanup() which deletes kobj entry in glue_dir list. >>> >>> I see the race even after disabling CONFIG_DEBUG_KOBJECT_RELEASE. >>> >>> The race is between kobject_put() and kobject_get() for the "firmware" >>> directory. >>> >>> In device_del() path, I see the race after the call of >>> cleanup_device_parent(). >>> cleanup_device_parent() calls kobject_put(glue_dir) with gdp_mutex lock >>> holding, >>> So ideally, to call kobject_put/kobject_get(glue_dir) we need to acquire >>> gdp_mutex. >>> >>> but kobject_del(kobj) will not hold the gdp_mutex lock while calling >>> kobject_put(kobj->parent). >>> >>> Here is the race: >>> >>> Consider 2 async works calls request_firmware() for 2 different binaries >>> (fw1.bin, fw2.bin) >>> Async work1 got scheduled first, and it falls in user helper mode so it >>> creates "firmware" dir >>> and then creates "fw1.bin" directory to load data from user. >>> After load completion or timeout, it starts cleanup dir's, meantime async >>> work2 got scheduled and >>> it also falls in user helper mode, and story starts here: >>> >>> PATH1 req fw for "fw1.bin" >>> PATH2 >>> req fw for "fw2.bin" >>> _request_firmware_load() >>> device_del(dev1) >>> >>> ->cleanup_device_parent() >>> >>> mutex_lock(&gdp_mutex); >>> >>> _request_firmware_load("fw2.bin") >>> kobject_put(kobj->parent); --> firmware ref ->1 >>> ->device_add(dev2) >>> mutex_unlock(&gdp_mutex); >>> ->get_device_parent() >>> ->kobject_del() >>> mutex_lock(&gdp_mutex) >>> kobj_kset_leave(kobj); -->del "fw1.bin" kobj in kset >>> list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry) >>> kobject_put(kobj->parent); /* No lock :)*/ >>> if (k->parent == parent_kobj) { >>> >>> ->kref_put() --> firmware ref ->0 >>> kobj = kobject_get(k); -> k is "firmware" >>> dir >>> ->kboject_release() >>> break; >>> >>> ->kobject_del() >>> } >>> >>> /*here deletes firmware entry in kset*/ >>> mutex_unlock(&gdp_mutex) >>> >>> ->dev2->kobj.parent = kobj -> kobj is released already >>> >>> >>> >>> PATH 1 is started removing the kobject of "fw1.bin" and its parent >>> "firmware" >>> . >>> cleanup_device_parent(dev1) makes firmware ref count to 1, by calling >>> kobject_put(firmware) with gdp_mutex held. >>> >>> and kobject_del(dev1->kobj) will again calls kobject_put(kobj->parent) >>> without holding gdp_mutex. >>> >>> In PATH2, glue_dir is able to find "firmware" kobj, as firmware dir kobj >>> is >>> deleted from the glue_dir list at kboject_release(). >>> and it calls kobject_get(firmware), by the time PATH1 calls >>> kobject_release(firmware). >>> >>> So PATH2 got reference of freed firmware kobj and added as parent to >>> "fw2.bin" kobj. >>> >>> I think, here is possible fix for this race: >>> >>> ------------------------8<------------------------------------- >>> Signed-off-by: Lingutla Chandrasekhar >>> >>> diff --git a/drivers/base/core.c b/drivers/base/core.c >>> index 5d8b713..47bc082 100644 >>> --- a/drivers/base/core.c >>> +++ b/drivers/base/core.c >>> @@ -1224,7 +1224,13 @@ void device_del(struct device *dev) >>> BUS_NOTIFY_REMOVED_DEVICE, >>> dev); >>> kobject_uevent(&dev->kobj, KOBJ_REMOVE); >>> cleanup_device_parent(dev); >>> - kobject_del(&dev->kobj); >>> + >>> + if (dev->kobj.parent->kset == &dev->class->p->glue_dirs) { >>> + mutex_lock(&gdp_mutex); >>> + kobject_del(&dev->kobj); >>> + mutex_unlock(&gdp_mutex); >>> + } else >>> + kobject_del(&dev->kobj); >>> put_device(parent); >>> } >>> EXPORT_SYMBOL_GPL(device_del); >>> >>> --------------------------->8----------------------------------- >> >> >> OK, one simpler way is to move cleanup_device_parent(dev) >> after kobject_del(&dev->kobj), which looks more clean too. >> > No, We can't move that, bcz kobject_del(&dev->kobj) will clear the parent. Yeah, but I think it is better to do that and just call cleanup_glue_dir() directly because cleanup_device_parent() should have released the glue_dir. BTW, failure path of device_add() has been broken already. > > >>> >>> >>>>> >>>>>> >>>>>> [21147.455038] ------------[ cut here ]------------ >>>>>> [21147.458910] WARNING: at <...>/kernel/include/linux/kref.h:47 >>>>>> kobject_get+0x50/0x68() >>>>>> [21147.481199] Modules linked in: core_ctl(PO) qdrbg_module(O) >>>>>> qcrypto_module(O) >>>>>> [21147.481256] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O >>>>>> 3.10.84-g0957845-00427-g56a05c2 #1 >>>>>> [21147.481284] Workqueue: events_unbound async_run_entry_fn >>>>>> [21147.481296] Call trace: >>>>>> [21147.481315] [] dump_backtrace+0x0/0x270 >>>>>> [21147.481331] [] show_stack+0x10/0x1c >>>>>> [21147.481353] [] dump_stack+0x1c/0x28 >>>>>> [21147.481370] [] warn_slowpath_common+0x74/0x9c >>>>>> [21147.481383] [] warn_slowpath_null+0x14/0x20 >>>>>> [21147.481397] [] kobject_get+0x4c/0x68 >>>>>> [21147.481416] [] get_device_parent+0xa0/0x194 >>>>>> [21147.481430] [] device_add+0x100/0x600 >>>>>> [21147.481446] [] _request_firmware+0x8b4/0xa80 >>>>>> [21147.481459] [] request_firmware+0x30/0x3c >>>>>> [21147.481473] [] cache_firmware+0x54/0xb0 >>>>>> [21147.481490] [] >>>>>> __async_dev_cache_fw_image+0x14/0x54 >>>>>> [21147.481505] [] async_run_entry_fn+0x6c/0x12c >>>>>> [21147.481521] [] process_one_work+0x264/0x3dc >>>>>> [21147.481535] [] worker_thread+0x1f0/0x340 >>>>>> [21147.481553] [] kthread+0xac/0xb8 >>>>>> [21147.481563] ---[ end trace dabc98ea48b8ba59 ]--- >>>>>> >>>>>> [21147.486436] Unable to handle kernel paging request at virtual >>>>>> address >>>>>> 0x6b6b6b6b6b6b6bcb >>>>>> [21147.493816] pgd = ffffffc0a982d000 >>>>>> [21147.498337] [6b6b6b6b6b6b6bcb] *pgd=0000000000000000 >>>>>> [21147.502287] Internal error: Oops: 96000004 [#1] PREEMPT SMP >>>>>> [21147.507826] Modules linked in: core_ctl(PO) qdrbg_module(O) >>>>>> qcrypto_module(O) >>>>>> [21147.514951] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O >>>>>> 3.10.84-g0957845-00427-g56a05c2 #1 >>>>>> [21147.524686] Workqueue: events_unbound async_run_entry_fn >>>>>> [21147.529965] task: ffffffc008138000 ti: ffffffc007880000 task.ti: >>>>>> ffffffc007880000 >>>>>> [21147.537438] PC is at sysfs_create_dir+0x34/0xd4 >>>>>> [21147.541949] LR is at kobject_add_internal+0x120/0x24c >>>>>> [21147.546979] pc : [] lr : [] >>>>>> pstate: >>>>>> 80000145 >>>>>> [21147.996295] Process kworker/u16:8 (pid: 23935, stack limit = >>>>>> 0xffffffc007880058) >>>>>> [21148.003673] Call trace: >>>>>> [21148.006117] [] sysfs_create_dir+0x34/0xd4 >>>>>> [21148.011663] [] kobject_add_internal+0x11c/0x24c >>>>>> [21148.017737] [] kobject_add+0xc8/0xe4 >>>>>> [21148.022863] [] device_add+0x11c/0x600 >>>>>> [21148.028069] [] _request_firmware+0x8b4/0xa80 >>>>>> [21148.033886] [] request_firmware+0x30/0x3c >>>>>> [21148.039439] [] cache_firmware+0x54/0xb0 >>>>>> [21148.044822] [] >>>>>> __async_dev_cache_fw_image+0x14/0x54 >>>>>> [21148.051249] [] async_run_entry_fn+0x6c/0x12c >>>>>> [21148.057064] [] process_one_work+0x264/0x3dc >>>>>> [21148.062792] [] worker_thread+0x1f0/0x340 >>>>>> [21148.068263] [] kthread+0xac/0xb8 >>>>>> [21148.073039] Code: b5000094 14000026 d000ac54 9125a294 (7940c281) >>>>>> [21148.080312] ---[ end trace dabc98ea48b8ba5a ]--- >>>>>> >>>>>> Below is the callstack where the parent is freed from the slub_debug >>>>>> info. >>>>>> 0xFFFFFFC000608D74 \\vmlinux\base/core\class_dir_release+0x0C >>>>>> 0xFFFFFFC0002FE6BC \\vmlinux\slub\free_debug_processing\fail+0x114 >>>>>> 0xFFFFFFC0002FE7A0 \\vmlinux\slub\__slab_free+0x44 >>>>>> 0xFFFFFFC0002FEE60 \\vmlinux\slub\kfree+0x1F8 >>>>>> 0xFFFFFFC000608D70 \\vmlinux\base/core\class_dir_release+0x8 >>>>>> 0xFFFFFFC000456D70 \\vmlinux\kobject\kobject_release+0x134 >>>>>> 0xFFFFFFC000456E10 \\vmlinux\kobject\kobject_put+0x58 >>>>>> 0xFFFFFFC000456C28 \\vmlinux\kobject\kobject_del+0x64 >>>>>> 0xFFFFFFC000609D78 \\vmlinux\base/core\device_del+0x150 >>>>>> 0xFFFFFFC00061C524 >>>>>> \\vmlinux\firmware_class\_request_firmware\out+0x71C >>>>>> 0xFFFFFFC00061C58C \\vmlinux\firmware_class\request_firmware+0x30 >>>>>> 0xFFFFFFC00061C5EC \\vmlinux\firmware_class\cache_firmware+0x54 >>>>>> 0xFFFFFFC00061C65C >>>>>> \\vmlinux\firmware_class\__async_dev_cache_fw_image+0x14 >>>>>> 0xFFFFFFC000243794 \\vmlinux\async\async_run_entry_fn+0x6C >>>>>> 0xFFFFFFC000237E88 \\vmlinux\workqueue\process_one_work+0x264 >>>>>> 0xFFFFFFC0002392C0 \\vmlinux\workqueue\worker_thread\recheck+0x1A0 >>>>>> 0xFFFFFFC00023E500 \\vmlinux\kthread\kthread+0xAC >>>>>> >>>>>> For similar type of issue, I see there is an existing fix: "sysfs: >>>>>> driver >>>>>> core: Fix glue dir race condition by gdp_mutex". >>>>>> >>>>>> >>>>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/base/core.c?id=e4a60d139060975eb956717e4f63ae348d4d8cc5 >>>>>> >>>>>> Still I am able to reproduce the issue. I have verified this race on >>>>>> kernels: 3.10, 3.18 and 4.4. >>>>>> >>>>>> I followed below procedure to reproduce the issue: >>>>>> 1. Enable "CONFIG_DEBUG_KOBJECT_RELEASE" >>>>>> 2. Use test_firmware, modified the test for async calls >>>>>> 3. Replace WARN_ON with BUG_ON in kref_get(). >>>>>> I ran below script from shell: >>>>>> count=0 >>>>>> while [ 1 ] >>>>>> do >>>>>> echo 3 > >>>>>> /sys/devices/virtual/misc/test_firmware/trigger_ufw >>>>>> if [ $? -ne 0 ]; then >>>>>> print "Exiting.. " >>>>>> exit 1 >>>>>> fi >>>>>> count=$(($count +1)) >>>>>> echo "count $count" >>>>>> done >>>>>> >>>>>> With below patch, I could reproduce the issue in the 2nd iteration. >>>>> >>>>> >>>>> >>>>> I will run your test script to see if there is new findings. >>>>> >>>>>> >>>>>> >>>>>> >>>>>> -------------------------8<---------------------------------------------- >>>>>> >>>>>> To reproduce the race condition, use firmware test and >>>>>> CONFIG_DEBUG_KOBJECT_RELEASE=y, which delays kobject >>>>>> releases. So that we can catch the bug easily. >>>>>> >>>>>> Enabled below flags in kernel config file: >>>>>> >>>>>> +CONFIG_DEBUG_KOBJECT_RELEASE=y >>>>>> +CONFIG_TEST_FIRMWARE=y >>>>> >>>>> >>>>> >>>>> Looks I can't reproduce the issue with your approach. >>>>> >>>>>> >>>>>> >>>>>> diff --git a/include/linux/kref.h b/include/linux/kref.h >>>>>> index 484604d..727fb24 100644 >>>>>> --- a/include/linux/kref.h >>>>>> +++ b/include/linux/kref.h >>>>>> @@ -44,7 +44,7 @@ static inline void kref_get(struct kref *kref) >>>>>> * condition when this kref is freeing by some other thread >>>>>> right >>>>>> now. >>>>>> * In this case one should use kref_get_unless_zero() >>>>>> */ >>>>>> - WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2); >>>>>> + BUG_ON(atomic_inc_return(&kref->refcount) < 2); >>>>>> } >>>>>> >>>>>> /** >>>>>> diff --git a/lib/test_firmware.c b/lib/test_firmware.c >>>>>> old mode 100644 >>>>>> new mode 100755 >>>>>> index 86374c1..14c9598 >>>>>> --- a/lib/test_firmware.c >>>>>> +++ b/lib/test_firmware.c >>>>>> @@ -18,6 +18,7 @@ >>>>>> #include >>>>>> #include >>>>>> #include >>>>>> +#include >>>>>> >>>>>> + >>>>>> +static ASYNC_DOMAIN_EXCLUSIVE(ufw_domain); >>>>>> + >>>>>> +static void __async_ufw_work(void *name, >>>>>> + async_cookie_t cookie) >>>>>> +{ >>>>>> + const struct firmware *fw; >>>>>> + const char *fw_name = name; >>>>>> + int ret; >>>>>> + >>>>>> + pr_err("requesting %s\n", fw_name); >>>>>> + >>>>>> + ret = request_firmware(&fw, fw_name, NULL); >>>>>> + pr_err("loaded: %zu\n", fw ? fw->size : 0); >>>>>> + if (!ret) >>>>>> + kfree(fw); >>>>>> + >>>>>> +} >>>>>> + >>>>>> +static char *name[] = { "fw1.bin", "fw2.bin"}; >>>>>> + >>>>>> +static ssize_t trigger_ufw_store(struct device *dev, >>>>>> + struct device_attribute *attr, >>>>>> + const char *buf, size_t count) >>>>>> +{ >>>>>> + int rc, i, c=0; >>>>>> + u32 start; >>>>>> + char *fw_name; >>>>>> + >>>>>> + rc = kstrtou32(buf, 0, &start); >>>>>> + if (rc){ >>>>>> + pr_err("Invalid option\n"); >>>>>> + return rc; >>>>>> + } >>>>>> + pr_err(" no of iterations %d\n", start); >>>>>> + >>>>>> + for (i=0; i< start; i++) >>>>>> + { >>>>>> + if (c >= 2) >>>>>> + c = 0; >>>>>> + fw_name = name[c++]; >>>>>> + async_schedule_domain(__async_ufw_work, (void *)fw_name, >>>>>> &ufw_domain); >>>>>> + } >>>>>> + async_synchronize_full_domain(&ufw_domain); >>>>>> + return count; >>>>>> +} >>>>>> >>>>>> static DEVICE_ATTR_WO(trigger_request); >>>>>> +static DEVICE_ATTR_WO(trigger_ufw); >>>>>> >>>>>> static int __init test_firmware_init(void) >>>>>> { >>>>>> @@ -92,6 +140,12 @@ static int __init test_firmware_init(void) >>>>>> goto dereg; >>>>>> } >>>>>> >>>>>> + rc = device_create_file(test_fw_misc_device.this_device, >>>>>> + &dev_attr_trigger_ufw); >>>>>> + if (rc) { >>>>>> + pr_err("could not create sysfs interface: %d\n", rc); >>>>>> + goto dereg; >>>>>> + } >>>>>> pr_warn("interface ready\n"); >>>>>> >>>>>> return 0; >>>>>> @@ -106,7 +160,9 @@ static void __exit test_firmware_exit(void) >>>>>> { >>>>>> release_firmware(test_firmware); >>>>>> device_remove_file(test_fw_misc_device.this_device, >>>>>> - &dev_attr_trigger_request); >>>>>> + &dev_attr_trigger_request); >>>>>> + device_remove_file(test_fw_misc_device.this_device, >>>>>> + &dev_attr_trigger_ufw); >>>>>> misc_deregister(&test_fw_misc_device); >>>>>> pr_warn("removed interface\n"); >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> ----------------------------8<-------------------------------------------- >>>>>> >>>>>> Thanks and regards, >>>>>> Chandrasekhar L. >>>>>> -- >>>>>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a >>>>>> member of >>>>>> Code Aurora Forum, >>>>>> a Linux Foundation Collaborative Project.