Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755650AbcCNMBU (ORCPT ); Mon, 14 Mar 2016 08:01:20 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:58955 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752234AbcCNMBJ (ORCPT ); Mon, 14 Mar 2016 08:01:09 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 14 Mar 2016 17:31:08 +0530 From: clingutla@codeaurora.org To: ming.lei@canonical.com, linux-kernel@vger.kernel.org Cc: gregkh@linuxfoundation.org, linux-arm-msm@vger.kernel.org Subject: Possible race at user mode helper in request_firmware Message-ID: <69c9d897d4c1703fb5efe6d2d9dd9f61@codeaurora.org> User-Agent: Roundcube Webmail/1.1.4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9778 Lines: 257 Hello, I see possible race in _request_firmware_load function, on which I wanted to take your opinion. When system is going to low power mode, device_cache_fw_images() is called from pm notifier which schedules async workers to cache devices firmware. If more than 2 async requests falls under user helper mode, then there is a use after free of "firmware" directory kobject. The race will hit, when one asynchronous worker calls device_del() while other asynchronous worker calls get_device_parent() in _request_firmware_load(). After loading firmware, first worker calls device_del(), and "firmware" directory ref count is 1, so cleanup_glue_dir() calls kobject_put() with gdp_mutex lock held. Meanwhile, second async worker calls device_add(), and get_device_parent() is still able to find the firmware directory kobject in ksets after getting gdp_mutex. This kobject gets added as parent to the second device kobject. By the time of accessing the parent kobject, its ref count was 0 and prints below warning stack followed by crash. [21147.455038] ------------[ cut here ]------------ [21147.458910] WARNING: at <...>/kernel/include/linux/kref.h:47 kobject_get+0x50/0x68() [21147.481199] Modules linked in: core_ctl(PO) qdrbg_module(O) qcrypto_module(O) [21147.481256] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O 3.10.84-g0957845-00427-g56a05c2 #1 [21147.481284] Workqueue: events_unbound async_run_entry_fn [21147.481296] Call trace: [21147.481315] [] dump_backtrace+0x0/0x270 [21147.481331] [] show_stack+0x10/0x1c [21147.481353] [] dump_stack+0x1c/0x28 [21147.481370] [] warn_slowpath_common+0x74/0x9c [21147.481383] [] warn_slowpath_null+0x14/0x20 [21147.481397] [] kobject_get+0x4c/0x68 [21147.481416] [] get_device_parent+0xa0/0x194 [21147.481430] [] device_add+0x100/0x600 [21147.481446] [] _request_firmware+0x8b4/0xa80 [21147.481459] [] request_firmware+0x30/0x3c [21147.481473] [] cache_firmware+0x54/0xb0 [21147.481490] [] __async_dev_cache_fw_image+0x14/0x54 [21147.481505] [] async_run_entry_fn+0x6c/0x12c [21147.481521] [] process_one_work+0x264/0x3dc [21147.481535] [] worker_thread+0x1f0/0x340 [21147.481553] [] kthread+0xac/0xb8 [21147.481563] ---[ end trace dabc98ea48b8ba59 ]--- [21147.486436] Unable to handle kernel paging request at virtual address 0x6b6b6b6b6b6b6bcb [21147.493816] pgd = ffffffc0a982d000 [21147.498337] [6b6b6b6b6b6b6bcb] *pgd=0000000000000000 [21147.502287] Internal error: Oops: 96000004 [#1] PREEMPT SMP [21147.507826] Modules linked in: core_ctl(PO) qdrbg_module(O) qcrypto_module(O) [21147.514951] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O 3.10.84-g0957845-00427-g56a05c2 #1 [21147.524686] Workqueue: events_unbound async_run_entry_fn [21147.529965] task: ffffffc008138000 ti: ffffffc007880000 task.ti: ffffffc007880000 [21147.537438] PC is at sysfs_create_dir+0x34/0xd4 [21147.541949] LR is at kobject_add_internal+0x120/0x24c [21147.546979] pc : [] lr : [] pstate: 80000145 [21147.996295] Process kworker/u16:8 (pid: 23935, stack limit = 0xffffffc007880058) [21148.003673] Call trace: [21148.006117] [] sysfs_create_dir+0x34/0xd4 [21148.011663] [] kobject_add_internal+0x11c/0x24c [21148.017737] [] kobject_add+0xc8/0xe4 [21148.022863] [] device_add+0x11c/0x600 [21148.028069] [] _request_firmware+0x8b4/0xa80 [21148.033886] [] request_firmware+0x30/0x3c [21148.039439] [] cache_firmware+0x54/0xb0 [21148.044822] [] __async_dev_cache_fw_image+0x14/0x54 [21148.051249] [] async_run_entry_fn+0x6c/0x12c [21148.057064] [] process_one_work+0x264/0x3dc [21148.062792] [] worker_thread+0x1f0/0x340 [21148.068263] [] kthread+0xac/0xb8 [21148.073039] Code: b5000094 14000026 d000ac54 9125a294 (7940c281) [21148.080312] ---[ end trace dabc98ea48b8ba5a ]--- Below is the callstack where the parent is freed from the slub_debug info. 0xFFFFFFC000608D74 \\vmlinux\base/core\class_dir_release+0x0C 0xFFFFFFC0002FE6BC \\vmlinux\slub\free_debug_processing\fail+0x114 0xFFFFFFC0002FE7A0 \\vmlinux\slub\__slab_free+0x44 0xFFFFFFC0002FEE60 \\vmlinux\slub\kfree+0x1F8 0xFFFFFFC000608D70 \\vmlinux\base/core\class_dir_release+0x8 0xFFFFFFC000456D70 \\vmlinux\kobject\kobject_release+0x134 0xFFFFFFC000456E10 \\vmlinux\kobject\kobject_put+0x58 0xFFFFFFC000456C28 \\vmlinux\kobject\kobject_del+0x64 0xFFFFFFC000609D78 \\vmlinux\base/core\device_del+0x150 0xFFFFFFC00061C524 \\vmlinux\firmware_class\_request_firmware\out+0x71C 0xFFFFFFC00061C58C \\vmlinux\firmware_class\request_firmware+0x30 0xFFFFFFC00061C5EC \\vmlinux\firmware_class\cache_firmware+0x54 0xFFFFFFC00061C65C \\vmlinux\firmware_class\__async_dev_cache_fw_image+0x14 0xFFFFFFC000243794 \\vmlinux\async\async_run_entry_fn+0x6C 0xFFFFFFC000237E88 \\vmlinux\workqueue\process_one_work+0x264 0xFFFFFFC0002392C0 \\vmlinux\workqueue\worker_thread\recheck+0x1A0 0xFFFFFFC00023E500 \\vmlinux\kthread\kthread+0xAC For similar type of issue, I see there is an existing fix: "sysfs: driver core: Fix glue dir race condition by gdp_mutex". https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/base/core.c?id=e4a60d139060975eb956717e4f63ae348d4d8cc5 Still I am able to reproduce the issue. I have verified this race on kernels: 3.10, 3.18 and 4.4. I followed below procedure to reproduce the issue: 1. Enable "CONFIG_DEBUG_KOBJECT_RELEASE" 2. Use test_firmware, modified the test for async calls 3. Replace WARN_ON with BUG_ON in kref_get(). I ran below script from shell: count=0 while [ 1 ] do echo 3 > /sys/devices/virtual/misc/test_firmware/trigger_ufw if [ $? -ne 0 ]; then print "Exiting.. " exit 1 fi count=$(($count +1)) echo "count $count" done With below patch, I could reproduce the issue in the 2nd iteration. -------------------------8<---------------------------------------------- To reproduce the race condition, use firmware test and CONFIG_DEBUG_KOBJECT_RELEASE=y, which delays kobject releases. So that we can catch the bug easily. Enabled below flags in kernel config file: +CONFIG_DEBUG_KOBJECT_RELEASE=y +CONFIG_TEST_FIRMWARE=y diff --git a/include/linux/kref.h b/include/linux/kref.h index 484604d..727fb24 100644 --- a/include/linux/kref.h +++ b/include/linux/kref.h @@ -44,7 +44,7 @@ static inline void kref_get(struct kref *kref) * condition when this kref is freeing by some other thread right now. * In this case one should use kref_get_unless_zero() */ - WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2); + BUG_ON(atomic_inc_return(&kref->refcount) < 2); } /** diff --git a/lib/test_firmware.c b/lib/test_firmware.c old mode 100644 new mode 100755 index 86374c1..14c9598 --- a/lib/test_firmware.c +++ b/lib/test_firmware.c @@ -18,6 +18,7 @@ #include #include #include +#include + +static ASYNC_DOMAIN_EXCLUSIVE(ufw_domain); + +static void __async_ufw_work(void *name, + async_cookie_t cookie) +{ + const struct firmware *fw; + const char *fw_name = name; + int ret; + + pr_err("requesting %s\n", fw_name); + + ret = request_firmware(&fw, fw_name, NULL); + pr_err("loaded: %zu\n", fw ? fw->size : 0); + if (!ret) + kfree(fw); + +} + +static char *name[] = { "fw1.bin", "fw2.bin"}; + +static ssize_t trigger_ufw_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + int rc, i, c=0; + u32 start; + char *fw_name; + + rc = kstrtou32(buf, 0, &start); + if (rc){ + pr_err("Invalid option\n"); + return rc; + } + pr_err(" no of iterations %d\n", start); + + for (i=0; i< start; i++) + { + if (c >= 2) + c = 0; + fw_name = name[c++]; + async_schedule_domain(__async_ufw_work, (void *)fw_name, &ufw_domain); + } + async_synchronize_full_domain(&ufw_domain); + return count; +} static DEVICE_ATTR_WO(trigger_request); +static DEVICE_ATTR_WO(trigger_ufw); static int __init test_firmware_init(void) { @@ -92,6 +140,12 @@ static int __init test_firmware_init(void) goto dereg; } + rc = device_create_file(test_fw_misc_device.this_device, + &dev_attr_trigger_ufw); + if (rc) { + pr_err("could not create sysfs interface: %d\n", rc); + goto dereg; + } pr_warn("interface ready\n"); return 0; @@ -106,7 +160,9 @@ static void __exit test_firmware_exit(void) { release_firmware(test_firmware); device_remove_file(test_fw_misc_device.this_device, - &dev_attr_trigger_request); + &dev_attr_trigger_request); + device_remove_file(test_fw_misc_device.this_device, + &dev_attr_trigger_ufw); misc_deregister(&test_fw_misc_device); pr_warn("removed interface\n"); } ----------------------------8<-------------------------------------------- Thanks and regards, Chandrasekhar L. -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.