MIME-Version: 1.0
In-Reply-To: <CACVXFVP4m-qLFFJZsOLSRfQ=JzLnxgqeoT7RcNi=Qv==76OdFQ@mail.gmail.com>
References: <56E2768C.1050000@codeaurora.com>
	<CACVXFVP4m-qLFFJZsOLSRfQ=JzLnxgqeoT7RcNi=Qv==76OdFQ@mail.gmail.com>
Date: Wed, 16 Mar 2016 10:06:36 +0800
Message-ID: <CACVXFVPyB3fkpgU-bO34V4SC-6+iiQBQDk4VOJrK7u_S_Eqa6g@mail.gmail.com>
Subject: Re: Possible race at user mode helper in request_firmware
From: Ming Lei <ming.lei@canonical.com>
To: clingutla@codeaurora.org
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        linux-arm-msm@vger.kernel.org, Russell King <linux@arm.linux.org.uk>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 13315
Lines: 333

On Tue, Mar 15, 2016 at 11:09 PM, Ming Lei <ming.lei@canonical.com> wrote:
> Hi,
>
> On Mon, Mar 14, 2016 at 8:00 PM,  <clingutla@codeaurora.org> wrote:
>> Hello,
>>
>> I see possible race in _request_firmware_load function, on which I wanted to
>> take your opinion.
>>
>> When system is going to low power mode, device_cache_fw_images() is called
>> from pm notifier which schedules asynchronous workers to cache devices
>> firmware.
>> If more than 2 async requests falls under user helper mode, then there is a
>> use after free of "firmware" directory kobject.
>
> If two requests are for getting same firmware image, only one can be sent
> out and be completed with user helper, please see fw_lookup_and_allocate_buf(),
> so the race shouldn't have been triggered in this case.
>
> So I guess your test must be involved with at least two different requests.
>
>>
>> The race will hit, when one asynchronous worker calls device_del() while
>> other asynchronous worker calls get_device_parent() in
>> _request_firmware_load().
>> After loading firmware, first worker calls device_del(), and "firmware"
>> directory ref count is 1, so cleanup_glue_dir() calls kobject_put() with
>> gdp_mutex lock held.
>> Meanwhile, second async worker calls device_add(), and get_device_parent()
>> is still able to find the firmware directory kobject in ksets after getting
>> gdp_mutex.
>> This kobject gets added as parent to the second device kobject.
>> By the time of accessing the parent kobject, its ref count was 0 and prints
>> below warning stack followed by crash.
>
> The trick here is just that we don't use 'struct device' in
> device_cache_fw_images(),
> then the firmware device's parent is null during calling device_add() from
> _request_firmware_load(), but this should be allowed by driver core.
>
> From firmware class view, both device_add()/device_del() is run with pair,
> and driver core still complains during get_device_parent(), so something
> inside driver core should be be wrong, IMO.

Thinking about the issue further, I believe it is a false positive which is
caused by CONFIG_DEBUG_KOBJECT_RELEASE, follows the story:

1) device_add() path

get_device_parent()
               ......
               mutex_lock(&gdp_mutex);

                /* find our class-directory at the parent and reference it */
                spin_lock(&dev->class->p->glue_dirs.list_lock);
                list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
                        if (k->parent == parent_kobj) {
                                kobj = kobject_get(k);
                                break;
                        }
                spin_unlock(&dev->class->p->glue_dirs.list_lock);
                if (kobj) {
                        mutex_unlock(&gdp_mutex);
                        return kobj;
                }
                k = class_dir_create_and_add(dev->class, parent_kobj);
                mutex_unlock(&gdp_mutex);
                ......

2) device_del() path

cleanup_device_parent()
       ->cleanup_glue_dir()
            ->mutex_lock(&gdp_mutex);
            ->kobject_put(glue_dir);
                   ->kobject_cleanup()
                         ->kobject_del()
                                ->kobj_kset_leave()
            ->mutex_unlock(&gdp_mutex);

So gdp_mutex is held wrt. both looking up/adding the parent kobject
and removing the parent kobject. Actually with this lock, both glur_dir
kobject's referece counting and joining/leaving kset are run atomically.

The race you found is triggered by CONFIG_DEBUG_KOBJECT_RELEASE,
which causes kobject_cleanup() to be run in workqueue context and the
lock of gdp_mutex can't be hold in that situation anymore. That is the root
cause for your race.

That also said DEBUG_KOBJECT_RELEASE might cause trouble in
some situations.

Thanks,
Ming

>
>>
>> [21147.455038] ------------[ cut here ]------------
>> [21147.458910] WARNING: at <...>/kernel/include/linux/kref.h:47
>> kobject_get+0x50/0x68()
>> [21147.481199] Modules linked in: core_ctl(PO) qdrbg_module(O)
>> qcrypto_module(O)
>> [21147.481256] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O
>> 3.10.84-g0957845-00427-g56a05c2 #1
>> [21147.481284] Workqueue: events_unbound async_run_entry_fn
>> [21147.481296] Call trace:
>> [21147.481315] [<ffffffc000206b60>] dump_backtrace+0x0/0x270
>> [21147.481331] [<ffffffc000206de0>] show_stack+0x10/0x1c
>> [21147.481353] [<ffffffc000d569ec>] dump_stack+0x1c/0x28
>> [21147.481370] [<ffffffc00021cc18>] warn_slowpath_common+0x74/0x9c
>> [21147.481383] [<ffffffc00021ce4c>] warn_slowpath_null+0x14/0x20
>> [21147.481397] [<ffffffc000456b10>] kobject_get+0x4c/0x68
>> [21147.481416] [<ffffffc000609f64>] get_device_parent+0xa0/0x194
>> [21147.481430] [<ffffffc00060a3f4>] device_add+0x100/0x600
>> [21147.481446] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
>> [21147.481459] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
>> [21147.481473] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
>> [21147.481490] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
>> [21147.481505] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
>> [21147.481521] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
>> [21147.481535] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
>> [21147.481553] [<ffffffc00023e500>] kthread+0xac/0xb8
>> [21147.481563] ---[ end trace dabc98ea48b8ba59 ]---
>>
>> [21147.486436] Unable to handle kernel paging request at virtual address
>> 0x6b6b6b6b6b6b6bcb
>> [21147.493816] pgd = ffffffc0a982d000
>> [21147.498337] [6b6b6b6b6b6b6bcb] *pgd=0000000000000000
>> [21147.502287] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [21147.507826] Modules linked in: core_ctl(PO) qdrbg_module(O)
>> qcrypto_module(O)
>> [21147.514951] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O
>> 3.10.84-g0957845-00427-g56a05c2 #1
>> [21147.524686] Workqueue: events_unbound async_run_entry_fn
>> [21147.529965] task: ffffffc008138000 ti: ffffffc007880000 task.ti:
>> ffffffc007880000
>> [21147.537438] PC is at sysfs_create_dir+0x34/0xd4
>> [21147.541949] LR is at kobject_add_internal+0x120/0x24c
>> [21147.546979] pc : [<ffffffc000361fa8>] lr : [<ffffffc000456f5c>] pstate:
>> 80000145
>> [21147.996295] Process kworker/u16:8 (pid: 23935, stack limit =
>> 0xffffffc007880058)
>> [21148.003673] Call trace:
>> [21148.006117] [<ffffffc000361fa8>] sysfs_create_dir+0x34/0xd4
>> [21148.011663] [<ffffffc000456f58>] kobject_add_internal+0x11c/0x24c
>> [21148.017737] [<ffffffc0004573ec>] kobject_add+0xc8/0xe4
>> [21148.022863] [<ffffffc00060a410>] device_add+0x11c/0x600
>> [21148.028069] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
>> [21148.033886] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
>> [21148.039439] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
>> [21148.044822] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
>> [21148.051249] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
>> [21148.057064] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
>> [21148.062792] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
>> [21148.068263] [<ffffffc00023e500>] kthread+0xac/0xb8
>> [21148.073039] Code: b5000094 14000026 d000ac54 9125a294 (7940c281)
>> [21148.080312] ---[ end trace dabc98ea48b8ba5a ]---
>>
>> Below is the callstack where the parent is freed from the slub_debug info.
>> 0xFFFFFFC000608D74  \\vmlinux\base/core\class_dir_release+0x0C
>> 0xFFFFFFC0002FE6BC \\vmlinux\slub\free_debug_processing\fail+0x114
>> 0xFFFFFFC0002FE7A0 \\vmlinux\slub\__slab_free+0x44
>> 0xFFFFFFC0002FEE60 \\vmlinux\slub\kfree+0x1F8
>> 0xFFFFFFC000608D70 \\vmlinux\base/core\class_dir_release+0x8
>> 0xFFFFFFC000456D70 \\vmlinux\kobject\kobject_release+0x134
>> 0xFFFFFFC000456E10 \\vmlinux\kobject\kobject_put+0x58
>> 0xFFFFFFC000456C28 \\vmlinux\kobject\kobject_del+0x64
>> 0xFFFFFFC000609D78 \\vmlinux\base/core\device_del+0x150
>> 0xFFFFFFC00061C524 \\vmlinux\firmware_class\_request_firmware\out+0x71C
>> 0xFFFFFFC00061C58C \\vmlinux\firmware_class\request_firmware+0x30
>> 0xFFFFFFC00061C5EC \\vmlinux\firmware_class\cache_firmware+0x54
>> 0xFFFFFFC00061C65C \\vmlinux\firmware_class\__async_dev_cache_fw_image+0x14
>> 0xFFFFFFC000243794 \\vmlinux\async\async_run_entry_fn+0x6C
>> 0xFFFFFFC000237E88 \\vmlinux\workqueue\process_one_work+0x264
>> 0xFFFFFFC0002392C0 \\vmlinux\workqueue\worker_thread\recheck+0x1A0
>> 0xFFFFFFC00023E500 \\vmlinux\kthread\kthread+0xAC
>>
>> For similar type of issue, I see there is an existing fix: "sysfs: driver
>> core: Fix glue dir race condition by gdp_mutex".
>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/base/core.c?id=e4a60d139060975eb956717e4f63ae348d4d8cc5
>>
>> Still I am able to reproduce the issue. I have verified this race on
>> kernels: 3.10, 3.18 and 4.4.
>>
>> I followed below procedure to reproduce the issue:
>>      1. Enable "CONFIG_DEBUG_KOBJECT_RELEASE"
>>      2. Use test_firmware, modified the test for async calls
>>      3. Replace WARN_ON with BUG_ON in kref_get().
>> I ran below script from shell:
>>     count=0
>>     while [ 1 ]
>>     do
>>             echo 3 > /sys/devices/virtual/misc/test_firmware/trigger_ufw
>>             if [ $? -ne 0 ]; then
>>                     print "Exiting.. "
>>                     exit 1
>>             fi
>>             count=$(($count +1))
>>             echo "count $count"
>>      done
>>
>> With below patch, I could reproduce the issue in the 2nd iteration.
>
> I will run your test script to see if there is new findings.
>
>>
>> -------------------------8<----------------------------------------------
>>
>> To reproduce the race condition, use firmware test and
>> CONFIG_DEBUG_KOBJECT_RELEASE=y, which delays kobject
>> releases. So that we can catch the bug easily.
>>
>> Enabled below flags in kernel config file:
>>
>> +CONFIG_DEBUG_KOBJECT_RELEASE=y
>> +CONFIG_TEST_FIRMWARE=y
>
> Looks I can't reproduce the issue with your approach.
>
>>
>>
>> diff --git a/include/linux/kref.h b/include/linux/kref.h
>> index 484604d..727fb24 100644
>> --- a/include/linux/kref.h
>> +++ b/include/linux/kref.h
>> @@ -44,7 +44,7 @@ static inline void kref_get(struct kref *kref)
>>       * condition when this kref is freeing by some other thread right
>> now.
>>       * In this case one should use kref_get_unless_zero()
>>       */
>> -    WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>> +    BUG_ON(atomic_inc_return(&kref->refcount) < 2);
>>  }
>>
>>  /**
>> diff --git a/lib/test_firmware.c b/lib/test_firmware.c
>> old mode 100644
>> new mode 100755
>> index 86374c1..14c9598
>> --- a/lib/test_firmware.c
>> +++ b/lib/test_firmware.c
>> @@ -18,6 +18,7 @@
>>  #include <linux/miscdevice.h>
>>  #include <linux/slab.h>
>>  #include <linux/uaccess.h>
>> +#include <linux/async.h>
>>
>> +
>> +static ASYNC_DOMAIN_EXCLUSIVE(ufw_domain);
>> +
>> +static void __async_ufw_work(void *name,
>> +                                       async_cookie_t cookie)
>> +{
>> +    const struct firmware *fw;
>> +    const char *fw_name = name;
>> +    int ret;
>> +
>> +    pr_err("requesting %s\n", fw_name);
>> +
>> +    ret = request_firmware(&fw, fw_name, NULL);
>> +    pr_err("loaded: %zu\n", fw ? fw->size : 0);
>> +        if (!ret)
>> +                kfree(fw);
>> +
>> +}
>> +
>> +static char *name[] = { "fw1.bin", "fw2.bin"};
>> +
>> +static ssize_t trigger_ufw_store(struct device *dev,
>> +        struct device_attribute *attr,
>> +        const char *buf, size_t count)
>> +{
>> +    int rc, i, c=0;
>> +    u32 start;
>> +    char *fw_name;
>> +
>> +    rc = kstrtou32(buf, 0, &start);
>> +    if (rc){
>> +        pr_err("Invalid option\n");
>> +        return rc;
>> +    }
>> +    pr_err(" no of iterations %d\n", start);
>> +
>> +    for (i=0; i< start; i++)
>> +    {
>> +        if (c >= 2)
>> +            c = 0;
>> +        fw_name = name[c++];
>> +        async_schedule_domain(__async_ufw_work, (void *)fw_name,
>> &ufw_domain);
>> +    }
>> +    async_synchronize_full_domain(&ufw_domain);
>> +    return count;
>> +}
>>
>>  static DEVICE_ATTR_WO(trigger_request);
>> +static DEVICE_ATTR_WO(trigger_ufw);
>>
>>  static int __init test_firmware_init(void)
>>  {
>> @@ -92,6 +140,12 @@ static int __init test_firmware_init(void)
>>          goto dereg;
>>      }
>>
>> +    rc = device_create_file(test_fw_misc_device.this_device,
>> +                             &dev_attr_trigger_ufw);
>> +    if (rc) {
>> +        pr_err("could not create sysfs interface: %d\n", rc);
>> +        goto dereg;
>> +    }
>>      pr_warn("interface ready\n");
>>
>>      return 0;
>> @@ -106,7 +160,9 @@ static void __exit test_firmware_exit(void)
>>  {
>>      release_firmware(test_firmware);
>>      device_remove_file(test_fw_misc_device.this_device,
>> -               &dev_attr_trigger_request);
>> +            &dev_attr_trigger_request);
>> +    device_remove_file(test_fw_misc_device.this_device,
>> +            &dev_attr_trigger_ufw);
>>      misc_deregister(&test_fw_misc_device);
>>      pr_warn("removed interface\n");
>>  }
>>
>> ----------------------------8<--------------------------------------------
>>
>> Thanks and regards,
>> Chandrasekhar L.
>> --
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of
>> Code Aurora Forum,
>>  a Linux Foundation Collaborative Project.