Here is a test case
while true; do losetup /dev/loop0 ./loop.file; losetup -d /dev/loop0; done &
while true; do rmmod loop; done &
that produces following oops:
[ 181.486775] loop: module is already loaded
[ 181.490901] BUG: unable to handle kernel paging request at ffffffffa18a600c
[ 181.497891] IP: [<ffffffff81310351>] kobject_put+0x11/0x60
[ 181.503391] PGD 1a0c067 PUD 1a0d063 PMD 2693af067 PTE 0
[ 181.508673] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 181.513245] Modules linked in: loop(+) sata_mv acpi_cpufreq mperf freq_table processor msr cpuid genrtc bnx2x libcrc32c mdio ipv6 [last unloaded: loop]
[ 181.527016] CPU 19
[ 181.528945] Pid: 19213, comm: modprobe Tainted: G W 3.9.0-dbg-DEV #8
[ 181.538432] RIP: 0010:[<ffffffff81310351>] [<ffffffff81310351>] kobject_put+0x11/0x60
[ 181.546359] RSP: 0018:ffff88065c2e1de8 EFLAGS: 00010282
[ 181.551667] RAX: 000000000000001e RBX: ffffffffa18a5fd0 RCX: ffff88087bc4ecc8
[ 181.558811] RDX: 0000000000000000 RSI: ffff88087bc4d078 RDI: ffffffffa18a5fd0
[ 181.565935] RBP: ffff88065c2e1df8 R08: 0000000000000002 R09: 0000000000000000
[ 181.573064] R10: 0000000000000766 R11: 0000000000000000 R12: ffffffffa18aef80
[ 181.580193] R13: ffff88065c2e1ee8 R14: ffffffffffffffea R15: ffffffffa18a5fd0
[ 181.587327] FS: 00007f871df56700(0000) GS:ffff88087bc40000(0063) knlGS:00000000f76816b0
[ 181.595407] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 181.601156] CR2: ffffffffa18a600c CR3: 0000000867923000 CR4: 00000000000007e0
[ 181.608288] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 181.615416] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 181.622545] Process modprobe (pid: 19213, threadinfo ffff88065c2e0000, task ffff88065c20e180)
[ 181.631065] Stack:
[ 181.633075] 0000000000000002 ffffffffa18aef98 ffff88065c2e1ed8 ffffffff810e691c
[ 181.640517] ffffffff8132ba60 ffffffff8131b42e 0000000000000000 57ff842b5f752e80
[ 181.647958] a80065d4ba000000 7400000000000000 ffffffffa18aef98 ffffffffa18af1e0
[ 181.655411] Call Trace:
[ 181.657859] [<ffffffff810e691c>] load_module+0xf4c/0x15b0
[ 181.663338] [<ffffffff8132ba60>] ? ddebug_proc_write+0x110/0x110
[ 181.669428] [<ffffffff8131b42e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 181.675864] [<ffffffff810e7077>] sys_init_module+0xf7/0x140
[ 181.681520] [<ffffffff815e6515>] cstar_dispatch+0x7/0x1f
[ 181.686912] Code: c7 c7 a0 8d ac 81 31 c0 e8 ed
This is a race codition that exists between kset_find_obj() and kobject_put().
kset_find_obj() might return kobject that has refcount equal
to 0 if this kobject is freing by kobject_put() in other thread.
Here is timeline for the crash in case if kset_find_obj() searches for
an object tht nobody holds and other thread is doing kobject_put()
on the same kobject:
THREAD A (calls kset_find_obj()) THREAD B (calls kobject_put())
splin_lock()
atomic_dec_return(kobj->kref), counter gets zero here
... starts kobject cleanup ....
spin_lock() // WAIT thread A in kobj_kset_leave()
iterate over kset->list
atomic_inc(kobj->kref) (counter becomes 1)
spin_unlock()
spin_lock() // taken
// it does not know that thread A increased counter so it
remove obj from list
spin_unlock()
vfree(module) // frees module object with containing kobj
// kobj points to freed memory area!!
koubject_put(kobj) // OOPS!!!!
Crash above happens because module.c tries to use kset_find_obj() when somebody
unloads module. The module.c code was introduced in 6494a93d55fad5862. In fact
we do not need kobj in module.c, all we need is to check whether name exists in
kset. For this introduce function that is thread-safe with kobject_put() path.
Add documentation that kset_find_obj() is not thread-safe with kobject_put() if
no external refcounts are held.
Other kset_find_obj() callers should be checked and made sure they do not race
with kobject_put().
Signed-off-by: Anatol Pomozov <[email protected]>
---
include/linux/kobject.h | 1 +
include/linux/kref.h | 8 +++++---
kernel/module.c | 5 +----
lib/kobject.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 54 insertions(+), 8 deletions(-)
diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index 939b112..ac6afe6 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -191,6 +191,7 @@ static inline struct kobj_type *get_ktype(struct kobject *kobj)
}
extern struct kobject *kset_find_obj(struct kset *, const char *);
+extern bool kset_check_name_exists(struct kset *kset, const char *name);
/* The global /sys/kernel/ kobject for people to chain off of */
extern struct kobject *kernel_kobj;
diff --git a/include/linux/kref.h b/include/linux/kref.h
index 4972e6e..dbb6e01 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -39,8 +39,10 @@ static inline void kref_init(struct kref *kref)
*/
static inline void kref_get(struct kref *kref)
{
- WARN_ON(!atomic_read(&kref->refcount));
- atomic_inc(&kref->refcount);
+ /* If refcount was 0 before incrementing then we have a race
+ * condition when this kref is freing by some other thread right now.
+ */
+ WARN_ON(atomic_inc_return(&kref->refcount) < 2);
}
/**
@@ -100,7 +102,7 @@ static inline int kref_put_mutex(struct kref *kref,
struct mutex *lock)
{
WARN_ON(release == NULL);
- if (unlikely(!atomic_add_unless(&kref->refcount, -1, 1))) {
+ if (unlikely(!atomic_add_unless(&kref->refcount, -1, 1))) {
mutex_lock(lock);
if (unlikely(!atomic_dec_and_test(&kref->refcount))) {
mutex_unlock(lock);
diff --git a/kernel/module.c b/kernel/module.c
index 0925c9a..84bcd6f 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1606,7 +1606,6 @@ static void module_remove_modinfo_attrs(struct module *mod)
static int mod_sysfs_init(struct module *mod)
{
int err;
- struct kobject *kobj;
if (!module_sysfs_initialized) {
printk(KERN_ERR "%s: module sysfs not initialized\n",
@@ -1615,10 +1614,8 @@ static int mod_sysfs_init(struct module *mod)
goto out;
}
- kobj = kset_find_obj(module_kset, mod->name);
- if (kobj) {
+ if (kset_check_name_exists(module_kset, mod->name)) {
printk(KERN_ERR "%s: module is already loaded\n", mod->name);
- kobject_put(kobj);
err = -EINVAL;
goto out;
}
diff --git a/lib/kobject.c b/lib/kobject.c
index e07ee1f..ae47cd1 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -734,6 +734,14 @@ void kset_unregister(struct kset *k)
}
/**
+ * kobj_compare_name - returns true if given kobj name equals to name parameter
+ */
+static bool kobj_compare_name(struct kobject *kobj, const char *name)
+{
+ return kobject_name(kobj) && !strcmp(kobject_name(kobj), name);
+}
+
+/**
* kset_find_obj - search for object in kset.
* @kset: kset we're looking in.
* @name: object's name.
@@ -741,6 +749,13 @@ void kset_unregister(struct kset *k)
* Lock kset via @kset->subsys, and iterate over @kset->list,
* looking for a matching kobject. If matching object is found
* take a reference and return the object.
+ *
+ * This function is racy with kobject_put function. kset_find_obj() can
+ * be called only if found object already has refcount equal or greater than 1.
+ * If found kobject is not held and has refcounter 0 it means it is in
+ * kobject_put path and returned kobject will be invalid soon.
+ * Use this function only when you sure returned kobject refcounter is held
+ * by someone during whole function execution.
*/
struct kobject *kset_find_obj(struct kset *kset, const char *name)
{
@@ -750,8 +765,39 @@ struct kobject *kset_find_obj(struct kset *kset, const char *name)
spin_lock(&kset->list_lock);
list_for_each_entry(k, &kset->list, entry) {
- if (kobject_name(k) && !strcmp(kobject_name(k), name)) {
+ if (kobj_compare_name(k, name)) {
ret = kobject_get(k);
+ WARN_ON(atomic_read(&k->kref.refcount) < 2);
+ break;
+ }
+ }
+
+ spin_unlock(&kset->list_lock);
+ return ret;
+}
+
+/**
+ * kset_check_name_exists - returns true iff the object name exists in kset.
+ * @kset: kset we're looking in.
+ * @name: object's name.
+ *
+ * Unlike kset_find_obj() function kset_check_name_exists() can be called even
+ * if searching kobject is not held externally. It is thread-safe even if
+ * somebody calls kobject_put() for the kobject.
+ *
+ * Note that the result is valid only at the very moment of running this
+ * function. kset might be modified right after returning from the function.
+ */
+bool kset_check_name_exists(struct kset *kset, const char *name)
+{
+ struct kobject *k;
+ bool ret = false;
+
+ spin_lock(&kset->list_lock);
+
+ list_for_each_entry(k, &kset->list, entry) {
+ if (kobj_compare_name(k, name)) {
+ ret = true;
break;
}
}
--
1.8.1.3
On Fri, Apr 12, 2013 at 3:32 PM, Anatol Pomozov
<[email protected]> wrote:
>
> Here is timeline for the crash in case if kset_find_obj() searches for
> an object tht nobody holds and other thread is doing kobject_put()
> on the same kobject:
>
> THREAD A (calls kset_find_obj()) THREAD B (calls kobject_put())
> splin_lock()
> atomic_dec_return(kobj->kref), counter gets zero here
> ... starts kobject cleanup ....
> spin_lock() // WAIT thread A in kobj_kset_leave()
> iterate over kset->list
> atomic_inc(kobj->kref) (counter becomes 1)
> spin_unlock()
> spin_lock() // taken
> // it does not know that thread A increased counter so it
> remove obj from list
> spin_unlock()
> vfree(module) // frees module object with containing kobj
>
> // kobj points to freed memory area!!
> koubject_put(kobj) // OOPS!!!!
This is a much more generic bug in kobjects, and I would hate to add
some random workaround for just one case of this bug like you do. The
more fundamental bug needs to be fixed too.
I think the more fundamental bugfix is to just fix kobject_get() to
return NULL if the refcount was zero, because in that case the kobject
no longer really exists.
So instead of having
kref_get(&kobj->kref);
it should do
if (!atomic_inc_not_zero(&kobj->kref.refcount))
kobj = NULL;
and I think that should fix your race automatically, no? Proper patch
attached (but TOTALLY UNTESTED - it seems to compile, though).
The problem is that we lose the warning for when the refcount is zero
and somebody does a kobject_get(), but that is ok *assuming* that
people actually check the return value of kobject_get() rather than
just "know" that if they passed in a non-NULL kobj, they'll get it
right back.
Greg - please take a look... I'm adding Al to the discussion too,
because Al just *loooves* these kinds of races ;)
Linus
On Fri, Apr 12, 2013 at 04:47:50PM -0700, Linus Torvalds wrote:
> On Fri, Apr 12, 2013 at 3:32 PM, Anatol Pomozov
> <[email protected]> wrote:
> >
> > Here is timeline for the crash in case if kset_find_obj() searches for
> > an object tht nobody holds and other thread is doing kobject_put()
> > on the same kobject:
> >
> > THREAD A (calls kset_find_obj()) THREAD B (calls kobject_put())
> > splin_lock()
> > atomic_dec_return(kobj->kref), counter gets zero here
> > ... starts kobject cleanup ....
> > spin_lock() // WAIT thread A in kobj_kset_leave()
> > iterate over kset->list
> > atomic_inc(kobj->kref) (counter becomes 1)
> > spin_unlock()
> > spin_lock() // taken
> > // it does not know that thread A increased counter so it
> > remove obj from list
> > spin_unlock()
> > vfree(module) // frees module object with containing kobj
> >
> > // kobj points to freed memory area!!
> > koubject_put(kobj) // OOPS!!!!
>
> This is a much more generic bug in kobjects, and I would hate to add
> some random workaround for just one case of this bug like you do. The
> more fundamental bug needs to be fixed too.
>
> I think the more fundamental bugfix is to just fix kobject_get() to
> return NULL if the refcount was zero, because in that case the kobject
> no longer really exists.
>
> So instead of having
>
> kref_get(&kobj->kref);
>
> it should do
>
> if (!atomic_inc_not_zero(&kobj->kref.refcount))
> kobj = NULL;
>
> and I think that should fix your race automatically, no? Proper patch
> attached (but TOTALLY UNTESTED - it seems to compile, though).
>
> The problem is that we lose the warning for when the refcount is zero
> and somebody does a kobject_get(), but that is ok *assuming* that
> people actually check the return value of kobject_get() rather than
> just "know" that if they passed in a non-NULL kobj, they'll get it
> right back.
>
> Greg - please take a look... I'm adding Al to the discussion too,
> because Al just *loooves* these kinds of races ;)
We "should" have some type of "higher-up" lock to prevent the
release/get races from happening, we have that in the driver core, and I
thought we had such a lock already in the module subsystem as well,
which will prevent any of this from being needed.
Rusty, don't we have a lock for this somewhere?
Linus, I think your patch will reduce the window the race could happen,
but it should still be there, although testing with it would be
interesting to see if the original problem can be triggered with it.
I'll look at it some more tomorrow, about to go to dinner now...
thanks,
greg k-h
Hi
On Fri, Apr 12, 2013 at 4:53 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Fri, Apr 12, 2013 at 04:47:50PM -0700, Linus Torvalds wrote:
>> On Fri, Apr 12, 2013 at 3:32 PM, Anatol Pomozov
>> <[email protected]> wrote:
>> >
>> > Here is timeline for the crash in case if kset_find_obj() searches for
>> > an object tht nobody holds and other thread is doing kobject_put()
>> > on the same kobject:
>> >
>> > THREAD A (calls kset_find_obj()) THREAD B (calls kobject_put())
>> > splin_lock()
>> > atomic_dec_return(kobj->kref), counter gets zero here
>> > ... starts kobject cleanup ....
>> > spin_lock() // WAIT thread A in kobj_kset_leave()
>> > iterate over kset->list
>> > atomic_inc(kobj->kref) (counter becomes 1)
>> > spin_unlock()
>> > spin_lock() // taken
>> > // it does not know that thread A increased counter so it
>> > remove obj from list
>> > spin_unlock()
>> > vfree(module) // frees module object with containing kobj
>> >
>> > // kobj points to freed memory area!!
>> > koubject_put(kobj) // OOPS!!!!
>>
>> This is a much more generic bug in kobjects, and I would hate to add
>> some random workaround for just one case of this bug like you do. The
>> more fundamental bug needs to be fixed too.
>>
>> I think the more fundamental bugfix is to just fix kobject_get() to
>> return NULL if the refcount was zero, because in that case the kobject
>> no longer really exists.
>>
>> So instead of having
>>
>> kref_get(&kobj->kref);
>>
>> it should do
>>
>> if (!atomic_inc_not_zero(&kobj->kref.refcount))
>> kobj = NULL;
>>
>> and I think that should fix your race automatically, no? Proper patch
>> attached (but TOTALLY UNTESTED - it seems to compile, though).
>>
>> The problem is that we lose the warning for when the refcount is zero
>> and somebody does a kobject_get(), but that is ok *assuming* that
>> people actually check the return value of kobject_get() rather than
>> just "know" that if they passed in a non-NULL kobj, they'll get it
>> right back.
>>
>> Greg - please take a look... I'm adding Al to the discussion too,
>> because Al just *loooves* these kinds of races ;)
>
> We "should" have some type of "higher-up" lock to prevent the
> release/get races from happening, we have that in the driver core, and I
> thought we had such a lock already in the module subsystem as well,
> which will prevent any of this from being needed.
>
> Rusty, don't we have a lock for this somewhere?
>
> Linus, I think your patch will reduce the window the race could happen,
> but it should still be there, although testing with it would be
> interesting to see if the original problem can be triggered with it.
Linus patch should fix the module race condition. vfree(module) cannot
be called while we keep kobj->kset->lock. vfree() is called in
THREAD_B only after it acquires lock, removes kobj from list. So if
kobj is found by THREAD_A in kset->list and we did not release lock
then memory is not freed.
>
> I'll look at it some more tomorrow, about to go to dinner now...
>
> thanks,
>
> greg k-h
On Fri, Apr 12, 2013 at 4:53 PM, Greg Kroah-Hartman
<[email protected]> wrote:
>
> Linus, I think your patch will reduce the window the race could happen,
> but it should still be there, although testing with it would be
> interesting to see if the original problem can be triggered with it.
Well, with my patch, there's no way you'll ever look up an object with
a zero refcount, so you'll never release it twice. The atomic
operations (atomic_inc_nonzero()) do guarantee that.
The "kset->list_lock" means that the list traversal is safe too.
So one particular race is definitely gone.
Now, what people who call "kset_find_obj()" really expect when not
locked against the last kobject_put(), I don't know. But at least it's
conceptually safe now. They'll either get NULL (either because the
object doesn't exist on the list, or because it does exist but is
about to be removed), or they will get a valid object that has *not*
started to be torn down yet.
Linus
I ran the test case for ~30 minutes and no crash. Before the patch it
took ~10 seconds for me to repro the crash.
The only harmless warning I see is
[ 1553.658421] ------------[ cut here ]------------
[ 1553.663211] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xbb/0xe0()
[ 1553.669571] Hardware name: MCP55
[ 1553.672834] sysfs: cannot create duplicate filename '/module/loop'
[ 1553.679035] Modules linked in: loop(+) sata_mv acpi_cpufreq mperf
freq_table processor msr cpuid genrtc bnx2x libcrc32c mdio ipv6 [last
unloaded: loop]
[ 1553.692983] Pid: 25935, comm: modprobe Tainted: G W
3.9.0-dbg-DEV #1
[ 1553.700221] Call Trace:
[ 1553.702699] [<ffffffff81087aaf>] warn_slowpath_common+0x7f/0xc0
[ 1553.708724] [<ffffffff81087ba6>] warn_slowpath_fmt+0x46/0x50
[ 1553.714495] [<ffffffff813167e0>] ? strlcat+0x60/0x80
[ 1553.719567] [<ffffffff8121981b>] sysfs_add_one+0xbb/0xe0
[ 1553.724988] [<ffffffff812199cf>] create_dir+0x7f/0xd0
[ 1553.730150] [<ffffffff81219d59>] sysfs_create_dir+0x89/0xe0
[ 1553.735831] [<ffffffff813104ad>] kobject_add_internal+0x9d/0x2a0
[ 1553.741945] [<ffffffff81310ca3>] kobject_init_and_add+0x63/0x90
[ 1553.747973] [<ffffffff81311194>] ? kset_find_obj+0x64/0x90
[ 1553.753571] [<ffffffff810e65fa>] load_module+0xc2a/0x15b0
[ 1553.759077] [<ffffffff8132ba70>] ? ddebug_proc_write+0x110/0x110
[ 1553.765191] [<ffffffff8131b43e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1553.771652] [<ffffffff810e7077>] sys_init_module+0xf7/0x140
[ 1553.777335] [<ffffffff815e6515>] cstar_dispatch+0x7/0x1f
[ 1553.782753] ---[ end trace a5e2ab42bb81f1b3 ]---
[ 1553.787405] ------------[ cut here ]------------
[ 1553.792041] WARNING: at lib/kobject.c:196 kobject_add_internal+0x234/0x2a0()
[ 1553.799087] Hardware name: MCP55
[ 1553.802316] kobject_add_internal failed for loop with -EEXIST,
don't try to register things with the same name in the same directory.
[ 1553.814308] Modules linked in: loop(+) sata_mv acpi_cpufreq mperf
freq_table processor msr cpuid genrtc bnx2x libcrc32c mdio ipv6 [last
unloaded: loop]
[ 1553.828122] Pid: 25935, comm: modprobe Tainted: G W
3.9.0-dbg-DEV #1
[ 1553.835342] Call Trace:
[ 1553.837799] [<ffffffff81087aaf>] warn_slowpath_common+0x7f/0xc0
[ 1553.843801] [<ffffffff81087ba6>] warn_slowpath_fmt+0x46/0x50
[ 1553.849548] [<ffffffff81310644>] kobject_add_internal+0x234/0x2a0
[ 1553.855731] [<ffffffff81310ca3>] kobject_init_and_add+0x63/0x90
[ 1553.861750] [<ffffffff81311194>] ? kset_find_obj+0x64/0x90
[ 1553.867329] [<ffffffff810e65fa>] load_module+0xc2a/0x15b0
[ 1553.872821] [<ffffffff8132ba70>] ? ddebug_proc_write+0x110/0x110
[ 1553.878938] [<ffffffff8131b43e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1553.885382] [<ffffffff810e7077>] sys_init_module+0xf7/0x140
[ 1553.891037] [<ffffffff815e6515>] cstar_dispatch+0x7/0x1f
That happens because kobject in kobject_put() path and its refcounter
== 0 but sysfs is not cleaned yet. kset_find_obj() returned NULL (as
per Linus patch). kobject_init_and_add() tries to add sysfs file with
existing name and fails.
On Fri, Apr 12, 2013 at 5:11 PM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Apr 12, 2013 at 4:53 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
>>
>> Linus, I think your patch will reduce the window the race could happen,
>> but it should still be there, although testing with it would be
>> interesting to see if the original problem can be triggered with it.
>
> Well, with my patch, there's no way you'll ever look up an object with
> a zero refcount, so you'll never release it twice. The atomic
> operations (atomic_inc_nonzero()) do guarantee that.
>
> The "kset->list_lock" means that the list traversal is safe too.
>
> So one particular race is definitely gone.
>
> Now, what people who call "kset_find_obj()" really expect when not
> locked against the last kobject_put(), I don't know. But at least it's
> conceptually safe now. They'll either get NULL (either because the
> object doesn't exist on the list, or because it does exist but is
> about to be removed), or they will get a valid object that has *not*
> started to be torn down yet.
>
> Linus
Hi
On Fri, Apr 12, 2013 at 4:47 PM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Apr 12, 2013 at 3:32 PM, Anatol Pomozov
> <[email protected]> wrote:
>>
>> Here is timeline for the crash in case if kset_find_obj() searches for
>> an object tht nobody holds and other thread is doing kobject_put()
>> on the same kobject:
>>
>> THREAD A (calls kset_find_obj()) THREAD B (calls kobject_put())
>> splin_lock()
>> atomic_dec_return(kobj->kref), counter gets zero here
>> ... starts kobject cleanup ....
>> spin_lock() // WAIT thread A in kobj_kset_leave()
>> iterate over kset->list
>> atomic_inc(kobj->kref) (counter becomes 1)
>> spin_unlock()
>> spin_lock() // taken
>> // it does not know that thread A increased counter so it
>> remove obj from list
>> spin_unlock()
>> vfree(module) // frees module object with containing kobj
>>
>> // kobj points to freed memory area!!
>> koubject_put(kobj) // OOPS!!!!
>
> This is a much more generic bug in kobjects, and I would hate to add
> some random workaround for just one case of this bug like you do. The
> more fundamental bug needs to be fixed too.
>
> I think the more fundamental bugfix is to just fix kobject_get() to
> return NULL if the refcount was zero, because in that case the kobject
> no longer really exists.
>
> So instead of having
>
> kref_get(&kobj->kref);
>
> it should do
>
> if (!atomic_inc_not_zero(&kobj->kref.refcount))
> kobj = NULL;
Does it make sense to move it to a separate function in kref.h?
/** Useful when kref_get is racing with kref_put and refcounter might be 0 */
int kref_get_not_zero(kref* ref) {
return atomic_inc_not_zero(&kref->refcount);
}
or maybe instead change default behavior of kref_get() to
atomic_inc_not_zero and force callers check the return value from
kref_get()?
>
> and I think that should fix your race automatically, no? Proper patch
> attached (but TOTALLY UNTESTED - it seems to compile, though).
>
> The problem is that we lose the warning for when the refcount is zero
> and somebody does a kobject_get(), but that is ok *assuming* that
> people actually check the return value of kobject_get() rather than
> just "know" that if they passed in a non-NULL kobj, they'll get it
> right back.
>
> Greg - please take a look... I'm adding Al to the discussion too,
> because Al just *loooves* these kinds of races ;)
>
> Linus
On Sat, Apr 13, 2013 at 8:41 AM, Anatol Pomozov
<[email protected]> wrote:
>
> Does it make sense to move it to a separate function in kref.h?
>
> /** Useful when kref_get is racing with kref_put and refcounter might be 0 */
> int kref_get_not_zero(kref* ref) {
> return atomic_inc_not_zero(&kref->refcount);
> }
It turns out we have that, except it's called "unless_zero", because
it uses "atomic_add_unless(x,1,0)", rather than the simplified
"atomic_inc_not_zero(x)".
> or maybe instead change default behavior of kref_get() to
> atomic_inc_not_zero and force callers check the return value from
> kref_get()?
That would be painful, and _most_ users should have a preexisting
refcount. So it's probably better in the long run to just keep the
warning (but perhaps fix it to be SMP-safe). So I think the part of
your patch that made kref_get() use atomic_inc_return() is probably a
good idea regardless.
Also, I changed my patch to be minimal, and not change other users of
kobject_get(). So other users (not kset_find_obj()) will continue to
get the warning, and kset_find_obj() uses the safe version. So this is
what I'm planning on committing as the minimal patch and marking for
stable. The rest (including that atomic_inc_return() in kref_get)
would be cleanup.
Can you give this a quick test?
Linus
Hi
On Sat, Apr 13, 2013 at 10:53 AM, Linus Torvalds
<[email protected]> wrote:
> On Sat, Apr 13, 2013 at 8:41 AM, Anatol Pomozov
> <[email protected]> wrote:
>>
>> Does it make sense to move it to a separate function in kref.h?
>>
>> /** Useful when kref_get is racing with kref_put and refcounter might be 0 */
>> int kref_get_not_zero(kref* ref) {
>> return atomic_inc_not_zero(&kref->refcount);
>> }
>
> It turns out we have that, except it's called "unless_zero", because
> it uses "atomic_add_unless(x,1,0)", rather than the simplified
> "atomic_inc_not_zero(x)".
>
>> or maybe instead change default behavior of kref_get() to
>> atomic_inc_not_zero and force callers check the return value from
>> kref_get()?
>
> That would be painful, and _most_ users should have a preexisting
> refcount. So it's probably better in the long run to just keep the
> warning (but perhaps fix it to be SMP-safe). So I think the part of
> your patch that made kref_get() use atomic_inc_return() is probably a
> good idea regardless.
>
> Also, I changed my patch to be minimal, and not change other users of
> kobject_get(). So other users (not kset_find_obj()) will continue to
> get the warning, and kset_find_obj() uses the safe version.
Looks good to me.
> So this is
> what I'm planning on committing as the minimal patch and marking for
> stable. The rest (including that atomic_inc_return() in kref_get)
> would be cleanup.
>
> Can you give this a quick test?
I ran the test case for ~60 minutes with XFS tests in parallel - no any issues.
On Fri, Apr 12, 2013 at 04:47:50PM -0700, Linus Torvalds wrote:
> This is a much more generic bug in kobjects, and I would hate to add
> some random workaround for just one case of this bug like you do. The
> more fundamental bug needs to be fixed too.
>
> I think the more fundamental bugfix is to just fix kobject_get() to
> return NULL if the refcount was zero, because in that case the kobject
> no longer really exists.
>
> So instead of having
>
> kref_get(&kobj->kref);
>
> it should do
>
> if (!atomic_inc_not_zero(&kobj->kref.refcount))
> kobj = NULL;
>
> and I think that should fix your race automatically, no? Proper patch
> attached (but TOTALLY UNTESTED - it seems to compile, though).
>
> The problem is that we lose the warning for when the refcount is zero
> and somebody does a kobject_get(), but that is ok *assuming* that
> people actually check the return value of kobject_get() rather than
> just "know" that if they passed in a non-NULL kobj, they'll get it
> right back.
>
> Greg - please take a look... I'm adding Al to the discussion too,
> because Al just *loooves* these kinds of races ;)
Unless I'm misreading what's going on, we have the following to thank for that:
/* remove from sysfs if the caller did not do it */
if (kobj->state_in_sysfs) {
pr_debug("kobject: '%s' (%p): auto cleanup kobject_del\n",
kobject_name(kobj), kobj);
kobject_del(kobj);
}
in kobject_cleanup(). Why don't we require kobject_del() before the final
kobject_put(), if the sucker had been added? FWIW, I thought it *was*
required all along...
Hi
On Sat, Apr 13, 2013 at 8:35 PM, Al Viro <[email protected]> wrote:
> On Fri, Apr 12, 2013 at 04:47:50PM -0700, Linus Torvalds wrote:
>> This is a much more generic bug in kobjects, and I would hate to add
>> some random workaround for just one case of this bug like you do. The
>> more fundamental bug needs to be fixed too.
>>
>> I think the more fundamental bugfix is to just fix kobject_get() to
>> return NULL if the refcount was zero, because in that case the kobject
>> no longer really exists.
>>
>> So instead of having
>>
>> kref_get(&kobj->kref);
>>
>> it should do
>>
>> if (!atomic_inc_not_zero(&kobj->kref.refcount))
>> kobj = NULL;
>>
>> and I think that should fix your race automatically, no? Proper patch
>> attached (but TOTALLY UNTESTED - it seems to compile, though).
>>
>> The problem is that we lose the warning for when the refcount is zero
>> and somebody does a kobject_get(), but that is ok *assuming* that
>> people actually check the return value of kobject_get() rather than
>> just "know" that if they passed in a non-NULL kobj, they'll get it
>> right back.
>>
>> Greg - please take a look... I'm adding Al to the discussion too,
>> because Al just *loooves* these kinds of races ;)
>
> Unless I'm misreading what's going on, we have the following to thank for that:
> /* remove from sysfs if the caller did not do it */
> if (kobj->state_in_sysfs) {
> pr_debug("kobject: '%s' (%p): auto cleanup kobject_del\n",
> kobject_name(kobj), kobj);
> kobject_del(kobj);
> }
> in kobject_cleanup(). Why don't we require kobject_del() before the final
> kobject_put(), if the sucker had been added? FWIW, I thought it *was*
> required all along...
But kobject_release/kobject_cleanup function is called as a result of
atomic decrement_compare. Until we perform the atomic operation we
don't know whether it is final kobject_put() or not.
kobject_put() {
if (atomic_sub_and_test(kobj->kref->refcount)) {
// refcounter is decremented to 0 so cleanup sysfs
kobject_release(kobj)
}
}
On Sat, Apr 13, 2013 at 09:42:06PM -0700, Anatol Pomozov wrote:
> > in kobject_cleanup(). Why don't we require kobject_del() before the final
> > kobject_put(), if the sucker had been added? FWIW, I thought it *was*
> > required all along...
>
> But kobject_release/kobject_cleanup function is called as a result of
> atomic decrement_compare. Until we perform the atomic operation we
> don't know whether it is final kobject_put() or not.
>
> kobject_put() {
> if (atomic_sub_and_test(kobj->kref->refcount)) {
> // refcounter is decremented to 0 so cleanup sysfs
> kobject_release(kobj)
> }
> }
Yes, of course, but WTF do we play with kobject_del() on that path at all?
Let the caller do it when it decides that object shouldn't be possible to
see anymore. Which is not the same thing as "the last reference is gone"...
Sigh... kobject model sucks, film at 11... ;-/