We hit a warning on rather old kernel - RHEL7-based 3.10.0-xxx.
Please don't think it's really near 3.10 - both RedHat and we (Virtuozzo)
backport a lot of things from modern mainstream kernels and do own changes.
------------[ cut here ]------------
WARNING: CPU: 2 PID: 63923 at fs/kernfs/dir.c:377 kernfs_get+0x2f/0x40
CPU: 2 PID: 63923 Comm: kworker/2:7 ve: 0 Kdump: loaded Not tainted
3.10.0-957.10.1.vz7.85.12 #1 85.12
...
Call Trace:
[<ffffffffa7f92e67>] dump_stack+0x19/0x1b
[<ffffffffa78987b8>] __warn+0xd8/0x100
[<ffffffffa78988fd>] warn_slowpath_null+0x1d/0x20
[<ffffffffa7aecaff>] kernfs_get+0x2f/0x40
[<ffffffffa7aed233>] __kernfs_remove+0x113/0x260
[<ffffffffa7aee201>] kernfs_remove+0x21/0x30
[<ffffffffa7af1010>] sysfs_remove_dir+0x50/0x80
[<ffffffffa7b9fb38>] kobject_del+0x18/0x50
[<ffffffffa7a38a4d>] sysfs_slab_remove+0x3d/0x50
[<ffffffffa79f1e6b>] do_kmem_cache_release+0x3b/0x70
[<ffffffffa79f2aa1>] memcg_destroy_kmem_caches+0xb1/0xf0
[<ffffffffa7a4ed5c>] mem_cgroup_css_free+0x4c/0x280
[<ffffffffa79377fc>] cgroup_free_fn+0x4c/0x120
[<ffffffffa78bc222>] process_one_work+0x182/0x440
[<ffffffffa78bd3d6>] worker_thread+0x126/0x3c0
[<ffffffffa78c4441>] kthread+0xd1/0xe0
The warning has been triggered only once and so far i'm unable to reproduce it.
i'm not completely sure why __kernfs_remove() believes "pos" should always
have kn->counter > 0 as it holds kernfs_mutex, but kernfs_notify_workfn()
could definitely do a kernfs_put() out of kernfs_mutex.
So i suppose kernfs_put() should be put under kernfs_mutex() in
kernfs_notify_workfn().
Konstantin Khorenko (1):
kernfs: keep kernfs node alive for __kernfs_remove()
fs/kernfs/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.15.1
__kernfs_remove() which is called under kernfs_mutex,
assumes nobody kills kernfs node whie it's working on it
and "get"s current kernfs node for that.
But we hit a warning in kernfs_get(): kn->counter == 0 already:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 63923 at fs/kernfs/dir.c:377 kernfs_get+0x2f/0x40
...
Call Trace:
[<ffffffffa7f92e67>] dump_stack+0x19/0x1b
[<ffffffffa78987b8>] __warn+0xd8/0x100
[<ffffffffa78988fd>] warn_slowpath_null+0x1d/0x20
[<ffffffffa7aecaff>] kernfs_get+0x2f/0x40
[<ffffffffa7aed233>] __kernfs_remove+0x113/0x260
[<ffffffffa7aee201>] kernfs_remove+0x21/0x30
[<ffffffffa7af1010>] sysfs_remove_dir+0x50/0x80
[<ffffffffa7b9fb38>] kobject_del+0x18/0x50
[<ffffffffa7a38a4d>] sysfs_slab_remove+0x3d/0x50
[<ffffffffa79f1e6b>] do_kmem_cache_release+0x3b/0x70
[<ffffffffa79f2aa1>] memcg_destroy_kmem_caches+0xb1/0xf0
[<ffffffffa7a4ed5c>] mem_cgroup_css_free+0x4c/0x280
[<ffffffffa79377fc>] cgroup_free_fn+0x4c/0x120
[<ffffffffa78bc222>] process_one_work+0x182/0x440
[<ffffffffa78bd3d6>] worker_thread+0x126/0x3c0
[<ffffffffa78c4441>] kthread+0xd1/0xe0
This could be for example because of kernfs_notify_workfn() which
does kernfs_put(kn) out of kernfs_mutex held section,
so move kernfs_put(kn) under the mutex.
Signed-off-by: Konstantin Khorenko <[email protected]>
---
fs/kernfs/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index ae948aaa4c53..ab9c7e2064cc 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -915,8 +915,8 @@ static void kernfs_notify_workfn(struct work_struct *work)
iput(inode);
}
- mutex_unlock(&kernfs_mutex);
kernfs_put(kn);
+ mutex_unlock(&kernfs_mutex);
goto repeat;
}
--
2.15.1
On Tue, Apr 16, 2019 at 06:53:35PM +0300, Konstantin Khorenko wrote:
> __kernfs_remove() which is called under kernfs_mutex,
> assumes nobody kills kernfs node whie it's working on it
> and "get"s current kernfs node for that.
>
> But we hit a warning in kernfs_get(): kn->counter == 0 already:
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 63923 at fs/kernfs/dir.c:377 kernfs_get+0x2f/0x40
> ...
> Call Trace:
> [<ffffffffa7f92e67>] dump_stack+0x19/0x1b
> [<ffffffffa78987b8>] __warn+0xd8/0x100
> [<ffffffffa78988fd>] warn_slowpath_null+0x1d/0x20
> [<ffffffffa7aecaff>] kernfs_get+0x2f/0x40
> [<ffffffffa7aed233>] __kernfs_remove+0x113/0x260
> [<ffffffffa7aee201>] kernfs_remove+0x21/0x30
> [<ffffffffa7af1010>] sysfs_remove_dir+0x50/0x80
> [<ffffffffa7b9fb38>] kobject_del+0x18/0x50
> [<ffffffffa7a38a4d>] sysfs_slab_remove+0x3d/0x50
> [<ffffffffa79f1e6b>] do_kmem_cache_release+0x3b/0x70
> [<ffffffffa79f2aa1>] memcg_destroy_kmem_caches+0xb1/0xf0
> [<ffffffffa7a4ed5c>] mem_cgroup_css_free+0x4c/0x280
> [<ffffffffa79377fc>] cgroup_free_fn+0x4c/0x120
> [<ffffffffa78bc222>] process_one_work+0x182/0x440
> [<ffffffffa78bd3d6>] worker_thread+0x126/0x3c0
> [<ffffffffa78c4441>] kthread+0xd1/0xe0
>
> This could be for example because of kernfs_notify_workfn() which
> does kernfs_put(kn) out of kernfs_mutex held section,
> so move kernfs_put(kn) under the mutex.
This patch doesn't really make sense to me. Can you give a more
concrete scenario where this would help?
Thanks.
--
tejun
On 04/16/2019 10:17 PM, Tejun Heo wrote:
> On Tue, Apr 16, 2019 at 06:53:35PM +0300, Konstantin Khorenko wrote:
>> __kernfs_remove() which is called under kernfs_mutex,
>> assumes nobody kills kernfs node whie it's working on it
>> and "get"s current kernfs node for that.
>>
>> But we hit a warning in kernfs_get(): kn->counter == 0 already:
>> ------------[ cut here ]------------
>> WARNING: CPU: 2 PID: 63923 at fs/kernfs/dir.c:377 kernfs_get+0x2f/0x40
>> ...
>> Call Trace:
>> [<ffffffffa7f92e67>] dump_stack+0x19/0x1b
>> [<ffffffffa78987b8>] __warn+0xd8/0x100
>> [<ffffffffa78988fd>] warn_slowpath_null+0x1d/0x20
>> [<ffffffffa7aecaff>] kernfs_get+0x2f/0x40
>> [<ffffffffa7aed233>] __kernfs_remove+0x113/0x260
>> [<ffffffffa7aee201>] kernfs_remove+0x21/0x30
>> [<ffffffffa7af1010>] sysfs_remove_dir+0x50/0x80
>> [<ffffffffa7b9fb38>] kobject_del+0x18/0x50
>> [<ffffffffa7a38a4d>] sysfs_slab_remove+0x3d/0x50
>> [<ffffffffa79f1e6b>] do_kmem_cache_release+0x3b/0x70
>> [<ffffffffa79f2aa1>] memcg_destroy_kmem_caches+0xb1/0xf0
>> [<ffffffffa7a4ed5c>] mem_cgroup_css_free+0x4c/0x280
>> [<ffffffffa79377fc>] cgroup_free_fn+0x4c/0x120
>> [<ffffffffa78bc222>] process_one_work+0x182/0x440
>> [<ffffffffa78bd3d6>] worker_thread+0x126/0x3c0
>> [<ffffffffa78c4441>] kthread+0xd1/0xe0
>>
>> This could be for example because of kernfs_notify_workfn() which
>> does kernfs_put(kn) out of kernfs_mutex held section,
>> so move kernfs_put(kn) under the mutex.
>
> This patch doesn't really make sense to me. Can you give a more
> concrete scenario where this would help?
i don't know the full scenario unfortunately, but the idea is the following:
__kernfs_remove() is called under kernfs_mutex and if
!(!kn || (kn->parent && RB_EMPTY_NODE(&kn->rb)))
it assumes that nothing can change while we hold the mutex and
for each kernfs descendant should have kn->count > 0.
=====
/* deactivate and unlink the subtree node-by-node */
do {
pos = kernfs_leftmost_descendant(kn);
/*
* kernfs_drain() drops kernfs_mutex temporarily and @pos's
* base ref could have been put by someone else by the time
* the function returns. Make sure it doesn't go away
* underneath us.
*/
kernfs_get(pos);
=====
At the same time kernfs_notify_workfn() can do a kernfs_put() out of kernfs_mutex
which probably can be the last put and dec kn->count to 0 any moment.
Thank you.
--
Best regards,
Konstantin Khorenko,
Virtuozzo Linux Kernel Team
Hello,
On Wed, Apr 17, 2019 at 04:12:29PM +0000, Konstantin Khorenko wrote:
> i don't know the full scenario unfortunately, but the idea is the following:
>
> __kernfs_remove() is called under kernfs_mutex and if
> !(!kn || (kn->parent && RB_EMPTY_NODE(&kn->rb)))
>
> it assumes that nothing can change while we hold the mutex and
> for each kernfs descendant should have kn->count > 0.
>
> =====
> /* deactivate and unlink the subtree node-by-node */
> do {
> pos = kernfs_leftmost_descendant(kn);
>
> /*
> * kernfs_drain() drops kernfs_mutex temporarily and @pos's
> * base ref could have been put by someone else by the time
> * the function returns. Make sure it doesn't go away
> * underneath us.
> */
> kernfs_get(pos);
> =====
>
> At the same time kernfs_notify_workfn() can do a kernfs_put() out of kernfs_mutex
> which probably can be the last put and dec kn->count to 0 any moment.
Yeah, but the caller of __kernfs_remove() should be holding the ref,
so I don't see how it'd reach zero. Also, just putting that one
kernfs_put() inside mutex can't possibly be the right solution given
that the function is allowed to be called from any context. I think
we need to understand what's going on better before making changes.
Thanks.
--
tejun