2021-09-10 02:46:07

by Waiman Long

[permalink] [raw]
Subject: [PATCH 1/2] cgroup: Fix incorrect warning from cgroup_apply_control_disable()

It was found that the following warning was displayed when remounting
controllers from cgroup v2 to v1:

[ 8042.997778] WARNING: CPU: 88 PID: 80682 at kernel/cgroup/cgroup.c:3130 cgroup_apply_control_disable+0x158/0x190
:
[ 8043.091109] RIP: 0010:cgroup_apply_control_disable+0x158/0x190
[ 8043.096946] Code: ff f6 45 54 01 74 39 48 8d 7d 10 48 c7 c6 e0 46 5a a4 e8 7b 67 33 00 e9 41 ff ff ff 49 8b 84 24 e8 01 00 00 0f b7 40 08 eb 95 <0f> 0b e9 5f ff ff ff 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3
[ 8043.115692] RSP: 0018:ffffba8a47c23d28 EFLAGS: 00010202
[ 8043.120916] RAX: 0000000000000036 RBX: ffffffffa624ce40 RCX: 000000000000181a
[ 8043.128047] RDX: ffffffffa63c43e0 RSI: ffffffffa63c43e0 RDI: ffff9d7284ee1000
[ 8043.135180] RBP: ffff9d72874c5800 R08: ffffffffa624b090 R09: 0000000000000004
[ 8043.142314] R10: ffffffffa624b080 R11: 0000000000002000 R12: ffff9d7284ee1000
[ 8043.149447] R13: ffff9d7284ee1000 R14: ffffffffa624ce70 R15: ffffffffa6269e20
[ 8043.156576] FS: 00007f7747cff740(0000) GS:ffff9d7a5fc00000(0000) knlGS:0000000000000000
[ 8043.164663] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8043.170409] CR2: 00007f7747e96680 CR3: 0000000887d60001 CR4: 00000000007706e0
[ 8043.177539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8043.184673] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8043.191804] PKRU: 55555554
[ 8043.194517] Call Trace:
[ 8043.196970] rebind_subsystems+0x18c/0x470
[ 8043.201070] cgroup_setup_root+0x16c/0x2f0
[ 8043.205177] cgroup1_root_to_use+0x204/0x2a0
[ 8043.209456] cgroup1_get_tree+0x3e/0x120
[ 8043.213384] vfs_get_tree+0x22/0xb0
[ 8043.216883] do_new_mount+0x176/0x2d0
[ 8043.220550] __x64_sys_mount+0x103/0x140
[ 8043.224474] do_syscall_64+0x38/0x90
[ 8043.228063] entry_SYSCALL_64_after_hwframe+0x44/0xae

It was caused by the "WARN_ON_ONCE(percpu_ref_is_dying(&css->refcnt))"
check. It was because the css had not been completely removed from
a previous call to css_kill() by cgroup_apply_control_disable() and
so the warning was triggered when cgroup_apply_control_disable() was
called again soon afterward.

Eliminating this incorrect warning by using percpu_ref_is_dying() as
escape for further action instead.

Fixes: 945ba19968888 ("cgroup: combine cgroup_mutex locking and offline css draining")
Signed-off-by: Waiman Long <[email protected]>
---
kernel/cgroup/cgroup.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 881ce1470beb..e31bca9fcd46 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3140,7 +3140,16 @@ static void cgroup_apply_control_disable(struct cgroup *cgrp)
if (!css)
continue;

- WARN_ON_ONCE(percpu_ref_is_dying(&css->refcnt));
+ /*
+ * A kill_css() might have been called previously, but
+ * the css may still linger for a while before being
+ * removed. Skip it in this case.
+ */
+ if (percpu_ref_is_dying(&css->refcnt)) {
+ WARN_ON_ONCE(css->parent &&
+ cgroup_ss_mask(dsct) & (1 << ss->id));
+ continue;
+ }

if (css->parent &&
!(cgroup_ss_mask(dsct) & (1 << ss->id))) {
--
2.18.1


2021-09-14 00:49:07

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix incorrect warning from cgroup_apply_control_disable()

Hello,

On Thu, Sep 09, 2021 at 10:42:55PM -0400, Waiman Long wrote:
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 881ce1470beb..e31bca9fcd46 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -3140,7 +3140,16 @@ static void cgroup_apply_control_disable(struct cgroup *cgrp)
> if (!css)
> continue;
>
> - WARN_ON_ONCE(percpu_ref_is_dying(&css->refcnt));
> + /*
> + * A kill_css() might have been called previously, but
> + * the css may still linger for a while before being
> + * removed. Skip it in this case.
> + */
> + if (percpu_ref_is_dying(&css->refcnt)) {
> + WARN_ON_ONCE(css->parent &&
> + cgroup_ss_mask(dsct) & (1 << ss->id));
> + continue;
> + }

This warning did help me catch some gnarly bugs. Any chance we can keep it
for normal cases and elide it just for remounting?

Thanks.

--
tejun

2021-09-14 00:54:24

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix incorrect warning from cgroup_apply_control_disable()

On 9/13/21 2:45 PM, Tejun Heo wrote:
> On Mon, Sep 13, 2021 at 02:43:44PM -0400, Waiman Long wrote:
>>> The problem with percpu_ref_is_dying() is the fact that it becomes true
>>> after percpu_ref_exit() is called in css_free_rwork_fn() which has an
>>> RCU delay. If you want to catch the fact that kill_css() has been
>>> called, we can check the CSS_DYING flag which is set in kill_css() by
>>> commit 33c35aa481786 ("cgroup: Prevent kill_css() from being called more
>>> than once"). Will that be an acceptable alternative?
>> Something like
>>
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index 881ce1470beb..851e54800ad8 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -3140,6 +3140,9 @@ static void cgroup_apply_control_disable(struct cgroup
>> *cg
>>                         if (!css)
>>                                 continue;
>>
>> +                       if (css->flags & CSS_DYING)
>> +                               continue;
>> +
> So, I don't think this would be correct. It is assumed that there are no
> dying csses when control reaches this point. The right fix is making sure
> that remount path clears up dying csses before calling into this path.

Thanks for the suggestion. I will look into that.

Cheers,
Longman

2021-09-14 00:54:40

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix incorrect warning from cgroup_apply_control_disable()

On 9/13/21 2:35 PM, Waiman Long wrote:
> On 9/13/21 2:05 PM, Tejun Heo wrote:
>> Hello,
>>
>> On Thu, Sep 09, 2021 at 10:42:55PM -0400, Waiman Long wrote:
>>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>>> index 881ce1470beb..e31bca9fcd46 100644
>>> --- a/kernel/cgroup/cgroup.c
>>> +++ b/kernel/cgroup/cgroup.c
>>> @@ -3140,7 +3140,16 @@ static void
>>> cgroup_apply_control_disable(struct cgroup *cgrp)
>>>               if (!css)
>>>                   continue;
>>>   - WARN_ON_ONCE(percpu_ref_is_dying(&css->refcnt));
>>> +            /*
>>> +             * A kill_css() might have been called previously, but
>>> +             * the css may still linger for a while before being
>>> +             * removed. Skip it in this case.
>>> +             */
>>> +            if (percpu_ref_is_dying(&css->refcnt)) {
>>> +                WARN_ON_ONCE(css->parent &&
>>> +                    cgroup_ss_mask(dsct) & (1 << ss->id));
>>> +                continue;
>>> +            }
>> This warning did help me catch some gnarly bugs. Any chance we can
>> keep it
>> for normal cases and elide it just for remounting?
>
> The problem with percpu_ref_is_dying() is the fact that it becomes
> true after percpu_ref_exit() is called in css_free_rwork_fn() which
> has an RCU delay. If you want to catch the fact that kill_css() has
> been called, we can check the CSS_DYING flag which is set in
> kill_css() by commit 33c35aa481786 ("cgroup: Prevent kill_css() from
> being called more than once"). Will that be an acceptable alternative?

Something like

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 881ce1470beb..851e54800ad8 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3140,6 +3140,9 @@ static void cgroup_apply_control_disable(struct
cgroup *cg
                        if (!css)
                                continue;

+                       if (css->flags & CSS_DYING)
+                               continue;
+
WARN_ON_ONCE(percpu_ref_is_dying(&css->refcnt));

                        if (css->parent &&

Cheers,
Longman

2021-09-14 00:55:04

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix incorrect warning from cgroup_apply_control_disable()

On Mon, Sep 13, 2021 at 02:43:44PM -0400, Waiman Long wrote:
> > The problem with percpu_ref_is_dying() is the fact that it becomes true
> > after percpu_ref_exit() is called in css_free_rwork_fn() which has an
> > RCU delay. If you want to catch the fact that kill_css() has been
> > called, we can check the CSS_DYING flag which is set in kill_css() by
> > commit 33c35aa481786 ("cgroup: Prevent kill_css() from being called more
> > than once"). Will that be an acceptable alternative?
>
> Something like
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 881ce1470beb..851e54800ad8 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -3140,6 +3140,9 @@ static void cgroup_apply_control_disable(struct cgroup
> *cg
> ??????????????????????? if (!css)
> ??????????????????????????????? continue;
>
> +?????????????????????? if (css->flags & CSS_DYING)
> +?????????????????????????????? continue;
> +

So, I don't think this would be correct. It is assumed that there are no
dying csses when control reaches this point. The right fix is making sure
that remount path clears up dying csses before calling into this path.

Thanks.

--
tejun

2021-09-14 00:56:00

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix incorrect warning from cgroup_apply_control_disable()

On 9/13/21 2:05 PM, Tejun Heo wrote:
> Hello,
>
> On Thu, Sep 09, 2021 at 10:42:55PM -0400, Waiman Long wrote:
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index 881ce1470beb..e31bca9fcd46 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -3140,7 +3140,16 @@ static void cgroup_apply_control_disable(struct cgroup *cgrp)
>> if (!css)
>> continue;
>>
>> - WARN_ON_ONCE(percpu_ref_is_dying(&css->refcnt));
>> + /*
>> + * A kill_css() might have been called previously, but
>> + * the css may still linger for a while before being
>> + * removed. Skip it in this case.
>> + */
>> + if (percpu_ref_is_dying(&css->refcnt)) {
>> + WARN_ON_ONCE(css->parent &&
>> + cgroup_ss_mask(dsct) & (1 << ss->id));
>> + continue;
>> + }
> This warning did help me catch some gnarly bugs. Any chance we can keep it
> for normal cases and elide it just for remounting?

The problem with percpu_ref_is_dying() is the fact that it becomes true
after percpu_ref_exit() is called in css_free_rwork_fn() which has an
RCU delay. If you want to catch the fact that kill_css() has been
called, we can check the CSS_DYING flag which is set in kill_css() by
commit 33c35aa481786 ("cgroup: Prevent kill_css() from being called more
than once"). Will that be an acceptable alternative?

Cheers,
Longman

2021-09-14 00:56:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix incorrect warning from cgroup_apply_control_disable()

Hello,

On Mon, Sep 13, 2021 at 02:35:03PM -0400, Waiman Long wrote:
> The problem with percpu_ref_is_dying() is the fact that it becomes true
> after percpu_ref_exit() is called in css_free_rwork_fn() which has an RCU
> delay. If you want to catch the fact that kill_css() has been called, we can
> check the CSS_DYING flag which is set in kill_css() by commit 33c35aa481786
> ("cgroup: Prevent kill_css() from being called more than once"). Will that
> be an acceptable alternative?

So, it's checking that cgroup_lock_and_drain_offline() which is used by
cgroup_lock_and_drain_offline() actually waited for csses which are in the
process of going offline because some operations aren't safe when racing
against them. I think the right solution here is making sure that [u]mount
path drains the offlining cgroups rather than getting rid of the warning. I
think the warning is pointing out an actual problem.

Thanks.

--
tejun