2021-09-08 05:28:04

by brookxu.cn

[permalink] [raw]
Subject: [RFC PATCH 1/3] misc_cgroup: introduce misc.events and misc_events.local

From: Chunguang Xu <[email protected]>

Introduce misc.events and misc.events.local to make it easier for
us to understand the pressure of resources. The main idea comes
from mem_cgroup.

Signed-off-by: Chunguang Xu <[email protected]>
---
include/linux/misc_cgroup.h | 9 +++++++
kernel/cgroup/misc.c | 50 ++++++++++++++++++++++++++++++++++++-
2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h
index da2367e2ac1e..d29f1743fae9 100644
--- a/include/linux/misc_cgroup.h
+++ b/include/linux/misc_cgroup.h
@@ -46,6 +46,15 @@ struct misc_res {
*/
struct misc_cg {
struct cgroup_subsys_state css;
+
+ /* misc.events */
+ atomic_long_t events[MISC_CG_RES_TYPES];
+ struct cgroup_file events_file;
+
+ /* misc.events.local */
+ atomic_long_t events_local[MISC_CG_RES_TYPES];
+ struct cgroup_file events_local_file;
+
struct misc_res res[MISC_CG_RES_TYPES];
};

diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
index ec02d963cad1..2a3d14be21e5 100644
--- a/kernel/cgroup/misc.c
+++ b/kernel/cgroup/misc.c
@@ -140,7 +140,7 @@ static void misc_cg_cancel_charge(enum misc_res_type type, struct misc_cg *cg,
int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
unsigned long amount)
{
- struct misc_cg *i, *j;
+ struct misc_cg *i, *j, *k;
int ret;
struct misc_res *res;
int new_usage;
@@ -171,6 +171,16 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
return 0;

err_charge:
+ if (cgroup_subsys_on_dfl(misc_cgrp_subsys)) {
+ atomic_long_inc(&i->events_local[type]);
+ cgroup_file_notify(&i->events_local_file);
+
+ for (k = i; k; k = parent_misc(k)) {
+ atomic_long_inc(&k->events[type]);
+ cgroup_file_notify(&k->events_file);
+ }
+ }
+
for (j = cg; j != i; j = parent_misc(j))
misc_cg_cancel_charge(type, j, amount);
misc_cg_cancel_charge(type, i, amount);
@@ -335,6 +345,32 @@ static int misc_cg_capacity_show(struct seq_file *sf, void *v)
return 0;
}

+static int misc_events_show(struct seq_file *sf, void *v)
+{
+ struct misc_cg *cg = css_misc(seq_css(sf));
+ unsigned long count, i;
+
+ for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+ count = atomic_long_read(&cg->events[i]);
+ if (READ_ONCE(misc_res_capacity[i]) || count)
+ seq_printf(sf, "%s %lu\n", misc_res_name[i], count);
+ }
+ return 0;
+}
+
+static int misc_events_local_show(struct seq_file *sf, void *v)
+{
+ struct misc_cg *cg = css_misc(seq_css(sf));
+ unsigned long count, i;
+
+ for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+ count = atomic_long_read(&cg->events_local[i]);
+ if (READ_ONCE(misc_res_capacity[i]) || count)
+ seq_printf(sf, "%s %lu\n", misc_res_name[i], count);
+ }
+ return 0;
+}
+
/* Misc cgroup interface files */
static struct cftype misc_cg_files[] = {
{
@@ -353,6 +389,18 @@ static struct cftype misc_cg_files[] = {
.seq_show = misc_cg_capacity_show,
.flags = CFTYPE_ONLY_ON_ROOT,
},
+ {
+ .name = "events",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .file_offset = offsetof(struct misc_cg, events_file),
+ .seq_show = misc_events_show,
+ },
+ {
+ .name = "events.local",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .file_offset = offsetof(struct misc_cg, events_local_file),
+ .seq_show = misc_events_local_show,
+ },
{}
};

--
2.30.0


2021-09-08 05:28:04

by brookxu.cn

[permalink] [raw]
Subject: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

From: Chunguang Xu <[email protected]>

In scenarios where containers are frequently created and deleted,
a large number of error logs maybe generated. This log provides
less information, we can get more detailed failure records through
misc.events, misc.events.local and misc.failcnt. From this, perhaps
we can remove it.

Signed-off-by: Chunguang Xu <[email protected]>
---
include/linux/misc_cgroup.h | 1 -
kernel/cgroup/misc.c | 7 -------
2 files changed, 8 deletions(-)

diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h
index 4d630cd3e4bd..838e26149263 100644
--- a/include/linux/misc_cgroup.h
+++ b/include/linux/misc_cgroup.h
@@ -36,7 +36,6 @@ struct misc_cg;
struct misc_res {
unsigned long max;
atomic_long_t usage;
- bool failed;
};

/**
diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
index d8f99fd58981..b9c7e488413f 100644
--- a/kernel/cgroup/misc.c
+++ b/kernel/cgroup/misc.c
@@ -157,13 +157,6 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
new_usage = atomic_long_add_return(amount, &res->usage);
if (new_usage > READ_ONCE(res->max) ||
new_usage > READ_ONCE(misc_res_capacity[type])) {
- if (!res->failed) {
- pr_info("cgroup: charge rejected by the misc controller for %s resource in ",
- misc_res_name[type]);
- pr_cont_cgroup_path(i->css.cgroup);
- pr_cont("\n");
- res->failed = true;
- }
ret = -EBUSY;
goto err_charge;
}
--
2.30.0

2021-09-08 05:37:41

by brookxu.cn

[permalink] [raw]
Subject: [RFC PATCH 2/3] misc_cgroup: introduce misc.failcnt for V1

From: Chunguang Xu <[email protected]>

introduce misc.failcnt for cgroup v1 to make it easier for us to
understand the pressure of resources.

Signed-off-by: Chunguang Xu <[email protected]>
---
include/linux/misc_cgroup.h | 3 +++
kernel/cgroup/misc.c | 54 +++++++++++++++++++++++++++++++++++--
2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h
index d29f1743fae9..4d630cd3e4bd 100644
--- a/include/linux/misc_cgroup.h
+++ b/include/linux/misc_cgroup.h
@@ -55,6 +55,9 @@ struct misc_cg {
atomic_long_t events_local[MISC_CG_RES_TYPES];
struct cgroup_file events_local_file;

+ /* misc.failcnt for v1 only */
+ atomic_long_t nr_failed[MISC_CG_RES_TYPES];
+
struct misc_res res[MISC_CG_RES_TYPES];
};

diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
index 2a3d14be21e5..d8f99fd58981 100644
--- a/kernel/cgroup/misc.c
+++ b/kernel/cgroup/misc.c
@@ -179,7 +179,8 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
atomic_long_inc(&k->events[type]);
cgroup_file_notify(&k->events_file);
}
- }
+ } else
+ atomic_long_inc(&i->nr_failed[type]);

for (j = cg; j != i; j = parent_misc(j))
misc_cg_cancel_charge(type, j, amount);
@@ -371,6 +372,29 @@ static int misc_events_local_show(struct seq_file *sf, void *v)
return 0;
}

+/**
+ * misc_cg_failcnt_show() - Show the fail count of the misc cgroup.
+ * @sf: Interface file
+ * @v: Arguments passed
+ *
+ * Context: Any context.
+ * Return: 0 to denote successful print.
+ */
+static int misc_cg_failcnt_show(struct seq_file *sf, void *v)
+{
+ int i;
+ unsigned long failcnt;
+ struct misc_cg *cg = css_misc(seq_css(sf));
+
+ for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+ failcnt = atomic_long_read(&cg->nr_failed[i]);
+ if (READ_ONCE(misc_res_capacity[i]) || failcnt)
+ seq_printf(sf, "%s %lu\n", misc_res_name[i], failcnt);
+ }
+
+ return 0;
+}
+
/* Misc cgroup interface files */
static struct cftype misc_cg_files[] = {
{
@@ -404,6 +428,32 @@ static struct cftype misc_cg_files[] = {
{}
};

+/* Misc cgroup interface files for v1*/
+static struct cftype misc_cg_legacy_files[] = {
+ {
+ .name = "max",
+ .write = misc_cg_max_write,
+ .seq_show = misc_cg_max_show,
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
+ {
+ .name = "current",
+ .seq_show = misc_cg_current_show,
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
+ {
+ .name = "capacity",
+ .seq_show = misc_cg_capacity_show,
+ .flags = CFTYPE_ONLY_ON_ROOT,
+ },
+ {
+ .name = "failcnt",
+ .seq_show = misc_cg_failcnt_show,
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
+ {}
+};
+
/**
* misc_cg_alloc() - Allocate misc cgroup.
* @parent_css: Parent cgroup.
@@ -450,6 +500,6 @@ static void misc_cg_free(struct cgroup_subsys_state *css)
struct cgroup_subsys misc_cgrp_subsys = {
.css_alloc = misc_cg_alloc,
.css_free = misc_cg_free,
- .legacy_cftypes = misc_cg_files,
+ .legacy_cftypes = misc_cg_legacy_files,
.dfl_cftypes = misc_cg_files,
};
--
2.30.0

2021-09-09 14:38:34

by Michal Koutný

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] misc_cgroup: introduce misc.events and misc_events.local

Hello Chunguang.

The new version looks like a good step generally.

My main remark is that I wouldn't make a distinct v1 and v2 interface,
it's a new controller so I think the v2 could be exposed in both cases
(or in other words, don't create new v1-specific features).

On Wed, Sep 08, 2021 at 01:24:34PM +0800, brookxu <[email protected]> wrote:
> Introduce misc.events and misc.events.local to make it easier for
> us to understand the pressure of resources. The main idea comes
> from mem_cgroup.

BTW what are the events you really are interesed in? (See also the
proposal in my reply to 1/3.)

> @@ -171,6 +171,16 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
> return 0;
>
> err_charge:
> + if (cgroup_subsys_on_dfl(misc_cgrp_subsys)) {
> + atomic_long_inc(&i->events_local[type]);
> + cgroup_file_notify(&i->events_local_file);
> +
> + for (k = i; k; k = parent_misc(k)) {
> + atomic_long_inc(&k->events[type]);
> + cgroup_file_notify(&k->events_file);
> + }
> + }
> +

No reason to wrap this for the default hierarchy only.

> +static int misc_events_show(struct seq_file *sf, void *v)
> +{
> + struct misc_cg *cg = css_misc(seq_css(sf));
> + unsigned long count, i;
> +
> + for (i = 0; i < MISC_CG_RES_TYPES; i++) {
> + count = atomic_long_read(&cg->events[i]);
> + if (READ_ONCE(misc_res_capacity[i]) || count)
> + seq_printf(sf, "%s %lu\n", misc_res_name[i], count);

More future-proof key would be
seq_printf(sf, "%s.max %lu\n", misc_res_name[i], count);
or
seq_printf(sf, "max.%s %lu\n", misc_res_name[i], count);

(Which one is a judgement call but I'd include the "name" of event type too.)

> +static int misc_events_local_show(struct seq_file *sf, void *v)
> +{
> + struct misc_cg *cg = css_misc(seq_css(sf));
> + unsigned long count, i;
> +
> + for (i = 0; i < MISC_CG_RES_TYPES; i++) {
> + count = atomic_long_read(&cg->events_local[i]);
> + if (READ_ONCE(misc_res_capacity[i]) || count)
> + seq_printf(sf, "%s %lu\n", misc_res_name[i], count);

Ditto.

Thanks,
Michal

2021-09-09 14:38:49

by Michal Koutný

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] misc_cgroup: introduce misc.failcnt for V1

On Wed, Sep 08, 2021 at 01:24:35PM +0800, brookxu <[email protected]> wrote:
> introduce misc.failcnt for cgroup v1 to make it easier for us to
> understand the pressure of resources.

As argued in the patch 1/3, I'd drop this change completely.

Thanks,
Michal

2021-09-09 14:39:08

by Michal Koutný

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

On Wed, Sep 08, 2021 at 01:24:36PM +0800, brookxu <[email protected]> wrote:
> This log provides less information, we can get more detailed failure
> records through
> misc.events, misc.events.local and misc.failcnt.
> From this, perhaps we can remove it.

I hope: a) it's not used widely, b) no-one relies on parsing the
message so this is an acceptable change.

> @@ -157,13 +157,6 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
> new_usage = atomic_long_add_return(amount, &res->usage);
> if (new_usage > READ_ONCE(res->max) ||
> new_usage > READ_ONCE(misc_res_capacity[type])) {
> - if (!res->failed) {
> - pr_info("cgroup: charge rejected by the misc controller for %s resource in ",
> - misc_res_name[type]);
> - pr_cont_cgroup_path(i->css.cgroup);
> - pr_cont("\n");
> - res->failed = true;
> - }

`i` is the misc_cg whose limit was hit. In a sense, I think the current
implementation of the log message (before your patch) is not as useful
as it could be. The logged message here should not refer to `i` but `cg`
(i.e. the cgroup where the actual chargee resides). It's basically the
idea from [1].

So there could be two type of events (not referring to the v1-specific
failcnt):
- max - number of times the cgroup's misc.max was hit,
- fail - number of times operation failed (rejected) in the cgroup.

The former would tell you which limit is probably set too low, the
latter would capture which cgroup workload is affected. (The difference
would become apparent with >1 level deep hierarchies.)

Regards,
Michal

[1] https://lore.kernel.org/lkml/[email protected]/

2021-09-09 16:53:16

by Vipin Sharma

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

On Thu, Sep 9, 2021 at 7:37 AM Michal Koutný <[email protected]> wrote:
>
> On Wed, Sep 08, 2021 at 01:24:36PM +0800, brookxu <[email protected]> wrote:
>
> So there could be two type of events (not referring to the v1-specific
> failcnt):
> - max - number of times the cgroup's misc.max was hit,
> - fail - number of times operation failed (rejected) in the cgroup.
>
> The former would tell you which limit is probably set too low, the
> latter would capture which cgroup workload is affected. (The difference
> would become apparent with >1 level deep hierarchies.)

We are adding two files in this patch series, misc.events and
misc.events.local. I think "fail" should go in misc.events.local and
its name should be changed to "max". So, both files will have "max"
entry but the one in misc.events will refer to the failures in the
current cgroup and its children, and the one in misc.events.local will
refer to the originating cgroup.

>
> Regards,
> Michal
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>

2021-09-09 16:59:11

by Vipin Sharma

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] misc_cgroup: introduce misc.events and misc_events.local

On Tue, Sep 7, 2021 at 10:24 PM brookxu <[email protected]> wrote:
>
> From: Chunguang Xu <[email protected]>
>
> Introduce misc.events and misc.events.local to make it easier for
> us to understand the pressure of resources. The main idea comes
> from mem_cgroup.
>
> Signed-off-by: Chunguang Xu <[email protected]>
> ---
> include/linux/misc_cgroup.h | 9 +++++++
> kernel/cgroup/misc.c | 50 ++++++++++++++++++++++++++++++++++++-
> 2 files changed, 58 insertions(+), 1 deletion(-)

Thanks for the changes. Please also update the documentation at
Documentation/admin-guide/cgroup-v2.rst for the new changes in the
misc cgroup.

2021-09-09 17:12:05

by Vipin Sharma

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] misc_cgroup: introduce misc.events and misc_events.local

On Thu, Sep 9, 2021 at 7:37 AM Michal Koutný <[email protected]> wrote:
>
> Hello Chunguang.
>
> The new version looks like a good step generally.
>
> My main remark is that I wouldn't make a distinct v1 and v2 interface,
> it's a new controller so I think the v2 could be exposed in both cases
> (or in other words, don't create new v1-specific features).

I agree with Michal. We can have the same interface for v1 otherwise
there will not be any form of feedback in v1 for failures.

>
> On Wed, Sep 08, 2021 at 01:24:34PM +0800, brookxu <[email protected]> wrote:
> > +static int misc_events_show(struct seq_file *sf, void *v)
> > +{
> > + struct misc_cg *cg = css_misc(seq_css(sf));
> > + unsigned long count, i;
> > +
> > + for (i = 0; i < MISC_CG_RES_TYPES; i++) {
> > + count = atomic_long_read(&cg->events[i]);
> > + if (READ_ONCE(misc_res_capacity[i]) || count)
> > + seq_printf(sf, "%s %lu\n", misc_res_name[i], count);
>
> More future-proof key would be
> seq_printf(sf, "%s.max %lu\n", misc_res_name[i], count);
> or
> seq_printf(sf, "max.%s %lu\n", misc_res_name[i], count);
>
> (Which one is a judgement call but I'd include the "name" of event type too.)
>
I am inclined more towards "%s.max", it looks nice to see the resource
name before its corresponding events.

2021-09-09 19:00:50

by Michal Koutný

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

On Thu, Sep 09, 2021 at 09:49:56AM -0700, Vipin Sharma <[email protected]> wrote:
> We are adding two files in this patch series, misc.events and
> misc.events.local. I think "fail" should go in misc.events.local and
> its name should be changed to "max".

I consider the max vs fail orthogonal to local vs hierarchical. I.e.
both entries can be in both files:

(1) misc.events.local:max number of times the cgroup's misc.max was hit
(2) misc.events.local:fail number of times operation failed in the cgroup
(3) misc.events:max number of times the cgroup's misc.max was hit in the subtree
(4) misc.events:fail number of times operation failed in the subtree

Is that too many? Admittedly, I assume (1) and (4) will be the more useful ones.
However, I'm afraid overloading "max" as suggested might be (more)
confusing. ('subtree' above := self or descendant)

Michal

2021-09-10 05:22:34

by brookxu.cn

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] misc_cgroup: introduce misc.events and misc_events.local



Vipin Sharma wrote on 2021/9/10 1:08 上午:
> On Thu, Sep 9, 2021 at 7:37 AM Michal Koutný <[email protected]> wrote:
>>
>> Hello Chunguang.
>>
>> The new version looks like a good step generally.
>>
>> My main remark is that I wouldn't make a distinct v1 and v2 interface,
>> it's a new controller so I think the v2 could be exposed in both cases
>> (or in other words, don't create new v1-specific features).
>
> I agree with Michal. We can have the same interface for v1 otherwise
> there will not be any form of feedback in v1 for failures.

Yeah, this is more reasonable. But there is still one question, whether we
need to be consistent with other cgroup subsystems, events and events.local
under v1 should not support hierarchy?

>>
>> On Wed, Sep 08, 2021 at 01:24:34PM +0800, brookxu <[email protected]> wrote:
>>> +static int misc_events_show(struct seq_file *sf, void *v)
>>> +{
>>> + struct misc_cg *cg = css_misc(seq_css(sf));
>>> + unsigned long count, i;
>>> +
>>> + for (i = 0; i < MISC_CG_RES_TYPES; i++) {
>>> + count = atomic_long_read(&cg->events[i]);
>>> + if (READ_ONCE(misc_res_capacity[i]) || count)
>>> + seq_printf(sf, "%s %lu\n", misc_res_name[i], count);
>>
>> More future-proof key would be
>> seq_printf(sf, "%s.max %lu\n", misc_res_name[i], count);
>> or
>> seq_printf(sf, "max.%s %lu\n", misc_res_name[i], count);
>>
>> (Which one is a judgement call but I'd include the "name" of event type too.)
>>
> I am inclined more towards "%s.max", it looks nice to see the resource
> name before its corresponding events

I also think %s.max may be more intuitive.

2021-09-10 05:26:38

by brookxu.cn

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] misc_cgroup: introduce misc.events and misc_events.local



Vipin Sharma wrote on 2021/9/10 12:56 上午:
> On Tue, Sep 7, 2021 at 10:24 PM brookxu <[email protected]> wrote:
>>
>> From: Chunguang Xu <[email protected]>
>>
>> Introduce misc.events and misc.events.local to make it easier for
>> us to understand the pressure of resources. The main idea comes
>> from mem_cgroup.
>>
>> Signed-off-by: Chunguang Xu <[email protected]>
>> ---
>> include/linux/misc_cgroup.h | 9 +++++++
>> kernel/cgroup/misc.c | 50 ++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 58 insertions(+), 1 deletion(-)
>
> Thanks for the changes. Please also update the documentation at
> Documentation/admin-guide/cgroup-v2.rst for the new changes in the
> misc cgroup.
>
All right, thanks.

2021-09-10 05:32:53

by brookxu.cn

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood


Thanks for your time.

Michal Koutný wrote on 2021/9/10 2:56 上午:
> On Thu, Sep 09, 2021 at 09:49:56AM -0700, Vipin Sharma <[email protected]> wrote:
>> We are adding two files in this patch series, misc.events and
>> misc.events.local. I think "fail" should go in misc.events.local and
>> its name should be changed to "max".
>
> I consider the max vs fail orthogonal to local vs hierarchical. I.e.
> both entries can be in both files:
>
> (1) misc.events.local:max number of times the cgroup's misc.max was hit
> (2) misc.events.local:fail number of times operation failed in the cgroup
> (3) misc.events:max number of times the cgroup's misc.max was hit in the subtree
> (4) misc.events:fail number of times operation failed in the subtree
>
> Is that too many? Admittedly, I assume (1) and (4) will be the more useful ones.
> However, I'm afraid overloading "max" as suggested might be (more)
> confusing. ('subtree' above := self or descendant)

I am a bit confused here. For misc_cgroup, we can only be rejected when the count
touch Limit, but there may be other more reasons for other subsystems. Therefore,
when we are rejected, does it mean that we have touch Limit? If so, do we still
need to distinguish between max and fail? (for misc_cgroup)

> Michal
>

2021-09-10 09:24:57

by Michal Koutný

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

On Fri, Sep 10, 2021 at 01:30:46PM +0800, brookxu <[email protected]> wrote:
> I am a bit confused here. For misc_cgroup, we can only be rejected when the count
> touch Limit, but there may be other more reasons for other subsystems.

Sorry, I wasn't clear about that -- the failures I meant to be counted
here were only the ones caused by (an ancestor) limit. Maybe there's a
better naem for that.

> Therefore, when we are rejected, does it mean that we have touch
> Limit? If so, do we still need to distinguish between max and fail?
> (for misc_cgroup)

r
`- c1
`- c2.max
`- c3
`- c4.max
`- task t
`- c5

Assuming c2.max < c4.max, when a task t calls try_charge and it fails
because of c2.max, then the 'max' event is counted to c2 (telling that
the limit is perhaps low) and the 'fail' event is counted to c4 (telling
you where the troubles originated). That is my idea. Although in the
case of short-lived cgroups, you'd likely only get the hierarchically
aggregated 'fail' events from c3 or higher with lower (spatial)
precision.
What would be the type of information useful for your troubleshooting?

Cheers,
Michal

2021-09-10 10:35:46

by Michal Koutný

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] misc_cgroup: introduce misc.events and misc_events.local

On Fri, Sep 10, 2021 at 01:20:37PM +0800, brookxu <[email protected]> wrote:
> Yeah, this is more reasonable. But there is still one question, whether we
> need to be consistent with other cgroup subsystems, events and events.local
> under v1 should not support hierarchy?

My take is that it's acceptable to present the v2-like files in v1 too
for the sake of simplicity since:
- this is not used yet,
- the v1 is less conventional and
- the presence of events.local would cater even to cases with tasks in
inner nodes.

It'd be good to have Tejun's insight on this too.

Michal

2021-09-10 14:31:14

by brookxu.cn

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

Thanks for your time.

On 2021/9/10 5:23 PM, Michal Koutný wrote:
> On Fri, Sep 10, 2021 at 01:30:46PM +0800, brookxu <[email protected]> wrote:
>> I am a bit confused here. For misc_cgroup, we can only be rejected when the count
>> touch Limit, but there may be other more reasons for other subsystems.
>
> Sorry, I wasn't clear about that -- the failures I meant to be counted
> here were only the ones caused by (an ancestor) limit. Maybe there's a
> better naem for that.
>
>> Therefore, when we are rejected, does it mean that we have touch
>> Limit? If so, do we still need to distinguish between max and fail?
>> (for misc_cgroup)
>
> r
> `- c1
> `- c2.max
> `- c3
> `- c4.max
> `- task t
> `- c5
>
> Assuming c2.max < c4.max, when a task t calls try_charge and it fails
> because of c2.max, then the 'max' event is counted to c2 (telling that
> the limit is perhaps low) and the 'fail' event is counted to c4 (telling
> you where the troubles originated). That is my idea. Although in the
> case of short-lived cgroups, you'd likely only get the hierarchically
> aggregated 'fail' events from c3 or higher with lower (spatial)
> precision.
> What would be the type of information useful for your troubleshooting?

Through events and events.local, we can determine which node has
insufficient resources. For example, when the ‘events’ is large, we
traverse down and use events.local to determine which node has
insufficient resources. 'fail' counter does not seem to provide more
effective information in this regard. When 'fail' is big, it seems that
we still need to use events and events.local to determine the node of
insufficient resources. I am not very sure what details can we learn
through 'fail' counter.

>
> Cheers,
> Michal
>

2021-09-10 15:38:08

by Michal Koutný

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

On Fri, Sep 10, 2021 at 10:29:21PM +0800, "brookxu.cn" <[email protected]> wrote:
> Through events and events.local, we can determine which node has
> insufficient resources. For example, when the ‘events’ is large, we traverse
> down and use events.local to determine which node has insufficient
> resources.

IIUC, this works in situations when the limits are set in decreasing
fashion (from root down) till very (controller) leaves. That's a valid
config and you're right that following 'max' events gets you to the
misbehaving/underprovisioned cgroup.

> 'fail' counter does not seem to provide more effective
> information in this regard. When 'fail' is big, it seems that we still need
> to use events and events.local to determine the node of insufficient
> resources.
> I am not very sure what details can we learn through 'fail' counter.

If there's a limit on certain level with otherwise unconstrained cgroup
structure below (a valid config too), the 'fail' counter would help
determining what the affected cgroup is. Does that make sense to you?

The log messages as implemented currently, aren't as useful as proposed
'fail' counter (they would need report 'cg' path, not 'i').

I see justification for 'fail' events as a replacement for the dropped
log messages. Although it's not a complete replacement due to longer
persistence of the log, illustrated e.g. with the short-lived containers
whose cgroups/fail counts are gone).


Michal

2021-09-10 16:31:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

Hello,

On Fri, Sep 10, 2021 at 05:36:09PM +0200, Michal Koutn? wrote:
> If there's a limit on certain level with otherwise unconstrained cgroup
> structure below (a valid config too), the 'fail' counter would help
> determining what the affected cgroup is. Does that make sense to you?

While the desire to make the interface complete is understandable, I don't
think we need to go too far in that direction given that debugging these
configuration issues requires human intervention anyway and providing
overall information is often enough of aid especially for simple controllers
like misc/pid. So, let's stick to something consistent and simple even if
not complete and definitely not name them "fail" even if we add them.

Thanks.

--
tejun

2021-09-10 16:34:44

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] misc_cgroup: introduce misc.events and misc_events.local

Hello,

On Fri, Sep 10, 2021 at 12:33:06PM +0200, Michal Koutný wrote:
> On Fri, Sep 10, 2021 at 01:20:37PM +0800, brookxu <[email protected]> wrote:
> > Yeah, this is more reasonable. But there is still one question, whether we
> > need to be consistent with other cgroup subsystems, events and events.local
> > under v1 should not support hierarchy?
>
> My take is that it's acceptable to present the v2-like files in v1 too
> for the sake of simplicity since:
> - this is not used yet,
> - the v1 is less conventional and
> - the presence of events.local would cater even to cases with tasks in
> inner nodes.
>
> It'd be good to have Tejun's insight on this too.

My general appraoch is

* If it's trivial both in terms of complexity and effort to add support for
cgroup1, oh well, why not?

* Otherwise, don't bother.

* cgroup1 interface is wildly inconsistent anyway, so I wouldn't worry much
about that.

Thanks.

--
tejun

2021-09-10 17:21:19

by Vipin Sharma

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

On Fri, Sep 10, 2021 at 9:29 AM Tejun Heo <[email protected]> wrote:
>
> Hello,
>
> On Fri, Sep 10, 2021 at 05:36:09PM +0200, Michal Koutnı wrote:
> > If there's a limit on certain level with otherwise unconstrained cgroup
> > structure below (a valid config too), the 'fail' counter would help
> > determining what the affected cgroup is. Does that make sense to you?
>
> While the desire to make the interface complete is understandable, I don't
> think we need to go too far in that direction given that debugging these
> configuration issues requires human intervention anyway and providing
> overall information is often enough of aid especially for simple controllers
> like misc/pid. So, let's stick to something consistent and simple even if
> not complete and definitely not name them "fail" even if we add them.
>

I understand what Michal is proposing regarding fail vs max and local
vs hierarchical. I think this will provide complete information but it
will be too many interfaces for a simple controller like misc and
might not even get used by anyone.

Chunguang's case was to avoid printing so many messages, I agree we
should remove the log message and add a file event.

For now, I think we can just have one file, events.local
(non-hierarchical) which has %s.max type entries. This will tell us
which cgroup is under pressure and I believe this is helpful.

Regarding the original cgroup which started the charge should be
easier to identify because those processes will not be able to proceed
or will use some alternate logic, and the job owner should be able to
notice it.

If in future there is a need to find the originating cgroup we can
resume this discussion during that time.

2021-09-10 17:21:39

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

Hello,

On Fri, Sep 10, 2021 at 10:16:46AM -0700, Vipin Sharma wrote:
> For now, I think we can just have one file, events.local
> (non-hierarchical) which has %s.max type entries. This will tell us
> which cgroup is under pressure and I believe this is helpful.

How about just sticking with hierarchical events? Do you see a lot of
downsides with that compared to .local?

Thanks.

--
tejun

2021-09-10 17:33:21

by Vipin Sharma

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

On Fri, Sep 10, 2021 at 10:19 AM Tejun Heo <[email protected]> wrote:
>
> Hello,
>
> On Fri, Sep 10, 2021 at 10:16:46AM -0700, Vipin Sharma wrote:
> > For now, I think we can just have one file, events.local
> > (non-hierarchical) which has %s.max type entries. This will tell us
> > which cgroup is under pressure and I believe this is helpful.
>
> How about just sticking with hierarchical events? Do you see a lot of
> downsides with that compared to .local?
>

Local would have made it easier to identify constrained cgroup and the
lesser number of file notify events will be fired.

I don't have a strong aversion to hierarchical events only as well if
that is more suited for the cgroup v2 style.

2021-09-10 17:39:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

Hello,

On Fri, Sep 10, 2021 at 10:31:02AM -0700, Vipin Sharma wrote:
> I don't have a strong aversion to hierarchical events only as well if
> that is more suited for the cgroup v2 style.

It's not a hard requirement but I'd prefer hierarchical over local unless
there are strong reasons against doing so.

Thanks.

--
tejun