2021-08-14 00:26:11

by brookxu.cn

[permalink] [raw]
Subject: [PATCH v2] misc_cgroup: use a counter to count the number of failures

From: Chunguang Xu <[email protected]>

For a container, we only print an error log when the resource
charge fails. There may be some problems here:

1. If a large number of containers are created and deleted,
there will be a lot of error logs.
2. According to an error log, we cannot better understand
the actual pressure of resources.

Therefore, perhaps we should use a failcnt counter to count
the number of failures, so that we can easily understand the
actual pressure of resources and avoid too many error log..

v2: rename failcnt to nr_fails.

Signed-off-by: Chunguang Xu <[email protected]>
---
include/linux/misc_cgroup.h | 4 ++--
kernel/cgroup/misc.c | 37 ++++++++++++++++++++++++++++++-------
2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h
index da2367e..59706b2 100644
--- a/include/linux/misc_cgroup.h
+++ b/include/linux/misc_cgroup.h
@@ -31,12 +31,12 @@ enum misc_res_type {
* struct misc_res: Per cgroup per misc type resource
* @max: Maximum limit on the resource.
* @usage: Current usage of the resource.
- * @failed: True if charged failed for the resource in a cgroup.
+ * @nr_fails: Failure count of the resource
*/
struct misc_res {
unsigned long max;
atomic_long_t usage;
- bool failed;
+ atomic_long_t nr_fails;
};

/**
diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
index ec02d96..c35477e 100644
--- a/kernel/cgroup/misc.c
+++ b/kernel/cgroup/misc.c
@@ -157,13 +157,7 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
new_usage = atomic_long_add_return(amount, &res->usage);
if (new_usage > READ_ONCE(res->max) ||
new_usage > READ_ONCE(misc_res_capacity[type])) {
- if (!res->failed) {
- pr_info("cgroup: charge rejected by the misc controller for %s resource in ",
- misc_res_name[type]);
- pr_cont_cgroup_path(i->css.cgroup);
- pr_cont("\n");
- res->failed = true;
- }
+ atomic_long_inc(&res->nr_fails);
ret = -EBUSY;
goto err_charge;
}
@@ -312,6 +306,29 @@ static int misc_cg_current_show(struct seq_file *sf, void *v)
}

/**
+ * misc_cg_failcnt_show() - Show the fail count of the misc cgroup.
+ * @sf: Interface file
+ * @v: Arguments passed
+ *
+ * Context: Any context.
+ * Return: 0 to denote successful print.
+ */
+static int misc_cg_failcnt_show(struct seq_file *sf, void *v)
+{
+ int i;
+ unsigned long nr_fails;
+ struct misc_cg *cg = css_misc(seq_css(sf));
+
+ for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+ nr_fails = atomic_long_read(&cg->res[i].nr_fails);
+ if (READ_ONCE(misc_res_capacity[i]) || nr_fails)
+ seq_printf(sf, "%s %lu\n", misc_res_name[i], nr_fails);
+ }
+
+ return 0;
+}
+
+/**
* misc_cg_capacity_show() - Show the total capacity of misc res on the host.
* @sf: Interface file
* @v: Arguments passed
@@ -349,6 +366,11 @@ static int misc_cg_capacity_show(struct seq_file *sf, void *v)
.flags = CFTYPE_NOT_ON_ROOT,
},
{
+ .name = "failcnt",
+ .seq_show = misc_cg_failcnt_show,
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
+ {
.name = "capacity",
.seq_show = misc_cg_capacity_show,
.flags = CFTYPE_ONLY_ON_ROOT,
@@ -382,6 +404,7 @@ static int misc_cg_capacity_show(struct seq_file *sf, void *v)
for (i = 0; i < MISC_CG_RES_TYPES; i++) {
WRITE_ONCE(cg->res[i].max, MAX_NUM);
atomic_long_set(&cg->res[i].usage, 0);
+ atomic_long_set(&cg->res[i].nr_fails, 0);
}

return &cg->css;
--
1.8.3.1


2021-08-14 01:18:22

by Vipin Sharma

[permalink] [raw]
Subject: Re: [PATCH v2] misc_cgroup: use a counter to count the number of failures

On Fri, Aug 13, 2021 at 5:15 PM brookxu <[email protected]> wrote:
>
> From: Chunguang Xu <[email protected]>
>
> For a container, we only print an error log when the resource
> charge fails. There may be some problems here:
>
> 1. If a large number of containers are created and deleted,
> there will be a lot of error logs.
> 2. According to an error log, we cannot better understand
> the actual pressure of resources.
>
> Therefore, perhaps we should use a failcnt counter to count
> the number of failures, so that we can easily understand the
> actual pressure of resources and avoid too many error log..
>
> v2: rename failcnt to nr_fails.
>
> Signed-off-by: Chunguang Xu <[email protected]>
> ---
> include/linux/misc_cgroup.h | 4 ++--
> kernel/cgroup/misc.c | 37 ++++++++++++++++++++++++++++++-------
> 2 files changed, 32 insertions(+), 9 deletions(-)
>

Acked-by: Vipin Sharma <[email protected]>

2021-08-24 16:45:32

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH v2] misc_cgroup: use a counter to count the number of failures

Hello.

On Sat, Aug 14, 2021 at 08:15:16AM +0800, brookxu <[email protected]> wrote:
> 1. If a large number of containers are created and deleted,
> there will be a lot of error logs.
> 2. According to an error log, we cannot better understand
> the actual pressure of resources.
>
> Therefore, perhaps we should use a failcnt counter to count
> the number of failures, so that we can easily understand the
> actual pressure of resources and avoid too many error log..

This is an understandable use case and generally the implementation via
the counter is good as well.

However, the non-hierarchical failcnt interface looks like v1ism to me
(I think new features should come with v2 first in mind).
What about exposing this in misc.events file with max.$res_name entries?

Or if the hierarchical reporting is unnecessary now, there can be just
misc.events.local for starters.

(That reminds me the forgotten pids.events[.local] rework [1], oops.)

Michal

https://lore.kernel.org/lkml/[email protected]/#t

2021-08-24 19:13:06

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v2] misc_cgroup: use a counter to count the number of failures

Hello,

On Tue, Aug 24, 2021 at 06:44:23PM +0200, Michal Koutn? wrote:
> However, the non-hierarchical failcnt interface looks like v1ism to me
> (I think new features should come with v2 first in mind).
> What about exposing this in misc.events file with max.$res_name entries?

Ah yeah, good point. misc.events sounds like a good spot to put these.

> Or if the hierarchical reporting is unnecessary now, there can be just
> misc.events.local for starters.

I'd prefer to stick with hierarchical counting as the first step at least.

> (That reminds me the forgotten pids.events[.local] rework [1], oops.)
>
> https://lore.kernel.org/lkml/[email protected]/#t

I think both counters are useful - the number of failures due to this type
of limit in this subhierarchy, and the number of failures caused by this
particular limit in this subhierarchy. It's a pretty subtle difference to
encapsulate in a counter name tho.

Thanks.

--
tejun

2021-08-25 06:53:06

by brookxu.cn

[permalink] [raw]
Subject: Re: [PATCH v2] misc_cgroup: use a counter to count the number of failures



Tejun Heo wrote on 2021/8/25 3:08:
> Hello,
>
> On Tue, Aug 24, 2021 at 06:44:23PM +0200, Michal Koutný wrote:
>> However, the non-hierarchical failcnt interface looks like v1ism to me
>> (I think new features should come with v2 first in mind).
>> What about exposing this in misc.events file with max.$res_name entries?
>
> Ah yeah, good point. misc.events sounds like a good spot to put these.
>
>> Or if the hierarchical reporting is unnecessary now, there can be just
>> misc.events.local for starters.
>
> I'd prefer to stick with hierarchical counting as the first step at least.
>
>> (That reminds me the forgotten pids.events[.local] rework [1], oops.)
>>
>> https://lore.kernel.org/lkml/[email protected]/#t
>
> I think both counters are useful - the number of failures due to this type
> of limit in this subhierarchy, and the number of failures caused by this
> particular limit in this subhierarchy. It's a pretty subtle difference to
> encapsulate in a counter name tho.

Thanks all for good suggestion, I try to do it in next version.

> Thanks.
>

2021-08-26 01:39:51

by brookxu.cn

[permalink] [raw]
Subject: Re: [PATCH v2] misc_cgroup: use a counter to count the number of failures



Tejun Heo wrote on 2021/8/25 3:08:
> Hello,
>
> On Tue, Aug 24, 2021 at 06:44:23PM +0200, Michal Koutný wrote:
>> However, the non-hierarchical failcnt interface looks like v1ism to me
>> (I think new features should come with v2 first in mind).
>> What about exposing this in misc.events file with max.$res_name entries?
>
> Ah yeah, good point. misc.events sounds like a good spot to put these.
>
>> Or if the hierarchical reporting is unnecessary now, there can be just
>> misc.events.local for starters.
>
> I'd prefer to stick with hierarchical counting as the first step at least.
>
>> (That reminds me the forgotten pids.events[.local] rework [1], oops.)
>>
>> https://lore.kernel.org/lkml/[email protected]/#t

The core logic of pids cgroup and misc cgroup is similar. Is it possible for
us to merge pids cgroup into misc cgroup?

> I think both counters are useful - the number of failures due to this type
> of limit in this subhierarchy, and the number of failures caused by this
> particular limit in this subhierarchy. It's a pretty subtle difference to
> encapsulate in a counter name tho.
>
> Thanks.
>

2021-08-26 11:30:52

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH v2] misc_cgroup: use a counter to count the number of failures

On Thu, Aug 26, 2021 at 09:34:45AM +0800, brookxu <[email protected]> wrote:
> The core logic of pids cgroup and misc cgroup is similar.

Yes, the latter is conceptually a generalization of the former and it can
be tempting to use the general form. Beware that pids controller would
need to retain its existing API (and the behavior of being an
independent controller) and that would be IMO exceptions
counterweighting the generalization.

> Is it possible for us to merge pids cgroup into misc cgroup?

Technically it might be possible but I can't see the benefit (but maybe
you envisioned something else where my reasoning won't apply).

Michal

2021-09-08 05:31:21

by brookxu.cn

[permalink] [raw]
Subject: Re: [PATCH v2] misc_cgroup: use a counter to count the number of failures

Hi all:

I have sentout misc_cgroup events and events.local related patches.
I want to make corresponding changes to pids cgroup by the way. Do
you think it is ok?

Thanks.

2021-09-09 15:07:01

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH v2] misc_cgroup: use a counter to count the number of failures

Hi.

On Wed, Sep 08, 2021 at 01:29:18PM +0800, brookxu <[email protected]> wrote:
> I have sentout misc_cgroup events and events.local related patches.
> I want to make corresponding changes to pids cgroup by the way. Do
> you think it is ok?

The pids controller is longer out there, so some changes should be more
careful (e.g. I wouldn't drop the log message).

Also, you can base it on [1] (there should be just missing the
.local/hierarchical event files separation).

HTH,
Michal

[1] https://lore.kernel.org/lkml/[email protected]/