2023-05-18 13:08:05

by Hao Jia

[permalink] [raw]
Subject: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic

In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
needs to be updated to the current {rstatc, cgrp}->bstat, directly
assigning values instead of adding the last value to delta.

Signed-off-by: Hao Jia <[email protected]>
---
kernel/cgroup/rstat.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 9c4c55228567..3e5c4c1c92c6 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -376,14 +376,14 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
/* propagate percpu delta to global */
cgroup_base_stat_sub(&delta, &rstatc->last_bstat);
cgroup_base_stat_add(&cgrp->bstat, &delta);
- cgroup_base_stat_add(&rstatc->last_bstat, &delta);
+ rstatc->last_bstat = rstatc->bstat;

/* propagate global delta to parent (unless that's root) */
if (cgroup_parent(parent)) {
delta = cgrp->bstat;
cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
cgroup_base_stat_add(&parent->bstat, &delta);
- cgroup_base_stat_add(&cgrp->last_bstat, &delta);
+ cgrp->last_bstat = cgrp->bstat;
}
}

--
2.37.0



2023-05-19 04:38:16

by Hao Jia

[permalink] [raw]
Subject: Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic



On 2023/5/18 Hao Jia wrote:
> In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
> needs to be updated to the current {rstatc, cgrp}->bstat, directly
> assigning values instead of adding the last value to delta.
>
> Signed-off-by: Hao Jia <[email protected]>
> ---
> kernel/cgroup/rstat.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index 9c4c55228567..3e5c4c1c92c6 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -376,14 +376,14 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
> /* propagate percpu delta to global */
> cgroup_base_stat_sub(&delta, &rstatc->last_bstat); *(1)*
> cgroup_base_stat_add(&cgrp->bstat, &delta);
> - cgroup_base_stat_add(&rstatc->last_bstat, &delta);
> + rstatc->last_bstat = rstatc->bstat; *(2)*

Some things are wrong, the value of rstatc->bstat at (1) and (2) may not
be the same, rstatc->bstat may be updated on other cpu. Sorry for the noise.

>
> /* propagate global delta to parent (unless that's root) */
> if (cgroup_parent(parent)) {
> delta = cgrp->bstat;
> cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
> cgroup_base_stat_add(&parent->bstat, &delta);
> - cgroup_base_stat_add(&cgrp->last_bstat, &delta);
> + cgrp->last_bstat = cgrp->bstat;
> }
> }
>

Maybe something like this?


In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
needs to be updated to the current {rstatc, cgrp}->bstat after the
calculation.

For the rstatc->last_bstat case, rstatc->bstat may be updated on other
cpus during our calculation, resulting in inconsistent rstatc->bstat
statistics for the two reads. So we use the temporary variable @cur to
record the read statc->bstat statistics, and use @cur to update
rstatc->last_bstat.

For the cgrp->last_bstat case, we already hold cgroup_rstat_lock, so
cgrp->bstat will not change during the calculation process, and it can
be directly used to update cgrp->last_bstat.

It is better for us to assign directly instead of using
cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat.

Signed-off-by: Hao Jia <[email protected]>
---
kernel/cgroup/rstat.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 9c4c55228567..17a6a1fcc2d4 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -360,7 +360,7 @@ static void cgroup_base_stat_flush(struct cgroup
*cgrp, int cpu)
{
struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
struct cgroup *parent = cgroup_parent(cgrp);
- struct cgroup_base_stat delta;
+ struct cgroup_base_stat delta, cur;
unsigned seq;

/* Root-level stats are sourced from system-wide CPU stats */
@@ -370,20 +370,21 @@ static void cgroup_base_stat_flush(struct cgroup
*cgrp, int cpu)
/* fetch the current per-cpu values */
do {
seq = __u64_stats_fetch_begin(&rstatc->bsync);
- delta = rstatc->bstat;
+ cur = rstatc->bstat;
} while (__u64_stats_fetch_retry(&rstatc->bsync, seq));

/* propagate percpu delta to global */
+ delta = cur;
cgroup_base_stat_sub(&delta, &rstatc->last_bstat);
cgroup_base_stat_add(&cgrp->bstat, &delta);
- cgroup_base_stat_add(&rstatc->last_bstat, &delta);
+ rstatc->last_bstat = cur;

/* propagate global delta to parent (unless that's root) */
if (cgroup_parent(parent)) {
delta = cgrp->bstat;
cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
cgroup_base_stat_add(&parent->bstat, &delta);
- cgroup_base_stat_add(&cgrp->last_bstat, &delta);
+ cgrp->last_bstat = cgrp->bstat;
}
}


2023-05-23 15:32:40

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic

Hello Jia.

On Fri, May 19, 2023 at 12:15:57PM +0800, Hao Jia <[email protected]> wrote:
> Maybe something like this?

(Next time please send with a version bump in subject.)


> In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
> needs to be updated to the current {rstatc, cgrp}->bstat after the
> calculation.
>
> For the rstatc->last_bstat case, rstatc->bstat may be updated on other
> cpus during our calculation, resulting in inconsistent rstatc->bstat
> statistics for the two reads. So we use the temporary variable @cur to
> record the read statc->bstat statistics, and use @cur to update
> rstatc->last_bstat.

If a concurrent update happens after sample of bstat was taken for
calculation, it won't be reflected in the flushed result.
But subsequent flush will use the updated bstat and the difference from
last_bstat would account for that concurrent change (and any other
changes between the flushes).

IOW flushing cannot prevent concurrent updates but it will give
eventually consistent (repeated without more updates) results.

> It is better for us to assign directly instead of using
> cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat.

Or do you mean the copying is faster then arithmetics?

Thanks,
Michal


Attachments:
(No filename) (1.26 kB)
signature.asc (235.00 B)
Download all attachments

2023-05-24 07:18:19

by Hao Jia

[permalink] [raw]
Subject: Re: [External] Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic



On 2023/5/23 Michal Koutný wrote:
> Hello Jia.
>
> On Fri, May 19, 2023 at 12:15:57PM +0800, Hao Jia <[email protected]> wrote:
>> Maybe something like this?
>
> (Next time please send with a version bump in subject.)

Thanks for your review, I will do it in the next version.

>
>
>> In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
>> needs to be updated to the current {rstatc, cgrp}->bstat after the
>> calculation.
>>
>> For the rstatc->last_bstat case, rstatc->bstat may be updated on other
>> cpus during our calculation, resulting in inconsistent rstatc->bstat
>> statistics for the two reads. So we use the temporary variable @cur to
>> record the read statc->bstat statistics, and use @cur to update
>> rstatc->last_bstat.
>
> If a concurrent update happens after sample of bstat was taken for
> calculation, it won't be reflected in the flushed result.
> But subsequent flush will use the updated bstat and the difference from
> last_bstat would account for that concurrent change (and any other
> changes between the flushes).
>
> IOW flushing cannot prevent concurrent updates but it will give
> eventually consistent (repeated without more updates) results.
>

Yes, so we need @curr to record the bstat value after the sequence fetch
is completed.


>> It is better for us to assign directly instead of using
>> cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat.
>
> Or do you mean the copying is faster then arithmetics?
>

Yes, but it may not be obvious.
Another reason is that when we complete an update, we snapshot
last_bstat as the current bstat, which is better for readers to
understand. Arithmetics is somewhat obscure.

Thanks,
Hao

2023-05-24 08:14:33

by Michal Koutný

[permalink] [raw]
Subject: Re: [External] Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic

On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <[email protected]> wrote:
> Yes, so we need @curr to record the bstat value after the sequence fetch is
> completed.

No, I still don't see a problem that it solves. If you find incorrect
data being reported, please explain it more/with an example.

> Yes, but it may not be obvious.
> Another reason is that when we complete an update, we snapshot last_bstat as
> the current bstat, which is better for readers to understand. Arithmetics is
> somewhat obscure.

The readability here is subjective. It'd be interesting to have some
data comparing arithmetics vs copying though.

HTH,
Michal


Attachments:
(No filename) (665.00 B)
signature.asc (235.00 B)
Download all attachments

2023-05-24 09:05:25

by Hao Jia

[permalink] [raw]
Subject: Re: [External] Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic



On 2023/5/24 Michal Koutný wrote:
> On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <[email protected]> wrote:
>> Yes, so we need @curr to record the bstat value after the sequence fetch is
>> completed.
>
> No, I still don't see a problem that it solves. If you find incorrect
> data being reported, please explain it more/with an example.

Sorry to confuse you.

My earliest patch is like this:

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 9c4c55228567..3e5c4c1c92c6 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -376,14 +376,14 @@ static void cgroup_base_stat_flush(struct cgroup
*cgrp, int cpu)
/* propagate percpu delta to global */
cgroup_base_stat_sub(&delta, &rstatc->last_bstat); (1) <---
cgroup_base_stat_add(&cgrp->bstat, &delta);
- cgroup_base_stat_add(&rstatc->last_bstat, &delta);
+ rstatc->last_bstat = rstatc->bstat; (2) <--

/* propagate global delta to parent (unless that's root) */
if (cgroup_parent(parent)) {
delta = cgrp->bstat;
cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
cgroup_base_stat_add(&parent->bstat, &delta);
- cgroup_base_stat_add(&cgrp->last_bstat, &delta);
+ cgrp->last_bstat = cgrp->bstat;
}
}

If I understand correctly, the rstatc->bstat at (1) and (2) may be
different. At (2) rstatc->bstat may have been updated on other CPUs.
Or we should not read rstatc->bstat directly, we should pass the
following way

do {
seq = __u64_stats_fetch_begin(&rstatc->bsync);
cur = rstatc->bstat;
} while (__u64_stats_fetch_retry(&rstatc->bsync, seq));


>
>> Yes, but it may not be obvious.
>> Another reason is that when we complete an update, we snapshot last_bstat as
>> the current bstat, which is better for readers to understand. Arithmetics is
>> somewhat obscure.
>
> The readability here is subjective. It'd be interesting to have some
> data comparing arithmetics vs copying though.

Thanks for your suggestion, I plan to use RDTSC to compare the time
consumption of arithmetics vs copying. Do you have better suggestions or
tools?

Thanks,
Hao

2023-06-12 03:38:03

by Hao Jia

[permalink] [raw]
Subject: Re: [External] Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic



On 2023/5/24 Michal Koutný wrote:
> On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <[email protected]> wrote:
>> Yes, so we need @curr to record the bstat value after the sequence fetch is
>> completed.
>
> No, I still don't see a problem that it solves. If you find incorrect
> data being reported, please explain it more/with an example.
>
>> Yes, but it may not be obvious.
>> Another reason is that when we complete an update, we snapshot last_bstat as
>> the current bstat, which is better for readers to understand. Arithmetics is
>> somewhat obscure.
>
> The readability here is subjective. It'd be interesting to have some
> data comparing arithmetics vs copying though.
>

Sorry for replying you so late. I am using RDTSC on my machine (an Intel
Xeon(R) Platinum 8260 [email protected] machine with 2 NUMA nodes each of
which has 24 cores with SMT2 enabled, so 96 CPUs in total.) to compare
the time consumption of arithmetics vs copying. There is almost no
difference in the time consumption between arithmetics and copying.



> HTH,
> Michal

2023-06-13 12:04:39

by Michal Koutný

[permalink] [raw]
Subject: Re: [External] Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic

On Mon, Jun 12, 2023 at 11:13:41AM +0800, Hao Jia <[email protected]> wrote:
> Sorry for replying you so late. I am using RDTSC on my machine (an Intel
> Xeon(R) Platinum 8260 [email protected] machine with 2 NUMA nodes each of which
> has 24 cores with SMT2 enabled, so 96 CPUs in total.) to compare the time
> consumption of arithmetics vs copying. There is almost no difference in the
> time consumption between arithmetics and copying.

Thanks for carrying out and sharing this despite not convincing towards
the change.

Michal