From: jiebin sun <[email protected]>
Remove the redundant updating of stats_flush_threshold. If the
global var stats_flush_threshold has exceeded the trigger value
for __mem_cgroup_flush_stats, further increment is unnecessary.
Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads).
Score gain: 1.95x
Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%)
CPU: ICX 8380 x 2 sockets
Core number: 40 x 2 physical cores
Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads)
Signed-off-by: Jiebin Sun <[email protected]>
---
mm/memcontrol.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index abec50f31fe6..9e8c6f24c694 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -626,7 +626,14 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
x = __this_cpu_add_return(stats_updates, abs(val));
if (x > MEMCG_CHARGE_BATCH) {
- atomic_add(x / MEMCG_CHARGE_BATCH, &stats_flush_threshold);
+ /*
+ * If stats_flush_threshold exceeds the threshold
+ * (>num_online_cpus()), cgroup stats update will be triggered
+ * in __mem_cgroup_flush_stats(). Increasing this var further
+ * is redundant and simply adds overhead in atomic update.
+ */
+ if (atomic_read(&stats_flush_threshold) <= num_online_cpus())
+ atomic_add(x / MEMCG_CHARGE_BATCH, &stats_flush_threshold);
__this_cpu_write(stats_updates, 0);
}
}
--
2.31.1
On Sat, Jul 23, 2022 at 12:49:49AM +0800, Jiebin Sun wrote:
> From: jiebin sun <[email protected]>
>
> Remove the redundant updating of stats_flush_threshold. If the
> global var stats_flush_threshold has exceeded the trigger value
> for __mem_cgroup_flush_stats, further increment is unnecessary.
>
> Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads).
>
> Score gain: 1.95x
> Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%)
>
> CPU: ICX 8380 x 2 sockets
> Core number: 40 x 2 physical cores
> Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads)
>
> Signed-off-by: Jiebin Sun <[email protected]>
Yes, this makes sense. No need to dirty a cacheline if we are already
over the threshold.
Acked-by: Shakeel Butt <[email protected]>
On Sat, Jul 23, 2022 at 12:49:49AM +0800, Jiebin Sun wrote:
> From: jiebin sun <[email protected]>
>
> Remove the redundant updating of stats_flush_threshold. If the
> global var stats_flush_threshold has exceeded the trigger value
> for __mem_cgroup_flush_stats, further increment is unnecessary.
>
> Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads).
>
> Score gain: 1.95x
> Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%)
>
> CPU: ICX 8380 x 2 sockets
> Core number: 40 x 2 physical cores
> Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads)
>
> Signed-off-by: Jiebin Sun <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Good optimization, thanks!
>
>On Sat, Jul 23, 2022 at 12:49:49AM +0800, Jiebin Sun wrote:
>> From: jiebin sun <[email protected]>
>>
>> Remove the redundant updating of stats_flush_threshold. If the global
>> var stats_flush_threshold has exceeded the trigger value for
>> __mem_cgroup_flush_stats, further increment is unnecessary.
>>
>> Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads).
>>
>> Score gain: 1.95x
>> Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%)
>>
>> CPU: ICX 8380 x 2 sockets
>> Core number: 40 x 2 physical cores
>> Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads)
>>
>> Signed-off-by: Jiebin Sun <[email protected]>
>
>Reviewed-by: Roman Gushchin <[email protected]>
>
>Good optimization, thanks!
Looks good. Nice performance improvement.
Reviewed-by: Tim Chen <[email protected]>
On Sat, Jul 23, 2022 at 12:49:49AM +0800, Jiebin Sun wrote:
> From: jiebin sun <[email protected]>
>
> Remove the redundant updating of stats_flush_threshold. If the
> global var stats_flush_threshold has exceeded the trigger value
> for __mem_cgroup_flush_stats, further increment is unnecessary.
>
> Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads).
>
> Score gain: 1.95x
> Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%)
>
> CPU: ICX 8380 x 2 sockets
> Core number: 40 x 2 physical cores
> Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads)
>
> Signed-off-by: Jiebin Sun <[email protected]>
Acked-by: Muchun Song <[email protected]>
Thanks.