LinuxLists.cc - [PATCH v2 0/4] sock: Improve condition on sockmem pressure

2023-05-22 07:12:17

Subject: [PATCH v2 0/4] sock: Improve condition on sockmem pressure

Currently the memcg's status is also accounted into the socket's
memory pressure to alleviate the memcg's memstall. But there are
still cases that can be improved. Please check the patches for
detailed info.

v2:
- Splited into several patches and modified commit log for
better readability.
- Make memcg's pressure consideration function-wide in
__sk_mem_raise_allocated().

v1: https://lore.kernel.org/netdev/[email protected]/

Abel Wu (4):
sock: Always take memcg pressure into consideration
sock: Fix misuse of sk_under_memory_pressure()
sock: Consider memcg pressure when raising sockmem
sock: Remove redundant cond of memcg pressure

include/net/sock.h | 11 +++++++----
net/core/sock.c | 32 +++++++++++++++++++++++---------
2 files changed, 30 insertions(+), 13 deletions(-)

--
2.37.3

2023-05-22 07:15:26

by Abel Wu

[permalink] [raw]

Subject: [PATCH v2 4/4] sock: Remove redundant cond of memcg pressure

Now with the preivous patch, __sk_mem_raise_allocated() considers
the memory pressure of both global and the socket's memcg on a func-
wide level, making the condition of memcg's pressure in question
redundant.

Signed-off-by: Abel Wu <[email protected]>
---
net/core/sock.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 7641d64293af..baccbb58a11a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3029,9 +3029,14 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
if (sk_has_memory_pressure(sk)) {
u64 alloc;

- if (!sk_under_memory_pressure(sk))
+ if (!sk_under_global_memory_pressure(sk))
return 1;
alloc = sk_sockets_allocated_read_positive(sk);
+ /*
+ * If under global pressure, allow the sockets that are below
+ * average memory usage to raise, trying to be fair among all
+ * the sockets under global constrains.
+ */
if (sk_prot_mem_limits(sk, 2) > alloc *
sk_mem_pages(sk->sk_wmem_queued +
atomic_read(&sk->sk_rmem_alloc) +
--
2.37.3

2023-05-22 07:15:56

by Abel Wu

[permalink] [raw]

Subject: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem

For now __sk_mem_raise_allocated() mainly considers global socket
memory pressure and allows to raise if no global pressure observed,
including the sockets whose memcgs are in pressure, which might
result in longer memcg memstall.

So take net-memcg's pressure into consideration when allocating
socket memory to alleviate long tail latencies.

Signed-off-by: Abel Wu <[email protected]>
---
net/core/sock.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 801df091e37a..7641d64293af 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2977,21 +2977,30 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
{
bool memcg_charge = mem_cgroup_sockets_enabled && sk->sk_memcg;
struct proto *prot = sk->sk_prot;
- bool charged = true;
+ bool charged = true, pressured = false;
long allocated;

sk_memory_allocated_add(sk, amt);
allocated = sk_memory_allocated(sk);
- if (memcg_charge &&
- !(charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
- gfp_memcg_charge())))
- goto suppress_allocation;
+
+ if (memcg_charge) {
+ charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
+ gfp_memcg_charge());
+ if (!charged)
+ goto suppress_allocation;
+ if (mem_cgroup_under_socket_pressure(sk->sk_memcg))
+ pressured = true;
+ }

/* Under limit. */
- if (allocated <= sk_prot_mem_limits(sk, 0)) {
+ if (allocated <= sk_prot_mem_limits(sk, 0))
sk_leave_memory_pressure(sk);
+ else
+ pressured = true;
+
+ /* No pressure observed in global/memcg. */
+ if (!pressured)
return 1;
- }

/* Under pressure. */
if (allocated > sk_prot_mem_limits(sk, 1))
--
2.37.3

2023-05-22 07:39:41

by Abel Wu

[permalink] [raw]

Subject: [PATCH v2 1/4] sock: Always take memcg pressure into consideration

The sk_under_memory_pressure() is called to check whether there is
memory pressure related to this socket. But now it ignores the net-
memcg's pressure if the proto of the socket doesn't care about the
global pressure, which may put burden on its memcg compaction or
reclaim path (also remember that socket memory is un-reclaimable).

So always check the memcg's vm status to alleviate memstalls when
it's in pressure.

Signed-off-by: Abel Wu <[email protected]>
---
include/net/sock.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8b7ed7167243..c73d9bad7ac7 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1411,14 +1411,12 @@ static inline bool sk_has_memory_pressure(const struct sock *sk)

static inline bool sk_under_memory_pressure(const struct sock *sk)
{
- if (!sk->sk_prot->memory_pressure)
- return false;
-
if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
mem_cgroup_under_socket_pressure(sk->sk_memcg))
return true;

- return !!*sk->sk_prot->memory_pressure;
+ return sk->sk_prot->memory_pressure &&
+ *sk->sk_prot->memory_pressure;
}

static inline long
--
2.37.3

2023-05-22 07:46:38

by Abel Wu

[permalink] [raw]

Subject: [PATCH v2 2/4] sock: Fix misuse of sk_under_memory_pressure()

The status of global socket memory pressure is updated when:

a) __sk_mem_raise_allocated():

enter: sk_memory_allocated(sk) > sysctl_mem[1]
leave: sk_memory_allocated(sk) <= sysctl_mem[0]

b) __sk_mem_reduce_allocated():

leave: sk_under_memory_pressure(sk) &&
sk_memory_allocated(sk) < sysctl_mem[0]

So the conditions of leaving global pressure are inconstant, which
may lead to the situation that one pressured net-memcg prevents the
global pressure from being cleared when there is indeed no global
pressure, thus the global constrains are still in effect unexpectedly
on the other sockets.

This patch fixes this by ignoring the net-memcg's pressure when
deciding whether should leave global memory pressure.

Fixes: e1aab161e013 ("socket: initial cgroup code")
Signed-off-by: Abel Wu <[email protected]>
---
include/net/sock.h | 9 +++++++--
net/core/sock.c | 2 +-
2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index c73d9bad7ac7..bf930d4db7f0 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1409,14 +1409,19 @@ static inline bool sk_has_memory_pressure(const struct sock *sk)
return sk->sk_prot->memory_pressure != NULL;
}

+static inline bool sk_under_global_memory_pressure(const struct sock *sk)
+{
+ return sk->sk_prot->memory_pressure &&
+ *sk->sk_prot->memory_pressure;
+}
+
static inline bool sk_under_memory_pressure(const struct sock *sk)
{
if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
mem_cgroup_under_socket_pressure(sk->sk_memcg))
return true;

- return sk->sk_prot->memory_pressure &&
- *sk->sk_prot->memory_pressure;
+ return sk_under_global_memory_pressure(sk);
}

static inline long
diff --git a/net/core/sock.c b/net/core/sock.c
index 5440e67bcfe3..801df091e37a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3095,7 +3095,7 @@ void __sk_mem_reduce_allocated(struct sock *sk, int amount)
if (mem_cgroup_sockets_enabled && sk->sk_memcg)
mem_cgroup_uncharge_skmem(sk->sk_memcg, amount);

- if (sk_under_memory_pressure(sk) &&
+ if (sk_under_global_memory_pressure(sk) &&
(sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)))
sk_leave_memory_pressure(sk);
}
--
2.37.3

2023-05-22 13:06:05

by Simon Horman

[permalink] [raw]

Subject: Re: [PATCH v2 4/4] sock: Remove redundant cond of memcg pressure

On Mon, May 22, 2023 at 03:01:22PM +0800, Abel Wu wrote:
> Now with the preivous patch, __sk_mem_raise_allocated() considers

nit: s/preivous/previous/

> the memory pressure of both global and the socket's memcg on a func-
> wide level, making the condition of memcg's pressure in question
> redundant.
>
> Signed-off-by: Abel Wu <[email protected]>
> ---
> net/core/sock.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 7641d64293af..baccbb58a11a 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -3029,9 +3029,14 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
> if (sk_has_memory_pressure(sk)) {
> u64 alloc;
>
> - if (!sk_under_memory_pressure(sk))
> + if (!sk_under_global_memory_pressure(sk))
> return 1;
> alloc = sk_sockets_allocated_read_positive(sk);
> + /*
> + * If under global pressure, allow the sockets that are below
> + * average memory usage to raise, trying to be fair among all
> + * the sockets under global constrains.
> + */

nit:
/* Multi-line comments in networking code
* look like this.
*/

> if (sk_prot_mem_limits(sk, 2) > alloc *
> sk_mem_pages(sk->sk_wmem_queued +
> atomic_read(&sk->sk_rmem_alloc) +
> --
> 2.37.3
>
>

2023-05-22 13:27:14

by Simon Horman

[permalink] [raw]

Subject: Re: [PATCH v2 2/4] sock: Fix misuse of sk_under_memory_pressure()

On Mon, May 22, 2023 at 03:01:20PM +0800, Abel Wu wrote:
> The status of global socket memory pressure is updated when:
>
> a) __sk_mem_raise_allocated():
>
> enter: sk_memory_allocated(sk) > sysctl_mem[1]
> leave: sk_memory_allocated(sk) <= sysctl_mem[0]
>
> b) __sk_mem_reduce_allocated():
>
> leave: sk_under_memory_pressure(sk) &&
> sk_memory_allocated(sk) < sysctl_mem[0]
>
> So the conditions of leaving global pressure are inconstant, which
> may lead to the situation that one pressured net-memcg prevents the
> global pressure from being cleared when there is indeed no global
> pressure, thus the global constrains are still in effect unexpectedly
> on the other sockets.
>
> This patch fixes this by ignoring the net-memcg's pressure when
> deciding whether should leave global memory pressure.
>
> Fixes: e1aab161e013 ("socket: initial cgroup code")

really pedantic nit:

Fixes: e1aab161e013 ("socket: initial cgroup code.")

> Signed-off-by: Abel Wu <[email protected]>

...

2023-05-22 13:29:28

by Simon Horman

[permalink] [raw]

Subject: Re: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem

2023-05-23 03:30:43

by Abel Wu

[permalink] [raw]

Subject: Re: Re: [PATCH v2 4/4] sock: Remove redundant cond of memcg pressure

Hi Simon, thanks for reviewing! I will fix the coding style issues
next version!

Thanks,
Abel

On 5/22/23 8:54 PM, Simon Horman wrote:
> On Mon, May 22, 2023 at 03:01:22PM +0800, Abel Wu wrote:
>> Now with the preivous patch, __sk_mem_raise_allocated() considers
>
> nit: s/preivous/previous/
>
>> the memory pressure of both global and the socket's memcg on a func-
>> wide level, making the condition of memcg's pressure in question
>> redundant.
>>
>> Signed-off-by: Abel Wu <[email protected]>
>> ---
>> net/core/sock.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 7641d64293af..baccbb58a11a 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -3029,9 +3029,14 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>> if (sk_has_memory_pressure(sk)) {
>> u64 alloc;
>>
>> - if (!sk_under_memory_pressure(sk))
>> + if (!sk_under_global_memory_pressure(sk))
>> return 1;
>> alloc = sk_sockets_allocated_read_positive(sk);
>> + /*
>> + * If under global pressure, allow the sockets that are below
>> + * average memory usage to raise, trying to be fair among all
>> + * the sockets under global constrains.
>> + */
>
> nit:
> /* Multi-line comments in networking code
> * look like this.
> */
>
>> if (sk_prot_mem_limits(sk, 2) > alloc *
>> sk_mem_pages(sk->sk_wmem_queued +
>> atomic_read(&sk->sk_rmem_alloc) +
>> --
>> 2.37.3
>>
>>

2023-05-25 01:41:40

by Shakeel Butt

[permalink] [raw]

Subject: Re: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem

On Mon, May 22, 2023 at 03:01:21PM +0800, Abel Wu wrote:
> For now __sk_mem_raise_allocated() mainly considers global socket
> memory pressure and allows to raise if no global pressure observed,
> including the sockets whose memcgs are in pressure, which might
> result in longer memcg memstall.
>
> So take net-memcg's pressure into consideration when allocating
> socket memory to alleviate long tail latencies.
>
> Signed-off-by: Abel Wu <[email protected]>

Hi Abel,

Have you seen any real world production issue which is fixed by this
patch or is it more of a fix after reading code?

This code is quite subtle and small changes can cause unintended
behavior changes. At the moment the tcp memory accounting and memcg
accounting is intermingled and I think we should decouple them.

> ---
> net/core/sock.c | 23 ++++++++++++++++-------
> 1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 801df091e37a..7641d64293af 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2977,21 +2977,30 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
> {
> bool memcg_charge = mem_cgroup_sockets_enabled && sk->sk_memcg;
> struct proto *prot = sk->sk_prot;
> - bool charged = true;
> + bool charged = true, pressured = false;
> long allocated;
>
> sk_memory_allocated_add(sk, amt);
> allocated = sk_memory_allocated(sk);
> - if (memcg_charge &&
> - !(charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
> - gfp_memcg_charge())))
> - goto suppress_allocation;
> +
> + if (memcg_charge) {
> + charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
> + gfp_memcg_charge());
> + if (!charged)
> + goto suppress_allocation;
> + if (mem_cgroup_under_socket_pressure(sk->sk_memcg))

The memcg under pressure callback does a upward memcg tree walk, do
please make sure you have tested the performance impact of this.

> + pressured = true;
> + }
>
> /* Under limit. */
> - if (allocated <= sk_prot_mem_limits(sk, 0)) {
> + if (allocated <= sk_prot_mem_limits(sk, 0))
> sk_leave_memory_pressure(sk);
> + else
> + pressured = true;
> +
> + /* No pressure observed in global/memcg. */
> + if (!pressured)
> return 1;
> - }
>
> /* Under pressure. */
> if (allocated > sk_prot_mem_limits(sk, 1))
> --
> 2.37.3
>

2023-05-29 12:05:03

by Abel Wu

[permalink] [raw]

Subject: Re: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem

Hi Shakeel, thanks for reviewing! And sorry for replying so late,
I was on a vocation :)

On 5/25/23 9:22 AM, Shakeel Butt wrote:
> On Mon, May 22, 2023 at 03:01:21PM +0800, Abel Wu wrote:
>> For now __sk_mem_raise_allocated() mainly considers global socket
>> memory pressure and allows to raise if no global pressure observed,
>> including the sockets whose memcgs are in pressure, which might
>> result in longer memcg memstall.
>>
>> So take net-memcg's pressure into consideration when allocating
>> socket memory to alleviate long tail latencies.
>>
>> Signed-off-by: Abel Wu <[email protected]>
>
> Hi Abel,
>
> Have you seen any real world production issue which is fixed by this
> patch or is it more of a fix after reading code?

The latter. But we do observe one common case in the production env
that p2p service, which mainly downloads container images, running
inside a container with tight memory limit can easily be throttled and
keep memstalled for a long period of time and sometimes even be OOM-
killed. This service shows burst usage of TCP memory and I think it
indeed needs suppressing sockmem allocation if memcg is already under
pressure. The memcg pressure is usually caused by too many page caches
and the dirty ones starting to be wrote back to slow backends. So it
is insane to continuously receive net data to consume more memory.

>
> This code is quite subtle and small changes can cause unintended
> behavior changes. At the moment the tcp memory accounting and memcg
> accounting is intermingled and I think we should decouple them.

My original intention to post this patchset is to clarify that:

- proto pressure only considers sysctl_mem[] (patch 2)
- memcg pressure only indicates the pressure inside itself
- consider both whenever needs allocation or reclaim (patch 1,3)

In this way, the two kinds of pressure maintain purer semantics, and
socket core can react on both of them properly and consistently.

>
>> ---
>> net/core/sock.c | 23 ++++++++++++++++-------
>> 1 file changed, 16 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 801df091e37a..7641d64293af 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2977,21 +2977,30 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>> {
>> bool memcg_charge = mem_cgroup_sockets_enabled && sk->sk_memcg;
>> struct proto *prot = sk->sk_prot;
>> - bool charged = true;
>> + bool charged = true, pressured = false;
>> long allocated;
>>
>> sk_memory_allocated_add(sk, amt);
>> allocated = sk_memory_allocated(sk);
>> - if (memcg_charge &&
>> - !(charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
>> - gfp_memcg_charge())))
>> - goto suppress_allocation;
>> +
>> + if (memcg_charge) {
>> + charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
>> + gfp_memcg_charge());
>> + if (!charged)
>> + goto suppress_allocation;
>> + if (mem_cgroup_under_socket_pressure(sk->sk_memcg))
>
> The memcg under pressure callback does a upward memcg tree walk, do
> please make sure you have tested the performance impact of this.

Yes, I have tested several benchmarks on a dual socket machine modeled
Intel Xeon(R) Platinum 8260 with SNC disabled, that is 2 NUMA nodes each
of which has 24C/48T. All the benchmarks are done inside a separate
cgroup in a clean host. Below shows the result of tbench4 and netperf:

tbench4 Throughput (misleading but traditional)
baseline patchset
Hmean 1 377.62 ( 0.00%) 375.06 * -0.68%*
Hmean 2 753.99 ( 0.00%) 753.21 * -0.10%*
Hmean 4 1503.50 ( 0.00%) 1493.07 * -0.69%*
Hmean 8 2941.43 ( 0.00%) 2925.18 * -0.55%*
Hmean 16 5637.59 ( 0.00%) 5603.64 * -0.60%*
Hmean 32 9042.90 ( 0.00%) 9022.53 * -0.23%*
Hmean 64 10530.55 ( 0.00%) 10554.89 * 0.23%*
Hmean 128 24230.20 ( 0.00%) 24424.74 * 0.80%*
Hmean 256 23798.21 ( 0.00%) 23941.24 * 0.60%*
Hmean 384 23620.63 ( 0.00%) 23569.54 * -0.22%*

netperf-udp
baseline patchset
Hmean send-64 281.99 ( 0.00%) 274.50 * -2.65%*
Hmean send-128 556.70 ( 0.00%) 545.82 * -1.96%*
Hmean send-256 1102.60 ( 0.00%) 1091.21 * -1.03%*
Hmean send-1024 4180.48 ( 0.00%) 4073.87 * -2.55%*
Hmean send-2048 7837.61 ( 0.00%) 7707.12 * -1.66%*
Hmean send-3312 12157.49 ( 0.00%) 11845.03 * -2.57%*
Hmean send-4096 14512.64 ( 0.00%) 14156.45 * -2.45%*
Hmean send-8192 24015.40 ( 0.00%) 23920.94 ( -0.39%)
Hmean send-16384 39875.21 ( 0.00%) 39696.67 ( -0.45%)
Hmean recv-64 281.99 ( 0.00%) 274.50 * -2.65%*
Hmean recv-128 556.70 ( 0.00%) 545.82 * -1.96%*
Hmean recv-256 1102.60 ( 0.00%) 1091.21 * -1.03%*
Hmean recv-1024 4180.48 ( 0.00%) 4073.76 * -2.55%*
Hmean recv-2048 7837.61 ( 0.00%) 7707.11 * -1.67%*
Hmean recv-3312 12157.49 ( 0.00%) 11845.03 * -2.57%*
Hmean recv-4096 14512.62 ( 0.00%) 14156.45 * -2.45%*
Hmean recv-8192 24015.29 ( 0.00%) 23920.88 ( -0.39%)
Hmean recv-16384 39873.93 ( 0.00%) 39696.02 ( -0.45%)

netperf-tcp
baseline patchset
Hmean 64 1777.05 ( 0.00%) 1793.04 ( 0.90%)
Hmean 128 3364.25 ( 0.00%) 3451.05 * 2.58%*
Hmean 256 6309.21 ( 0.00%) 6506.84 * 3.13%*
Hmean 1024 19571.52 ( 0.00%) 19606.65 ( 0.18%)
Hmean 2048 26467.00 ( 0.00%) 26658.12 ( 0.72%)
Hmean 3312 31312.36 ( 0.00%) 31403.54 ( 0.29%)
Hmean 4096 33263.37 ( 0.00%) 33278.77 ( 0.05%)
Hmean 8192 39961.82 ( 0.00%) 40149.77 ( 0.47%)
Hmean 16384 46065.33 ( 0.00%) 46683.67 ( 1.34%)

Except slight regression in netperf-udp, no obvious performance win
or loss. But as you reminded me of the cost of hierarchical behavior,
I re-tested the cases in a 5-level depth cgroup (originally 2-level),
and the results are:

tbench4 Throughput (misleading but traditional)
baseline patchset
Hmean 1 361.93 ( 0.00%) 367.58 * 1.56%*
Hmean 2 734.39 ( 0.00%) 730.33 * -0.55%*
Hmean 4 1426.82 ( 0.00%) 1440.81 * 0.98%*
Hmean 8 2848.86 ( 0.00%) 2860.87 * 0.42%*
Hmean 16 5436.72 ( 0.00%) 5491.72 * 1.01%*
Hmean 32 8743.34 ( 0.00%) 8913.27 * 1.94%*
Hmean 64 10345.41 ( 0.00%) 10436.92 * 0.88%*
Hmean 128 23390.36 ( 0.00%) 23353.09 * -0.16%*
Hmean 256 23823.20 ( 0.00%) 23509.79 * -1.32%*
Hmean 384 23268.09 ( 0.00%) 23178.10 * -0.39%*

netperf-udp
baseline patchset
Hmean send-64 278.31 ( 0.00%) 275.68 * -0.94%*
Hmean send-128 554.52 ( 0.00%) 547.46 ( -1.27%)
Hmean send-256 1106.64 ( 0.00%) 1103.01 ( -0.33%)
Hmean send-1024 4135.84 ( 0.00%) 4057.47 * -1.89%*
Hmean send-2048 7816.13 ( 0.00%) 7732.71 * -1.07%*
Hmean send-3312 12068.32 ( 0.00%) 11895.94 * -1.43%*
Hmean send-4096 14358.02 ( 0.00%) 14304.06 ( -0.38%)
Hmean send-8192 24041.57 ( 0.00%) 24061.70 ( 0.08%)
Hmean send-16384 39996.09 ( 0.00%) 39936.08 ( -0.15%)
Hmean recv-64 278.31 ( 0.00%) 275.68 * -0.94%*
Hmean recv-128 554.52 ( 0.00%) 547.46 ( -1.27%)
Hmean recv-256 1106.64 ( 0.00%) 1103.01 ( -0.33%)
Hmean recv-1024 4135.84 ( 0.00%) 4057.47 * -1.89%*
Hmean recv-2048 7816.13 ( 0.00%) 7732.71 * -1.07%*
Hmean recv-3312 12068.32 ( 0.00%) 11895.94 * -1.43%*
Hmean recv-4096 14357.99 ( 0.00%) 14304.04 ( -0.38%)
Hmean recv-8192 24041.43 ( 0.00%) 24061.58 ( 0.08%)
Hmean recv-16384 39995.72 ( 0.00%) 39935.68 ( -0.15%)

netperf-tcp
baseline patchset
Hmean 64 1779.93 ( 0.00%) 1784.75 ( 0.27%)
Hmean 128 3380.32 ( 0.00%) 3424.14 ( 1.30%)
Hmean 256 6383.37 ( 0.00%) 6504.97 * 1.90%*
Hmean 1024 19345.07 ( 0.00%) 19604.06 * 1.34%*
Hmean 2048 26547.60 ( 0.00%) 26743.94 * 0.74%*
Hmean 3312 30948.40 ( 0.00%) 31419.11 * 1.52%*
Hmean 4096 32888.83 ( 0.00%) 33125.01 * 0.72%*
Hmean 8192 40020.38 ( 0.00%) 39949.53 ( -0.18%)
Hmean 16384 46084.48 ( 0.00%) 46300.43 ( 0.47%)

Still no obvious difference, and even the udp regression is reduced.

Thanks & Best,
Abel

2023-05-29 21:47:12

by Shakeel Butt

[permalink] [raw]

Subject: Re: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem

+linux-mm and cgroups

On Mon, May 29, 2023 at 07:58:45PM +0800, Abel Wu wrote:
> Hi Shakeel, thanks for reviewing! And sorry for replying so late,
> I was on a vocation :)
>
> On 5/25/23 9:22 AM, Shakeel Butt wrote:
> > On Mon, May 22, 2023 at 03:01:21PM +0800, Abel Wu wrote:
> > > For now __sk_mem_raise_allocated() mainly considers global socket
> > > memory pressure and allows to raise if no global pressure observed,
> > > including the sockets whose memcgs are in pressure, which might
> > > result in longer memcg memstall.
> > >
> > > So take net-memcg's pressure into consideration when allocating
> > > socket memory to alleviate long tail latencies.
> > >
> > > Signed-off-by: Abel Wu <[email protected]>
> >
> > Hi Abel,
> >
> > Have you seen any real world production issue which is fixed by this
> > patch or is it more of a fix after reading code?
>
> The latter. But we do observe one common case in the production env
> that p2p service, which mainly downloads container images, running
> inside a container with tight memory limit can easily be throttled and
> keep memstalled for a long period of time and sometimes even be OOM-
> killed. This service shows burst usage of TCP memory and I think it
> indeed needs suppressing sockmem allocation if memcg is already under
> pressure. The memcg pressure is usually caused by too many page caches
> and the dirty ones starting to be wrote back to slow backends. So it
> is insane to continuously receive net data to consume more memory.
>

We actually made an intentional decision to not throttle the incoming
traffic under memory pressure. See 720ca52bcef22 ("net-memcg: avoid
stalls when under memory pressure"). If you think the throttling
behavior is preferred for your application, please propose the patch
separately and we can work on how to enable flexible policy here.

> >
> > This code is quite subtle and small changes can cause unintended
> > behavior changes. At the moment the tcp memory accounting and memcg
> > accounting is intermingled and I think we should decouple them.
>
> My original intention to post this patchset is to clarify that:
>
> - proto pressure only considers sysctl_mem[] (patch 2)
> - memcg pressure only indicates the pressure inside itself
> - consider both whenever needs allocation or reclaim (patch 1,3)
>
> In this way, the two kinds of pressure maintain purer semantics, and
> socket core can react on both of them properly and consistently.

Can you please resend you patch series (without patch 3) and Cc to
linux-mm, cgroups list and memcg maintainers as well?

2023-05-30 10:06:30

by Abel Wu

[permalink] [raw]

Subject: Re: Re: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem

On 5/30/23 5:12 AM, Shakeel Butt wrote:
>
> +linux-mm and cgroups
>
> On Mon, May 29, 2023 at 07:58:45PM +0800, Abel Wu wrote:
>> Hi Shakeel, thanks for reviewing! And sorry for replying so late,
>> I was on a vocation :)
>>
>> On 5/25/23 9:22 AM, Shakeel Butt wrote:
>>> On Mon, May 22, 2023 at 03:01:21PM +0800, Abel Wu wrote:
>>>> For now __sk_mem_raise_allocated() mainly considers global socket
>>>> memory pressure and allows to raise if no global pressure observed,
>>>> including the sockets whose memcgs are in pressure, which might
>>>> result in longer memcg memstall.
>>>>
>>>> So take net-memcg's pressure into consideration when allocating
>>>> socket memory to alleviate long tail latencies.
>>>>
>>>> Signed-off-by: Abel Wu <[email protected]>
>>>
>>> Hi Abel,
>>>
>>> Have you seen any real world production issue which is fixed by this
>>> patch or is it more of a fix after reading code?
>>
>> The latter. But we do observe one common case in the production env
>> that p2p service, which mainly downloads container images, running
>> inside a container with tight memory limit can easily be throttled and
>> keep memstalled for a long period of time and sometimes even be OOM-
>> killed. This service shows burst usage of TCP memory and I think it
>> indeed needs suppressing sockmem allocation if memcg is already under
>> pressure. The memcg pressure is usually caused by too many page caches
>> and the dirty ones starting to be wrote back to slow backends. So it
>> is insane to continuously receive net data to consume more memory.
>>
>
> We actually made an intentional decision to not throttle the incoming
> traffic under memory pressure. See 720ca52bcef22 ("net-memcg: avoid
> stalls when under memory pressure"). If you think the throttling
> behavior is preferred for your application, please propose the patch
> separately and we can work on how to enable flexible policy here.

Ah I see. Thanks for providing the context. So suppressing the alloc
under memcg pressure could further keep senders waiting if SACKed segs
get dropped from the OFO queue.

>
>>>
>>> This code is quite subtle and small changes can cause unintended
>>> behavior changes. At the moment the tcp memory accounting and memcg
>>> accounting is intermingled and I think we should decouple them.
>>
>> My original intention to post this patchset is to clarify that:
>>
>> - proto pressure only considers sysctl_mem[] (patch 2)
>> - memcg pressure only indicates the pressure inside itself
>> - consider both whenever needs allocation or reclaim (patch 1,3)
>>
>> In this way, the two kinds of pressure maintain purer semantics, and
>> socket core can react on both of them properly and consistently.
>
> Can you please resend you patch series (without patch 3) and Cc to
> linux-mm, cgroups list and memcg maintainers as well?

Yeah, absolutely.

Thanks,
Abel