If a TCP socket is allocated in IRQ context or cloned from unassociated
(i.e. not associated to a memcg) in IRQ context then it will remain
unassociated for its whole life. Almost half of the TCPs created on the
system are created in IRQ context, so, memory used by suck sockets will
not be accounted by the memcg.
This issue is more widespread in cgroup v1 where network memory
accounting is opt-in but it can happen in cgroup v2 if the source socket
for the cloning was created in root memcg.
To fix the issue, just do the late association of the unassociated
sockets at accept() time in the process context and then force charge
the memory buffer already reserved by the socket.
Signed-off-by: Shakeel Butt <[email protected]>
---
net/ipv4/inet_connection_sock.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index a4db79b1b643..df9c8ef024a2 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -482,6 +482,13 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
}
spin_unlock_bh(&queue->fastopenq.lock);
}
+
+ if (mem_cgroup_sockets_enabled && !newsk->sk_memcg) {
+ mem_cgroup_sk_alloc(newsk);
+ if (newsk->sk_memcg)
+ mem_cgroup_charge_skmem(newsk->sk_memcg,
+ sk_mem_pages(newsk->sk_forward_alloc));
+ }
out:
release_sock(sk);
if (req)
--
2.25.0.265.gbab2e86ba0-goog
On Fri, Feb 21, 2020 at 05:04:56PM -0800, Shakeel Butt wrote:
> If a TCP socket is allocated in IRQ context or cloned from unassociated
> (i.e. not associated to a memcg) in IRQ context then it will remain
> unassociated for its whole life. Almost half of the TCPs created on the
> system are created in IRQ context, so, memory used by suck sockets will
> not be accounted by the memcg.
>
> This issue is more widespread in cgroup v1 where network memory
> accounting is opt-in but it can happen in cgroup v2 if the source socket
> for the cloning was created in root memcg.
>
> To fix the issue, just do the late association of the unassociated
> sockets at accept() time in the process context and then force charge
> the memory buffer already reserved by the socket.
>
> Signed-off-by: Shakeel Butt <[email protected]>
Hello, Shakeel!
> ---
> net/ipv4/inet_connection_sock.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index a4db79b1b643..df9c8ef024a2 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -482,6 +482,13 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
> }
> spin_unlock_bh(&queue->fastopenq.lock);
> }
> +
> + if (mem_cgroup_sockets_enabled && !newsk->sk_memcg) {
> + mem_cgroup_sk_alloc(newsk);
> + if (newsk->sk_memcg)
> + mem_cgroup_charge_skmem(newsk->sk_memcg,
> + sk_mem_pages(newsk->sk_forward_alloc));
> + }
Looks good for me from the memcg side. Let's see what networking people will say...
Btw, do you plan to make a separate patch for associating the socket with the default
cgroup on the unified hierarchy? I mean cgroup_sk_alloc().
Thank you for working on it!
Roman
On Fri, Feb 21, 2020 at 5:49 PM Roman Gushchin <[email protected]> wrote:
>
> On Fri, Feb 21, 2020 at 05:04:56PM -0800, Shakeel Butt wrote:
> > If a TCP socket is allocated in IRQ context or cloned from unassociated
> > (i.e. not associated to a memcg) in IRQ context then it will remain
> > unassociated for its whole life. Almost half of the TCPs created on the
> > system are created in IRQ context, so, memory used by suck sockets will
> > not be accounted by the memcg.
> >
> > This issue is more widespread in cgroup v1 where network memory
> > accounting is opt-in but it can happen in cgroup v2 if the source socket
> > for the cloning was created in root memcg.
> >
> > To fix the issue, just do the late association of the unassociated
> > sockets at accept() time in the process context and then force charge
> > the memory buffer already reserved by the socket.
> >
> > Signed-off-by: Shakeel Butt <[email protected]>
>
> Hello, Shakeel!
>
> > ---
> > net/ipv4/inet_connection_sock.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index a4db79b1b643..df9c8ef024a2 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -482,6 +482,13 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
> > }
> > spin_unlock_bh(&queue->fastopenq.lock);
> > }
> > +
> > + if (mem_cgroup_sockets_enabled && !newsk->sk_memcg) {
> > + mem_cgroup_sk_alloc(newsk);
> > + if (newsk->sk_memcg)
> > + mem_cgroup_charge_skmem(newsk->sk_memcg,
> > + sk_mem_pages(newsk->sk_forward_alloc));
> > + }
>
> Looks good for me from the memcg side. Let's see what networking people will say...
>
> Btw, do you plan to make a separate patch for associating the socket with the default
> cgroup on the unified hierarchy? I mean cgroup_sk_alloc().
>
Yes. I tried to do that here but was not able to do without adding the
(newsk->sk_cgrp_data.val) check which I can not do in this file as
sk_cgrp_data might not be compiled. I will send a separate patch.
Shakeel
On Fri, Feb 21, 2020 at 5:05 PM Shakeel Butt <[email protected]> wrote:
>
> If a TCP socket is allocated in IRQ context or cloned from unassociated
> (i.e. not associated to a memcg) in IRQ context then it will remain
> unassociated for its whole life. Almost half of the TCPs created on the
> system are created in IRQ context, so, memory used by suck sockets will
> not be accounted by the memcg.
>
> This issue is more widespread in cgroup v1 where network memory
> accounting is opt-in but it can happen in cgroup v2 if the source socket
> for the cloning was created in root memcg.
>
> To fix the issue, just do the late association of the unassociated
> sockets at accept() time in the process context and then force charge
> the memory buffer already reserved by the socket.
>
> Signed-off-by: Shakeel Butt <[email protected]>
> ---
> net/ipv4/inet_connection_sock.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index a4db79b1b643..df9c8ef024a2 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -482,6 +482,13 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
> }
> spin_unlock_bh(&queue->fastopenq.lock);
> }
> +
> + if (mem_cgroup_sockets_enabled && !newsk->sk_memcg) {
> + mem_cgroup_sk_alloc(newsk);
> + if (newsk->sk_memcg)
> + mem_cgroup_charge_skmem(newsk->sk_memcg,
> + sk_mem_pages(newsk->sk_forward_alloc));
I am not sure what you are trying to do here.
sk->sk_forward_alloc is not the total amount of memory used by a TCP socket.
It is only some part that has been reserved, but not yet consumed.
For example, every skb that has been stored in TCP receive queue or
out-of-order queue might have
used memory.
I guess that if we assume that a not yet accepted socket can not have
any outstanding data in its transmit queue,
you need to use sk->sk_rmem_alloc as well.
To test this patch, make sure to add a delay before accept(), so that
2MB worth of data can be queued before accept() happens.
Thanks.
On Sun, Feb 23, 2020 at 11:29 PM Eric Dumazet <[email protected]> wrote:
>
> On Fri, Feb 21, 2020 at 5:05 PM Shakeel Butt <[email protected]> wrote:
> >
> > If a TCP socket is allocated in IRQ context or cloned from unassociated
> > (i.e. not associated to a memcg) in IRQ context then it will remain
> > unassociated for its whole life. Almost half of the TCPs created on the
> > system are created in IRQ context, so, memory used by suck sockets will
> > not be accounted by the memcg.
> >
> > This issue is more widespread in cgroup v1 where network memory
> > accounting is opt-in but it can happen in cgroup v2 if the source socket
> > for the cloning was created in root memcg.
> >
> > To fix the issue, just do the late association of the unassociated
> > sockets at accept() time in the process context and then force charge
> > the memory buffer already reserved by the socket.
> >
> > Signed-off-by: Shakeel Butt <[email protected]>
> > ---
> > net/ipv4/inet_connection_sock.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index a4db79b1b643..df9c8ef024a2 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -482,6 +482,13 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
> > }
> > spin_unlock_bh(&queue->fastopenq.lock);
> > }
> > +
> > + if (mem_cgroup_sockets_enabled && !newsk->sk_memcg) {
> > + mem_cgroup_sk_alloc(newsk);
> > + if (newsk->sk_memcg)
> > + mem_cgroup_charge_skmem(newsk->sk_memcg,
> > + sk_mem_pages(newsk->sk_forward_alloc));
>
> I am not sure what you are trying to do here.
>
> sk->sk_forward_alloc is not the total amount of memory used by a TCP socket.
> It is only some part that has been reserved, but not yet consumed.
>
> For example, every skb that has been stored in TCP receive queue or
> out-of-order queue might have
> used memory.
>
> I guess that if we assume that a not yet accepted socket can not have
> any outstanding data in its transmit queue,
> you need to use sk->sk_rmem_alloc as well.
Thanks a lot. I will add that with a comment. BTW for my knowledge
which field represents the transmit queue size?
>
> To test this patch, make sure to add a delay before accept(), so that
> 2MB worth of data can be queued before accept() happens.
Yes, I will test this with a delay.
thanks,
Shakeel