Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp5248300ybf; Wed, 4 Mar 2020 20:39:20 -0800 (PST) X-Google-Smtp-Source: ADFU+vtLppfsgvoFyaeeR4ct9Nf1/Wk5pAetfz7ChlwGo4NMcNWDzfeq3mk6CkBzQuACJc83xxa9 X-Received: by 2002:a05:6830:2361:: with SMTP id r1mr4995877oth.88.1583383160602; Wed, 04 Mar 2020 20:39:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583383160; cv=none; d=google.com; s=arc-20160816; b=E1AXSw2VJKwJ+RytOq3wsETbZRGjqc9rPsdIE6y4CTsQq92BRqgm7X3+tRA3c1HSpE 8l4jccLnSoa/o6SGxu2dKPx7S2MpgNwJq9s5dBJ9Dqxyomg1RmI0J9DYXhzDuaJHEdVM miXCPHrhQ3yvBa20aZAf5etj1tm3N0CkZ2TCXuTMFx6O60+3D5BAfPGCcVOfvkhWQGK9 gj+jft5D4GVBz3NDSZgpUE4XZrtWXW8jcT1QTlzg8GzXfOUMsJF9/XMbUvPUTySLvxPq uSHo9CVXz98Q90ZUBBfTTHApcJsM6lAKOEBlcSekQttbj/Hrn//jkO3B4nEvBHeHRE/e Wq+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=wT7fX3XNpFUklA+4ubZ3A31ow/x3fc9U86BQ8duNKxk=; b=pZNnW6VU5JIVhpAFmYAkPx82QHbc4EQhjuH7uygaw6fSgZXDuUPBQvJaAiMASi3JpD omMG701VFxsFNwLF24dE6cpUkFC/Ecp9pWDWJ+3hg+VwxWoUdbgsY+BXHqQ+GuRXHUL4 WpRy5xXpSkFUE/QFGPWsEyG5iDk8jipoYeYhxpnPITmP0NKi7i6BvIDPEuSlrIbbE0aG rFB1UCSEhLPw4OLcD2zLb+9Jf7FWEbKIjXfqN7UCdiRO/eYHBZGXO8NTrDv7qDiHrsqw i7jYD0FKV8RJjI8I8qqsyHBt0slvOBZVVVkqtA3Kv720IKrYRDxFlJHjqQ067c7TaSu9 vSQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=mPnvGwwI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j2si2438950otr.255.2020.03.04.20.39.08; Wed, 04 Mar 2020 20:39:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=mPnvGwwI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725975AbgCEEiJ (ORCPT + 99 others); Wed, 4 Mar 2020 23:38:09 -0500 Received: from mail-vs1-f68.google.com ([209.85.217.68]:41174 "EHLO mail-vs1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbgCEEiJ (ORCPT ); Wed, 4 Mar 2020 23:38:09 -0500 Received: by mail-vs1-f68.google.com with SMTP id k188so2733875vsc.8 for ; Wed, 04 Mar 2020 20:38:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wT7fX3XNpFUklA+4ubZ3A31ow/x3fc9U86BQ8duNKxk=; b=mPnvGwwIyc8PT9pu6q66/Xy/ag6QPMQHDyXJlCIiDTmTeYgipFoyhF1mRpzK0vT54c j7mVmd5I9pjGiM0Kgj4rIDhSIJ1vCE35gVKV5R9sD0dqxRnUnN479aGCHq4R0Ys/6QyJ AjH5TUwEl45CjFaDUVfjZT20unJjwtiCN9dz0pvaeKRUn0xPvQnxO9qHz9fhctfmKFvL ZfwXszJ3SCcT1z4FjJe6cdD3M1AQQoMtZmRVjvEPMhLufWXFeZsodXlZecriYJomGOdO AyBlIXzJbV3MTnDG0/ULrakFCgJdCH61wd8ssnq08gHeSwTXFPy86ENlrDxluvER3aZk eAZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wT7fX3XNpFUklA+4ubZ3A31ow/x3fc9U86BQ8duNKxk=; b=ZyQNSUkySXoXCk2Xm+HsdU5FmR1NeI161EkbTxO+gzGamBIju/9mwV2eF1pZdSJse+ xHMA+PT09Z7yoOh5zb5jfOaXZ4m61+9hORBw8T0zuQPfCeRnU5wLuE/owyhg0Rzy1bLV b/LlJ4WLpC5jJkv7x+zLHmyj642R7w5AG/WVFqI8fPrnP90QwTC/3B0Lx0JeQBzUA2WG WSyTXjpS5/ruViE4ceY29WbUkV8+yIb29b4L9wRo+CwyLTm3W8qX36UzwwzPANj1PRFx ALUuvpjMTFRrJFZR5vyZNGAVe8G7xtXWKFplEKswOBt+hXcMJEsxv6UF8ZUkpeGvsghK L+Og== X-Gm-Message-State: ANhLgQ0iKsnddJNqW1Yl/EPAd7PMcb55XnSNHaojS0dceNgipVZ0jAY8 q32en4B2qGo54xuBQxkcsq88tVxcYB2fntkv1nXHOg== X-Received: by 2002:a67:ea84:: with SMTP id f4mr4016914vso.218.1583383085993; Wed, 04 Mar 2020 20:38:05 -0800 (PST) MIME-Version: 1.0 References: <20200304233856.257891-1-shakeelb@google.com> In-Reply-To: From: Eric Dumazet Date: Wed, 4 Mar 2020 20:37:53 -0800 Message-ID: Subject: Re: [PATCH v2] net: memcg: late association of sock to memcg To: Shakeel Butt Cc: Roman Gushchin , Johannes Weiner , Michal Hocko , Andrew Morton , "David S . Miller" , Alexey Kuznetsov , Hideaki YOSHIFUJI , netdev , linux-mm , Cgroups , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 4, 2020 at 6:19 PM Shakeel Butt wrote: > > On Wed, Mar 4, 2020 at 5:36 PM Eric Dumazet wrote: > > > > On Wed, Mar 4, 2020 at 3:39 PM Shakeel Butt wrote: > > > > > > If a TCP socket is allocated in IRQ context or cloned from unassociated > > > (i.e. not associated to a memcg) in IRQ context then it will remain > > > unassociated for its whole life. Almost half of the TCPs created on the > > > system are created in IRQ context, so, memory used by such sockets will > > > not be accounted by the memcg. > > > > > > This issue is more widespread in cgroup v1 where network memory > > > accounting is opt-in but it can happen in cgroup v2 if the source socket > > > for the cloning was created in root memcg. > > > > > > To fix the issue, just do the late association of the unassociated > > > sockets at accept() time in the process context and then force charge > > > the memory buffer already reserved by the socket. > > > > > > Signed-off-by: Shakeel Butt > > > --- > > > Changes since v1: > > > - added sk->sk_rmem_alloc to initial charging. > > > - added synchronization to get memory usage and set sk_memcg race-free. > > > > > > net/ipv4/inet_connection_sock.c | 19 +++++++++++++++++++ > > > 1 file changed, 19 insertions(+) > > > > > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > > > index a4db79b1b643..7bcd657cd45e 100644 > > > --- a/net/ipv4/inet_connection_sock.c > > > +++ b/net/ipv4/inet_connection_sock.c > > > @@ -482,6 +482,25 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern) > > > } > > > spin_unlock_bh(&queue->fastopenq.lock); > > > } > > > + > > > + if (mem_cgroup_sockets_enabled && !newsk->sk_memcg) { > > > + int amt; > > > + > > > + /* atomically get the memory usage and set sk->sk_memcg. */ > > > + lock_sock(newsk); > > > + > > > + /* The sk has not been accepted yet, no need to look at > > > + * sk->sk_wmem_queued. > > > + */ > > > + amt = sk_mem_pages(newsk->sk_forward_alloc + > > > + atomic_read(&sk->sk_rmem_alloc)); > > > + mem_cgroup_sk_alloc(newsk); > > > + > > > + release_sock(newsk); > > > + > > > + if (newsk->sk_memcg) > > > > Most sockets in accept queue should have amt == 0, so maybe avoid > > calling this thing only when amt == 0 ? > > > > Thanks, will do in the next version. BTW I have tested with adding > mdelay() here and running iperf3 and I did see non-zero amt. > > > Also I would release_sock(newsk) after this, otherwise incoming > > packets could mess with newsk->sk_forward_alloc > > > > I think that is fine. Once sk->sk_memcg is set then > mem_cgroup_charge_skmem() will be called for new incoming packets. > Here we just need to call mem_cgroup_charge_skmem() with amt before > sk->sk_memcg was set. Unfortunately, as soon as release_sock(newsk) is done, incoming packets can be fed to the socket, and completely change memory usage of the socket. For example, the whole queue might have been zapped, or collapsed, if we receive a RST packet, or if memory pressure asks us to prune the out of order queue. So you might charge something, then never uncharge it, since at close() time the socket will have zero bytes to uncharge. > > > if (amt && newsk->sk_memcg) > > mem_cgroup_charge_skmem(newsk->sk_memcg, amt); > > release_sock(newsk); > > > > Also, I wonder if mem_cgroup_charge_skmem() has been used at all > > these last four years > > on arches with PAGE_SIZE != 4096 > > > > ( SK_MEM_QUANTUM is not anymore PAGE_SIZE, but 4096) > > > > Oh so sk_mem_pages() does not really give the number of pages. Yeah > this needs a fix for non-4906 page size architectures. Though I can > understand why this has not been caught yet. Network memory accounting > is opt-in in cgroup v1 and most of the users still use v1. In cgroup > v2, it is enabled and there is no way to opt-out. Facebook is a > well-known v2 user and it seems like they don't have non-4096 page > size arch systems.