Received: by 10.223.176.5 with SMTP id f5csp56700wra; Thu, 1 Feb 2018 15:29:52 -0800 (PST) X-Google-Smtp-Source: AH8x225BZWMaalDYlLhNVscS8SuscqJqqSjzP46i9+kG8QjGNamzK+hrnL6eFerX7dcf09E13Si8 X-Received: by 10.101.76.14 with SMTP id u14mr29507681pgq.363.1517527792423; Thu, 01 Feb 2018 15:29:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517527792; cv=none; d=google.com; s=arc-20160816; b=i2zMEedp5U0ERVMuCdLr2YkHWTOowVlEdYFm20OtNo/Cb4y31wB5fiq/uM0PmuQn57 /a2Oi4I9qugrG+uR9+eLtLX8zH3JaEOCXmXmVTy0lWdzVE+f0GbupZp1qyfQn7Ena05+ c39CHScHgkg2ByDMwQ9ZUknUfzYXOnMOaJlokz44FttskE5T2HUMed8SWNc/UbwQhCiD AAw/pHABInRvY3OvxnHB2SJw6la0OrH4o3nrK2cyHFtqBKGgNfiaN52qm2CweaNqNNcW zCayCtOt1QZb1YBQ93DeRUB7ob2M3pdSmyblQteI9neLQSglybQjianZgoVB1A/d2M/a NJEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=Mjvc6rGErRF0hfCBjKfPGGXHotfZGUX9iedRzqhIM7A=; b=qjJHCP/y184j2HKzntSoHdE4ds5e79YDOhEwr2Aki2GvxYrCI03KFhgQaSAYMfQ8n0 p3GIvz6kxCaQNUNNUeHneA3TNCPQOBk1yTY+hAqv5uapv6/MNF1IEkgakk7cssZv0DGm 1b2EIWIGC3bmFB8GYFQSyTS814YSzJZA9b3eWwCUPdew2dJcg9BMB6eVFVUcPEf25GNw RQc+hFR1RusLDdcnPm2Anbqz3VWDcAcSuCji8qhpqapzfhkKuQsJjm1tEh2FLqNsT+5l b7h/4YCaBPh2uySnKRbg07KOj1pz4Xh6poLF2+CH4zDgXjK6R+5SROdTLvZHvLdFzAN8 emvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bKBIsX+w; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z17-v6si532860plo.794.2018.02.01.15.29.37; Thu, 01 Feb 2018 15:29:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bKBIsX+w; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751759AbeBAX1Y (ORCPT + 99 others); Thu, 1 Feb 2018 18:27:24 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:41021 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751638AbeBAX1R (ORCPT ); Thu, 1 Feb 2018 18:27:17 -0500 Received: by mail-wr0-f193.google.com with SMTP id v15so20806548wrb.8 for ; Thu, 01 Feb 2018 15:27:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Mjvc6rGErRF0hfCBjKfPGGXHotfZGUX9iedRzqhIM7A=; b=bKBIsX+wt2z+kfCKU/SZyQrjF7oVBnM8Avw3Gu6DN0vfm4PpSc7f5lt4Y/TzmhjAdl 7g7CSudp1fIQDw36qARCBctubQ0s+KEL3FkUnoPkb6t/3RzysGJduPi7MIyjcbG8/YWZ djXqR8komS9/Cf3AsSJjTUT8xU5hSJNBmOmBbXhlZUmT8+uIg/k+097NmAxzohPGQFPO BZ84rYoql/U+9TH69zKhScN/LO+/QtTNdE3fcsEJG456wP1E+te7Rv8RK8rpfxvWdOVY 6ChSjaNVlVbUk5fEFWDKITJQrXfOtWNnZmuxsIPnhVTlCRAl8bKsnUpxELAZ2H49M02G b5mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Mjvc6rGErRF0hfCBjKfPGGXHotfZGUX9iedRzqhIM7A=; b=dBa19Td/z/Xr3IU+PrtUqQbyvhj8Oq+hE9Bi000su9JCh9UHr7uarOLU7fyYJA3Ibn PQvc1euVSpgJABiA6EwCEilbiemxVcLKscihX0Vp7NlZJcl/HR5BVuxqr1mqmMM8KPaR vMUX4ojx/pt2lEVXKI7vtfukPvVy74EN2A9UmW1fV3OZu9gRhTanx4Ss4YqvRygy+r3N 3OrRkPzP3GpcyiOX4GrhZAwhBTFX0ieeRsMklkVCDfNQUP+RoZOXXhJ58vnAg3NJFRTM uS45MPY7t8LekUnqDinINME0I3REgTyQVBP/x5jST2dLWT26IcRBPMnyBSDsILtJ9J4K MOiw== X-Gm-Message-State: AKwxytcH+WmMoC0kdC47FuotQWlWrzXN44La5Nz4tkPpQWuCC8ug/kAM kpCpLvI3dckNPXv6Z5OL1QM1Dp2cmLG2dHhKBhx2DovO3Wc= X-Received: by 10.223.130.206 with SMTP id 72mr11814329wrc.248.1517527635598; Thu, 01 Feb 2018 15:27:15 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.147.15 with HTTP; Thu, 1 Feb 2018 15:27:14 -0800 (PST) In-Reply-To: <20180201225542.GA13072@castle> References: <20180125001911.15597-1-guro@fb.com> <20180125.120302.1117695034222616751.davem@davemloft.net> <20180131215401.GA8956@castle> <20180201.101655.1316424669256047119.davem@davemloft.net> <20180201202158.GA11477@castle.DHCP.thefacebook.com> <20180201225542.GA13072@castle> From: Eric Dumazet Date: Thu, 1 Feb 2018 15:27:14 -0800 Message-ID: Subject: Re: [PATCH net] net: memcontrol: charge allocated memory after mem_cgroup_sk_alloc() To: Roman Gushchin Cc: "David S. Miller" , netdev , LKML , kernel-team , Johannes Weiner , Tejun Heo Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Well, this memcg stuff is so confusing. My recollection is that we had : https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=d979a39d7242e0601bf9b60e89628fb8ac577179 https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=75cb070960ade40fba5de32138390f3c85c90941 https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=c0576e3975084d4699b7bfef578613fb8e1144f6 And commit a590b90d472f2c176c140576ee3ab44df7f67839 as well Honestly bug was closed months ago for us, based on stack traces on the wild. No C repro or whatever, but reproducing it would be a matter of having a TCP listener constantly doing a socket()/setsockopt(REUSEADDR)/bind()/listen()/close() in a loop, while connections are attempted to the listening port. On Thu, Feb 1, 2018 at 2:55 PM, Roman Gushchin wrote: > On Thu, Feb 01, 2018 at 01:17:56PM -0800, Eric Dumazet wrote: >> On Thu, Feb 1, 2018 at 12:22 PM, Roman Gushchin wrote: >> > On Thu, Feb 01, 2018 at 10:16:55AM -0500, David Miller wrote: >> >> From: Roman Gushchin >> >> Date: Wed, 31 Jan 2018 21:54:08 +0000 >> >> >> >> > So I really start thinking that reverting 9f1c2674b328 >> >> > ("net: memcontrol: defer call to mem_cgroup_sk_alloc()") >> >> > and fixing the original issue differently might be easier >> >> > and a proper way to go. Does it makes sense? >> >> >> >> You'll need to work that out with Eric Dumazet who added the >> >> change in question which you think we should revert. >> > >> > Eric, >> > >> > can you, please, provide some details about the use-after-free problem >> > that you've fixed with commit 9f1c2674b328 ("net: memcontrol: defer call >> > to mem_cgroup_sk_alloc()" ? Do you know how to reproduce it? >> > >> > Deferring mem_cgroup_sk_alloc() breaks socket memory accounting >> > and makes it much more fragile in general. So, I wonder, if there are >> > solutions for the use-after-free problem. >> > >> > Thank you! >> > >> > Roman >> >> Unfortunately bug is not public (Google-Bug-Id 67556600 for Googlers >> following this thread ) >> >> Our kernel has a debug feature on percpu_ref_get_many() which detects >> the typical use-after-free problem of >> doing atomic_long_add(nr, &ref->count); while ref->count is 0, or >> memory already freed. >> >> Bug was serious because css_put() will release the css a second time. >> >> Stack trace looked like : >> >> Oct 8 00:23:14 lphh23 kernel: [27239.568098] >> [] dump_stack+0x4d/0x6c >> Oct 8 00:23:14 lphh23 kernel: [27239.568108] [] ? >> cgroup_get+0x43/0x50 >> Oct 8 00:23:14 lphh23 kernel: [27239.568114] [] >> warn_slowpath_common+0xac/0xc8 >> Oct 8 00:23:14 lphh23 kernel: [27239.568117] [] >> warn_slowpath_null+0x1a/0x1c >> Oct 8 00:23:14 lphh23 kernel: [27239.568120] [] >> cgroup_get+0x43/0x50 >> Oct 8 00:23:14 lphh23 kernel: [27239.568123] [] >> cgroup_sk_alloc+0x64/0x90 > > Hm, that looks strange... It's cgroup_sk_alloc(), > not mem_cgroup_sk_alloc(), which was removed by 9f1c2674b328. > > I thought, that it's css_get() in mem_cgroup_sk_alloc(), which > you removed, but the stacktrace you've posted is different. > > void mem_cgroup_sk_alloc(struct sock *sk) { > /* > * Socket cloning can throw us here with sk_memcg already > * filled. It won't however, necessarily happen from > * process context. So the test for root memcg given > * the current task's memcg won't help us in this case. > * > * Respecting the original socket's memcg is a better > * decision in this case. > */ > if (sk->sk_memcg) { > BUG_ON(mem_cgroup_is_root(sk->sk_memcg)); >>>> css_get(&sk->sk_memcg->css); > return; > } > > Is it possible to reproduce the issue on an upstream kernel? > Any ideas of what can trigger it? > > Btw, with the following patch applied (below) and cgroup v2 enabled, > the issue, which I'm talking about, can be reproduced in seconds after reboot > by doing almost any network activity. Just sshing to a machine is enough. > The corresponding warning will be printed to dmesg. > > What is a proper way to fix the socket memory accounting in this case, > what do you think? > > Thank you! > > Roman > > -- > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index c51c589..55fb890 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -276,6 +276,8 @@ struct mem_cgroup { > struct list_head event_list; > spinlock_t event_list_lock; > > + atomic_t tcpcnt; > + > struct mem_cgroup_per_node *nodeinfo[0]; > /* WARNING: nodeinfo must be the last member here */ > }; > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 19eea69..c69ff04 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5623,6 +5623,8 @@ static int memory_stat_show(struct seq_file *m, void *v) > seq_printf(m, "workingset_nodereclaim %lu\n", > stat[WORKINGSET_NODERECLAIM]); > > + seq_printf(m, "tcpcnt %d\n", atomic_read(&memcg->tcpcnt)); > + > return 0; > } > > @@ -6139,6 +6141,8 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages) > { > gfp_t gfp_mask = GFP_KERNEL; > > + atomic_add(nr_pages, &memcg->tcpcnt); > + > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { > struct page_counter *fail; > > @@ -6171,6 +6175,11 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages) > */ > void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages) > { > + int v = atomic_sub_return(nr_pages, &memcg->tcpcnt); > + if (v < 0) { > + pr_info("@@@ %p %d \n", memcg, v); > + } > + > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { > page_counter_uncharge(&memcg->tcpmem, nr_pages); > return;