Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp832848ybj; Tue, 5 May 2020 08:13:55 -0700 (PDT) X-Google-Smtp-Source: APiQypK/ZMCCxw6SsIto1/WFYhfjRcQZZi1w5/D7phoTo3vE/fZW+q1sWThgnUNVCJ1FvHx9YEnU X-Received: by 2002:aa7:de0b:: with SMTP id h11mr3125573edv.133.1588691635223; Tue, 05 May 2020 08:13:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588691635; cv=none; d=google.com; s=arc-20160816; b=oCMdijRs3pF3yttQydPcOia6BQnRDcqRmAZFqfQZj5pfMLnbyCBNMZ2oUNsKn4sdcM IoAEoffItuFL0HaJeoOQySieZLUV6e5CLX7UrnI4ppWfLXBIvajOIhsFwrH8rLwK+gY1 6YUsp7YTWVh2LHSAHI23SiQTFVnr8ZW09JSbwM/qOYLe1dqC27+7Yx6C35kRXgkA7cAp 8TMOWVPDTcM+VPmmKb1R0Gd2lBevSl3M1atLZ+W0LDKQsciaXHnWicvULEhGCxOgy7Ka +knj8Xe45cXRjxxn+O6828M5L2DiKtLt0qttNwoilYaQx6R6fBFT2H/U+gQCmOaO4HwT 98Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:in-reply-to:message-id:date :subject:cc:to:from:ironport-sdr:dkim-signature; bh=IAgZZDQAgmaOCAEF+gMgVxcblMlmWcnuMZ5A1KV3daA=; b=0XCdzdPW3KFBdE5FK4FGXPB59BVACi8YuZlKHR35U0FQaFhtSPTjhpRHtwyTmWUJEc rXOJo2TIaYdyLILS/cJUN8DYqrZN8LOFR/2PKL1php5oVS2mLhIMPTGNrG6iRct8f2XZ vs6UE3BCMsySordvo0Vw2oWxU387c5IJZVQL50fgduXfGOWgs2eUohMCZpy4FT6WNY42 4qDD650i8q/GLb0uRGCy6azkvLrgPYhf9kXKZ+a0APdT4STkK/EO4rYvCSh4uNoYAHES MEJkp4oF52zGV3tX/h3lSWKYujfb3D8uzNk7JKIVKMBdEyFvoQjMJiOHm2daY0GHXb1b 67uA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=UIWX9Qri; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j2si1281363ejs.138.2020.05.05.08.13.27; Tue, 05 May 2020 08:13:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=UIWX9Qri; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730122AbgEEPIQ (ORCPT + 99 others); Tue, 5 May 2020 11:08:16 -0400 Received: from smtp-fw-2101.amazon.com ([72.21.196.25]:45492 "EHLO smtp-fw-2101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729571AbgEEPIQ (ORCPT ); Tue, 5 May 2020 11:08:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1588691295; x=1620227295; h=from:to:cc:subject:date:message-id:in-reply-to: mime-version; bh=IAgZZDQAgmaOCAEF+gMgVxcblMlmWcnuMZ5A1KV3daA=; b=UIWX9QriVhXaUgff+rXR8CA9p3BYxPpUa+3orOtDkb6FXERDofUChTKC 8MUSyAGmNNqx4e362Ei3lvfT5hIds/3aVLJVM9f6g/2h3xKC2V+KhEaIz ivDgJKOUSjafjb7ELC0Gws+PdjFwiEyBfNvWkQUC7kzCCxOajjI9ZcNow w=; IronPort-SDR: kyd4994nfXuQCwn5F4nNpODgpRWvzpQJ2KEhGPjS9GIDWdfh2joXL/vSHZCx1BE9ajvdBjCf9k VpSuB92/uMKw== X-IronPort-AV: E=Sophos;i="5.73,355,1583193600"; d="scan'208";a="28970664" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-2a-d0be17ee.us-west-2.amazon.com) ([10.43.8.2]) by smtp-border-fw-out-2101.iad2.amazon.com with ESMTP; 05 May 2020 15:08:00 +0000 Received: from EX13MTAUEA002.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2a-d0be17ee.us-west-2.amazon.com (Postfix) with ESMTPS id 994FBA1F77; Tue, 5 May 2020 15:07:59 +0000 (UTC) Received: from EX13D31EUA001.ant.amazon.com (10.43.165.15) by EX13MTAUEA002.ant.amazon.com (10.43.61.77) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 5 May 2020 15:07:59 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.162.38) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 5 May 2020 15:07:50 +0000 From: SeongJae Park To: Eric Dumazet CC: SeongJae Park , David Miller , "Al Viro" , Jakub Kicinski , "Greg Kroah-Hartman" , , netdev , LKML , SeongJae Park , , , Subject: Re: Re: [PATCH net v2 0/2] Revert the 'socket_alloc' life cycle change Date: Tue, 5 May 2020 17:07:17 +0200 Message-ID: <20200505150717.5688-1-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: (raw) MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.162.38] X-ClientProxiedBy: EX13D28UWB004.ant.amazon.com (10.43.161.56) To EX13D31EUA001.ant.amazon.com (10.43.165.15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 5 May 2020 07:53:39 -0700 Eric Dumazet wrote: > On Tue, May 5, 2020 at 4:54 AM SeongJae Park wrote: > > > > CC-ing stable@vger.kernel.org and adding some more explanations. > > > > On Tue, 5 May 2020 10:10:33 +0200 SeongJae Park wrote: > > > > > From: SeongJae Park > > > > > > The commit 6d7855c54e1e ("sockfs: switch to ->free_inode()") made the > > > deallocation of 'socket_alloc' to be done asynchronously using RCU, as > > > same to 'sock.wq'. And the following commit 333f7909a857 ("coallocate > > > socket_sq with socket itself") made those to have same life cycle. > > > > > > The changes made the code much more simple, but also made 'socket_alloc' > > > live longer than before. For the reason, user programs intensively > > > repeating allocations and deallocations of sockets could cause memory > > > pressure on recent kernels. > > > > I found this problem on a production virtual machine utilizing 4GB memory while > > running lebench[1]. The 'poll big' test of lebench opens 1000 sockets, polls > > and closes those. This test is repeated 10,000 times. Therefore it should > > consume only 1000 'socket_alloc' objects at once. As size of socket_alloc is > > about 800 Bytes, it's only 800 KiB. However, on the recent kernels, it could > > consume up to 10,000,000 objects (about 8 GiB). On the test machine, I > > confirmed it consuming about 4GB of the system memory and results in OOM. > > > > [1] https://github.com/LinuxPerfStudy/LEBench > > To be fair, I have not backported Al patches to Google production > kernels, nor I have tried this benchmark. > > Why do we have 10,000,000 objects around ? Could this be because of > some RCU problem ? Mainly because of a long RCU grace period, as you guess. I have no idea how the grace period became so long in this case. As my test machine was a virtual machine instance, I guess RCU readers preemption[1] like problem might affected this. [1] https://www.usenix.org/system/files/conference/atc17/atc17-prasad.pdf > > Once Al patches reverted, do you have 10,000,000 sock_alloc around ? Yes, both the old kernel that prior to Al's patches and the recent kernel reverting the Al's patches didn't reproduce the problem. Thanks, SeongJae Park > > Thanks. > > > > > > > > > To avoid the problem, this commit reverts the changes. > > > > I also tried to make fixup rather than reverts, but I couldn't easily find > > simple fixup. As the commits 6d7855c54e1e and 333f7909a857 were for code > > refactoring rather than performance optimization, I thought introducing complex > > fixup for this problem would make no sense. Meanwhile, the memory pressure > > regression could affect real machines. To this end, I decided to quickly > > revert the commits first and consider better refactoring later. > > > > > > Thanks, > > SeongJae Park > > > > > > > > SeongJae Park (2): > > > Revert "coallocate socket_wq with socket itself" > > > Revert "sockfs: switch to ->free_inode()" > > > > > > drivers/net/tap.c | 5 +++-- > > > drivers/net/tun.c | 8 +++++--- > > > include/linux/if_tap.h | 1 + > > > include/linux/net.h | 4 ++-- > > > include/net/sock.h | 4 ++-- > > > net/core/sock.c | 2 +- > > > net/socket.c | 23 ++++++++++++++++------- > > > 7 files changed, 30 insertions(+), 17 deletions(-) > > > > > > -- > > > 2.17.1