Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp368065pxu; Fri, 11 Dec 2020 04:25:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJyuodvrGGQYiGS+DuMQFtMUjY0iXs7rC9+gv+8DfhMAgAtiwOAFr5+o/B7GhxQqxHAR5Wu6 X-Received: by 2002:a17:906:3d62:: with SMTP id r2mr11057841ejf.295.1607689520978; Fri, 11 Dec 2020 04:25:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607689520; cv=none; d=google.com; s=arc-20160816; b=RIP9vgzkZLEjfaYW8WxiLefbXPi27fWyGRK1sSCDyIPN3mQzKijyKzeR0klZ2dsb9J djIaW0G6P20l3L5kXOyH4+QnZzvLj3dDVVb5Ts1sXXYP6tn8a4MRO5Fgn1Q0Lf+0+cfr sWHxICw5AWsRz2qRlRd1kgtJTUtk2QzhnQXPBNI7unmUhGZ7yFj7Rd8ftOcIwPoLcArs AmWr2FkUD8cdygTTMEnsimT1fDK9JZyH+YcTT8LQ1IYxjdUD/8aXcNGtBGLN7Hy/y/Z8 j6X6i1hd8KFQIOtGbQoR5D3P7X67eVXEQf5xzU8cpVAQLxf3eHnS/9/xdyjhff/SpCQv kGYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from :dkim-signature; bh=+1ORPUHdspsm9h53fFYu3DqNR/hyW+duN+x5ggEWKRU=; b=JG2NJkPrbD40VQz8wXZ2URCgsRyqZaX4g61aKuqkvLQMbybvX8sqNGscMRCy+NCbFw NDBC7bXO8Q3DWigdcHl0xAU9N663LSOv8Wb6QSSXB51Mnn9RfXmOi+gKISK8FOfgWAUP KD0MB/X6zqwxNpch5l8LRZhdxLCLE/kW8DkQ/ktfHR+qEX+BL+MYgQldj3/uvP2iv1ek WyXVnO8aYKGE2sJx/iTQvmIhYJdHBU1mYWX/j4vQMJihrL9Au/cB0//mIgF+awr0IcxN N+xe29sSpq7AnESkMeEyR0AMvw6l2FFjio66xPmRcgN7S3Wo+Zf1pX23Mxwo+sZcWG1D jzQw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=bG2+nlW+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ga34si4344659ejc.171.2020.12.11.04.24.58; Fri, 11 Dec 2020 04:25:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=bG2+nlW+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436851AbgLKIYN (ORCPT + 99 others); Fri, 11 Dec 2020 03:24:13 -0500 Received: from smtp-fw-6001.amazon.com ([52.95.48.154]:53687 "EHLO smtp-fw-6001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436842AbgLKIXi (ORCPT ); Fri, 11 Dec 2020 03:23:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1607675017; x=1639211017; h=from:to:cc:subject:date:message-id:mime-version; bh=+1ORPUHdspsm9h53fFYu3DqNR/hyW+duN+x5ggEWKRU=; b=bG2+nlW+j/FkmD713dV/aRI893d/7rEl27//i+g0MdnK+eNo/gsCWPvW b43o641cQDgs5G+0d5rAN+GDuxKc+XH1+q3TkoHbAp+6A1qWYNFVJ9Vmy ylSOudmGyj5V7/QcZa/r55Dp1L69lcBbnKNB69RgH+2JX691cJgAggP/a o=; X-IronPort-AV: E=Sophos;i="5.78,410,1599523200"; d="scan'208";a="71918939" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-2b-baacba05.us-west-2.amazon.com) ([10.43.8.2]) by smtp-border-fw-out-6001.iad6.amazon.com with ESMTP; 11 Dec 2020 08:22:46 +0000 Received: from EX13D31EUA001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-2b-baacba05.us-west-2.amazon.com (Postfix) with ESMTPS id 0AA0EA1BBA; Fri, 11 Dec 2020 08:20:54 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.162.144) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 11 Dec 2020 08:20:48 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , Subject: [PATCH v3 0/1] net: Reduce rcu_barrier() contentions from 'unshare(CLONE_NEWNET)' Date: Fri, 11 Dec 2020 09:20:31 +0100 Message-ID: <20201211082032.26965-1-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.162.144] X-ClientProxiedBy: EX13D47UWC004.ant.amazon.com (10.43.162.74) To EX13D31EUA001.ant.amazon.com (10.43.165.15) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: SeongJae Park On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls make the number of active slab objects including 'sock_inode_cache' type rapidly and continuously increase. As a result, memory pressure occurs. In more detail, I made an artificial reproducer that resembles the workload that we found the problem and reproduce the problem faster. It merely repeats 'unshare(CLONE_NEWNET)' 50,000 times in a loop. It takes about 2 minutes. On 40 CPU cores, 70GB DRAM machine, the available memory continuously reduced in a fast speed (about 120MB per second, 15GB in total within the 2 minutes). Note that the issue don't reproduce on every machine. On my 6 CPU cores machine, the problem didn't reproduce. 'cleanup_net()' and 'fqdir_work_fn()' are functions that deallocate the relevant memory objects. They are asynchronously invoked by the work queues and internally use 'rcu_barrier()' to ensure safe destructions. 'cleanup_net()' works in a batched maneer in a single thread worker, while 'fqdir_work_fn()' works for each 'fqdir_exit()' call in the 'system_wq'. Therefore, 'fqdir_work_fn()' called frequently under the workload and made the contention for 'rcu_barrier()' high. In more detail, the global mutex, 'rcu_state.barrier_mutex' became the bottleneck. I tried making 'rcu_barrier()' and subsequent lightweight works in 'fqdir_work_fn()' to be processed by a dedicated singlethread worker in batch and confirmed it works. After the change, No continuous memory reduction but some fluctuation observed. Nevertheless, the available memory reduction was only up to about 400MB. The following patch is for the change. I think this is the right solution for point fix of this issue, but someone might blame different parts. 1. User: Frequent 'unshare()' calls From some point of view, such frequent 'unshare()' calls might seem only insane. 2. Global mutex in 'rcu_barrier()' Because of the global mutex, 'rcu_barrier()' callers could wait long even after the callbacks started before the call finished. Therefore, similar issues could happen in another 'rcu_barrier()' usages. Maybe we can use some wait queue like mechanism to notify the waiters when the desired time came. I personally believe applying the point fix for now and making 'rcu_barrier()' improvement in longterm make sense. If I'm missing something or you have different opinion, please feel free to let me know. Patch History ------------- Changes from v2 (https://lore.kernel.org/lkml/20201210080844.23741-1-sjpark@amazon.com/) - Add numbers after the patch (Eric Dumazet) - Make only 'rcu_barrier()' and subsequent lightweight works serialized (Eric Dumazet) Changes from v1 (https://lore.kernel.org/netdev/20201208094529.23266-1-sjpark@amazon.com/) - Keep xmas tree variable ordering (Jakub Kicinski) - Add more numbers (Eric Dumazet) - Use 'llist_for_each_entry_safe()' (Eric Dumazet) SeongJae Park (1): net/ipv4/inet_fragment: Batch fqdir destroy works include/net/inet_frag.h | 1 + net/ipv4/inet_fragment.c | 45 +++++++++++++++++++++++++++++++++------- 2 files changed, 39 insertions(+), 7 deletions(-) -- 2.17.1