Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp4589606pxu; Wed, 9 Dec 2020 23:46:01 -0800 (PST) X-Google-Smtp-Source: ABdhPJzAsOILJK/dHsW3guNhkQdcXqHIO4R95LfrMgcjzXySH79EMzuXVyAaWQam93ym7ZYLTmjI X-Received: by 2002:a50:e00b:: with SMTP id e11mr5530163edl.303.1607586361595; Wed, 09 Dec 2020 23:46:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607586361; cv=none; d=google.com; s=arc-20160816; b=urRJsKTMYk6D/bMKgOg8+PA9GctMoZ9SPlcZCmsWzcKUxYE+W1qb0QXzS1uAWngBRi 7f805O4M2Ckmor3uy//rgwRCgLgeoftypLduT7AGESqkEqwRWOOvOdMclahQIPBm088g X/0mrsQkv3jKI0jk4N/hn5zpnuxmkYNUYNEHEwNAbJECenEh6mQ4ZKMUviQHVJ+jRcgC WqjMUW6D1+DJ5HL3QLQwwYLuOde8ZCzMTi9emGXIuYy6tJTjhgldEW8zF+hv/fTA584L f/a696exqhCcVnzyCGtwtaui6eArY9jr+nOEM4BoCD2gA7MAviUVQ1f32gk+8CPL9Dui HNew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=+ZgNEknmUpfCWrXl5dEdMSWKoi9I8sDyFxoRQJHgM0k=; b=ydp3bn72LZcCPBnqwfJ7w3/uPLoHm2wnlofbr2mSequ5AM93nEqq/9dvIg2jQDWdHY 1LuxHTNvY8qqmp72UC4qdMtiWodvMkryAIWsY7umvFW7w/9nNQgo92pSbOw+J+yvNHH9 PkdoN1S7mxC7eiiqsk1j3Cw0TEmqMtJy1953KqdyBvz7bgA5ctiAVWyvIX3yCcymHChW b6bMBLcB1Y5skuE4DGcUyZb+j8Hu04ih54/WJPOsJFitHISgNsfQj+fWOTIChd/cd6bF Dri7UEKjwRKzPx1f2bdXIaBk3LM6yDfckgLB55pQYBBwQ/rEd0qR4YUbfRZuMJ5RRygh GDaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=mmbMJazM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y11si2125569ejp.753.2020.12.09.23.45.39; Wed, 09 Dec 2020 23:46:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=mmbMJazM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732920AbgLJH2S (ORCPT + 99 others); Thu, 10 Dec 2020 02:28:18 -0500 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:52988 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727734AbgLJH2R (ORCPT ); Thu, 10 Dec 2020 02:28:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1607585296; x=1639121296; h=from:to:cc:subject:date:message-id:in-reply-to: mime-version; bh=+ZgNEknmUpfCWrXl5dEdMSWKoi9I8sDyFxoRQJHgM0k=; b=mmbMJazMT6B24v+M9OhYzfal/T9RpSBS8NyUDIEP2uBVuwcUW5gT75aI VuqanPZyKEFcGZqmAWhCxHSoDOXf4CSS2yRUsreECE+rf7if+l8mjii5E nds10u4sdDg5hcRbPPBYwGNEkxAJZAXGpMp4tz1jmc3+qkGHM5vj2QDRF Y=; X-IronPort-AV: E=Sophos;i="5.78,407,1599523200"; d="scan'208";a="70252257" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-2a-1c1b5cdd.us-west-2.amazon.com) ([10.43.8.2]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 10 Dec 2020 07:27:28 +0000 Received: from EX13D31EUA001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-2a-1c1b5cdd.us-west-2.amazon.com (Postfix) with ESMTPS id 6C765A19EB; Thu, 10 Dec 2020 07:27:27 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.162.53) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 10 Dec 2020 07:27:22 +0000 From: SeongJae Park To: Eric Dumazet CC: SeongJae Park , , SeongJae Park , , , , , , Subject: Re: [PATCH 1/1] net/ipv4/inet_fragment: Batch fqdir destroy works Date: Thu, 10 Dec 2020 08:27:07 +0100 Message-ID: <20201210072707.16079-1-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <6d3e32f6-c2df-a1a6-3568-b7387cd0c933@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.162.53] X-ClientProxiedBy: EX13D48UWA002.ant.amazon.com (10.43.163.16) To EX13D31EUA001.ant.amazon.com (10.43.165.15) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 10 Dec 2020 01:17:58 +0100 Eric Dumazet wrote: > > > On 12/8/20 10:45 AM, SeongJae Park wrote: > > From: SeongJae Park > > > > In 'fqdir_exit()', a work for destruction of the 'fqdir' is enqueued. > > The work function, 'fqdir_work_fn()', calls 'rcu_barrier()'. In case of > > intensive 'fqdir_exit()' (e.g., frequent 'unshare(CLONE_NEWNET)' > > systemcalls), this increased contention could result in unacceptably > > high latency of 'rcu_barrier()'. This commit avoids such contention by > > doing the destruction in batched manner, as similar to that of > > 'cleanup_net()'. > > Any numbers to share ? I have never seen an issue. On our 40 CPU cores / 70GB DRAM machine, 15GB of available memory was reduced within 2 minutes while my artificial reproducer runs. The reproducer merely repeats 'unshare(CLONE_NEWNET)' in a loop for 50,000 times. The reproducer is not only artificial but resembles the behavior of our real workloads. While the reproducer runs, 'cleanup_net()' was called only 4 times. First two calls quickly finished, but third call took about 30 seconds, and the fourth call didn't finished until the reproducer finishes. We also confirmed the third and fourth calls just waiting for 'rcu_barrier()'. I think you've not seen this issue before because we are doing very intensive 'unshare()' calls. Also, this is not reproducible on every hardware. On my 6 CPU machine, the problem didn't reproduce. > > > > > Signed-off-by: SeongJae Park > > --- > > include/net/inet_frag.h | 2 +- > > net/ipv4/inet_fragment.c | 28 ++++++++++++++++++++-------- > > 2 files changed, 21 insertions(+), 9 deletions(-) > > > > diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h > > index bac79e817776..558893d8810c 100644 > > --- a/include/net/inet_frag.h > > +++ b/include/net/inet_frag.h > > @@ -20,7 +20,7 @@ struct fqdir { > > > > /* Keep atomic mem on separate cachelines in structs that include it */ > > atomic_long_t mem ____cacheline_aligned_in_smp; > > - struct work_struct destroy_work; > > + struct llist_node destroy_list; > > }; > > > > /** > > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c > > index 10d31733297d..796b559137c5 100644 > > --- a/net/ipv4/inet_fragment.c > > +++ b/net/ipv4/inet_fragment.c > > @@ -145,12 +145,19 @@ static void inet_frags_free_cb(void *ptr, void *arg) > > inet_frag_destroy(fq); > > } > > > > +static LLIST_HEAD(destroy_list); > > + > > static void fqdir_work_fn(struct work_struct *work) > > { > > - struct fqdir *fqdir = container_of(work, struct fqdir, destroy_work); > > - struct inet_frags *f = fqdir->f; > > + struct llist_node *kill_list; > > + struct fqdir *fqdir; > > + struct inet_frags *f; > > + > > + /* Atomically snapshot the list of fqdirs to destroy */ > > + kill_list = llist_del_all(&destroy_list); > > > > - rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); > > + llist_for_each_entry(fqdir, kill_list, destroy_list) > > + rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); > > > > > OK, it seems rhashtable_free_and_destroy() has cond_resched() so we are not going > to hold this cpu for long periods. > > > /* We need to make sure all ongoing call_rcu(..., inet_frag_destroy_rcu) > > * have completed, since they need to dereference fqdir. > > @@ -158,10 +165,13 @@ static void fqdir_work_fn(struct work_struct *work) > > */ > > rcu_barrier(); > > > > - if (refcount_dec_and_test(&f->refcnt)) > > - complete(&f->completion); > > + llist_for_each_entry(fqdir, kill_list, destroy_list) { > > Don't we need the llist_for_each_entry_safe() variant here ??? Oh, indeed. I will do so in the next version. > > > + f = fqdir->f; > > + if (refcount_dec_and_test(&f->refcnt)) > > + complete(&f->completion); > > > > - kfree(fqdir); > > + kfree(fqdir); > > + } Thanks, SeongJae Park