Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1125191pxu; Sat, 12 Dec 2020 02:58:36 -0800 (PST) X-Google-Smtp-Source: ABdhPJzBYFTdup42hc73s0GZtkWxJDKG9TGbLdtnIvHfQfe0ZuVaPWLMFgvHT0/EU+BLAomkP1rO X-Received: by 2002:a17:906:49c2:: with SMTP id w2mr14097140ejv.12.1607770716064; Sat, 12 Dec 2020 02:58:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607770716; cv=none; d=google.com; s=arc-20160816; b=nBaH6OgcZfrGNbXa0ArRxwQM0KX7FhtPZ8WyVCcjz+S6gsvdNE6bD/bOXtW5OHgI1n YKEEH5jtk8rebeJO8ye6Mo58bzy593N6RHKDrrc7282b0JoOHC1JhxSFeHz7OS6CHiuB 2MT4XRdf1rIFnGIbvyaBEIifyfUg8vJp/vq498ooUsWogYijnp2jJEHXO+CHtTXbMneS qKEYwYPhRzCaovKdJ5Gn5Oj0Qnxx7CRyvAaQhjipqsgpWFAC1pfFGfNiWgL0CzAdJivJ B/msNBRO7Y3T4ZzsJZlcy0oJZTCPjkz2YAGD/fgFVEwS68y5j5Bf+TNCSKJwTo9zGDhl Mmcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=gfBur0GlhElbekxC9TYTBm7Z41iQ0G2i2uPqDPkx+K4=; b=AJSKsV+4rEOZCtpNp+ZQ2fnbQSBomV9uu0NPag6kiX9fKv6PmiWCeAePABNPnHa1OG D+ljrFh7eeL5ULXKxWK9vxpMa3DJjbSif7o4uMD8X6ySPtC3a628MHvgTK+M07bAvc/w eb+cfuEXNe563Z4UD/qO5/SddbDvwW48qMiRnRK/q2bSvKmN+wxLpmCnKg/egjjiKzBr OrrRU0MSz20/2pLfH5UtdHHgng1dvBIi6lVR/CWZSblIUwQeQHNPODAoW2DHMQC7QghR i6n1qQH3rdAkAypef73mBhpn4aYdlw38UDSQ2tx7WSxJL5TpYxBIsaupGq2SJ20EM/IR H6Lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=Au6AYjKi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n6si5782246ejc.567.2020.12.12.02.58.12; Sat, 12 Dec 2020 02:58:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=Au6AYjKi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393413AbgLKKsF (ORCPT + 99 others); Fri, 11 Dec 2020 05:48:05 -0500 Received: from smtp-fw-4101.amazon.com ([72.21.198.25]:10753 "EHLO smtp-fw-4101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726623AbgLKKrc (ORCPT ); Fri, 11 Dec 2020 05:47:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1607683651; x=1639219651; h=from:to:cc:subject:date:message-id:in-reply-to: mime-version; bh=gfBur0GlhElbekxC9TYTBm7Z41iQ0G2i2uPqDPkx+K4=; b=Au6AYjKi6idk1ZAJQMrWoZ898yq6P2H6OgQlP1h3e6uiAZxpcg9RPJrB O8okVOKFX56Enq7ArxGTI2DssgkimvH8MXv+i8X3qKIXzZspb6VuJXmtk 7tlxVNjpFiuH8VodF8kX5FqI9QowNompBp1j6fJvX/x0A3ljswtTc7pKC E=; X-IronPort-AV: E=Sophos;i="5.78,411,1599523200"; d="scan'208";a="68767749" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com) ([10.43.8.2]) by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP; 11 Dec 2020 10:46:44 +0000 Received: from EX13D31EUA001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com (Postfix) with ESMTPS id C423FA1E75; Fri, 11 Dec 2020 10:46:41 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.160.185) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 11 Dec 2020 10:46:35 +0000 From: SeongJae Park To: Eric Dumazet CC: SeongJae Park , David Miller , SeongJae Park , Jakub Kicinski , "Alexey Kuznetsov" , Florian Westphal , "Paul E. McKenney" , netdev , , LKML Subject: Re: [PATCH v3 1/1] net/ipv4/inet_fragment: Batch fqdir destroy works Date: Fri, 11 Dec 2020 11:46:22 +0100 Message-ID: <20201211104622.23231-1-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.160.185] X-ClientProxiedBy: EX13D01UWB002.ant.amazon.com (10.43.161.136) To EX13D31EUA001.ant.amazon.com (10.43.165.15) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 11 Dec 2020 11:41:36 +0100 Eric Dumazet wrote: > On Fri, Dec 11, 2020 at 11:33 AM SeongJae Park wrote: > > > > On Fri, 11 Dec 2020 09:43:41 +0100 Eric Dumazet wrote: > > > > > On Fri, Dec 11, 2020 at 9:21 AM SeongJae Park wrote: > > > > > > > > From: SeongJae Park > > > > > > > > For each 'fqdir_exit()' call, a work for destroy of the 'fqdir' is > > > > enqueued. The work function, 'fqdir_work_fn()', internally calls > > > > 'rcu_barrier()'. In case of intensive 'fqdir_exit()' (e.g., frequent > > > > 'unshare()' systemcalls), this increased contention could result in > > > > unacceptably high latency of 'rcu_barrier()'. This commit avoids such > > > > contention by doing the 'rcu_barrier()' and subsequent lightweight works > > > > in a batched manner using a dedicated singlethread worker, as similar to > > > > that of 'cleanup_net()'. > > > > > > > > > Not sure why you submit a patch series with a single patch. > > > > > > Your cover letter contains interesting info that would better be > > > captured in this changelog IMO > > > > I thought someone might think this is not a kernel issue but the reproducer is > > insane or 'rcu_barrier()' needs modification. I wanted to do such discussion > > on the coverletter. Seems I misjudged. I will make this single patch and move > > the detailed information here from the next version. > > > > > > > > > > > > > Signed-off-by: SeongJae Park > > > > --- > > > > include/net/inet_frag.h | 1 + > > > > net/ipv4/inet_fragment.c | 45 +++++++++++++++++++++++++++++++++------- > > > > 2 files changed, 39 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h > > > > index bac79e817776..48cc5795ceda 100644 > > > > --- a/include/net/inet_frag.h > > > > +++ b/include/net/inet_frag.h > > > > @@ -21,6 +21,7 @@ struct fqdir { > > > > /* Keep atomic mem on separate cachelines in structs that include it */ > > > > atomic_long_t mem ____cacheline_aligned_in_smp; > > > > struct work_struct destroy_work; > > > > + struct llist_node free_list; > > > > }; > > > > > > > > /** > > > > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c > > > > index 10d31733297d..a6fc4afcc323 100644 > > > > --- a/net/ipv4/inet_fragment.c > > > > +++ b/net/ipv4/inet_fragment.c > > > > @@ -145,12 +145,17 @@ static void inet_frags_free_cb(void *ptr, void *arg) > > > > inet_frag_destroy(fq); > > > > } > > > > > > > > -static void fqdir_work_fn(struct work_struct *work) > > > > +static struct workqueue_struct *fqdir_wq; > > > > +static LLIST_HEAD(free_list); > > > > + > > > > +static void fqdir_free_fn(struct work_struct *work) > > > > { > > > > - struct fqdir *fqdir = container_of(work, struct fqdir, destroy_work); > > > > - struct inet_frags *f = fqdir->f; > > > > + struct llist_node *kill_list; > > > > + struct fqdir *fqdir, *tmp; > > > > + struct inet_frags *f; > > > > > > > > - rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); > > > > + /* Atomically snapshot the list of fqdirs to free */ > > > > + kill_list = llist_del_all(&free_list); > > > > > > > > /* We need to make sure all ongoing call_rcu(..., inet_frag_destroy_rcu) > > > > * have completed, since they need to dereference fqdir. > > > > @@ -158,12 +163,38 @@ static void fqdir_work_fn(struct work_struct *work) > > > > */ > > > > rcu_barrier(); > > > > > > > > - if (refcount_dec_and_test(&f->refcnt)) > > > > - complete(&f->completion); > > > > + llist_for_each_entry_safe(fqdir, tmp, kill_list, free_list) { > > > > + f = fqdir->f; > > > > + if (refcount_dec_and_test(&f->refcnt)) > > > > + complete(&f->completion); > > > > > > > > - kfree(fqdir); > > > > + kfree(fqdir); > > > > + } > > > > } > > > > > > > > +static DECLARE_WORK(fqdir_free_work, fqdir_free_fn); > > > > + > > > > +static void fqdir_work_fn(struct work_struct *work) > > > > +{ > > > > + struct fqdir *fqdir = container_of(work, struct fqdir, destroy_work); > > > > + > > > > + rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); > > > > + > > > > + if (llist_add(&fqdir->free_list, &free_list)) > > > > + queue_work(fqdir_wq, &fqdir_free_work); > > > > > > I think you misunderstood me. > > > > > > Since this fqdir_free_work will have at most one instance, you can use > > > system_wq here, there is no risk of abuse. > > > > > > My suggestion was to not use system_wq for fqdir_exit(), to better > > > control the number > > > of threads in rhashtable cleanups. > > > > > > void fqdir_exit(struct fqdir *fqdir) > > > { > > > INIT_WORK(&fqdir->destroy_work, fqdir_work_fn); > > > - queue_work(system_wq, &fqdir->destroy_work); > > > + queue_work(fqdir_wq, &fqdir->destroy_work); > > > } > > > > Oh, got it. I definitely misunderstood. My fault, sorry. > > > > > > > > > > > > > > > +} > > > > + > > > > +static int __init fqdir_wq_init(void) > > > > +{ > > > > + fqdir_wq = create_singlethread_workqueue("fqdir"); > > > > > > > > > And here, I suggest to use a non ordered work queue, allowing one > > > thread per cpu, to allow concurrent rhashtable cleanups > > > > > > Also "fqdir" name is rather vague, this is an implementation detail ? > > > > > > fqdir_wq =create_workqueue("inet_frag_wq"); > > > > So, what you are suggesting is to use a dedicated non-ordered work queue > > (fqdir_wq) for rhashtable cleanup and do the remaining works with system_wq in > > the batched manner, right? IOW, doing below change on top of this patch. > > > > --- a/net/ipv4/inet_fragment.c > > +++ b/net/ipv4/inet_fragment.c > > @@ -145,7 +145,7 @@ static void inet_frags_free_cb(void *ptr, void *arg) > > inet_frag_destroy(fq); > > } > > > > -static struct workqueue_struct *fqdir_wq; > > +static struct workqueue_struct *inet_frag_wq; > > static LLIST_HEAD(free_list); > > Nit : Please prefix this free_list , like fqdir_free_list to avoid > namespace pollution. Sure! > > > > > > static void fqdir_free_fn(struct work_struct *work) > > @@ -181,14 +181,14 @@ static void fqdir_work_fn(struct work_struct *work) > > rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); > > > > if (llist_add(&fqdir->free_list, &free_list)) > > - queue_work(fqdir_wq, &fqdir_free_work); > > + queue_work(system_wq, &fqdir_free_work); > > } > > > > static int __init fqdir_wq_init(void) > > { > > - fqdir_wq = create_singlethread_workqueue("fqdir"); > > - if (!fqdir_wq) > > - panic("Could not create fqdir workq"); > > + inet_frag_wq = create_workqueue("inet_frag_wq"); > > + if (!inet_frag_wq) > > + panic("Could not create inet frag workq"); > > return 0; > > } > > > > @@ -218,7 +218,7 @@ EXPORT_SYMBOL(fqdir_init); > > void fqdir_exit(struct fqdir *fqdir) > > { > > INIT_WORK(&fqdir->destroy_work, fqdir_work_fn); > > - queue_work(system_wq, &fqdir->destroy_work); > > + queue_work(inet_frag_wq, &fqdir->destroy_work); > > } > > EXPORT_SYMBOL(fqdir_exit); > > > > If I'm still misunderstanding, please let me know. > > > > I think that with the above changes, we should be good ;) Thank you for your quick and nice reply. I will send the next version soon! Thanks, SeongJae Park