Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp420909pxu; Fri, 11 Dec 2020 05:40:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJy/Z4fG1jE+0Olnwz36Q9BOPwPf1WTwmcV6Tjbw3MI4aNEDfvZcHBlFyyT/NWpMGNLlkR4o X-Received: by 2002:a05:6402:158:: with SMTP id s24mr12315910edu.19.1607694052408; Fri, 11 Dec 2020 05:40:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607694052; cv=none; d=google.com; s=arc-20160816; b=zaZ+T/O/o1QYJGpfjghKZpLv9s3k0SZSlqW5ixcSLK2QGEHNmFLi3LrG5u6wlldtbF NtndWBu1VLhVpyDKvsd0QHShnJsTxb4Z+rHx2/WZn/GetFlSBBhpTv1LefLkZAtFWtsN 2c6r6rmQMCVHk327a9TNh3wg62tvJRddgHoi3zgH4rrMBQfyFPerNJXRwZ3pr6Y667Um AOHIEYM//k1n2L4xU8qU4QS9MGewzwQ/6DVxvm5vvIlxRhP+ejjrjd8B0GFcioGITSUK heF6Ok+HAoqCgyA0/IfisfoJsOuvC39OHgW1EHfaW7ukAA0pFRkr9Is/l5WOVcHrdz6E 9hMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=3gLNjygVT9zPDI8y5SVkCNef9e9uyFYeqq082/T6qSo=; b=KTI9rcYvL4ve/gKSqyaN2BsnIM8LZFLN1oJJ85eWMt5fGrNNoajUeYLtY0fMLA7nHf 6Jzjm18gMtgVuTQ63a0JVY1SuphmvnM0xrkxDS/W7YGeqRIdzQ5INUBCN6hTUTL15GM6 t69fM/Y2Mm7LVRCjk7xggQ4AdaQMxdBzQZ9YF37QAm8T8w7kHHr1mEtk+k7+PnvQFcv8 gQax4Mug4sbM1E9iO9g7adiVcstUpA01eg14BAvvwvtI/joGd1MwvCXOYjONl8QW52Ph rmrXaLbu4DeQQqgj6JBrvbQ3EEvljy165Fs4gcUTayZgPGkP1fwdM60D0ye4hjx4OzrD OUbQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vo4TER9W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l26si4667706edv.316.2020.12.11.05.40.29; Fri, 11 Dec 2020 05:40:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vo4TER9W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392420AbgLKKmo (ORCPT + 99 others); Fri, 11 Dec 2020 05:42:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390233AbgLKKm3 (ORCPT ); Fri, 11 Dec 2020 05:42:29 -0500 Received: from mail-io1-xd44.google.com (mail-io1-xd44.google.com [IPv6:2607:f8b0:4864:20::d44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17135C0613D6 for ; Fri, 11 Dec 2020 02:41:49 -0800 (PST) Received: by mail-io1-xd44.google.com with SMTP id o8so9032830ioh.0 for ; Fri, 11 Dec 2020 02:41:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3gLNjygVT9zPDI8y5SVkCNef9e9uyFYeqq082/T6qSo=; b=vo4TER9Wk1zb4HozmOtFaX1mB/6GU89rzzMknrhE1fUvT55aKrum/HowDHvhwzT5gF m9YTfAu2YJs+0RgPfG2uC0pkO0omXIUNn6P4YAWdGP6l/aBF+8IL6PIGh2nX2ANMs3LK mESvLtfKBkEGiPaWkJ+h0xswwFIh01vdJscyfuln/SklZ2iipnts8RP4JCDvSZsiHJ9z +/n1TpYLWuO8aKH3SmHTketSjLF2VScEWVI8ATH7eGNXcOWIX3fLbeaGOmrSISNV2g+Z ujYBHjojECTTU6IwMa5Cr1rVESF0jFSX9AdT988iwqNb9xyL9OpBab+gKCuW6rGq/nir BwWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3gLNjygVT9zPDI8y5SVkCNef9e9uyFYeqq082/T6qSo=; b=IwJrfzp4TvkD5CH6UGcB32ilK/Pc1qAqokus7WHso01qT7y6S0wEUcx0DmT1U08kR5 y7ndJORMzriBmKtIX2s+f3nxt7Obo/g9COcVZhdhFBqs8StLswDbBKLQLp4B2XongtUc HWCdmky8cBXdSHyeOurLuL4UkoJJne6TIc0JzyKe51mIBmgTyVgA6dcF0Dyzev73bWT6 46zL9T5mJtsFobOHLjAOCAeP2+snTPDMSn7DQ6/O2NP/v5shW4MeOjvvFpvMc8Msk86F 4iC5Kanw07C4O1FEigkWplTVEpOBiLNSFsEPbfin4rjVaBzj1pGr6DIoyjYsUyHbRSeq bmiQ== X-Gm-Message-State: AOAM533RGsCVmIyNN3vOa6HKW3QiD4yu/6cMxKEnPQZQvU92vMTr1uJp GMbIVfpjYiMXN8eZjNWZ49aF27RqzBOrOO5HHjJeiA== X-Received: by 2002:a02:7821:: with SMTP id p33mr15231786jac.53.1607683308007; Fri, 11 Dec 2020 02:41:48 -0800 (PST) MIME-Version: 1.0 References: <20201211102844.13120-1-sjpark@amazon.com> In-Reply-To: <20201211102844.13120-1-sjpark@amazon.com> From: Eric Dumazet Date: Fri, 11 Dec 2020 11:41:36 +0100 Message-ID: Subject: Re: [PATCH v3 1/1] net/ipv4/inet_fragment: Batch fqdir destroy works To: SeongJae Park Cc: David Miller , SeongJae Park , Jakub Kicinski , Alexey Kuznetsov , Florian Westphal , "Paul E. McKenney" , netdev , rcu@vger.kernel.org, LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 11, 2020 at 11:33 AM SeongJae Park wrote: > > On Fri, 11 Dec 2020 09:43:41 +0100 Eric Dumazet wrote: > > > On Fri, Dec 11, 2020 at 9:21 AM SeongJae Park wrote: > > > > > > From: SeongJae Park > > > > > > For each 'fqdir_exit()' call, a work for destroy of the 'fqdir' is > > > enqueued. The work function, 'fqdir_work_fn()', internally calls > > > 'rcu_barrier()'. In case of intensive 'fqdir_exit()' (e.g., frequent > > > 'unshare()' systemcalls), this increased contention could result in > > > unacceptably high latency of 'rcu_barrier()'. This commit avoids such > > > contention by doing the 'rcu_barrier()' and subsequent lightweight works > > > in a batched manner using a dedicated singlethread worker, as similar to > > > that of 'cleanup_net()'. > > > > > > Not sure why you submit a patch series with a single patch. > > > > Your cover letter contains interesting info that would better be > > captured in this changelog IMO > > I thought someone might think this is not a kernel issue but the reproducer is > insane or 'rcu_barrier()' needs modification. I wanted to do such discussion > on the coverletter. Seems I misjudged. I will make this single patch and move > the detailed information here from the next version. > > > > > > > > > Signed-off-by: SeongJae Park > > > --- > > > include/net/inet_frag.h | 1 + > > > net/ipv4/inet_fragment.c | 45 +++++++++++++++++++++++++++++++++------- > > > 2 files changed, 39 insertions(+), 7 deletions(-) > > > > > > diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h > > > index bac79e817776..48cc5795ceda 100644 > > > --- a/include/net/inet_frag.h > > > +++ b/include/net/inet_frag.h > > > @@ -21,6 +21,7 @@ struct fqdir { > > > /* Keep atomic mem on separate cachelines in structs that include it */ > > > atomic_long_t mem ____cacheline_aligned_in_smp; > > > struct work_struct destroy_work; > > > + struct llist_node free_list; > > > }; > > > > > > /** > > > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c > > > index 10d31733297d..a6fc4afcc323 100644 > > > --- a/net/ipv4/inet_fragment.c > > > +++ b/net/ipv4/inet_fragment.c > > > @@ -145,12 +145,17 @@ static void inet_frags_free_cb(void *ptr, void *arg) > > > inet_frag_destroy(fq); > > > } > > > > > > -static void fqdir_work_fn(struct work_struct *work) > > > +static struct workqueue_struct *fqdir_wq; > > > +static LLIST_HEAD(free_list); > > > + > > > +static void fqdir_free_fn(struct work_struct *work) > > > { > > > - struct fqdir *fqdir = container_of(work, struct fqdir, destroy_work); > > > - struct inet_frags *f = fqdir->f; > > > + struct llist_node *kill_list; > > > + struct fqdir *fqdir, *tmp; > > > + struct inet_frags *f; > > > > > > - rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); > > > + /* Atomically snapshot the list of fqdirs to free */ > > > + kill_list = llist_del_all(&free_list); > > > > > > /* We need to make sure all ongoing call_rcu(..., inet_frag_destroy_rcu) > > > * have completed, since they need to dereference fqdir. > > > @@ -158,12 +163,38 @@ static void fqdir_work_fn(struct work_struct *work) > > > */ > > > rcu_barrier(); > > > > > > - if (refcount_dec_and_test(&f->refcnt)) > > > - complete(&f->completion); > > > + llist_for_each_entry_safe(fqdir, tmp, kill_list, free_list) { > > > + f = fqdir->f; > > > + if (refcount_dec_and_test(&f->refcnt)) > > > + complete(&f->completion); > > > > > > - kfree(fqdir); > > > + kfree(fqdir); > > > + } > > > } > > > > > > +static DECLARE_WORK(fqdir_free_work, fqdir_free_fn); > > > + > > > +static void fqdir_work_fn(struct work_struct *work) > > > +{ > > > + struct fqdir *fqdir = container_of(work, struct fqdir, destroy_work); > > > + > > > + rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); > > > + > > > + if (llist_add(&fqdir->free_list, &free_list)) > > > + queue_work(fqdir_wq, &fqdir_free_work); > > > > I think you misunderstood me. > > > > Since this fqdir_free_work will have at most one instance, you can use > > system_wq here, there is no risk of abuse. > > > > My suggestion was to not use system_wq for fqdir_exit(), to better > > control the number > > of threads in rhashtable cleanups. > > > > void fqdir_exit(struct fqdir *fqdir) > > { > > INIT_WORK(&fqdir->destroy_work, fqdir_work_fn); > > - queue_work(system_wq, &fqdir->destroy_work); > > + queue_work(fqdir_wq, &fqdir->destroy_work); > > } > > Oh, got it. I definitely misunderstood. My fault, sorry. > > > > > > > > > > +} > > > + > > > +static int __init fqdir_wq_init(void) > > > +{ > > > + fqdir_wq = create_singlethread_workqueue("fqdir"); > > > > > > And here, I suggest to use a non ordered work queue, allowing one > > thread per cpu, to allow concurrent rhashtable cleanups > > > > Also "fqdir" name is rather vague, this is an implementation detail ? > > > > fqdir_wq =create_workqueue("inet_frag_wq"); > > So, what you are suggesting is to use a dedicated non-ordered work queue > (fqdir_wq) for rhashtable cleanup and do the remaining works with system_wq in > the batched manner, right? IOW, doing below change on top of this patch. > > --- a/net/ipv4/inet_fragment.c > +++ b/net/ipv4/inet_fragment.c > @@ -145,7 +145,7 @@ static void inet_frags_free_cb(void *ptr, void *arg) > inet_frag_destroy(fq); > } > > -static struct workqueue_struct *fqdir_wq; > +static struct workqueue_struct *inet_frag_wq; > static LLIST_HEAD(free_list); Nit : Please prefix this free_list , like fqdir_free_list to avoid namespace pollution. > > static void fqdir_free_fn(struct work_struct *work) > @@ -181,14 +181,14 @@ static void fqdir_work_fn(struct work_struct *work) > rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); > > if (llist_add(&fqdir->free_list, &free_list)) > - queue_work(fqdir_wq, &fqdir_free_work); > + queue_work(system_wq, &fqdir_free_work); > } > > static int __init fqdir_wq_init(void) > { > - fqdir_wq = create_singlethread_workqueue("fqdir"); > - if (!fqdir_wq) > - panic("Could not create fqdir workq"); > + inet_frag_wq = create_workqueue("inet_frag_wq"); > + if (!inet_frag_wq) > + panic("Could not create inet frag workq"); > return 0; > } > > @@ -218,7 +218,7 @@ EXPORT_SYMBOL(fqdir_init); > void fqdir_exit(struct fqdir *fqdir) > { > INIT_WORK(&fqdir->destroy_work, fqdir_work_fn); > - queue_work(system_wq, &fqdir->destroy_work); > + queue_work(inet_frag_wq, &fqdir->destroy_work); > } > EXPORT_SYMBOL(fqdir_exit); > > If I'm still misunderstanding, please let me know. > I think that with the above changes, we should be good ;) Thanks !