Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3533654pxb; Mon, 4 Apr 2022 20:14:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw2rpjpxWkWgAcwMK1o1WtX4siZ8Vk2hXbdhBWlYsPeXKbpB9plIj2E4H6KQLFPDmHitYMv X-Received: by 2002:a63:5c0b:0:b0:382:76f4:c768 with SMTP id q11-20020a635c0b000000b0038276f4c768mr1086792pgb.93.1649128443950; Mon, 04 Apr 2022 20:14:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649128443; cv=none; d=google.com; s=arc-20160816; b=qUUzqaCa5/JcOlwkEjQdE3oUSH9sYMfiV0bmtGj+dwF4ujaiAWNOaqiI4UETGhqpA0 BBLMixtm1SEhdEOCCZGhjqvfq+eMokXOXnWSf8U903NwIoSMUJASu3Of7RasdL8XcWIu sU9UyX2xlbSYCj4+vRyMqqH+3LDFxpX9pH2Qbyb3jlVQnC4oZhkwVA5v1y6AkeuFXysZ sfnKTOMrbjqbMyr1vOZYto8JHNqN9gS/G840kooowTBynDUXVR44Y/hjZBcHWbWnJYSh +oXrWM6/41fPT6pLlEyFPAUDJ1c4rGjKCjjRUZq9fvYKb+c8viiXcBHXWRFAKLpf2+/K lpiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FWRbAK/DgDhExdAGrthT/YhYXh6+ucehKHiGt1wLsos=; b=RBnaPiVkGT9pjmC1vRiu1cPaZ+Y9D8tR3qgCQ2Aa51sH+FgXkORl36nJquzjfxWj3T suhsXxTuw2DqyrVsMcP35rZtRET3Jf+IuWTjMVzgCmeCPlHIB6eMwimBQuqZl8587Jyk dbj2Ev2stzKkGq2N3qcjTwicP0C5lNYqQLpIIsAJnq4CvcYG5OJ4DWa6gSCi5V6eySCn tDx2I2WFxout3J8BtjnHU9eTLjpHObyTk9HoFi3+X/QdRk8arol4uw0yoWcAWNENF9Ei H0ZtXG1TrBXCwn4b2th9NWk8hCYUJ0YbRf5kiTMV2GCb58tOa/i2ud10g6ux7hAnCif1 0lhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YWw3W69L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id w1-20020a636201000000b003816043ef3fsi11954930pgb.308.2022.04.04.20.14.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 20:14:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YWw3W69L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5645D304606; Mon, 4 Apr 2022 18:14:22 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348369AbiDAOnF (ORCPT + 99 others); Fri, 1 Apr 2022 10:43:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348468AbiDAOe3 (ORCPT ); Fri, 1 Apr 2022 10:34:29 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3675BC7; Fri, 1 Apr 2022 07:32:39 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A7A92B82507; Fri, 1 Apr 2022 14:32:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 97D25C340EE; Fri, 1 Apr 2022 14:32:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1648823557; bh=WavI65MpFwEQGdn/j9IXFin7URnvZ2VluOujiSBnhx4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YWw3W69L/2KT8xt5rogRPtmXru3E9iYUuuBsn2oVWwYyXGdDcDvx5w+yet+iTcTZF F9ZKtt/V0yEpU0yn8i8N/2BSa1u7PnduaLb6AzP9KBQpSiUQ/Sj+j6BsOQH4+yg7hk uu0DucVGAN4eHRIddaNa920hDLPEG+6wbL0PPYqQVP+WGWvj9NdN3lB5xs8RNTT29L HA54DceESGyVdnBOs9U//Hjj8Id7OOZbFlfnxo110PRsAD846wCD1aeZDdQ75t9YNn Zhb+D8newc/3ibYyFo2rqLLDYVu7auSsCA1bgpAKGxt0/JiPbhShLTT3KHBwl0e+/u Uhp8I0cvuHJpA== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Florian Westphal , Karel Rericha , Shmulik Ladkani , Eyal Birger , Pablo Neira Ayuso , Sasha Levin , kadlec@netfilter.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org Subject: [PATCH AUTOSEL 5.17 143/149] netfilter: conntrack: revisit gc autotuning Date: Fri, 1 Apr 2022 10:25:30 -0400 Message-Id: <20220401142536.1948161-143-sashal@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220401142536.1948161-1-sashal@kernel.org> References: <20220401142536.1948161-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Florian Westphal [ Upstream commit 2cfadb761d3d0219412fd8150faea60c7e863833 ] as of commit 4608fdfc07e1 ("netfilter: conntrack: collect all entries in one cycle") conntrack gc was changed to run every 2 minutes. On systems where conntrack hash table is set to large value, most evictions happen from gc worker rather than the packet path due to hash table distribution. This causes netlink event overflows when events are collected. This change collects average expiry of scanned entries and reschedules to the average remaining value, within 1 to 60 second interval. To avoid event overflows, reschedule after each bucket and add a limit for both run time and number of evictions per run. If more entries have to be evicted, reschedule and restart 1 jiffy into the future. Reported-by: Karel Rericha Cc: Shmulik Ladkani Cc: Eyal Birger Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/nf_conntrack_core.c | 85 ++++++++++++++++++++++++------- 1 file changed, 68 insertions(+), 17 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index bf1e17c678f1..7552e1e9fd62 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -67,6 +67,8 @@ EXPORT_SYMBOL_GPL(nf_conntrack_hash); struct conntrack_gc_work { struct delayed_work dwork; u32 next_bucket; + u32 avg_timeout; + u32 start_time; bool exiting; bool early_drop; }; @@ -78,8 +80,19 @@ static __read_mostly bool nf_conntrack_locks_all; /* serialize hash resizes and nf_ct_iterate_cleanup */ static DEFINE_MUTEX(nf_conntrack_mutex); -#define GC_SCAN_INTERVAL (120u * HZ) +#define GC_SCAN_INTERVAL_MAX (60ul * HZ) +#define GC_SCAN_INTERVAL_MIN (1ul * HZ) + +/* clamp timeouts to this value (TCP unacked) */ +#define GC_SCAN_INTERVAL_CLAMP (300ul * HZ) + +/* large initial bias so that we don't scan often just because we have + * three entries with a 1s timeout. + */ +#define GC_SCAN_INTERVAL_INIT INT_MAX + #define GC_SCAN_MAX_DURATION msecs_to_jiffies(10) +#define GC_SCAN_EXPIRED_MAX (64000u / HZ) #define MIN_CHAINLEN 8u #define MAX_CHAINLEN (32u - MIN_CHAINLEN) @@ -1421,16 +1434,28 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct) static void gc_worker(struct work_struct *work) { - unsigned long end_time = jiffies + GC_SCAN_MAX_DURATION; unsigned int i, hashsz, nf_conntrack_max95 = 0; - unsigned long next_run = GC_SCAN_INTERVAL; + u32 end_time, start_time = nfct_time_stamp; struct conntrack_gc_work *gc_work; + unsigned int expired_count = 0; + unsigned long next_run; + s32 delta_time; + gc_work = container_of(work, struct conntrack_gc_work, dwork.work); i = gc_work->next_bucket; if (gc_work->early_drop) nf_conntrack_max95 = nf_conntrack_max / 100u * 95u; + if (i == 0) { + gc_work->avg_timeout = GC_SCAN_INTERVAL_INIT; + gc_work->start_time = start_time; + } + + next_run = gc_work->avg_timeout; + + end_time = start_time + GC_SCAN_MAX_DURATION; + do { struct nf_conntrack_tuple_hash *h; struct hlist_nulls_head *ct_hash; @@ -1447,6 +1472,7 @@ static void gc_worker(struct work_struct *work) hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) { struct nf_conntrack_net *cnet; + unsigned long expires; struct net *net; tmp = nf_ct_tuplehash_to_ctrack(h); @@ -1456,11 +1482,29 @@ static void gc_worker(struct work_struct *work) continue; } + if (expired_count > GC_SCAN_EXPIRED_MAX) { + rcu_read_unlock(); + + gc_work->next_bucket = i; + gc_work->avg_timeout = next_run; + + delta_time = nfct_time_stamp - gc_work->start_time; + + /* re-sched immediately if total cycle time is exceeded */ + next_run = delta_time < (s32)GC_SCAN_INTERVAL_MAX; + goto early_exit; + } + if (nf_ct_is_expired(tmp)) { nf_ct_gc_expired(tmp); + expired_count++; continue; } + expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP); + next_run += expires; + next_run /= 2u; + if (nf_conntrack_max95 == 0 || gc_worker_skip_ct(tmp)) continue; @@ -1478,8 +1522,10 @@ static void gc_worker(struct work_struct *work) continue; } - if (gc_worker_can_early_drop(tmp)) + if (gc_worker_can_early_drop(tmp)) { nf_ct_kill(tmp); + expired_count++; + } nf_ct_put(tmp); } @@ -1492,33 +1538,38 @@ static void gc_worker(struct work_struct *work) cond_resched(); i++; - if (time_after(jiffies, end_time) && i < hashsz) { + delta_time = nfct_time_stamp - end_time; + if (delta_time > 0 && i < hashsz) { + gc_work->avg_timeout = next_run; gc_work->next_bucket = i; next_run = 0; - break; + goto early_exit; } } while (i < hashsz); + gc_work->next_bucket = 0; + + next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_MAX); + + delta_time = max_t(s32, nfct_time_stamp - gc_work->start_time, 1); + if (next_run > (unsigned long)delta_time) + next_run -= delta_time; + else + next_run = 1; + +early_exit: if (gc_work->exiting) return; - /* - * Eviction will normally happen from the packet path, and not - * from this gc worker. - * - * This worker is only here to reap expired entries when system went - * idle after a busy period. - */ - if (next_run) { + if (next_run) gc_work->early_drop = false; - gc_work->next_bucket = 0; - } + queue_delayed_work(system_power_efficient_wq, &gc_work->dwork, next_run); } static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work) { - INIT_DEFERRABLE_WORK(&gc_work->dwork, gc_worker); + INIT_DELAYED_WORK(&gc_work->dwork, gc_worker); gc_work->exiting = false; } -- 2.34.1