Received: by 2002:a05:6512:3d0e:0:0:0:0 with SMTP id d14csp48265lfv; Tue, 12 Apr 2022 16:46:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw4GNmzgsq3vYQYJjgGbdsluT3XXwm56Nm1zoOJq1+NLp3KeLPLXsVBByDUlIDFAbZHu6hO X-Received: by 2002:a17:902:9307:b0:154:78ba:ed40 with SMTP id bc7-20020a170902930700b0015478baed40mr39546235plb.92.1649807219082; Tue, 12 Apr 2022 16:46:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649807219; cv=none; d=google.com; s=arc-20160816; b=JZwQKM4L7l43iiLDKP/8SKOyU3xq8l3utsKtTQrMDaAsg7NiylXRzb70om5jRw0sGQ rzs1VsC5FWtT7GHYoxdLfS4QuNpPMvKadRo5Dukl+2/FPtO9EmciKIHL4MQSSR1197vy Zi77mR2e56swqUV3jit6b5EKfmGipyU3BcC7ZIo+DgwrIFxhWqtCxFTzZVsq8RP+oELK vZqzATBPp4VIXleypHFnLqLJ7SZMQ0OW9Lu9yCGP6jMfoV9rmFQxyaIMDyFaHyo2MMP8 e2BH1stXVASTwjlsm/+WlKQNI4X/35917SauTfFzAnuclCq9dQ/zsT/HW8MfJd2IfqoS iAWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=h8dcCl5oxK4whR5L7xZKF/Zp4mnNG9r5+1k0xpAGJy0=; b=u0D83U5OSXE4yeNhr0P1ym5/4MNnluRcw9+fH4ehp6RfMItiekE+xEotcjcqp07/UX xbImYfA8vSnVsSdpvzxS8Ag1HADJxOdqYvyeWXAv9riYbVyUBEjmUiBW9aXRYlVZOyzH RBiSXWQ9NFqUj8MIAMqumJpk/9eTtesqoWJi20p4VFbM61CnOqNZoTlGkpGGolaOMSUg w2jX0EJuv2xnsaI7JSLu2bRP5V/8YG+ca8n8fTsvwEfj4SCr4hIajg72FrfyDJMn6g49 M36jkLNTF56T1QjeuKA6kwn3FZlQBBAcjGym+d6nv/j/+T2aBpIJZHRPNilzwViv+Ra7 0Jvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b="lKvfQ/Fs"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id m14-20020a170902bb8e00b00156999baa94si12544298pls.435.2022.04.12.16.46.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Apr 2022 16:46:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b="lKvfQ/Fs"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4CA3912F16F; Tue, 12 Apr 2022 14:39:18 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384581AbiDLImE (ORCPT + 99 others); Tue, 12 Apr 2022 04:42:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357241AbiDLHjx (ORCPT ); Tue, 12 Apr 2022 03:39:53 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74978DB1; Tue, 12 Apr 2022 00:14:12 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2D937B81B4F; Tue, 12 Apr 2022 07:14:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 92C64C385A1; Tue, 12 Apr 2022 07:14:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1649747650; bh=qTmYQrOjWDd3Zahu3bZmLVJ06wcEPA21y1qIWN7atiA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lKvfQ/Fs/BiNRpDgJPMA5u8a77nTTB/e9EViRGtdZERjVaU9K6C+CX2tCfSir/WwO PO7yW9Sp2bc5yapK7BN8j7JtXYC4+rsSzK9TDc01cl9GAS2HfdMUIwxmB1FS7oGwec BKeP88ltEe5fADP04UzGbPs6TzOjkORDNU02Hzs4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Karel Rericha , Shmulik Ladkani , Eyal Birger , Florian Westphal , Pablo Neira Ayuso , Sasha Levin Subject: [PATCH 5.17 149/343] netfilter: conntrack: revisit gc autotuning Date: Tue, 12 Apr 2022 08:29:27 +0200 Message-Id: <20220412062955.680707128@linuxfoundation.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220412062951.095765152@linuxfoundation.org> References: <20220412062951.095765152@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Florian Westphal [ Upstream commit 2cfadb761d3d0219412fd8150faea60c7e863833 ] as of commit 4608fdfc07e1 ("netfilter: conntrack: collect all entries in one cycle") conntrack gc was changed to run every 2 minutes. On systems where conntrack hash table is set to large value, most evictions happen from gc worker rather than the packet path due to hash table distribution. This causes netlink event overflows when events are collected. This change collects average expiry of scanned entries and reschedules to the average remaining value, within 1 to 60 second interval. To avoid event overflows, reschedule after each bucket and add a limit for both run time and number of evictions per run. If more entries have to be evicted, reschedule and restart 1 jiffy into the future. Reported-by: Karel Rericha Cc: Shmulik Ladkani Cc: Eyal Birger Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/nf_conntrack_core.c | 85 ++++++++++++++++++++++++------- 1 file changed, 68 insertions(+), 17 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index bf1e17c678f1..7552e1e9fd62 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -67,6 +67,8 @@ EXPORT_SYMBOL_GPL(nf_conntrack_hash); struct conntrack_gc_work { struct delayed_work dwork; u32 next_bucket; + u32 avg_timeout; + u32 start_time; bool exiting; bool early_drop; }; @@ -78,8 +80,19 @@ static __read_mostly bool nf_conntrack_locks_all; /* serialize hash resizes and nf_ct_iterate_cleanup */ static DEFINE_MUTEX(nf_conntrack_mutex); -#define GC_SCAN_INTERVAL (120u * HZ) +#define GC_SCAN_INTERVAL_MAX (60ul * HZ) +#define GC_SCAN_INTERVAL_MIN (1ul * HZ) + +/* clamp timeouts to this value (TCP unacked) */ +#define GC_SCAN_INTERVAL_CLAMP (300ul * HZ) + +/* large initial bias so that we don't scan often just because we have + * three entries with a 1s timeout. + */ +#define GC_SCAN_INTERVAL_INIT INT_MAX + #define GC_SCAN_MAX_DURATION msecs_to_jiffies(10) +#define GC_SCAN_EXPIRED_MAX (64000u / HZ) #define MIN_CHAINLEN 8u #define MAX_CHAINLEN (32u - MIN_CHAINLEN) @@ -1421,16 +1434,28 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct) static void gc_worker(struct work_struct *work) { - unsigned long end_time = jiffies + GC_SCAN_MAX_DURATION; unsigned int i, hashsz, nf_conntrack_max95 = 0; - unsigned long next_run = GC_SCAN_INTERVAL; + u32 end_time, start_time = nfct_time_stamp; struct conntrack_gc_work *gc_work; + unsigned int expired_count = 0; + unsigned long next_run; + s32 delta_time; + gc_work = container_of(work, struct conntrack_gc_work, dwork.work); i = gc_work->next_bucket; if (gc_work->early_drop) nf_conntrack_max95 = nf_conntrack_max / 100u * 95u; + if (i == 0) { + gc_work->avg_timeout = GC_SCAN_INTERVAL_INIT; + gc_work->start_time = start_time; + } + + next_run = gc_work->avg_timeout; + + end_time = start_time + GC_SCAN_MAX_DURATION; + do { struct nf_conntrack_tuple_hash *h; struct hlist_nulls_head *ct_hash; @@ -1447,6 +1472,7 @@ static void gc_worker(struct work_struct *work) hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) { struct nf_conntrack_net *cnet; + unsigned long expires; struct net *net; tmp = nf_ct_tuplehash_to_ctrack(h); @@ -1456,11 +1482,29 @@ static void gc_worker(struct work_struct *work) continue; } + if (expired_count > GC_SCAN_EXPIRED_MAX) { + rcu_read_unlock(); + + gc_work->next_bucket = i; + gc_work->avg_timeout = next_run; + + delta_time = nfct_time_stamp - gc_work->start_time; + + /* re-sched immediately if total cycle time is exceeded */ + next_run = delta_time < (s32)GC_SCAN_INTERVAL_MAX; + goto early_exit; + } + if (nf_ct_is_expired(tmp)) { nf_ct_gc_expired(tmp); + expired_count++; continue; } + expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP); + next_run += expires; + next_run /= 2u; + if (nf_conntrack_max95 == 0 || gc_worker_skip_ct(tmp)) continue; @@ -1478,8 +1522,10 @@ static void gc_worker(struct work_struct *work) continue; } - if (gc_worker_can_early_drop(tmp)) + if (gc_worker_can_early_drop(tmp)) { nf_ct_kill(tmp); + expired_count++; + } nf_ct_put(tmp); } @@ -1492,33 +1538,38 @@ static void gc_worker(struct work_struct *work) cond_resched(); i++; - if (time_after(jiffies, end_time) && i < hashsz) { + delta_time = nfct_time_stamp - end_time; + if (delta_time > 0 && i < hashsz) { + gc_work->avg_timeout = next_run; gc_work->next_bucket = i; next_run = 0; - break; + goto early_exit; } } while (i < hashsz); + gc_work->next_bucket = 0; + + next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_MAX); + + delta_time = max_t(s32, nfct_time_stamp - gc_work->start_time, 1); + if (next_run > (unsigned long)delta_time) + next_run -= delta_time; + else + next_run = 1; + +early_exit: if (gc_work->exiting) return; - /* - * Eviction will normally happen from the packet path, and not - * from this gc worker. - * - * This worker is only here to reap expired entries when system went - * idle after a busy period. - */ - if (next_run) { + if (next_run) gc_work->early_drop = false; - gc_work->next_bucket = 0; - } + queue_delayed_work(system_power_efficient_wq, &gc_work->dwork, next_run); } static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work) { - INIT_DEFERRABLE_WORK(&gc_work->dwork, gc_worker); + INIT_DELAYED_WORK(&gc_work->dwork, gc_worker); gc_work->exiting = false; } -- 2.35.1