Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp999519pxb; Wed, 1 Sep 2021 14:56:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzidYmktw+QGDF+SK/0wSo61/2Pf4onaHIrgWakoRktWXja0aKDqGkXeVtZmeSiLNcSiQSq X-Received: by 2002:a17:906:68c2:: with SMTP id y2mr21527ejr.18.1630533412927; Wed, 01 Sep 2021 14:56:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630533412; cv=none; d=google.com; s=arc-20160816; b=QRPgQM/n6FObJVWM1qnq5aZiyTYkrMEishVbStlTcLGTVnc+f/NJjgn0uDcXjsINhL 0LgHB7NWmrkcNdMiY+qfrkiRXVVMCPfQVUOGlCCnHMHr+hdyptH8SVq/T6xZj365O2Mk HnfjwSzAqg5UTIxYKQEipVBCTmEr4mGUttHkZH7tnK9qjfqtB5bLRXdeuERsgO4BOIv3 f1O0ESaYlqaIv34vCqbhjLE8awVIcbjoRKSIk8qWKieXui7alwFvahxBNR78ioZw0D+t sx1yGur8TvqBVP6f5Q39opr3awhG8MWlwRPMQyZmqd4/XoFi4TYrkMMlRdAa9DFJRCCn cXWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=6YSousZ1oIGUfFoMo5sXXDLgx9MUhcuZMT/u6mJwu0Q=; b=Tkrh3/ilEkys1a5E/Q3yJq9O5mdveWtQgcYbwRCT46o38l9oXfMdrzZsMu5WDpzg6/ JIg6+SVy/XCydg5iPvQcJKCJ9nWou4Cm1KMZwkb913q9C79y20zkb/6VYxHADAbRPNQh E/n0rI+xLgjMOUSnCqq24sfSVJuxDuq8ZEKmMgL7yCXT3WKka+HHtj2e/mc6M2w+68RB bq82GrgQwPHogTrwRa+isF+KwmPkQs9Pq7SxheCLGpljECUI59zz0B36DB8u1z62cJk5 Ypv6FTbBDUccfVLFU9mfM+I0bTXhOIg3JIGh3IgZnefsrgTMyo/DNY5hNXb0dWfw3gzX Dhlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=abeJJygg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g18si713426eds.512.2021.09.01.14.56.30; Wed, 01 Sep 2021 14:56:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=abeJJygg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244549AbhIAMbt (ORCPT + 99 others); Wed, 1 Sep 2021 08:31:49 -0400 Received: from mail.kernel.org ([198.145.29.99]:60574 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244528AbhIAMbJ (ORCPT ); Wed, 1 Sep 2021 08:31:09 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 923A661090; Wed, 1 Sep 2021 12:30:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1630499413; bh=UXnPXI1cTXaNjcJpROv7owWl0W840oE9wPlrQJQJ1R8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=abeJJyggLif4wE/76Juoest1c7wZI2kmRSmRuGU0L6a8v+k3g68eUOk3TSY59ELK7 x70wEb/Ae1AWLXQPC9EiShvrn9WE9caV7igx/rF54t5Q4HyzkXEgniZ7nyBNja1Kvw BAqcaG4nYBRb4WLm7Hjwri5TwTE0rezzBj9lQ3Ig= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Michal Kubecek , Florian Westphal , Pablo Neira Ayuso , Sasha Levin Subject: [PATCH 4.19 06/33] netfilter: conntrack: collect all entries in one cycle Date: Wed, 1 Sep 2021 14:27:55 +0200 Message-Id: <20210901122250.993208505@linuxfoundation.org> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210901122250.752620302@linuxfoundation.org> References: <20210901122250.752620302@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Florian Westphal [ Upstream commit 4608fdfc07e116f9fc0895beb40abad7cdb5ee3d ] Michal Kubecek reports that conntrack gc is responsible for frequent wakeups (every 125ms) on idle systems. On busy systems, timed out entries are evicted during lookup. The gc worker is only needed to remove entries after system becomes idle after a busy period. To resolve this, always scan the entire table. If the scan is taking too long, reschedule so other work_structs can run and resume from next bucket. After a completed scan, wait for 2 minutes before the next cycle. Heuristics for faster re-schedule are removed. GC_SCAN_INTERVAL could be exposed as a sysctl in the future to allow tuning this as-needed or even turn the gc worker off. Reported-by: Michal Kubecek Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/nf_conntrack_core.c | 71 ++++++++++--------------------- 1 file changed, 22 insertions(+), 49 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index c5590d36b775..a38caf317dbb 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -70,10 +70,9 @@ EXPORT_SYMBOL_GPL(nf_conntrack_hash); struct conntrack_gc_work { struct delayed_work dwork; - u32 last_bucket; + u32 next_bucket; bool exiting; bool early_drop; - long next_gc_run; }; static __read_mostly struct kmem_cache *nf_conntrack_cachep; @@ -81,12 +80,8 @@ static __read_mostly spinlock_t nf_conntrack_locks_all_lock; static __read_mostly DEFINE_SPINLOCK(nf_conntrack_locks_all_lock); static __read_mostly bool nf_conntrack_locks_all; -/* every gc cycle scans at most 1/GC_MAX_BUCKETS_DIV part of table */ -#define GC_MAX_BUCKETS_DIV 128u -/* upper bound of full table scan */ -#define GC_MAX_SCAN_JIFFIES (16u * HZ) -/* desired ratio of entries found to be expired */ -#define GC_EVICT_RATIO 50u +#define GC_SCAN_INTERVAL (120u * HZ) +#define GC_SCAN_MAX_DURATION msecs_to_jiffies(10) static struct conntrack_gc_work conntrack_gc_work; @@ -1198,17 +1193,13 @@ static void nf_ct_offload_timeout(struct nf_conn *ct) static void gc_worker(struct work_struct *work) { - unsigned int min_interval = max(HZ / GC_MAX_BUCKETS_DIV, 1u); - unsigned int i, goal, buckets = 0, expired_count = 0; - unsigned int nf_conntrack_max95 = 0; + unsigned long end_time = jiffies + GC_SCAN_MAX_DURATION; + unsigned int i, hashsz, nf_conntrack_max95 = 0; + unsigned long next_run = GC_SCAN_INTERVAL; struct conntrack_gc_work *gc_work; - unsigned int ratio, scanned = 0; - unsigned long next_run; - gc_work = container_of(work, struct conntrack_gc_work, dwork.work); - goal = nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV; - i = gc_work->last_bucket; + i = gc_work->next_bucket; if (gc_work->early_drop) nf_conntrack_max95 = nf_conntrack_max / 100u * 95u; @@ -1216,22 +1207,21 @@ static void gc_worker(struct work_struct *work) struct nf_conntrack_tuple_hash *h; struct hlist_nulls_head *ct_hash; struct hlist_nulls_node *n; - unsigned int hashsz; struct nf_conn *tmp; - i++; rcu_read_lock(); nf_conntrack_get_ht(&ct_hash, &hashsz); - if (i >= hashsz) - i = 0; + if (i >= hashsz) { + rcu_read_unlock(); + break; + } hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) { struct net *net; tmp = nf_ct_tuplehash_to_ctrack(h); - scanned++; if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) { nf_ct_offload_timeout(tmp); continue; @@ -1239,7 +1229,6 @@ static void gc_worker(struct work_struct *work) if (nf_ct_is_expired(tmp)) { nf_ct_gc_expired(tmp); - expired_count++; continue; } @@ -1271,7 +1260,14 @@ static void gc_worker(struct work_struct *work) */ rcu_read_unlock(); cond_resched(); - } while (++buckets < goal); + i++; + + if (time_after(jiffies, end_time) && i < hashsz) { + gc_work->next_bucket = i; + next_run = 0; + break; + } + } while (i < hashsz); if (gc_work->exiting) return; @@ -1282,40 +1278,17 @@ static void gc_worker(struct work_struct *work) * * This worker is only here to reap expired entries when system went * idle after a busy period. - * - * The heuristics below are supposed to balance conflicting goals: - * - * 1. Minimize time until we notice a stale entry - * 2. Maximize scan intervals to not waste cycles - * - * Normally, expire ratio will be close to 0. - * - * As soon as a sizeable fraction of the entries have expired - * increase scan frequency. */ - ratio = scanned ? expired_count * 100 / scanned : 0; - if (ratio > GC_EVICT_RATIO) { - gc_work->next_gc_run = min_interval; - } else { - unsigned int max = GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV; - - BUILD_BUG_ON((GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV) == 0); - - gc_work->next_gc_run += min_interval; - if (gc_work->next_gc_run > max) - gc_work->next_gc_run = max; + if (next_run) { + gc_work->early_drop = false; + gc_work->next_bucket = 0; } - - next_run = gc_work->next_gc_run; - gc_work->last_bucket = i; - gc_work->early_drop = false; queue_delayed_work(system_power_efficient_wq, &gc_work->dwork, next_run); } static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work) { INIT_DEFERRABLE_WORK(&gc_work->dwork, gc_worker); - gc_work->next_gc_run = HZ; gc_work->exiting = false; } -- 2.30.2