Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp4734484pxb; Thu, 14 Oct 2021 10:53:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy0JBH4LtwTcegFuIG+PeUznPjwunxrrFcmDrWIx4mWNGUU5dxjr/qDOhBQrPvXwegsAFNr X-Received: by 2002:a17:906:6011:: with SMTP id o17mr435426ejj.157.1634233999010; Thu, 14 Oct 2021 10:53:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634233999; cv=none; d=google.com; s=arc-20160816; b=DXWyBYnQUKJKNzZQQP4caC0MJK2Uy1wsYZdCYGgOHT4/r06L9ExdRyMuLgomceBi9g zcap35+BObgVa9V11FZeJXtumm3y8MzxWoMmkFKUHflFOp86B1/yfVwNe8Cb6HKA3Khq lvuAOqiF+eDbAkLtSsC0IILArCZ/WRSvMK/uXmkrA326iOGH61nwN/ahL0fuf5fM6hPP BxI0tvfo6EBo8k5mdpmN/cHz//+3IqoSswQfQqYMb37HbGQg2cyLE8z1bs0HzIeSoZp0 Wfe1Zul4DiWfRCa2ISzLlZ7YaFvw7euZ2RG9XU2vK6q4KTzK4sqTtvnnXKyWkESdbKma lwuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=TgBeuMD0JZtCD02gZgy2lEgMfIDPNJYnDr0bkXHP/IA=; b=T0NEuLXuguj49I9PHAXBVTwNDripsUQPIOZ+UOftbr/j8Qxct46jxv7ZqA++VMsJhl Rp4epl7f9JoG6JYf4BXCkzPoqcM445EU6R+Dom0yPuhvdQBdC6lpF7aW4uXqcrHDMv2P GQFH2x31V+LfiHKllpxkmiLsKdpHrQ+PT3hv8RgWrYqy37YQxjRFCBneKROVXupy6jv2 2Wz5A1kNq4u8UGSCVZgNov+yrq89sbXyJu41VxZrTT749vER4T4p4SOmOP+gTG5CrRHM 2wGrgjBRle9X4/47pF8nq7a/LKQpNIV0WJUpwQh6TH1UJPuBogxfcoSnst5KGvKowh9h khKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=F1uAcc3I; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n14si4882978eda.482.2021.10.14.10.52.55; Thu, 14 Oct 2021 10:53:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=F1uAcc3I; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232698AbhJNPDp (ORCPT + 99 others); Thu, 14 Oct 2021 11:03:45 -0400 Received: from mail.kernel.org ([198.145.29.99]:42916 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232720AbhJNPCR (ORCPT ); Thu, 14 Oct 2021 11:02:17 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id C912B6120C; Thu, 14 Oct 2021 14:59:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1634223558; bh=jrcEUgW68Qkqt5B8DOxIH62D8gSCwxGLreQ0NMB8Lc4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=F1uAcc3IAMlRckK2oBCDjeds0nPSRJaYh9fheGN1ePybnt4C6esoRb5TWhK8NhoNG Y1GWpeBsvmuRsDsAzcPfz7qLfeAkN1vGMC5K2BZ+MgvxZrIzWoAp96UyZ7pqM8KIRe qpmkLA1iPogJtTb878FPHsP9er89wvnRIHeeQCD4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Florian Westphal , Pablo Neira Ayuso , Sasha Levin Subject: [PATCH 5.4 06/16] netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic Date: Thu, 14 Oct 2021 16:54:09 +0200 Message-Id: <20211014145207.525597975@linuxfoundation.org> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211014145207.314256898@linuxfoundation.org> References: <20211014145207.314256898@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Florian Westphal [ Upstream commit 30db406923b9285a9bac06a6af5e74bd6d0f1d06 ] masq_inet6_event is called asynchronously from system work queue, because the inet6 notifier is atomic and nf_iterate_cleanup can sleep. The ipv4 and device notifiers call nf_iterate_cleanup directly. This is legal, but these notifiers are called with RTNL mutex held. A large conntrack table with many devices coming and going will have severe impact on the system usability, with 'ip a' blocking for several seconds. This change places the defer code into a helper and makes it more generic so ipv4 and ifdown notifiers can be converted to defer the cleanup walk as well in a follow patch. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/nf_nat_masquerade.c | 122 ++++++++++++++++++------------ 1 file changed, 75 insertions(+), 47 deletions(-) diff --git a/net/netfilter/nf_nat_masquerade.c b/net/netfilter/nf_nat_masquerade.c index 8e8a65d46345..415919a6ac1a 100644 --- a/net/netfilter/nf_nat_masquerade.c +++ b/net/netfilter/nf_nat_masquerade.c @@ -9,8 +9,19 @@ #include +struct masq_dev_work { + struct work_struct work; + struct net *net; + union nf_inet_addr addr; + int ifindex; + int (*iter)(struct nf_conn *i, void *data); +}; + +#define MAX_MASQ_WORKER_COUNT 16 + static DEFINE_MUTEX(masq_mutex); static unsigned int masq_refcnt __read_mostly; +static atomic_t masq_worker_count __read_mostly; unsigned int nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum, @@ -63,6 +74,63 @@ nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum, } EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv4); +static void iterate_cleanup_work(struct work_struct *work) +{ + struct masq_dev_work *w; + + w = container_of(work, struct masq_dev_work, work); + + nf_ct_iterate_cleanup_net(w->net, w->iter, (void *)w, 0, 0); + + put_net(w->net); + kfree(w); + atomic_dec(&masq_worker_count); + module_put(THIS_MODULE); +} + +/* Iterate conntrack table in the background and remove conntrack entries + * that use the device/address being removed. + * + * In case too many work items have been queued already or memory allocation + * fails iteration is skipped, conntrack entries will time out eventually. + */ +static void nf_nat_masq_schedule(struct net *net, union nf_inet_addr *addr, + int ifindex, + int (*iter)(struct nf_conn *i, void *data), + gfp_t gfp_flags) +{ + struct masq_dev_work *w; + + if (atomic_read(&masq_worker_count) > MAX_MASQ_WORKER_COUNT) + return; + + net = maybe_get_net(net); + if (!net) + return; + + if (!try_module_get(THIS_MODULE)) + goto err_module; + + w = kzalloc(sizeof(*w), gfp_flags); + if (w) { + /* We can overshoot MAX_MASQ_WORKER_COUNT, no big deal */ + atomic_inc(&masq_worker_count); + + INIT_WORK(&w->work, iterate_cleanup_work); + w->ifindex = ifindex; + w->net = net; + w->iter = iter; + if (addr) + w->addr = *addr; + schedule_work(&w->work); + return; + } + + module_put(THIS_MODULE); + err_module: + put_net(net); +} + static int device_cmp(struct nf_conn *i, void *ifindex) { const struct nf_conn_nat *nat = nfct_nat(i); @@ -136,8 +204,6 @@ static struct notifier_block masq_inet_notifier = { }; #if IS_ENABLED(CONFIG_IPV6) -static atomic_t v6_worker_count __read_mostly; - static int nat_ipv6_dev_get_saddr(struct net *net, const struct net_device *dev, const struct in6_addr *daddr, unsigned int srcprefs, @@ -187,13 +253,6 @@ nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range, } EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv6); -struct masq_dev_work { - struct work_struct work; - struct net *net; - struct in6_addr addr; - int ifindex; -}; - static int inet6_cmp(struct nf_conn *ct, void *work) { struct masq_dev_work *w = (struct masq_dev_work *)work; @@ -204,21 +263,7 @@ static int inet6_cmp(struct nf_conn *ct, void *work) tuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple; - return ipv6_addr_equal(&w->addr, &tuple->dst.u3.in6); -} - -static void iterate_cleanup_work(struct work_struct *work) -{ - struct masq_dev_work *w; - - w = container_of(work, struct masq_dev_work, work); - - nf_ct_iterate_cleanup_net(w->net, inet6_cmp, (void *)w, 0, 0); - - put_net(w->net); - kfree(w); - atomic_dec(&v6_worker_count); - module_put(THIS_MODULE); + return nf_inet_addr_cmp(&w->addr, &tuple->dst.u3); } /* atomic notifier; can't call nf_ct_iterate_cleanup_net (it can sleep). @@ -233,36 +278,19 @@ static int masq_inet6_event(struct notifier_block *this, { struct inet6_ifaddr *ifa = ptr; const struct net_device *dev; - struct masq_dev_work *w; - struct net *net; + union nf_inet_addr addr; - if (event != NETDEV_DOWN || atomic_read(&v6_worker_count) >= 16) + if (event != NETDEV_DOWN) return NOTIFY_DONE; dev = ifa->idev->dev; - net = maybe_get_net(dev_net(dev)); - if (!net) - return NOTIFY_DONE; - if (!try_module_get(THIS_MODULE)) - goto err_module; + memset(&addr, 0, sizeof(addr)); - w = kmalloc(sizeof(*w), GFP_ATOMIC); - if (w) { - atomic_inc(&v6_worker_count); + addr.in6 = ifa->addr; - INIT_WORK(&w->work, iterate_cleanup_work); - w->ifindex = dev->ifindex; - w->net = net; - w->addr = ifa->addr; - schedule_work(&w->work); - - return NOTIFY_DONE; - } - - module_put(THIS_MODULE); - err_module: - put_net(net); + nf_nat_masq_schedule(dev_net(dev), &addr, dev->ifindex, inet6_cmp, + GFP_ATOMIC); return NOTIFY_DONE; } -- 2.33.0