Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp4828764pxb; Thu, 14 Oct 2021 12:49:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyOG91wLt0zaZu3gSdVNgda5hMq4Av5PqIhCD7SJ+bi9TiocoyaweKvxPouuM8DiNbQYFGS X-Received: by 2002:a17:906:ae14:: with SMTP id le20mr1254408ejb.89.1634240960155; Thu, 14 Oct 2021 12:49:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634240960; cv=none; d=google.com; s=arc-20160816; b=0FQnXtbGNdHhqQ42goB7CqY6ccYaSHW7cmr60fE119Wm59vlOorNj8cojuIUGxdbBn ONcSPBVpaNQdKZNqZSv+slFd9b5W8ld2kg/yXEqnrPtwqUfa+hK5AkRCBdZceiv/LEiM k6zArRO15RXd912hteHFyzUioVR8tOpMhYBnNqojZQ8l9pF64K+UUfNLqcfLbMpcL/lv ywxgaB4qN3O5zIAlX0RuIX9EZgvXqphFHWq6V9rZ4Z9DGznuHIXBVekwLK8wBgGptB4A IU+2cv4ryqw30atfFWG1yZmJYW16EWK2jooD4RyeW7qROi3Oisd+ibc9eMJGKE/UEt2y 1xig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=TgBeuMD0JZtCD02gZgy2lEgMfIDPNJYnDr0bkXHP/IA=; b=fk56kXU28bpzsm0/kcpv0QoII+r4aCbS6gAWSZLHnsSl6FTXdbVKszmA201GBwZAb7 qTLKOC0yRW99a58Nkg10ntsqBxu7795GUXuhEoehffE6SI8JGqsFNQ/pA6g57cg2lQgv v1oQacRpwqU1MUvLEONxgWSkP+wAfHjvRFXN49CEh4jRyNsyMAO6PrG3Dr3VwdkIO385 b5PIDQrE6oujsoI2XACGz/oW35E3I/x67apXksfBJSOVfcsAqgY4pwgIn8ImkZ/jijhM uoC5wA9u0XM2gxpNKD9U2AAz6f8CUlts9/jhNAIBMHrAThRo8rqAuXuKvy8peOjj9c8j 6IrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=T3paWbfL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s11si4671366edj.334.2021.10.14.12.48.56; Thu, 14 Oct 2021 12:49:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=T3paWbfL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233550AbhJNPGF (ORCPT + 99 others); Thu, 14 Oct 2021 11:06:05 -0400 Received: from mail.kernel.org ([198.145.29.99]:51704 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233316AbhJNPEZ (ORCPT ); Thu, 14 Oct 2021 11:04:25 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id A0F6E611C0; Thu, 14 Oct 2021 15:00:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1634223643; bh=jrcEUgW68Qkqt5B8DOxIH62D8gSCwxGLreQ0NMB8Lc4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=T3paWbfLryQzgo6AH4/dUdVwuGdqDxCm2ldyIwVyhp7/hLUwlTi4fHfoW29bXSrdn isuLr3TFMPsU/Y7/WWM5sEmzYNklhOSEHBZdEMHsQiycdfG5dkT/LOsGlchpBwbTR/ DmJqDCZZLkBoR7eBFNQBXmTA4L66mY0l8FIrMSYI= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Florian Westphal , Pablo Neira Ayuso , Sasha Levin Subject: [PATCH 5.14 12/30] netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic Date: Thu, 14 Oct 2021 16:54:17 +0200 Message-Id: <20211014145209.928413054@linuxfoundation.org> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211014145209.520017940@linuxfoundation.org> References: <20211014145209.520017940@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Florian Westphal [ Upstream commit 30db406923b9285a9bac06a6af5e74bd6d0f1d06 ] masq_inet6_event is called asynchronously from system work queue, because the inet6 notifier is atomic and nf_iterate_cleanup can sleep. The ipv4 and device notifiers call nf_iterate_cleanup directly. This is legal, but these notifiers are called with RTNL mutex held. A large conntrack table with many devices coming and going will have severe impact on the system usability, with 'ip a' blocking for several seconds. This change places the defer code into a helper and makes it more generic so ipv4 and ifdown notifiers can be converted to defer the cleanup walk as well in a follow patch. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/nf_nat_masquerade.c | 122 ++++++++++++++++++------------ 1 file changed, 75 insertions(+), 47 deletions(-) diff --git a/net/netfilter/nf_nat_masquerade.c b/net/netfilter/nf_nat_masquerade.c index 8e8a65d46345..415919a6ac1a 100644 --- a/net/netfilter/nf_nat_masquerade.c +++ b/net/netfilter/nf_nat_masquerade.c @@ -9,8 +9,19 @@ #include +struct masq_dev_work { + struct work_struct work; + struct net *net; + union nf_inet_addr addr; + int ifindex; + int (*iter)(struct nf_conn *i, void *data); +}; + +#define MAX_MASQ_WORKER_COUNT 16 + static DEFINE_MUTEX(masq_mutex); static unsigned int masq_refcnt __read_mostly; +static atomic_t masq_worker_count __read_mostly; unsigned int nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum, @@ -63,6 +74,63 @@ nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum, } EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv4); +static void iterate_cleanup_work(struct work_struct *work) +{ + struct masq_dev_work *w; + + w = container_of(work, struct masq_dev_work, work); + + nf_ct_iterate_cleanup_net(w->net, w->iter, (void *)w, 0, 0); + + put_net(w->net); + kfree(w); + atomic_dec(&masq_worker_count); + module_put(THIS_MODULE); +} + +/* Iterate conntrack table in the background and remove conntrack entries + * that use the device/address being removed. + * + * In case too many work items have been queued already or memory allocation + * fails iteration is skipped, conntrack entries will time out eventually. + */ +static void nf_nat_masq_schedule(struct net *net, union nf_inet_addr *addr, + int ifindex, + int (*iter)(struct nf_conn *i, void *data), + gfp_t gfp_flags) +{ + struct masq_dev_work *w; + + if (atomic_read(&masq_worker_count) > MAX_MASQ_WORKER_COUNT) + return; + + net = maybe_get_net(net); + if (!net) + return; + + if (!try_module_get(THIS_MODULE)) + goto err_module; + + w = kzalloc(sizeof(*w), gfp_flags); + if (w) { + /* We can overshoot MAX_MASQ_WORKER_COUNT, no big deal */ + atomic_inc(&masq_worker_count); + + INIT_WORK(&w->work, iterate_cleanup_work); + w->ifindex = ifindex; + w->net = net; + w->iter = iter; + if (addr) + w->addr = *addr; + schedule_work(&w->work); + return; + } + + module_put(THIS_MODULE); + err_module: + put_net(net); +} + static int device_cmp(struct nf_conn *i, void *ifindex) { const struct nf_conn_nat *nat = nfct_nat(i); @@ -136,8 +204,6 @@ static struct notifier_block masq_inet_notifier = { }; #if IS_ENABLED(CONFIG_IPV6) -static atomic_t v6_worker_count __read_mostly; - static int nat_ipv6_dev_get_saddr(struct net *net, const struct net_device *dev, const struct in6_addr *daddr, unsigned int srcprefs, @@ -187,13 +253,6 @@ nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range, } EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv6); -struct masq_dev_work { - struct work_struct work; - struct net *net; - struct in6_addr addr; - int ifindex; -}; - static int inet6_cmp(struct nf_conn *ct, void *work) { struct masq_dev_work *w = (struct masq_dev_work *)work; @@ -204,21 +263,7 @@ static int inet6_cmp(struct nf_conn *ct, void *work) tuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple; - return ipv6_addr_equal(&w->addr, &tuple->dst.u3.in6); -} - -static void iterate_cleanup_work(struct work_struct *work) -{ - struct masq_dev_work *w; - - w = container_of(work, struct masq_dev_work, work); - - nf_ct_iterate_cleanup_net(w->net, inet6_cmp, (void *)w, 0, 0); - - put_net(w->net); - kfree(w); - atomic_dec(&v6_worker_count); - module_put(THIS_MODULE); + return nf_inet_addr_cmp(&w->addr, &tuple->dst.u3); } /* atomic notifier; can't call nf_ct_iterate_cleanup_net (it can sleep). @@ -233,36 +278,19 @@ static int masq_inet6_event(struct notifier_block *this, { struct inet6_ifaddr *ifa = ptr; const struct net_device *dev; - struct masq_dev_work *w; - struct net *net; + union nf_inet_addr addr; - if (event != NETDEV_DOWN || atomic_read(&v6_worker_count) >= 16) + if (event != NETDEV_DOWN) return NOTIFY_DONE; dev = ifa->idev->dev; - net = maybe_get_net(dev_net(dev)); - if (!net) - return NOTIFY_DONE; - if (!try_module_get(THIS_MODULE)) - goto err_module; + memset(&addr, 0, sizeof(addr)); - w = kmalloc(sizeof(*w), GFP_ATOMIC); - if (w) { - atomic_inc(&v6_worker_count); + addr.in6 = ifa->addr; - INIT_WORK(&w->work, iterate_cleanup_work); - w->ifindex = dev->ifindex; - w->net = net; - w->addr = ifa->addr; - schedule_work(&w->work); - - return NOTIFY_DONE; - } - - module_put(THIS_MODULE); - err_module: - put_net(net); + nf_nat_masq_schedule(dev_net(dev), &addr, dev->ifindex, inet6_cmp, + GFP_ATOMIC); return NOTIFY_DONE; } -- 2.33.0