Received: by 10.223.176.5 with SMTP id f5csp3413041wra; Mon, 29 Jan 2018 12:49:19 -0800 (PST) X-Google-Smtp-Source: AH8x227xW/c+bR6t1W4Fq6VYvAm0QbxrJpyfRr2STWXOZTn3moP5Sy7O1T1MTzzV8fKt35m+xVCd X-Received: by 2002:a17:902:7c97:: with SMTP id y23-v6mr23106417pll.439.1517258959753; Mon, 29 Jan 2018 12:49:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517258959; cv=none; d=google.com; s=arc-20160816; b=lNn66at4Wjo7oHWoN1Y6DHliHe6ipHzg6tlmjxKlML3SFHDHFeRH0Y8KSqNnZrNktU p8kyzBJLQWvZC+l2rHWPPPW1Rh93Tq60Nz5T0ua6Tf/8It/vOgwyQ7pHExs376tIuQXI uMmX5K/GSX1qsrI1EE3HFsNEqiIoZSpdLgIC6h510GMlflNBJABS0sZQxA1xgedmh1MF 3f4hs04JDR4d0tUdivg4s1Jz8Ud3sLb8bO/T34n2hnTGmfbicrR5BRiuBpx9WJBNFzpX zz3QixWQCYb24j55YscWd6K4cstX3/m0SafWJkCSJ4YS76pS02pTRnV/JEQl1dDnjkqk v1Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=AS4QB2v+s4ShAkeJO+jZ/djDcl6upDTMGkAJNdbz3Bk=; b=CJ/TfcrpoHMG89j/jr8YSFSPZO0+XxyChRvY37ZHyGfkFNwmrZ8HS9EXz3mtCtFtE7 1wZ/3Fx/ZCbpV4KIx9X/nLCm1bA0qLKPeehB4Ccnt96KFqL9jFkE69HUveLw/WKzpWk4 mzAuTvRoi8H/Scr46XVNfoFL8eLaG3DftHaR2b07uTk77+tuJjUKdlag+g2erKq5+Mkb bOWjVS4Wy+sPMjUTGkGc1/8Zu0/hevd9DPtZRCMAYiZ3bGzji6B8JnVdjFvH1n7GEM0G zIe5DHRCs7RXKKGeWhz92i/QFnyMU/pCuIenNvBsIEILDUZJnO+DpWRMF/uhQY/jRtSS LYSA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y24si3994704pfa.194.2018.01.29.12.49.05; Mon, 29 Jan 2018 12:49:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753907AbeA2Usq (ORCPT + 99 others); Mon, 29 Jan 2018 15:48:46 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:32768 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753736AbeA2ULh (ORCPT ); Mon, 29 Jan 2018 15:11:37 -0500 Received: from localhost (LFbn-1-12258-90.w90-92.abo.wanadoo.fr [90.92.71.90]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id EB5323015; Mon, 29 Jan 2018 13:09:47 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Dan Streetman , "David S. Miller" Subject: [PATCH 4.14 32/71] net: tcp: close sock if net namespace is exiting Date: Mon, 29 Jan 2018 13:57:00 +0100 Message-Id: <20180129123829.368488430@linuxfoundation.org> X-Mailer: git-send-email 2.16.1 In-Reply-To: <20180129123827.271171825@linuxfoundation.org> References: <20180129123827.271171825@linuxfoundation.org> User-Agent: quilt/0.65 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Dan Streetman [ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ] When a tcp socket is closed, if it detects that its net namespace is exiting, close immediately and do not wait for FIN sequence. For normal sockets, a reference is taken to their net namespace, so it will never exit while the socket is open. However, kernel sockets do not take a reference to their net namespace, so it may begin exiting while the kernel socket is still open. In this case if the kernel socket is a tcp socket, it will stay open trying to complete its close sequence. The sock's dst(s) hold a reference to their interface, which are all transferred to the namespace's loopback interface when the real interfaces are taken down. When the namespace tries to take down its loopback interface, it hangs waiting for all references to the loopback interface to release, which results in messages like: unregister_netdevice: waiting for lo to become free. Usage count = 1 These messages continue until the socket finally times out and closes. Since the net namespace cleanup holds the net_mutex while calling its registered pernet callbacks, any new net namespace initialization is blocked until the current net namespace finishes exiting. After this change, the tcp socket notices the exiting net namespace, and closes immediately, releasing its dst(s) and their reference to the loopback interface, which lets the net namespace continue exiting. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811 Signed-off-by: Dan Streetman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/net_namespace.h | 10 ++++++++++ net/ipv4/tcp.c | 3 +++ net/ipv4/tcp_timer.c | 15 +++++++++++++++ 3 files changed, 28 insertions(+) --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -223,6 +223,11 @@ int net_eq(const struct net *net1, const return net1 == net2; } +static inline int check_net(const struct net *net) +{ + return atomic_read(&net->count) != 0; +} + void net_drop_ns(void *); #else @@ -246,6 +251,11 @@ int net_eq(const struct net *net1, const { return 1; } + +static inline int check_net(const struct net *net) +{ + return 1; +} #define net_drop_ns NULL #endif --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2273,6 +2273,9 @@ adjudge_to_death: tcp_send_active_reset(sk, GFP_ATOMIC); __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY); + } else if (!check_net(sock_net(sk))) { + /* Not possible to send reset; just close */ + tcp_set_state(sk, TCP_CLOSE); } } --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -50,11 +50,19 @@ static void tcp_write_err(struct sock *s * to prevent DoS attacks. It is called when a retransmission timeout * or zero probe timeout occurs on orphaned socket. * + * Also close if our net namespace is exiting; in that case there is no + * hope of ever communicating again since all netns interfaces are already + * down (or about to be down), and we need to release our dst references, + * which have been moved to the netns loopback interface, so the namespace + * can finish exiting. This condition is only possible if we are a kernel + * socket, as those do not hold references to the namespace. + * * Criteria is still not confirmed experimentally and may change. * We kill the socket, if: * 1. If number of orphaned sockets exceeds an administratively configured * limit. * 2. If we have strong memory pressure. + * 3. If our net namespace is exiting. */ static int tcp_out_of_resources(struct sock *sk, bool do_reset) { @@ -83,6 +91,13 @@ static int tcp_out_of_resources(struct s __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY); return 1; } + + if (!check_net(sock_net(sk))) { + /* Not possible to send reset; just close */ + tcp_done(sk); + return 1; + } + return 0; }