Received: by 10.223.176.46 with SMTP id f43csp20456wra; Thu, 18 Jan 2018 13:16:27 -0800 (PST) X-Google-Smtp-Source: ACJfBotUM59PTGQNkj13P/ugrNUMSzdnGSEDea4lDms553tAAnC4QpvxvWU1IvyW1kHMbbMJhGQD X-Received: by 10.101.82.1 with SMTP id o1mr27716249pgp.259.1516310187739; Thu, 18 Jan 2018 13:16:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516310187; cv=none; d=google.com; s=arc-20160816; b=uqyjUityd8s8acJ/uHoBUEzAhyhHiI4AX0W+rodH+VN2KPhTst45UX+bL07r8F4XcC bm9xB3LnOk1r+HgOifyYca8KUTh1VQVY8tMlAeFCvVNTAUA+yapfvjn38jfnXeHDZ1wm AFQXBrRdoKXdX03Uu68rTGJRE+6g/T+RrzwPpUDORQd5NNH3V7z2/yNUzCKVTRAi4H4m MYp7Awy/REnItsBIP/pv3kiSngzbQEIKr70irixDaXhHF9VgLwser0dltKOTrxJ3lG/z +A7aakVZ5o+37NbcQfGcneNR5m75v/V3/ZUrj81HALDLODJkxua3XaM3pZiLeYv0j9h4 gbBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=+fyKuURIpONdCoPE2K84GyajleF1domwYMmBQPWeBsA=; b=XggvTfJo+4xD2YMsKsFuKBVOiefIc/l69eBw94UPwWnSe0t7IEZ/yph9hcqQTKqcUA 8mw3j+fcnYhuNOuzbXuyTn4rWnvOJO47XfjpqMC7D8aCxQ3T8EZUk2sVWUJMiva0YYX2 vT9xJi7DXEqXjcR4TwprN9FF/0YVoAwf/SNEbuUCOdU3CyPdbPucYtueNtrPa1DgFr1J tjH4QHE4z2lxV2u48us2Lww7upRKEDtU+DW9tdKw7F4iIHJbuzLntGZcxlnVeKYpAGgm 6by7+ofUFNDKH1Z1hg/50Yw8uV6JhDVvnFk9lHJPB7DWW53Li3EOCuPSqlpdhiO+JDgu npwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=pBoo8sTX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p2si6908394pgr.379.2018.01.18.13.16.12; Thu, 18 Jan 2018 13:16:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=pBoo8sTX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753546AbeARVPl (ORCPT + 99 others); Thu, 18 Jan 2018 16:15:41 -0500 Received: from mail-qt0-f193.google.com ([209.85.216.193]:36325 "EHLO mail-qt0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753525AbeARVPf (ORCPT ); Thu, 18 Jan 2018 16:15:35 -0500 Received: by mail-qt0-f193.google.com with SMTP id z11so8000509qtm.3; Thu, 18 Jan 2018 13:15:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id; bh=+fyKuURIpONdCoPE2K84GyajleF1domwYMmBQPWeBsA=; b=pBoo8sTXVthY0WrSb51T3y4K00Lldr/Y/Ftnzr5CF0n65yOzzLCzr810V7QuLI/rLS JWSLdJy5tM/mNPhAiM/GlaCVwp+Vv+aveINKVibqOWGOPXkPEKyv+6aEmYOyQ4K5QakO PCCZsHow4gDrWSYIBew74DkC3YVfpJDa4krNNpd320fHtik8xF4DrN3uVI5nG4Kyww8+ 6y/PSkSFY3ih9N/yuP/EqIohRmdbxjqTxxkr4PK6/xd7BV3sZBgV/Tb1s3i/b+vF+y9e 7YvUQGFvFpi296y04f/U9j1Mnd7+122hq/ijVRVlFiIk/Rq4+99ahkonzCn2Tvd9QHDP 684A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id; bh=+fyKuURIpONdCoPE2K84GyajleF1domwYMmBQPWeBsA=; b=B7zG8HxY7ihsLKNjy2bqToeVfbCA1bWZjHpbr2dub67C/swU7EHhuyb1OLII8fmTNT aBcsr5izf+bCgsX//co6+qkmeHv+zaxQFOXI3EB8bwahIBxgMWvYBHrvhxblbHu+Zzk5 3omKavej9zGhG9u9nKTl7qPu9pQkyf4+tiA443X6X2SWS3LgpNDIeSb6SWKmfwKvlAJj saJr1daw1wQqDJRjxF0pNwgXtvRwYslBwpakVhHjD8Ln5QpB7G0SNx54h/sYE1BK34qF hVlQT+DTIo2W1Z+vkOD9bzppRq4DkZFwnn9HtVvnNCxgs9/MsvrbkCl1va3J9q5gsBqv dRrg== X-Gm-Message-State: AKwxytdpiVYSCJXXIBKsWa76rjTDcvrpfPK3NKloyVhiOMuY9pXlD5cJ 4VnHTYJsdSwMTHCwPidPfpI= X-Received: by 10.200.41.220 with SMTP id 28mr3066845qtt.159.1516310134242; Thu, 18 Jan 2018 13:15:34 -0800 (PST) Received: from thorin.lan (45-27-90-188.lightspeed.rlghnc.sbcglobal.net. [45.27.90.188]) by smtp.gmail.com with ESMTPSA id k2sm2798876qtk.60.2018.01.18.13.15.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Jan 2018 13:15:33 -0800 (PST) From: Dan Streetman To: "David S. Miller" Cc: Alexey Kuznetsov , Hideaki YOSHIFUJI , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Dan Streetman , Dan Streetman Subject: [PATCH] net: tcp: close sock if net namespace is exiting Date: Thu, 18 Jan 2018 16:14:26 -0500 Message-Id: <20180118211426.24441-1-ddstreet@ieee.org> X-Mailer: git-send-email 2.14.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When a tcp socket is closed, if it detects that its net namespace is exiting, close immediately and do not wait for FIN sequence. For normal sockets, a reference is taken to their net namespace, so it will never exit while the socket is open. However, kernel sockets do not take a reference to their net namespace, so it may begin exiting while the kernel socket is still open. In this case if the kernel socket is a tcp socket, it will stay open trying to complete its close sequence. The sock's dst(s) hold a reference to their interface, which are all transferred to the namespace's loopback interface when the real interfaces are taken down. When the namespace tries to take down its loopback interface, it hangs waiting for all references to the loopback interface to release, which results in messages like: unregister_netdevice: waiting for lo to become free. Usage count = 1 These messages continue until the socket finally times out and closes. Since the net namespace cleanup holds the net_mutex while calling its registered pernet callbacks, any new net namespace initialization is blocked until the current net namespace finishes exiting. After this change, the tcp socket notices the exiting net namespace, and closes immediately, releasing its dst(s) and their reference to the loopback interface, which lets the net namespace continue exiting. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811 Signed-off-by: Dan Streetman --- include/net/net_namespace.h | 10 ++++++++++ net/ipv4/tcp.c | 3 +++ net/ipv4/tcp_timer.c | 15 +++++++++++++++ 3 files changed, 28 insertions(+) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index f8a84a2c2341..f306b2aa15a4 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -223,6 +223,11 @@ int net_eq(const struct net *net1, const struct net *net2) return net1 == net2; } +static inline int check_net(const struct net *net) +{ + return refcount_read(&net->count) != 0; +} + void net_drop_ns(void *); #else @@ -247,6 +252,11 @@ int net_eq(const struct net *net1, const struct net *net2) return 1; } +static inline int check_net(const struct net *net) +{ + return 1; +} + #define net_drop_ns NULL #endif diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index d7cf861bf699..9389193e73f3 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2298,6 +2298,9 @@ void tcp_close(struct sock *sk, long timeout) tcp_send_active_reset(sk, GFP_ATOMIC); __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY); + } else if (!check_net(sock_net(sk))) { + /* Not possible to send reset; just close */ + tcp_set_state(sk, TCP_CLOSE); } } diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 6db3124cdbda..41b40b805aa3 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -48,11 +48,19 @@ static void tcp_write_err(struct sock *sk) * to prevent DoS attacks. It is called when a retransmission timeout * or zero probe timeout occurs on orphaned socket. * + * Also close if our net namespace is exiting; in that case there is no + * hope of ever communicating again since all netns interfaces are already + * down (or about to be down), and we need to release our dst references, + * which have been moved to the netns loopback interface, so the namespace + * can finish exiting. This condition is only possible if we are a kernel + * socket, as those do not hold references to the namespace. + * * Criteria is still not confirmed experimentally and may change. * We kill the socket, if: * 1. If number of orphaned sockets exceeds an administratively configured * limit. * 2. If we have strong memory pressure. + * 3. If our net namespace is exiting. */ static int tcp_out_of_resources(struct sock *sk, bool do_reset) { @@ -81,6 +89,13 @@ static int tcp_out_of_resources(struct sock *sk, bool do_reset) __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY); return 1; } + + if (!check_net(sock_net(sk))) { + /* Not possible to send reset; just close */ + tcp_done(sk); + return 1; + } + return 0; } -- 2.14.1