Received: by 10.223.185.116 with SMTP id b49csp1057324wrg; Fri, 23 Feb 2018 11:07:40 -0800 (PST) X-Google-Smtp-Source: AH8x225VPw3IAGqtKd8h5ShV7oJ/ybSzono4Ukw09vz9ySrfXJihUBDcF4nPhUAo6f6xR+krrDgz X-Received: by 10.167.130.133 with SMTP id s5mr2692027pfm.238.1519412860175; Fri, 23 Feb 2018 11:07:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519412860; cv=none; d=google.com; s=arc-20160816; b=QMcgeEdWDB/U9aQkLQcrPNe6/VAgUm7KSaCqU17YnagXbKpWqx7KQFqoCk4AdPo6cw SfYxyTh82mWHLW2j8zpl6PZLqvXUcX2pLlnnvBNh1thfvzYBzNjxKWf5oezHNCKJ4Zsh a+9OGM6XCj1HbgfqB6BzPDMJpIQilnX2a22QPzQvcw3ZDT/A6STp4r/qkLhfIhLCRazv io9E+Xhjma0P+GIuqkYELXcHm/plcN+r7ntDsBewtDX68g0EL95ZFjSjx/+aLQ0Ss2kh oU8lsHDOZ7m1ZhHeL0JyXioEcg4+SZS0PI7UKTav0sYeGIA2WrvKLXeQFDJxXcslQT5c EinQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=jvqJmPSUTxYNrxWK2M5u5DXBCv4zGwOJkNeN9WaUaMg=; b=e8wsz6V5PT/XwJazSuLLvk6hGIYHgvprhlNajOIvqRHGTX52+B96XTU0Ucgl6b0LX1 8qjKUuqZTmBXYxhAtNdYOpp3Dds6RUwbvMSmd1CTjRqibXjZaFvs4N1wf2vHbBxl9s29 PuQ7G5djgKjyeTgqPNzjYvQrQa7IYR+xP7MeHIZhdoO1GvnnxxpHWlZueZscaDYqjGiB gFmDKRCPNTN87XD9/w/2Zcvi+h6I7uuuW58J25IO47p/9eJsbUNLkJ1oiVKr+TX2sPog 0MQPS26s/CmUmMqv6i96imQp60+/APcR0e/tHBgnXkcog9Y7FyWJ13L/+mKMShyiHP2Y bCbA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t10-v6si2133182plh.403.2018.02.23.11.07.25; Fri, 23 Feb 2018 11:07:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754510AbeBWTGo (ORCPT + 99 others); Fri, 23 Feb 2018 14:06:44 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:50250 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965429AbeBWS5e (ORCPT ); Fri, 23 Feb 2018 13:57:34 -0500 Received: from localhost (LFbn-1-12258-90.w90-92.abo.wanadoo.fr [90.92.71.90]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 1C06ADFE; Fri, 23 Feb 2018 18:57:33 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Sowmini Varadhan , Santosh Shilimkar , "David S. Miller" Subject: [PATCH 4.15 19/45] rds: tcp: correctly sequence cleanup on netns deletion. Date: Fri, 23 Feb 2018 19:28:58 +0100 Message-Id: <20180223170718.380458441@linuxfoundation.org> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180223170715.197760019@linuxfoundation.org> References: <20180223170715.197760019@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.15-stable review patch. If anyone has any objections, please let me know. ------------------ From: Sowmini Varadhan commit 681648e67d43cf269c5590ecf021ed481f4551fc upstream. Commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") introduces a regression in rds-tcp netns cleanup. The cleanup_net(), (and thus rds_tcp_dev_event notification) is only called from put_net() when all netns refcounts go to 0, but this cannot happen if the rds_connection itself is holding a c_net ref that it expects to release in rds_tcp_kill_sock. Instead, the rds_tcp_kill_sock callback should make sure to tear down state carefully, ensuring that the socket teardown is only done after all data-structures and workqs that depend on it are quiesced. The original motivation for commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") was to resolve a race condition reported by syzkaller where workqs for tx/rx/connect were triggered after the namespace was deleted. Those worker threads should have been cancelled/flushed before socket tear-down and indeed, rds_conn_path_destroy() does try to sequence this by doing /* cancel cp_send_w */ /* cancel cp_recv_w */ /* flush cp_down_w */ /* free data structures */ Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that we ought to have satisfied the requirement that "socket-close is done after all other dependent state is quiesced". However, rds_conn_shutdown has a bug in that it *always* triggers the reconnect workq (and if connection is successful, we always restart tx/rx workqs so with the right timing, we risk the race conditions reported by syzkaller). Netns deletion is like module teardown- no need to restart a reconnect in this case. We can use the c_destroy_in_prog bit to avoid restarting the reconnect. Fixes: 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") Signed-off-by: Sowmini Varadhan Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/rds/connection.c | 3 ++- net/rds/rds.h | 6 +++--- net/rds/tcp.c | 4 ++-- 3 files changed, 7 insertions(+), 6 deletions(-) --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -366,6 +366,8 @@ void rds_conn_shutdown(struct rds_conn_p * to the conn hash, so we never trigger a reconnect on this * conn - the reconnect is always triggered by the active peer. */ cancel_delayed_work_sync(&cp->cp_conn_w); + if (conn->c_destroy_in_prog) + return; rcu_read_lock(); if (!hlist_unhashed(&conn->c_hash_node)) { rcu_read_unlock(); @@ -445,7 +447,6 @@ void rds_conn_destroy(struct rds_connect */ rds_cong_remove_conn(conn); - put_net(conn->c_net); kfree(conn->c_path); kmem_cache_free(rds_conn_slab, conn); --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -150,7 +150,7 @@ struct rds_connection { /* Protocol version */ unsigned int c_version; - struct net *c_net; + possible_net_t c_net; struct list_head c_map_item; unsigned long c_map_queued; @@ -165,13 +165,13 @@ struct rds_connection { static inline struct net *rds_conn_net(struct rds_connection *conn) { - return conn->c_net; + return read_pnet(&conn->c_net); } static inline void rds_conn_net_set(struct rds_connection *conn, struct net *net) { - conn->c_net = get_net(net); + write_pnet(&conn->c_net, net); } #define RDS_FLAG_CONG_BITMAP 0x01 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -528,7 +528,7 @@ static void rds_tcp_kill_sock(struct net rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w); spin_lock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { - struct net *c_net = tc->t_cpath->cp_conn->c_net; + struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net); if (net != c_net || !tc->t_sock) continue; @@ -587,7 +587,7 @@ static void rds_tcp_sysctl_reset(struct spin_lock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { - struct net *c_net = tc->t_cpath->cp_conn->c_net; + struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net); if (net != c_net || !tc->t_sock) continue;