Received: by 10.223.185.116 with SMTP id b49csp1073499wrg; Fri, 23 Feb 2018 11:26:11 -0800 (PST) X-Google-Smtp-Source: AH8x226ysfRQTlQUpovKOxpTC7jujXmOY4N9JMp4+vwz6rozituHzm9upD/lVYyvynBJ0HGfWLny X-Received: by 10.99.117.28 with SMTP id q28mr2220979pgc.187.1519413971196; Fri, 23 Feb 2018 11:26:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519413971; cv=none; d=google.com; s=arc-20160816; b=WNlFajHvo5ZItMD0hIj0iqikz5yKK/bJ0HnvD0c7ygZqWgvZHBi3TDRg7aLxaWB1Y/ KMGpr1d9TIDoCMpiT3Ib1yiedAAFoznxHrt+dHDtslShEZUM5SbYQOw4qFX9Y/HnNTV9 23VytsKlM2hT0HtkhFXBA6UaBxnJlJhwt6HiZUjSs1//Bkfk4M9noA4kJBkYjwBlrGpS SX/nfZzPgJydGLSaFfqfnlFIIHsPxDUB4pcbgyH1r8zbeQr+tlsS516vcTa4rMdQxelr MV1cEDljbE7eT+R4B1ZxdGjtJbHiE74616U0yQDEMmp/gUSconJTeUiZJpkuDlSdMMar tC9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=n1LUAFRYCKZbkI0t5a4oarteUR2PwoKz16VjAg63npY=; b=Zc315ivSE00d0vwkd0EcIuR5jDYrWjrJoNK/qNe1Iw50qWp9NJsxoLqyyeCRzs/0bm OawH5hrdzej0iuSZVPpU+dEGbdI7ynpnmkMUBvEvakqerC/FHy3sKERhn8hvCPlPJZzN FCsuaRtZMhYKbGjoE3yiBaG0pdzf3Q94aa1j61O0fuBv8njX+mU4VbP5xjVnZT+9xBpF 8RBNM9nVaHB1DboHq0HCfO/0SljfGrCzQblrJBR+oDK7i01DVjp5mwhvzaOSDa192VeH bHWJleEdCkJYIa9gU6yOx59cVJ2/q5zNsjyYDktYt7QZ64CeStU1C//YvsuLSeS64U21 n7KA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a34-v6si2189744pld.505.2018.02.23.11.25.56; Fri, 23 Feb 2018 11:26:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935242AbeBWSwx (ORCPT + 99 others); Fri, 23 Feb 2018 13:52:53 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:47090 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935204AbeBWSwq (ORCPT ); Fri, 23 Feb 2018 13:52:46 -0500 Received: from localhost (LFbn-1-12258-90.w90-92.abo.wanadoo.fr [90.92.71.90]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 82161FF8; Fri, 23 Feb 2018 18:52:45 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Sowmini Varadhan , Santosh Shilimkar , "David S. Miller" Subject: [PATCH 4.14 041/159] rds: tcp: correctly sequence cleanup on netns deletion. Date: Fri, 23 Feb 2018 19:25:49 +0100 Message-Id: <20180223170748.406395347@linuxfoundation.org> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180223170743.086611315@linuxfoundation.org> References: <20180223170743.086611315@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Sowmini Varadhan commit 681648e67d43cf269c5590ecf021ed481f4551fc upstream. Commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") introduces a regression in rds-tcp netns cleanup. The cleanup_net(), (and thus rds_tcp_dev_event notification) is only called from put_net() when all netns refcounts go to 0, but this cannot happen if the rds_connection itself is holding a c_net ref that it expects to release in rds_tcp_kill_sock. Instead, the rds_tcp_kill_sock callback should make sure to tear down state carefully, ensuring that the socket teardown is only done after all data-structures and workqs that depend on it are quiesced. The original motivation for commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") was to resolve a race condition reported by syzkaller where workqs for tx/rx/connect were triggered after the namespace was deleted. Those worker threads should have been cancelled/flushed before socket tear-down and indeed, rds_conn_path_destroy() does try to sequence this by doing /* cancel cp_send_w */ /* cancel cp_recv_w */ /* flush cp_down_w */ /* free data structures */ Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that we ought to have satisfied the requirement that "socket-close is done after all other dependent state is quiesced". However, rds_conn_shutdown has a bug in that it *always* triggers the reconnect workq (and if connection is successful, we always restart tx/rx workqs so with the right timing, we risk the race conditions reported by syzkaller). Netns deletion is like module teardown- no need to restart a reconnect in this case. We can use the c_destroy_in_prog bit to avoid restarting the reconnect. Fixes: 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") Signed-off-by: Sowmini Varadhan Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/rds/connection.c | 3 ++- net/rds/rds.h | 6 +++--- net/rds/tcp.c | 4 ++-- 3 files changed, 7 insertions(+), 6 deletions(-) --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -366,6 +366,8 @@ void rds_conn_shutdown(struct rds_conn_p * to the conn hash, so we never trigger a reconnect on this * conn - the reconnect is always triggered by the active peer. */ cancel_delayed_work_sync(&cp->cp_conn_w); + if (conn->c_destroy_in_prog) + return; rcu_read_lock(); if (!hlist_unhashed(&conn->c_hash_node)) { rcu_read_unlock(); @@ -445,7 +447,6 @@ void rds_conn_destroy(struct rds_connect */ rds_cong_remove_conn(conn); - put_net(conn->c_net); kfree(conn->c_path); kmem_cache_free(rds_conn_slab, conn); --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -150,7 +150,7 @@ struct rds_connection { /* Protocol version */ unsigned int c_version; - struct net *c_net; + possible_net_t c_net; struct list_head c_map_item; unsigned long c_map_queued; @@ -165,13 +165,13 @@ struct rds_connection { static inline struct net *rds_conn_net(struct rds_connection *conn) { - return conn->c_net; + return read_pnet(&conn->c_net); } static inline void rds_conn_net_set(struct rds_connection *conn, struct net *net) { - conn->c_net = get_net(net); + write_pnet(&conn->c_net, net); } #define RDS_FLAG_CONG_BITMAP 0x01 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -527,7 +527,7 @@ static void rds_tcp_kill_sock(struct net rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w); spin_lock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { - struct net *c_net = tc->t_cpath->cp_conn->c_net; + struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net); if (net != c_net || !tc->t_sock) continue; @@ -586,7 +586,7 @@ static void rds_tcp_sysctl_reset(struct spin_lock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { - struct net *c_net = tc->t_cpath->cp_conn->c_net; + struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net); if (net != c_net || !tc->t_sock) continue;