Received: by 10.213.65.68 with SMTP id h4csp217263imn; Fri, 23 Mar 2018 03:07:38 -0700 (PDT) X-Google-Smtp-Source: AG47ELt6yyKs+rFgduMIIIDz9BvZNYtUcvBqdSUVR9g8T50cR41QKdXfj4JYrdPxOqp2PjHS4swy X-Received: by 10.99.113.94 with SMTP id b30mr20373647pgn.196.1521799658667; Fri, 23 Mar 2018 03:07:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521799658; cv=none; d=google.com; s=arc-20160816; b=bXgmvfsFkhyUUqmxzYTBFRQCWLpkEsaG1iZqg1saSiaNEcEStufXw5dS1L/nDqq6/t jmVEzPSTvnL9cfqvFqr8D35HiaMcJPGMJwTklZV0lKgsmUl16C/difqdNC1lhxLj66tj NUXZxN9I5r2kcMT/1i0LiEEXdvSq9cXnffSr/MLXEZV/tlz8SvAqqk8xKH1UynQ9QpL3 cN2dDLWBpjLnv3q49hKkq389pqEQv1/cNcrSnfr7fotbeXOPiqlIn4ndCp8wtiEeoa23 w6xfZxz2sHlUT/rTmXyPCGZeGcQydmJK1Myyk+8ca0qgfs2tsoRYXKXQVMD1Pbt9ed1k dMTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=dsv5kGNp0CdlgUno4wwrih5+bz3khZLFSUUnLxzVMAg=; b=TvAtkRlCzHMewFg7QJabTOxvhz1Ct9I28SsnlrLKXGakEiCN+718FDKJj+psZJYroP TVMhlYuipMx3mg514zTiqKXbPe7XJdju22B94C0un6Wn94q3Q07lUkzGP30ViFl3vKjc Kg2USUMtkoh2XJ+Tg30YI9kIAuCRNQKTwOyluON5unlBM4bS4hb5LVCzVZHlybQrhEID C7m4TwsverkNiSD3hh8U5EsSlzPoSZWlll9NrLtg5v2PyVeNPUHdPSPXrITTe396UIvb 9AMCrilLgcLVbJMNIKwq+wmzDDgKMtU18NZC7gaZQuSP1RF1AIO4lIU7OiV+W45AyaO5 4lGQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si8674257plo.228.2018.03.23.03.07.24; Fri, 23 Mar 2018 03:07:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755262AbeCWKGe (ORCPT + 99 others); Fri, 23 Mar 2018 06:06:34 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:40834 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755207AbeCWKGc (ORCPT ); Fri, 23 Mar 2018 06:06:32 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 487EBC32; Fri, 23 Mar 2018 10:06:31 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Feras Daoud , Leon Romanovsky , Doug Ledford , Sasha Levin Subject: [PATCH 4.9 044/177] IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow Date: Fri, 23 Mar 2018 10:52:52 +0100 Message-Id: <20180323094207.192106795@linuxfoundation.org> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180323094205.090519271@linuxfoundation.org> References: <20180323094205.090519271@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Feras Daoud [ Upstream commit 3e31a490e01a6e67cbe9f6e1df2f3ff0fbf48972 ] Before calling ipoib_stop, rtnl_lock should be taken, then the flow clears the IPOIB_FLAG_ADMIN_UP and IPOIB_FLAG_OPER_UP flags, and waits for mcast completion if IPOIB_MCAST_FLAG_BUSY is set. On the other hand, the flow of multicast join task initializes a mcast completion, sets the IPOIB_MCAST_FLAG_BUSY and calls ipoib_mcast_join. If IPOIB_FLAG_OPER_UP flag is not set, this call returns EINVAL without setting the mcast completion and leads to a deadlock. ipoib_stop | | | clear_bit(IPOIB_FLAG_ADMIN_UP) | | | Context Switch | | ipoib_mcast_join_task | | | spin_lock_irq(lock) | | | init_completion(mcast) | | | set_bit(IPOIB_MCAST_FLAG_BUSY) | | | Context Switch | | clear_bit(IPOIB_FLAG_OPER_UP) | | | spin_lock_irqsave(lock) | | | Context Switch | | ipoib_mcast_join | return (-EINVAL) | | | spin_unlock_irq(lock) | | | Context Switch | | ipoib_mcast_dev_flush | wait_for_completion(mcast) | ipoib_stop will wait for mcast completion for ever, and will not release the rtnl_lock. As a result panic occurs with the following trace: [13441.639268] Call Trace: [13441.640150] [] schedule+0x29/0x70 [13441.641038] [] schedule_timeout+0x239/0x2d0 [13441.641914] [] ? complete+0x47/0x50 [13441.642765] [] ? flush_workqueue_prep_pwqs+0x16d/0x200 [13441.643580] [] wait_for_completion+0x116/0x170 [13441.644434] [] ? wake_up_state+0x20/0x20 [13441.645293] [] ipoib_mcast_dev_flush+0x150/0x190 [ib_ipoib] [13441.646159] [] ipoib_ib_dev_down+0x37/0x60 [ib_ipoib] [13441.647013] [] ipoib_stop+0x75/0x150 [ib_ipoib] Fixes: 08bc327629cb ("IB/ipoib: fix for rare multicast join race condition") Signed-off-by: Feras Daoud Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -487,6 +487,9 @@ static int ipoib_mcast_join(struct net_d !test_bit(IPOIB_FLAG_OPER_UP, &priv->flags)) return -EINVAL; + init_completion(&mcast->done); + set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + ipoib_dbg_mcast(priv, "joining MGID %pI6\n", mcast->mcmember.mgid.raw); rec.mgid = mcast->mcmember.mgid; @@ -645,8 +648,6 @@ void ipoib_mcast_join_task(struct work_s if (mcast->backoff == 1 || time_after_eq(jiffies, mcast->delay_until)) { /* Found the next unjoined group */ - init_completion(&mcast->done); - set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); if (ipoib_mcast_join(dev, mcast)) { spin_unlock_irq(&priv->lock); return; @@ -666,11 +667,9 @@ out: queue_delayed_work(priv->wq, &priv->mcast_task, delay_until - jiffies); } - if (mcast) { - init_completion(&mcast->done); - set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + if (mcast) ipoib_mcast_join(dev, mcast); - } + spin_unlock_irq(&priv->lock); }