Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756890Ab3CZXCT (ORCPT ); Tue, 26 Mar 2013 19:02:19 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:53426 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752172Ab3CZXCM (ORCPT ); Tue, 26 Mar 2013 19:02:12 -0400 From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Dean Luick , Mike Marciniszyn , Roland Dreier Subject: [ 34/49] IPoIB: Fix send lockup due to missed TX completion Date: Tue, 26 Mar 2013 16:01:31 -0700 Message-Id: <20130326225843.099781572@linuxfoundation.org> X-Mailer: git-send-email 1.8.1.rc1.5.g7e0651a In-Reply-To: <20130326225839.554028294@linuxfoundation.org> References: <20130326225839.554028294@linuxfoundation.org> User-Agent: quilt/0.60-5.1.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2145 Lines: 57 3.0-stable review patch. If anyone has any objections, please let me know. ------------------ From: Mike Marciniszyn commit 1ee9e2aa7b31427303466776f455d43e5e3c9275 upstream. Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM traffic") attempts to solve an issue where unprocessed UD send completions can deadlock the netdev. The patch doesn't fully resolve the issue because if more than half the tx_outstanding's were UD and all of the destinations are RC reachable, arming the CQ doesn't solve the issue. This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the ib_req_notify_cq(). If the rc is above 0, the UD send cq completion callback is called directly to re-arm the send completion timer. This issue is seen in very large parallel filesystem deployments and the patch has been shown to correct the issue. Reviewed-by: Dean Luick Signed-off-by: Mike Marciniszyn Signed-off-by: Roland Dreier Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -753,9 +753,13 @@ void ipoib_cm_send(struct net_device *de if (++priv->tx_outstanding == ipoib_sendq_size) { ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n", tx->qp->qp_num); - if (ib_req_notify_cq(priv->send_cq, IB_CQ_NEXT_COMP)) - ipoib_warn(priv, "request notify on send CQ failed\n"); netif_stop_queue(dev); + rc = ib_req_notify_cq(priv->send_cq, + IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS); + if (rc < 0) + ipoib_warn(priv, "request notify on send CQ failed\n"); + else if (rc) + ipoib_send_comp_handler(priv->send_cq, dev); } } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/