Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp3061425imm; Sun, 1 Jul 2018 11:20:03 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfvof7ynZpNu9QwEO5jzdMmxgPDGSVYyrBpISjwm6q924j+IADEDpznkXYrnnKnvyyj3mvj X-Received: by 2002:a62:2605:: with SMTP id m5-v6mr22536273pfm.223.1530469203746; Sun, 01 Jul 2018 11:20:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530469203; cv=none; d=google.com; s=arc-20160816; b=pGOpQkdjt6nw126YlCCF1o8h5bLTm52EakTENZ8cn9lF4sPUarlXs5X75RT7ziVYYt MP7oS9cvFpbiArYbb3YvceKeOxe+NxAQEBgukq2p3qaldTDRqEYFrKRVpjEYbSJ5OIo6 5wFEdnW+vK1EfhBIkLHRCy5jOal7n9HplJ8UO9J72l6IYTsQ9Dmv316P4khgupXE/Qjt O2YoOAl5d4PPeQjZWcZYK1JU3fviIjYbj+j2os5oGcrByEIeutS7vKgjEDp8pdcITRaT SQ1PAeWrVfMWlVnIkIVL8MVlugTRfUZrNqVWOy4BwyXRmGolgMwyItYEjJ1GnQJzSiRB +hYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=K6GQlYTy2UNqbcNT2j5j5du3Pk215bQBuDI7C0sAdcU=; b=B6kwYxomjuuDP7Ugkl28f2se/TwywCPjJGzSZ5dog5F9P33VIx4R/HmIb6dIayROl3 l6SCma5YB1SIiZ8N3r7BKiJe1bm7mttR6LAH4IKZZmwn5wSbH9Lt7ROaSQjoR7H4ry3t nm1+VW3p6LAIXTRt94Ki83ePaqv06StP+zwrnLeA1NhgBwIO87vTB78HrL8h9fvnKmaN dmnCMXJCCGG1BHF0s4lL4l/1PjgtrDOP4C0rMMqPqYFZ3AaWfj+WanLltks+iROS0pf7 xSWxLgYdeYzmwY2w+QIvss1ZcyrhAQQ/qRlhM25rMo3UskBaDK31jYScC7VndSWxxKK3 gnHA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h8-v6si13777582pls.69.2018.07.01.11.19.49; Sun, 01 Jul 2018 11:20:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965255AbeGASSY (ORCPT + 99 others); Sun, 1 Jul 2018 14:18:24 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:33730 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965234AbeGAQZr (ORCPT ); Sun, 1 Jul 2018 12:25:47 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 186EEACD; Sun, 1 Jul 2018 16:25:46 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Erez Shitrit , Leon Romanovsky , Jason Gunthorpe Subject: [PATCH 4.9 036/101] IB/mlx5: Fetch soft WQEs on fatal error state Date: Sun, 1 Jul 2018 18:21:22 +0200 Message-Id: <20180701160758.608940997@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180701160757.138608453@linuxfoundation.org> References: <20180701160757.138608453@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Erez Shitrit commit 7b74a83cf54a3747e22c57e25712bd70eef8acee upstream. On fatal error the driver simulates CQE's for ULPs that rely on completion of all their posted work-request. For the GSI traffic, the mlx5 has its own mechanism that sends the completions via software CQE's directly to the relevant CQ. This should be kept in fatal error too, so the driver should simulate such CQE's with the specified error state in order to complete GSI QP work requests. Without the fix the next deadlock might appears: schedule_timeout+0x274/0x350 wait_for_common+0xec/0x240 mcast_remove_one+0xd0/0x120 [ib_core] ib_unregister_device+0x12c/0x230 [ib_core] mlx5_ib_remove+0xc4/0x270 [mlx5_ib] mlx5_detach_device+0x184/0x1a0 [mlx5_core] mlx5_unload_one+0x308/0x340 [mlx5_core] mlx5_pci_err_detected+0x74/0xe0 [mlx5_core] Cc: # 4.7 Fixes: 89ea94a7b6c4 ("IB/mlx5: Reset flow support for IB kernel ULPs") Signed-off-by: Erez Shitrit Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/mlx5/cq.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -645,7 +645,7 @@ repoll: } static int poll_soft_wc(struct mlx5_ib_cq *cq, int num_entries, - struct ib_wc *wc) + struct ib_wc *wc, bool is_fatal_err) { struct mlx5_ib_dev *dev = to_mdev(cq->ibcq.device); struct mlx5_ib_wc *soft_wc, *next; @@ -658,6 +658,10 @@ static int poll_soft_wc(struct mlx5_ib_c mlx5_ib_dbg(dev, "polled software generated completion on CQ 0x%x\n", cq->mcq.cqn); + if (unlikely(is_fatal_err)) { + soft_wc->wc.status = IB_WC_WR_FLUSH_ERR; + soft_wc->wc.vendor_err = MLX5_CQE_SYNDROME_WR_FLUSH_ERR; + } wc[npolled++] = soft_wc->wc; list_del(&soft_wc->list); kfree(soft_wc); @@ -678,12 +682,17 @@ int mlx5_ib_poll_cq(struct ib_cq *ibcq, spin_lock_irqsave(&cq->lock, flags); if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) { - mlx5_ib_poll_sw_comp(cq, num_entries, wc, &npolled); + /* make sure no soft wqe's are waiting */ + if (unlikely(!list_empty(&cq->wc_list))) + soft_polled = poll_soft_wc(cq, num_entries, wc, true); + + mlx5_ib_poll_sw_comp(cq, num_entries - soft_polled, + wc + soft_polled, &npolled); goto out; } if (unlikely(!list_empty(&cq->wc_list))) - soft_polled = poll_soft_wc(cq, num_entries, wc); + soft_polled = poll_soft_wc(cq, num_entries, wc, false); for (npolled = 0; npolled < num_entries - soft_polled; npolled++) { if (mlx5_poll_one(cq, &cur_qp, wc + soft_polled + npolled))