Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp3094610rwb; Mon, 15 Aug 2022 18:00:08 -0700 (PDT) X-Google-Smtp-Source: AA6agR5LokurVmLo856v5Yqsu9B7coE/RznT/g6CM6b/hYjWGV1xefFpGpb+qRCPqxcPgAw9FWzK X-Received: by 2002:a17:906:dc92:b0:731:5a15:1a0f with SMTP id cs18-20020a170906dc9200b007315a151a0fmr11664477ejc.616.1660611608598; Mon, 15 Aug 2022 18:00:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660611608; cv=none; d=google.com; s=arc-20160816; b=aBlkXF33h6fuK0NecLVNJ+2x5JCMfT7SAsBgkByMR2Jv7SColQml62e6leupMn6n41 EwFHOJJ2vC80jpCpCHvuYWuVWNr1aRMLEnbMFEkX/xmTTVkcw5zAQHMEihNugPntfps3 4mfwDsT04SEmdN+U6Mx7L3piYYDOTfr4Tn3LlnS55WyVzg3SdcBsUCWi7kKGMB160x7P 2m6HVSbh3qM7ZO39JO5+IAx9IuIxmv8gmiasXncQF9gJXU81UQWbkE7lbCILRYfY0EcC jTAi0WkZ67Sr9zj6c4dOfMAVWuduWewhck8ZfumuRdYiOL+rVklnt6YEtmio9gy98S7t X6eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=VtDxer1DTQDg2rcWRdaP4HdiPw7zg4d7XO5XZ35B5/8=; b=BWyZEXHJPFfyNgB0ZYPwKQ9UYcSeJZMMqbCtH9rAW+Xx+uyOj8N9hNuenz/Vat4Shq 5EqdzmXb0cqp0aZKGsAlbOyl0a+vsyV2Y+fsL5X+aRZbegJavUWNPkEdANx6i3deiwhu jOBTUH8NhCysWlSTLBMQvmgJ8P3gExue+9GeYbi/nSUYMGxRCW27+MY2o+9Z1t7UuiXl YpXCzvyAXsQRAAB2tOdu9Y4WI8Acbm2vxKjmr8TePww0z0Bn+IhdJkdQ+s3ApY0AGi61 BnYKB25Tk8cgr765Gb81QzhMM367vGPHKsGoz3o0iAm0Z2H9JktBfTZW/9yLOOkKDu45 MusA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=WKAhYjmJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g11-20020a50ec0b000000b0043d9cfe0793si7761647edr.478.2022.08.15.17.59.39; Mon, 15 Aug 2022 18:00:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=WKAhYjmJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343956AbiHPA3V (ORCPT + 99 others); Mon, 15 Aug 2022 20:29:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357568AbiHPA1Q (ORCPT ); Mon, 15 Aug 2022 20:27:16 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E64151815CB; Mon, 15 Aug 2022 13:34:59 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 606BFB80EAD; Mon, 15 Aug 2022 20:34:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AACABC433D6; Mon, 15 Aug 2022 20:34:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1660595698; bh=IIN9laH3zCZELFAK2RGQArgBKR+rGOeFXnLgHCAwMpo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WKAhYjmJyy3FqtNfE8fUukP6gzoANMhGTbi2vJY0/BDFbQnGvVdb3ORXQmLYBoDQO ktMIu6Yk8mmufje/Z6qe2dFXciNAnOBnJbp68Hb0H/6p5wamS/KQSTi0Ejo29ZQb+i jqMR8uh4gF5vzK8Nf1jmSBfDe3sTOf5OklMBiarY= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Bob Pearson , Jason Gunthorpe , Sasha Levin Subject: [PATCH 5.19 0846/1157] RDMA/rxe: Fix rnr retry behavior Date: Mon, 15 Aug 2022 20:03:22 +0200 Message-Id: <20220815180513.340055812@linuxfoundation.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220815180439.416659447@linuxfoundation.org> References: <20220815180439.416659447@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Bob Pearson [ Upstream commit 445fd4f4fb76d513de6b05b08b3a4d0bb980fc80 ] Currently the completer tasklet when retransmit timer or the rnr timer fires the same flag (qp->req.need_retry) is set so that if either timer fires it will attempt to perform a retry flow on the send queue. This has the effect of responding to an RNR NAK at the first retransmit timer event which might not allow the requested rnr timeout. This patch adds a new flag (qp->req.wait_for_rnr_timer) which, if set, prevents a retry flow until the rnr nak timer fires. This patch fixes rnr retry errors which can be observed by running the pyverbs test_rdmacm_async_traffic_external_qp multiple times. With this patch applied they do not occur. Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/ Link: https://lore.kernel.org/linux-rdma/2bafda9e-2bb6-186d-12a1-179e8f6a2678@talpey.com/ Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220630190425.2251-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin --- drivers/infiniband/sw/rxe/rxe_comp.c | 8 +++++++- drivers/infiniband/sw/rxe/rxe_qp.c | 1 + drivers/infiniband/sw/rxe/rxe_req.c | 15 +++++++++++++-- drivers/infiniband/sw/rxe/rxe_verbs.h | 1 + 4 files changed, 22 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index da3a398053b8..4fc31bb7eee6 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -114,6 +114,8 @@ void retransmit_timer(struct timer_list *t) { struct rxe_qp *qp = from_timer(qp, t, retrans_timer); + pr_debug("%s: fired for qp#%d\n", __func__, qp->elem.index); + if (qp->valid) { qp->comp.timeout = 1; rxe_run_task(&qp->comp.task, 1); @@ -730,11 +732,15 @@ int rxe_completer(void *arg) break; case COMPST_RNR_RETRY: + /* we come here if we received an RNR NAK */ if (qp->comp.rnr_retry > 0) { if (qp->comp.rnr_retry != 7) qp->comp.rnr_retry--; - qp->req.need_retry = 1; + /* don't start a retry flow until the + * rnr timer has fired + */ + qp->req.wait_for_rnr_timer = 1; pr_debug("qp#%d set rnr nak timer\n", qp_num(qp)); mod_timer(&qp->rnr_nak_timer, diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c index b79e1b43454e..834f40ad00af 100644 --- a/drivers/infiniband/sw/rxe/rxe_qp.c +++ b/drivers/infiniband/sw/rxe/rxe_qp.c @@ -507,6 +507,7 @@ static void rxe_qp_reset(struct rxe_qp *qp) atomic_set(&qp->ssn, 0); qp->req.opcode = -1; qp->req.need_retry = 0; + qp->req.wait_for_rnr_timer = 0; qp->req.noack_pkts = 0; qp->resp.msn = 0; qp->resp.opcode = -1; diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index 15fefc689ca3..9f8e3db179cc 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -101,7 +101,11 @@ void rnr_nak_timer(struct timer_list *t) { struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer); - pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp)); + pr_debug("%s: fired for qp#%d\n", __func__, qp_num(qp)); + + /* request a send queue retry */ + qp->req.need_retry = 1; + qp->req.wait_for_rnr_timer = 0; rxe_run_task(&qp->req.task, 1); } @@ -622,10 +626,17 @@ int rxe_requester(void *arg) qp->req.need_rd_atomic = 0; qp->req.wait_psn = 0; qp->req.need_retry = 0; + qp->req.wait_for_rnr_timer = 0; goto exit; } - if (unlikely(qp->req.need_retry)) { + /* we come here if the retransmot timer has fired + * or if the rnr timer has fired. If the retransmit + * timer fires while we are processing an RNR NAK wait + * until the rnr timer has fired before starting the + * retry flow + */ + if (unlikely(qp->req.need_retry && !qp->req.wait_for_rnr_timer)) { req_retry(qp); qp->req.need_retry = 0; } diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index ac464e68c923..9bdf33346511 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -124,6 +124,7 @@ struct rxe_req_info { int need_rd_atomic; int wait_psn; int need_retry; + int wait_for_rnr_timer; int noack_pkts; struct rxe_task task; }; -- 2.35.1