Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp2430402rbb; Wed, 28 Feb 2024 01:14:27 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX2OLl1Y6g2dZTAXYCJwWm/2IRICXVNYN/AToGS6Hf82f8c0TlJnBNS+fKY2RHCsaCkQIUtBPUHMMXd2mTSbbUs7wqxDHo7WDGb3MYaBQ== X-Google-Smtp-Source: AGHT+IEZgk2oRgYMPJb4rA8tNvvgW2jKD30/L1NmtfTcikAPAoVcXv6ATqlfmYBXpduXLVQE/XBF X-Received: by 2002:a05:620a:2626:b0:787:ed4f:a091 with SMTP id z38-20020a05620a262600b00787ed4fa091mr3396969qko.72.1709111667435; Wed, 28 Feb 2024 01:14:27 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709111667; cv=pass; d=google.com; s=arc-20160816; b=TySFpPCtVWhCc88po4YAhwhJ2xexN8u5EbyCxJpsloM5gb4vsXlg2P+yPNvqVna7ag n4f58y6/rf2iB6COIOTw42rnxVae+Vs+Tg8YXCc6sufM51lE4np/VUOjQjvvcHTBjKD/ t1nN/mpQih0iQUTOlA8Kyhh5Bj0QUrzW+5zOCIm1oE6a08mAHlk/7n8l57npWAHpixnP jPBRL7k/NSxyTlYScqUtN2cWSNoh2p7Pjg4JvQ0wWJbFjmW7XnmZIFJwAf0gPgygENTU AG0/CAeG+s/a3cMcXGvwCPwubCRLrgFF/+N1YRaDOhE3E3Uh1ayYBGUy69tCN0k2sT1H K0Ng== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=nwNxh5KU/TPJqtrokiiKhDFcHYcJeUw5wnGYXiE+YAY=; fh=Pv3cWa3wd4c3ye67D0r8DRHR7IIt+OqPKwAJ3Qv7P70=; b=ghZLY+sD0IJXJZ9XVz3PZT3cJxkwpRESXNxCSXprGVp38a0cBDNtU7inL9m1WMaG12 YaOzH8nE7cN+rPMiQRGRAL5xLyo1vRukk53Sg8kMLG9fozNwRyLNYRY8ZJ5heOgiIHzv yHX60ZoYlrcJ3SVUeaZtT2fA9dQr96nkoVFCsTLn8o2Wk3MvxBEEoPOy1280BI4GQk7N 8mxjSlRgIUZGhTbHy0EdRCv9y7f7Wg3FnRwj1UPhHF0CtpjMFZlUJzZK8C6Htm8W6IWV /Qpea4G5+BFLhE+viWzrpp0En9dxMqHASqbfAK0emzwqKMuP6PR8rvvzc4OT8Ppg2c2K kMEQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=PEfS2gy0; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-84749-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-84749-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id oo12-20020a05620a530c00b00787a8963bbfsi8984055qkn.119.2024.02.28.01.14.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 01:14:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-84749-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=PEfS2gy0; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-84749-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-84749-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 326B41C2302E for ; Wed, 28 Feb 2024 09:14:27 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8EF0636124; Wed, 28 Feb 2024 09:14:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PEfS2gy0" Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EBC4339A8 for ; Wed, 28 Feb 2024 09:14:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709111660; cv=none; b=INEdVJUMDSEQ1sQ/rkCYMWyajqasIlLoQC3nX+mdqLamADwu1ebtQjjbmFXPckL3dpFI34C5fd4b9qaSaVjjgHX5RKMsezFIBrUTCmtqMFXJ59xnoQjxbU9XwiPG72ELr6YzFEJfzOPrJL3nWZg08n+CMIWB5mRm5ZzkPHQEwc8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709111660; c=relaxed/simple; bh=YqT3rosOO1z9LAP2EjAA3CdVKOOaWYbKMcl0gaenUQk=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=i1RSiHBRqIc2OQJ5YFew1PGhaVCVAJEPrh65Oyx93F15CA1AuJxA4XVZSDFNm3Q+dY+oqm8jYwE7m746KJyT15OzVLPqCDMw955VDLw5RnW6eV/ldCL4/v88mKTIYYXHJitvguXIrsZSOIULbK39x+gbdlpoMb+QBzLBkseinPA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PEfS2gy0; arc=none smtp.client-ip=209.85.215.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-5d3907ff128so4591141a12.3 for ; Wed, 28 Feb 2024 01:14:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709111658; x=1709716458; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=nwNxh5KU/TPJqtrokiiKhDFcHYcJeUw5wnGYXiE+YAY=; b=PEfS2gy0B2owdT9rmCcyDnhlKjcXOlNGSKMLwKm0svZcdK4lmsv1CiisAw3uCB6dbx xr4lFfqk6WOum8ehF2JVGnAoKjNzhYgkCq2X8pesRs0wg5h8Cfcucl1njgcq9NxK8Xqn oS6hnLE1GYD8XX9DW2bPeqP5PaiMvreosYrWpCOrv/dUe/1PEivWV8tWhoGGw5AOpGzM R3oB58cVf2rZjR0q120+jRFxuaNPZ/lhuJmzvn3CKptrM2H+oLH2pPYdiNZ/4EHPgGft mVUfTCmOzeO7YdXizVPcQmU52Jbcg0Pl6R3k0eyRPDFZDLv1iXRGV7sQCy2FpUjBCU2T EG+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709111658; x=1709716458; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nwNxh5KU/TPJqtrokiiKhDFcHYcJeUw5wnGYXiE+YAY=; b=FhJrLGlF0Jn/FxEG+twF3vl1VcnXrrPVpvtffY2XZ3KXNDeeUgt9PH+HT81+vKfkxQ eW3wuSsgmPrnFADl07m2Ofm1kuGRqKyjJqFNIEtkb7zbj4oFUlQZhn7/dZJYVl2wcZRC o8cNBQrQisr8jCslUsixgGyvtBp2w10mQ0UHTboMRjQhIVmJoivtahwyCycNUV/4AC/Z Ejx8ckqGD3gbldCdKck/5yhuslc/gdDW3J91tFbgxJsTkr3iN5ORETeHWy5fKmBekHX6 VuazHjMEpoeqdMT3SK6Hvsc4nD2dwVw2asSBhRnMV6rrIggAUb1I+uq5WZCmAn8J6SiR Z3qQ== X-Forwarded-Encrypted: i=1; AJvYcCXmU42fS3A3ZQSxo+7Velfm+dmghMaxI8M10Z73SaclPSgVsQSUIeTRvzL3XiE+d/ECOGAbN+/0xkXHo6NYjckiXIAXlEVZnyuswvEU X-Gm-Message-State: AOJu0Yx0jgHbexZWUvDMK4uY0FRQY00hW0pCl4zcdPnQgwrZOSGzEI8c hSXOKVZGo6ZxCDy4J5EU+qqCdum8YWFrAo7O0pm46iPM6f60GFKs0drj5dDqv7o= X-Received: by 2002:a05:6a20:c887:b0:1a0:f0a4:8206 with SMTP id hb7-20020a056a20c88700b001a0f0a48206mr5002676pzb.29.1709111658541; Wed, 28 Feb 2024 01:14:18 -0800 (PST) Received: from localhost.localdomain ([143.92.64.18]) by smtp.gmail.com with ESMTPSA id sx16-20020a17090b2cd000b00299b31de43esm1058722pjb.45.2024.02.28.01.14.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 01:14:17 -0800 (PST) From: "brookxu.cn" To: kbusch@kernel.org, axboe@kernel.dk, hch@lst.de, sagi@grimberg.me Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH] nvme: fix reconnection fail due to reserved tag allocation Date: Wed, 28 Feb 2024 17:14:17 +0800 Message-Id: <20240228091417.40110-1-brookxu.cn@gmail.com> X-Mailer: git-send-email 2.25.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Chunguang Xu We found a issue on production environment while using NVMe over RDMA, admin_q reconnect failed forever while remote target and network is ok. After dig into it, we found it may caused by a ABBA deadlock due to tag allocation. In my case, the tag was hold by a keep alive request waiting inside admin_q, as we quiesced admin_q while reset ctrl, so the request maked as idle and will not process before reset success. As fabric_q shares tagset with admin_q, while reconnect remote target, we need a tag for connect command, but the only one reserved tag was held by keep alive command which waiting inside admin_q. As a result, we failed to reconnect admin_q forever. In order to workaround this issue, I think we should not retry keep alive request while controller reconnecting, as we have stopped keep alive while resetting controller, and will start it again while init finish, so it maybe ok to drop it. Signed-off-by: Chunguang Xu --- drivers/nvme/host/core.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 0a96362912ce..07ed2b6a75fb 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -371,6 +371,8 @@ enum nvme_disposition { static inline enum nvme_disposition nvme_decide_disposition(struct request *req) { + struct nvme_ctrl *ctrl = nvme_req(req)->ctrl; + if (likely(nvme_req(req)->status == 0)) return COMPLETE; @@ -382,6 +384,12 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req) nvme_req(req)->retries >= nvme_max_retries) return COMPLETE; + if (nvme_ctrl_state(ctrl) != NVME_CTRL_LIVE) { + if (nvme_req(req)->cmd->common.opcode == + nvme_admin_keep_alive) + return COMPLETE; + } + if (req->cmd_flags & REQ_NVME_MPATH) { if (nvme_is_path_error(nvme_req(req)->status) || blk_queue_dying(req->q)) @@ -1296,8 +1304,7 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq, ctrl->ka_last_check_time = jiffies; ctrl->comp_seen = false; spin_lock_irqsave(&ctrl->lock, flags); - if (ctrl->state == NVME_CTRL_LIVE || - ctrl->state == NVME_CTRL_CONNECTING) + if (ctrl->state == NVME_CTRL_LIVE) startka = true; spin_unlock_irqrestore(&ctrl->lock, flags); if (startka) -- 2.25.1