Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1229175imu; Tue, 11 Dec 2018 15:19:28 -0800 (PST) X-Google-Smtp-Source: AFSGD/VT5Y6aMcnxKtG0njFG2xV7XVrZ90l9Wl1I09K+6pCZwNO6rqHtuT+nEVa4Fse+SaolE+0S X-Received: by 2002:a62:ab0d:: with SMTP id p13mr18115951pff.211.1544570368302; Tue, 11 Dec 2018 15:19:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544570368; cv=none; d=google.com; s=arc-20160816; b=BTC2lKS4Bq6kn6DxWd9brRpe/DuRiE2+8+kGF48ryLKWrRJFmYjPvlh6ABOSWSxrVg V7fPkJ9fM9TGRjZ0DQBkf0xbmAsrKeIDiIAnaVSFP52Vqcv0ASxAEyp912eFzWXy0Cup o5DnrZLR/JlSdGBvitfZPFws8G626LDiFO6x2Rv88eP9LoAxqwyq6opKfUmftcXv9eAK dh919EyljGqWlmjRDyWJI+0ItK4lkASMWx4oQInyKxuzXcbuI/4aHsXEF8qw2BByBnt7 Z7CfLnmLv8x8ye+cgnKCp8PwQnnbAyJ5w8NFmEsmZhEGkOhmOt3s0zkYudgA6v/WmRC9 y4sQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=46Nbr/qGAD9OMlEin27/aFadtS0XnpZ+RxosG/vtMXQ=; b=R6SyK5oVL/2f68x78JQyuzmE9G8ONVyQvCCLUmz8GAZGpXTw76OtSsKkvT8wLjlTlg gTc2RJZs/P51OYslyi4YykLiKnBSRcXhKGVXY89zsViqgi3sEHnbwSJDK/cCLbzd4dwa 6MlsCsM0P1vnUyCn1/DP1/qFlZJfZeT35nq50S9oKotNP2HQ298TtXhKpC6vMwdBT59p b21d6hVftVN61W2qhsBJb/ey282n8jWK2kFKoRQQK4/TiIhT8PvskcmDizgMXlqH3Ce+ JpCk8JtfsUl1lt7VibpLTuf3m9NjlcA/5FP0jKpvhCNpMNq7hsN+vZE2q+evlWumLlIb iYMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@purestorage.com header.s=google header.b=FpnABF23; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=purestorage.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p185si14532310pfg.112.2018.12.11.15.19.13; Tue, 11 Dec 2018 15:19:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@purestorage.com header.s=google header.b=FpnABF23; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=purestorage.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726287AbeLKXQn (ORCPT + 99 others); Tue, 11 Dec 2018 18:16:43 -0500 Received: from mail-qk1-f195.google.com ([209.85.222.195]:39397 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726158AbeLKXQm (ORCPT ); Tue, 11 Dec 2018 18:16:42 -0500 Received: by mail-qk1-f195.google.com with SMTP id q70so9697856qkh.6 for ; Tue, 11 Dec 2018 15:16:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=46Nbr/qGAD9OMlEin27/aFadtS0XnpZ+RxosG/vtMXQ=; b=FpnABF230kRr9I203PvkeC7TeG2gCjezOhtvtgFZfqm2Rp8by9dSpnOS1xL78TSCr4 Jh8kfsocW/wWhYLYdiLaWLp6CBrErvtTwhNsDWBXKMdxiRlVYPNuvOwYTiiy0eII4zyU CkpumoK9cc5y3WtTfCP+rIZqDLOu10Cm6WWpo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=46Nbr/qGAD9OMlEin27/aFadtS0XnpZ+RxosG/vtMXQ=; b=GsnV6gaMI6Gj9QB0aY/3sqxnaL0PHjFkZLoI7d3ggzt0PB3ug2FDa7IHC2JnHnker3 AU2BJJsIHyNjyJtVP62XP7ClalsyKaibeCCpnR2Ahoq/z5q6ctDxaMyReION18fh9PEz wVpFhJX6NyULKVQQM6fdqCYSWb1XTszaRGmTaXaiSkzLAw2X8ayCkqbZLzzK0dLixIO+ zDci03F+xkjB6pdoiGQsZ3RGd2JNsPxqCbzxOkeTPsHifGwzSwVfLcHxIssUk2EzULPQ l21A1ewSzcZtbhN+sGMkMe2MGOMF1ql04MrkWAhKUYU5BqBpKFEZQNGy4vvmPiWPnzXr QcjA== X-Gm-Message-State: AA+aEWbEfBqLWqnPsMktCYrjEWubwEtFYvnbrZR1L20CBMX4cMVXXnpS JHR4tyF0ovgJ5r+kA7r4hFnkfryMhAc5N4RDzpkJeQ== X-Received: by 2002:a37:291:: with SMTP id v17mr16329531qkg.208.1544570201794; Tue, 11 Dec 2018 15:16:41 -0800 (PST) MIME-Version: 1.0 References: <1543535954-28073-1-git-send-email-jalee@purestorage.com> <773bb91f-40a4-a525-f7b8-db821b402821@mellanox.com> In-Reply-To: <773bb91f-40a4-a525-f7b8-db821b402821@mellanox.com> From: Jaesoo Lee Date: Tue, 11 Dec 2018 15:16:30 -0800 Message-ID: Subject: Re: [PATCH] nvme-rdma: complete requests from ->timeout To: nitzanc@mellanox.com Cc: sagi@grimberg.me, keith.busch@intel.com, axboe@fb.com, hch@lst.de, Roland Dreier , Prabhath Sajeepa , Ashish Karkare , linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I cannot reproduce the bug with the patch; in my failure scenarios, it seems that completing the request on errors in nvme_rdma_send_done makes __nvme_submit_sync_cmd to be unblocked. Also, I think this is safe from the double completions. However, it seems that nvme_rdma_timeout code is still not free from the double completion problem. So, it looks promising to me if you could separate out the nvme_rdma_wr_error handling code as a new patch. On Tue, Dec 11, 2018 at 1:14 AM Nitzan Carmi wrote: > > I was just in the middle of sending this to upstream when I saw your > mail, and thought too that it addresses the same bug, although I see a > little different call trace than yours. > > I would be happy if you can verify that this patch works for you too, > and we can push it to upstream. > > On 11/12/2018 01:40, Jaesoo Lee wrote: > > It seems that your patch is addressing the same bug. I will see if > > that works for our failure scenarios. > > > > Why don't you make it upstream? > > > > On Sun, Dec 9, 2018 at 6:22 AM Nitzan Carmi wrote: > >> > >> Hi, > >> We encountered similar issue. > >> I think that the problem is that error_recovery might not even be > >> queued, in case we're in DELETING state (or CONNECTING state, for that > >> matter), because we cannot move from those states to RESETTING. > >> > >> We prepared some patches which handle completions in case such scenario > >> happens (which, in fact, might happen in numerous error flows). > >> > >> Does it solve your problem? > >> Nitzan. > >> > >> > >> On 30/11/2018 03:30, Sagi Grimberg wrote: > >>> > >>>> This does not hold at least for NVMe RDMA host driver. An example > >>>> scenario > >>>> is when the RDMA connection is gone while the controller is being > >>>> deleted. > >>>> In this case, the nvmf_reg_write32() for sending shutdown admin > >>>> command by > >>>> the delete_work could be hung forever if the command is not completed by > >>>> the timeout handler. > >>> > >>> If the queue is gone, this means that the queue has already flushed and > >>> any commands that were inflight has completed with a flush error > >>> completion... > >>> > >>> Can you describe the scenario that caused this hang? When has the > >>> queue became "gone" and when did the shutdown command execute? > >>> > >>> _______________________________________________ > >>> Linux-nvme mailing list > >>> Linux-nvme@lists.infradead.org > >>> http://lists.infradead.org/mailman/listinfo/linux-nvme