Received: by 10.223.176.5 with SMTP id f5csp2294044wra; Mon, 5 Feb 2018 01:23:23 -0800 (PST) X-Google-Smtp-Source: AH8x224Yid27+2o9+SHgYPw7BOiUS+PzMUU90ZtQ7iVNjhkV7tSFolb9qaWwTrBU/yCzz0e95+wt X-Received: by 10.99.66.195 with SMTP id p186mr540750pga.378.1517822603199; Mon, 05 Feb 2018 01:23:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517822603; cv=none; d=google.com; s=arc-20160816; b=RlNzVhinyqIlxRYUKpic7dpGUA+R30qyRZgN5RmtPpgyVjUicX0mMcFWy2wW+VBO0B +bWNla6FMoqd1nNP8lr1NW3MCKOkwlUnQqkRkNQVBnTCW5+dWLP0sNVNbj8iypnXiYE9 mUsXFkgFrQ3DpjCwXDS9JpTVReew9Q/AXek6F2tvL9qd8pZ6R8Tm6iS3d6ma/JpBGxOH HjAC2v590aG3ww2/YVY7wyySILCrJUIbws3nXm4Dv/87GHWNaUAk7mFGS65Cz66mQVkT Ju/pUpg78vUEy/KlpyBv7ZEGYKcM0ai13jRd9iSZZFJmeWYX3Y7fvc4Dyjz5kzMsfMl5 sOgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=Tvg0LUaGP4ZQn6cnlk8pwMOlmH65NFdXXN1gJrfVQwk=; b=V1DsJbRxGW63rMqGxcZJRZ9X/h0v9mi6dUiEgMK6d2WHmoSPq3YRsw/faAFNPbVVye nt0NWk0YYz3m2kX4fHBiGI0P0PEXRjqu9IUyLurflQH9MsWfIhVdog2YZwxyxuF8DB9t 8uURoTaW18FdVutqhhKqvMhYRmCZjkShJpBrp21bXydKAyO7Kcdz2NptOZ5zPxskC+PC 66dFFHxmKi5CwKysk8EgPjwxMXUhHizYv4yZyDBhTB0rA5MabM5Oo1ZWnr+6idhE+LpF jq3IYDkp4BPq21h/oZ7LmdZtopNCH4bNLuHtRNDebTQRdkf6iGGsonkXptJXjjKV2tzh UcVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=XD5xIj0k; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x143si2762192pgx.147.2018.02.05.01.23.08; Mon, 05 Feb 2018 01:23:23 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=XD5xIj0k; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752766AbeBEJVR (ORCPT + 99 others); Mon, 5 Feb 2018 04:21:17 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:55570 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752325AbeBEJUv (ORCPT ); Mon, 5 Feb 2018 04:20:51 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w159HFAD028292; Mon, 5 Feb 2018 09:20:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id; s=corp-2017-10-26; bh=Tvg0LUaGP4ZQn6cnlk8pwMOlmH65NFdXXN1gJrfVQwk=; b=XD5xIj0kjM/s0FC9p/D3gdefCkxyy5yS40YtGwlxpZgGHvjwcyPjrFMKaKF02RsmNMKC WIT+PWwAXxrQnSWTtquyaGIaJVuLftl/52K9Tje7iV7B2FI2q/cVeGCilr1mY5w51aaR aRCGMYpCqh/abvnCp7SWbCUfm4qA9xz21ZDM6NJjeEaVhPpJJ2eO+nCOey18YvL69nnE CnJ/YNZx7d+69uQaD6yGIURDvAWXdevdqzHeEshQeMumevH5Wh3UyfNRsre0Bpi5090h 8cJCiHhK1S5rJfQOJ1JgbR9k8fWDdppC20xfH9MNjf8nvl0n2AMwXmzxITV/aCgoTArI gQ== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2120.oracle.com with ESMTP id 2fxjmw0f17-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 05 Feb 2018 09:20:17 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w159KG1Y026739 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 5 Feb 2018 09:20:16 GMT Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w159KFLa016891; Mon, 5 Feb 2018 09:20:15 GMT Received: from will-ThinkCentre-M910s.cn.oracle.com (/10.182.70.254) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 05 Feb 2018 01:20:14 -0800 From: Jianchao Wang To: keith.busch@intel.com, axboe@fb.com, hch@lst.de, sagi@grimberg.me Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH V2 0/6]nvme-pci: fixes on nvme_timeout and nvme_dev_disable Date: Mon, 5 Feb 2018 17:20:09 +0800 Message-Id: <1517822415-11710-1-git-send-email-jianchao.w.wang@oracle.com> X-Mailer: git-send-email 2.7.4 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8795 signatures=668662 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802050117 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Christoph, Keith and Sagi Please consider and comment on the following patchset. That's really appreciated. There is a complicated relationship between nvme_timeout and nvme_dev_disable. - nvme_timeout has to invoke nvme_dev_disable to stop the controller doing DMA access before free the request. - nvme_dev_disable has to depend on nvme_timeout to complete adminq requests to set HMB or delete sq/cq when the controller has no response. - nvme_dev_disable will race with nvme_timeout when cancels the outstanding requests. We have found some issues introduced by them, please refer the following link http://lists.infradead.org/pipermail/linux-nvme/2018-January/015053.html http://lists.infradead.org/pipermail/linux-nvme/2018-January/015276.html http://lists.infradead.org/pipermail/linux-nvme/2018-January/015328.html Even we cannot ensure there is no other issue. The best way to fix them is to break up the relationship between them. With this patch, we could avoid nvme_dev_disable to be invoked by nvme_timeout and eliminate the race between nvme_timeout and nvme_dev_disable on outstanding requests. Changes V1->V2: - free and disable pci things in nvme_pci_disable_ctrl_directly - change comment and add reviewed-by in 1st patch - resort patches - other misc changes There are 6 patches: 1st ~ 3th patches does some preparation for the 4th one. 4th fixes a bug found when test. 5th is to avoid nvme_dev_disable to be invoked by nvme_timeout, and implement the synchronization between them. More details, please refer to the comment of this patch. 6th fixes a bug after 4th patch is introduced. It let nvme_delete_io_queues can only be wakeup by completion path. This patchset was tested under debug patch for some days. And some bugfix have been done. The patches are available in following it branch: https://github.com/jianchwa/linux-blcok.git nvme_fixes_V2 Jianchao Wang (6) 0001-nvme-pci-quiesce-IO-queues-prior-to-disabling-device.patch 0002-nvme-pci-fix-the-freeze-and-quiesce-for-shutdown-and.patch 0003-blk-mq-make-blk_mq_rq_update_aborted_gstate-a-extern.patch 0004-nvme-pci-suspend-queues-based-on-online_queues.patch 0005-nvme-pci-break-up-nvme_timeout-and-nvme_dev_disable.patch 0006-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch diff stat: block/blk-mq.c | 3 +- drivers/nvme/host/pci.c | 250 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------- include/linux/blk-mq.h | 1 + 3 files changed, 188 insertions(+), 66 deletions(-) Thanks Jianchao