Received: by 10.223.176.5 with SMTP id f5csp397046wra; Thu, 1 Feb 2018 23:04:19 -0800 (PST) X-Google-Smtp-Source: AH8x226Jf3fXDUIDQnHTWoZnLn+7EBNzs2lT27v3N48xeJPbnE+gRdzeA4wOd4fgB+9Bn9Jj0E5A X-Received: by 10.98.189.8 with SMTP id a8mr39274018pff.125.1517555059801; Thu, 01 Feb 2018 23:04:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517555059; cv=none; d=google.com; s=arc-20160816; b=ods5l9Ss/YnRwN4PTVRO0S5FvBI3H2mP2UCdY2IdldMJRdAuDM18b+1r8y1r9Nvc+S Pe0c4RABNqrwHjUMka0giDek1Go8qxr2pZQxGlo37J3HVTU9TxZC8DQ2PPhQCps0kFJQ W0QxfQqfo8Q8W9PjMIu+zTwQF7m/VUUqGocv6j9+4Fz2ibeyh+kRV6TRCPXkM64E933J +GKaSM9yK34ICMmv1oZMM76a+3hkE1cSRa01/VOjXmlnQDe8V4dYOYlOz23uoXzYKB/v Atx9fJRKiBTAQD+jlT/x8H7E0I5oMvRKck0GyDZXiU0R/YlKQMYi0CP8I2b1VtrUt8jB Kkcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=bJdjFoCstxFr9tMBpp+L8/9NZh8Yskv8hmDTRoQU7fM=; b=aYhCy6C09mTEVU4Hoj8Wv/r7EoT9DgOpscofz0PbkdWtqXMsCvWgaj64ZCZEqy7n09 k9A8UeEW6nxjgSKbp06yr/NFISDH9yzJBamdWFXd2qjp6aRMZfybsBzzL6TUsvtUTpPD Gf6ixsXlnz4PXYg2JokNmjBt03NL3HwXXurRjQW/GZoK7tA4JTVX1V3b134RGv2jjNJ7 sw2HwFQHXbO5NX4zeKGImGe08Ud658vzOPWipPoW9Yds02wwmuQl4CQ94r5yjf1JTjJX Hk/7epDChWagQlBEABFV8U5f1o88FZcKpWSRhJKSD+VlbLcn7dlwK2G+OEz7hKt/0kI3 aVjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=WSnmOSAT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i1-v6si1247093pld.42.2018.02.01.23.04.05; Thu, 01 Feb 2018 23:04:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=WSnmOSAT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751296AbeBBHCA (ORCPT + 99 others); Fri, 2 Feb 2018 02:02:00 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:35978 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751829AbeBBHBp (ORCPT ); Fri, 2 Feb 2018 02:01:45 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w126xEsT060518; Fri, 2 Feb 2018 07:01:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id; s=corp-2017-10-26; bh=bJdjFoCstxFr9tMBpp+L8/9NZh8Yskv8hmDTRoQU7fM=; b=WSnmOSAT2vaHTSXmCxP2TPSxFMNfsozwo6DsCNLdAKEKDMAEwIHVFiDlJQgZBFv934os 6N/05MjxBSlvSXSwqQSeFH3E/RTgAphmamnBqHpZA75+EJrP+ltZRXuyHK2HM+NmLDvo 3xauPtR0JvFG0Q8KfDj0UE7UA5z7/CghbSeHscj82ODHGZxVc+JnTcIkzassgl6iBZeV qMHoztxIhYCKBtTHfPmUQUpT7JCKBXbnon/No0iyHfbJ89/b6WOBFmNEdZwebOpjiDrU M2cro59mmjcfGle5GDtAS+Xw/FaPrHpS6NirVQYx9H2vJXVNawlSTulY+TUko0bCufLu dw== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2130.oracle.com with ESMTP id 2fvg008gv2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 02 Feb 2018 07:01:01 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w12710k4012937 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 2 Feb 2018 07:01:00 GMT Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w12710fs003850; Fri, 2 Feb 2018 07:01:00 GMT Received: from will-ThinkCentre-M910s.cn.oracle.com (/10.182.70.254) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 01 Feb 2018 23:00:59 -0800 From: Jianchao Wang To: keith.busch@intel.com, axboe@fb.com, hch@lst.de, sagi@grimberg.me Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH 0/6]nvme-pci: fixes on nvme_timeout and nvme_dev_disable Date: Fri, 2 Feb 2018 15:00:43 +0800 Message-Id: <1517554849-7802-1-git-send-email-jianchao.w.wang@oracle.com> X-Mailer: git-send-email 2.7.4 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8792 signatures=668660 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802020080 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Christoph, Keith and Sagi Please consider and comment on the following patchset. That's really appreciated. There is a complicated relationship between nvme_timeout and nvme_dev_disable. - nvme_timeout has to invoke nvme_dev_disable to stop the controller doing DMA access before free the request. - nvme_dev_disable has to depend on nvme_timeout to complete adminq requests to set HMB or delete sq/cq when the controller has no response. - nvme_dev_disable will race with nvme_timeout when cancels the outstanding requests. We have found some issues introduced by them, please refer the following link http://lists.infradead.org/pipermail/linux-nvme/2018-January/015053.html http://lists.infradead.org/pipermail/linux-nvme/2018-January/015276.html http://lists.infradead.org/pipermail/linux-nvme/2018-January/015328.html Even we cannot ensure there is no other issue. The best way to fix them is to break up the relationship between them. With this patch, we could avoid nvme_dev_disable to be invoked by nvme_timeout and eliminate the race between nvme_timeout and nvme_dev_disable on outstanding requests. There are 6 patches: 1st ~ 3th patches does some preparation for the 4th one. 4th is to avoid nvme_dev_disable to be invoked by nvme_timeout, and implement the synchronization between them. More details, please refer to the comment of this patch. 5th fixes a bug after 4th patch is introduced. It let nvme_delete_io_queues can only be wakeup by completion path. 6th fixes a bug found when test, it is not related with 4th patch. This patchset was tested under debug patch for some days. And some bugfix have been done. The debug patch and other patches are available in following it branch: https://github.com/jianchwa/linux-blcok.git nvme_fixes_test Jianchao Wang (6) 0001-nvme-pci-move-clearing-host-mem-behind-stopping-queu.patch 0002-nvme-pci-fix-the-freeze-and-quiesce-for-shutdown-and.patch 0003-blk-mq-make-blk_mq_rq_update_aborted_gstate-a-extern.patch 0004-nvme-pci-break-up-nvme_timeout-and-nvme_dev_disable.patch 0005-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch 0006-nvme-pci-suspend-queues-based-on-online_queues.patch diff stat following: block/blk-mq.c | 3 +- drivers/nvme/host/pci.c | 225 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------- include/linux/blk-mq.h | 1 + 3 files changed, 169 insertions(+), 60 deletions(-) Thanks Jianchao