Received: by 10.223.148.5 with SMTP id 5csp7251882wrq; Thu, 18 Jan 2018 03:03:16 -0800 (PST) X-Google-Smtp-Source: ACJfBou6XbQqtnc7Pqk+p8MW54AB4DZ4tJhKWC/z+6J7yoe2WgKVVNk3NfgTWx94PVENejEXN9zX X-Received: by 10.159.245.150 with SMTP id a22mr36145473pls.60.1516273396258; Thu, 18 Jan 2018 03:03:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516273396; cv=none; d=google.com; s=arc-20160816; b=BXkqKdk70eTQ821S8N3ZuQBLHyd/nFNb/uWMZpYm89Duvf7QP+EzDLHkWo8opJ5YCN wUN326ZHvoJs/VuydM+/blBJErn/Pav0YkLKNXb9lTAcAJRiqC86y1NWmEF5gBY8Lx3H 2xBJCwCrs6piwq5L9rb7ccyMzlVuD4bJi2oXI0bi/Zvxk1YevlTxomvyGjaXZQh3hF/g ke4kXpD58ozkczJcb5OSWHdqTCUIjmtM9sGIiadssvesy25/T3Elgo6ViB8lskdbEpj3 H3xtuU2vOoFgVuTVsmTHQ1bnNJeo7L9/ma8X+ZlsQJnzewQKK+xqEOW6MmcTqykbQN+b WWJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=lxEDo8dEFxSaBMMhTMuXrtU7CC4UFUbhJzmbR7V2jho=; b=ejKE7wUjP/N104m7RUFu7QRr1HuwX+pAuEaER8ALjBY96ufpBfCvugP1CTKJJoPGi5 lBRFqyYs38GUHZaNEfzaNRZP90C1T3uxYG+YnB9vMRFV5igZkU1fitomtjOTnY675hlO djwbWbNIRwuqzBDigBYjp/gDaj89Emn7vsR9XUuXVv16TFNZAqcgywuOwxgTisbo5dR9 HJyK4daoxGnHI/hQtLSMBDPWp5qg4i2IeGhcZd9IeO3yIoZlsGLyyFPnLNipkbA8PBUo Gfkv75gXl6tIWXQnnyF9oIxfDaPdvjKNYlCDm6uCGbegjniQSisgnAXJnkKc2dSz3aPZ ZwCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=epRs3zjF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k65si6683194pfa.98.2018.01.18.03.03.02; Thu, 18 Jan 2018 03:03:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=epRs3zjF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755644AbeARKLg (ORCPT + 99 others); Thu, 18 Jan 2018 05:11:36 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:54466 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755629AbeARKLe (ORCPT ); Thu, 18 Jan 2018 05:11:34 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0IAAA4W183537; Thu, 18 Jan 2018 10:10:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id; s=corp-2017-10-26; bh=lxEDo8dEFxSaBMMhTMuXrtU7CC4UFUbhJzmbR7V2jho=; b=epRs3zjF5E8I3kLrWGxp8H6P3aN++hAFEPszH6ym1d/qtGfG6HnN2jHRsSmAzDxHlXEb p7a7LRStjgBH5v+pzGY3Czlq8o8bDzg+B+gs+aPwnJ5KPoJKRRK51tAek7N7K9iTG/IP cQ91T7atorGW7VukNNl5ZRffVySIiVFM2JBrRo9g8w7TY1m8Vdj+QHBuMEF3Lqmwfa5M 3puiNSZxiMIGsYgTDM9XGJiTNGE4d4PNA4u9DQdGnv1Lb/ogln3pzvKrFAzEqM8dsMAf 7i0x308h/InRZIblulwNcVOgkz5PHdEkMP3S+ScqsnRc8iFz/+KW9JJJ0DnILyeUaMYo lA== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2130.oracle.com with ESMTP id 2fjskfg01u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 18 Jan 2018 10:10:23 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w0IAAMAG011429 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 18 Jan 2018 10:10:23 GMT Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w0IAALv5001119; Thu, 18 Jan 2018 10:10:22 GMT Received: from will-ThinkCentre-M910s.cn.oracle.com (/10.182.70.254) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 18 Jan 2018 02:10:21 -0800 From: Jianchao Wang To: keith.busch@intel.com, axboe@fb.com, hch@lst.de, sagi@grimberg.me, maxg@mellanox.com, james.smart@broadcom.com Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is ongoing Date: Thu, 18 Jan 2018 18:10:00 +0800 Message-Id: <1516270202-8051-1-git-send-email-jianchao.w.wang@oracle.com> X-Mailer: git-send-email 2.7.4 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8777 signatures=668653 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801180144 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Please consider the following scenario. nvme_reset_ctrl -> set state to RESETTING -> queue reset_work (scheduling) nvme_reset_work -> nvme_dev_disable -> quiesce queues -> nvme_cancel_request on outstanding requests -------------------------------_boundary_ -> nvme initializing (issue request on adminq) Before the _boundary_, not only quiesce the queues, but only cancel all the outstanding requests. A request could expire when the ctrl state is RESETTING. - If the timeout occur before the _boundary_, the expired requests are from the previous work. - Otherwise, the expired requests are from the controller initializing procedure, such as sending cq/sq create commands to adminq to setup io queues. In current implementation, nvme_timeout cannot identify the _boundary_ so only handles second case above. In fact, after Sagi's commit (nvme-rdma: fix concurrent reset and reconnect), both nvme-fc/rdma have following pattern: RESETTING - quiesce blk-mq queues, teardown and delete queues/ connections, clear out outstanding IO requests... RECONNECTING - establish new queues/connections and some other initializing things. Introduce RECONNECTING to nvme-pci transport to do the same mark Then we get a coherent state definition among nvme pci/rdma/fc transports and nvme_timeout could identify the _boundary_. V5: - discard RESET_PREPARE and introduce RESETTING into nvme-pci - change the 1st patch's name and comment - other misc changes V4: - rebase patches on Jens' for-next - let RESETTING equal to RECONNECTING in terms of work procedure - change the 1st patch's name and comment - other misc changes V3: - fix wrong reference in loop.c - other misc changes V2: - split NVME_CTRL_RESETTING into NVME_CTRL_RESET_PREPARE and NVME_CTRL_RESETTING. Introduce new patch based on this. - distinguish the requests based on the new state in nvme_timeout - change comments of patch drivers/nvme/host/core.c | 2 +- drivers/nvme/host/pci.c | 43 ++++++++++++++++++++++++++++++++----------- 2 files changed, 33 insertions(+), 12 deletions(-) Thanks Jianchao