Received: by 10.223.185.116 with SMTP id b49csp5946970wrg; Wed, 7 Mar 2018 22:23:48 -0800 (PST) X-Google-Smtp-Source: AG47ELvLEXlS2MdR7/Mo9344IP/SpIqF75O3O0lhTfcF4K0oEujKst4FkO2hB4e87e+TAYHSWRDw X-Received: by 10.98.64.73 with SMTP id n70mr25485029pfa.142.1520490228812; Wed, 07 Mar 2018 22:23:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520490228; cv=none; d=google.com; s=arc-20160816; b=pPT0uRiocb/UlTue9ZpyLHAR8JHmMQbhr/Y3GB8u+q75n4LeIeCRL0MOziuszzUvdC nZq38e3/fWKi8JpLKgOjL+vmISmWgc5JVApoUpzMfIw92VgTsR/j9kz3xhFwhbadgHTH LcqezUYJVX9iVHyhGf6ECM8wAKyA95Z+LOxhXbcQSmHd2lRJfAFrNHdUBnDfrEcJcqza vSGQLuvWYhIp+Dxf9Xe2eQDur6I8jTAGavU1yRymLEux+IoIb+sn4fztAHB7LXc30PvG 7XVC6tt5qOSgjXTM2yP53gcxZTeuS5E8zfKjoJoo7fl34ZXOyR761IBYzaa2I9c+MrnZ 5WhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=c5yd0J646hTIWUxR6kXiR7spe9NGxX7JxHwkctooKSU=; b=Q5eWdtB8bWT+zgQpaou/AxiaO4Awep+gvBxLZvdO9hxpuuk/qoFAq/PdVyThX8snmy Xq5lpv4dFYgJXwzzYWopmGhbBrdohJhVoPXut2aRlTAvxPwfuGdA1s6MRSyfPVYqs8fh 43nMjMfrfxw36jA5E6Sipr+lPpiEpHuBz4pVGGBHr2zhOA7MAp2xYVuEuNyRb4cxZKuI Yb7qF7y8j5P1gcCZCgYuETigydtaiEdijvXxwKXyQ5xoAYOzZvpfzPxXT3AHCGlwEybX I2VtZNPVPZWdP53hvHVasytIGq1WskB1v0HPOfH+ukrO5aKXJ6YjBAvP5dvFx6jWseXB g+CA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=h4Gawo54; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s18si12465953pgd.65.2018.03.07.22.23.34; Wed, 07 Mar 2018 22:23:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=h4Gawo54; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965392AbeCHGUg (ORCPT + 99 others); Thu, 8 Mar 2018 01:20:36 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:39114 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935361AbeCHGU3 (ORCPT ); Thu, 8 Mar 2018 01:20:29 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w286GhXx085558; Thu, 8 Mar 2018 06:20:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=c5yd0J646hTIWUxR6kXiR7spe9NGxX7JxHwkctooKSU=; b=h4Gawo54FE7LQhFYKdGGqFYoqbxxPtTmg0naLX7mKESyfLmW4ZN716b8a+XQ03YD794b me0Ib/LwU+QMB4POEO2G3WYQTLSrtKvX0yXKmK83GRV/9xj4v3MNbsWuktvIjIe2sJ2i zFXM91NriHlDtB1WON4oi67maerUpk0GQrRZE0XcOHQQGaM4k0N1TuFSCIEGE8D1Vt0j 3BTZrN+Gk46+qw+Nso2YlrJDluAqKNwKtdi7mnxN7ATq4pnyrKGVvCnDf/OB2S+OdeZL PLV4BbfGAf97y96WY6QmRDCyBxM1JaYG6Ccex1htJXSu499DDcX0wNaLE+d3ipnPA4H0 fg== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2120.oracle.com with ESMTP id 2gjyrar0qt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 08 Mar 2018 06:20:06 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w286K5LE021537 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 8 Mar 2018 06:20:05 GMT Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w286K48w005585; Thu, 8 Mar 2018 06:20:05 GMT Received: from will-ThinkCentre-M910s.cn.oracle.com (/10.182.70.254) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 07 Mar 2018 22:20:04 -0800 From: Jianchao Wang To: keith.busch@intel.com, axboe@fb.com, hch@lst.de, sagi@grimberg.me Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH V4 2/5] nvme: add helper interface to flush in-flight requests Date: Thu, 8 Mar 2018 14:19:28 +0800 Message-Id: <1520489971-31174-3-git-send-email-jianchao.w.wang@oracle.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1520489971-31174-1-git-send-email-jianchao.w.wang@oracle.com> References: <1520489971-31174-1-git-send-email-jianchao.w.wang@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8825 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=790 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803080079 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, we use nvme_cancel_request to complete the request forcedly. This has following defects: - It is not safe to race with the normal completion path. blk_mq_complete_request is ok to race with timeout path, but not with itself. - Cannot ensure all the requests have been handled. The timeout path may grab some expired requests, blk_mq_complete_request cannot touch them. add two helper interface to flush in-flight requests more safely. - nvme_abort_requests_sync use nvme_abort_req to timeout all the in-flight requests and wait until timeout work and irq completion path completes. More details please refer to the comment of this interface. - nvme_flush_aborted_requests complete the requests 'aborted' by nvme_abort_requests_sync. It will be invoked after the controller is disabled/shutdown. Signed-off-by: Jianchao Wang --- drivers/nvme/host/core.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++ drivers/nvme/host/nvme.h | 4 +- 2 files changed, 99 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 7b8df47..e26759b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3567,6 +3567,102 @@ void nvme_start_queues(struct nvme_ctrl *ctrl) } EXPORT_SYMBOL_GPL(nvme_start_queues); +static void nvme_abort_req(struct request *req, void *data, bool reserved) +{ + if (!blk_mq_request_started(req)) + return; + + dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device, + "Abort I/O %d", req->tag); + + /* The timeout path need identify this flag and return + * BLK_EH_NOT_HANDLED, then the request will not be completed. + * we will defer the completion after the controller is disabled or + * shutdown. + */ + set_bit(NVME_REQ_ABORTED, &nvme_req(req)->flags); + blk_abort_request(req); +} + +/* + * This function will ensure all the in-flight requests on the + * controller to be handled by timeout path or irq completion path. + * It has to pair with nvme_flush_aborted_requests which will be + * invoked after the controller has been disabled/shutdown and + * complete the requests 'aborted' by nvme_abort_req. + * + * Note, the driver layer will not be quiescent before disable + * controller, because the requests aborted by blk_abort_request + * are still active and the irq will fire at any time, but it can + * not enter into completion path, because the request has been + * timed out. + */ +void nvme_abort_requests_sync(struct nvme_ctrl *ctrl) +{ + struct nvme_ns *ns; + + blk_mq_tagset_busy_iter(ctrl->tagset, nvme_abort_req, ctrl); + blk_mq_tagset_busy_iter(ctrl->admin_tagset, nvme_abort_req, ctrl); + /* + * ensure the timeout_work is queued, thus needn't to sync + * the timer + */ + kblockd_schedule_work(&ctrl->admin_q->timeout_work); + + down_read(&ctrl->namespaces_rwsem); + + list_for_each_entry(ns, &ctrl->namespaces, list) + kblockd_schedule_work(&ns->queue->timeout_work); + + list_for_each_entry(ns, &ctrl->namespaces, list) + flush_work(&ns->queue->timeout_work); + + up_read(&ctrl->namespaces_rwsem); + /* This will ensure all the nvme irq completion path have exited */ + synchronize_sched(); +} +EXPORT_SYMBOL_GPL(nvme_abort_requests_sync); + +static void nvme_comp_req(struct request *req, void *data, bool reserved) +{ + struct nvme_ctrl *ctrl = (struct nvme_ctrl *)data; + + if (!test_bit(NVME_REQ_ABORTED, &nvme_req(req)->flags)) + return; + + WARN_ON(!blk_mq_request_started(req)); + + if (ctrl->tagset && ctrl->tagset->ops->complete) { + clear_bit(NVME_REQ_ABORTED, &nvme_req(req)->flags); + /* + * We set the status to NVME_SC_ABORT_REQ, then ioq request + * will be requeued and adminq one will be failed. + */ + nvme_req(req)->status = NVME_SC_ABORT_REQ; + /* + * For ioq request, blk_mq_requeue_request should be better + * here. But the nvme code will still setup the cmd even if + * the RQF_DONTPREP is set. We have to use .complete to free + * the cmd and then requeue it. + * + * For adminq request, invoking .complete directly will miss + * blk_mq_sched_completed_request, but this is ok because we + * won't have io scheduler for adminq. + */ + ctrl->tagset->ops->complete(req); + } +} + +/* + * Should pair with nvme_abort_requests_sync + */ +void nvme_flush_aborted_requests(struct nvme_ctrl *ctrl) +{ + blk_mq_tagset_busy_iter(ctrl->tagset, nvme_comp_req, ctrl); + blk_mq_tagset_busy_iter(ctrl->admin_tagset, nvme_comp_req, ctrl); +} +EXPORT_SYMBOL_GPL(nvme_flush_aborted_requests); + int nvme_reinit_tagset(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set) { if (!ctrl->ops->reinit_request) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 02097e8..3c71c73 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -104,6 +104,7 @@ struct nvme_request { enum { NVME_REQ_CANCELLED = 0, + NVME_REQ_ABORTED, /* cmd is aborted by nvme_abort_request */ }; static inline struct nvme_request *nvme_req(struct request *req) @@ -381,7 +382,8 @@ void nvme_wait_freeze(struct nvme_ctrl *ctrl); void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout); void nvme_start_freeze(struct nvme_ctrl *ctrl); int nvme_reinit_tagset(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set); - +void nvme_abort_requests_sync(struct nvme_ctrl *ctrl); +void nvme_flush_aborted_requests(struct nvme_ctrl *ctrl); #define NVME_QID_ANY -1 struct request *nvme_alloc_request(struct request_queue *q, struct nvme_command *cmd, blk_mq_req_flags_t flags, int qid); -- 2.7.4