Received: by 10.213.65.68 with SMTP id h4csp355389imn; Tue, 13 Mar 2018 06:37:05 -0700 (PDT) X-Google-Smtp-Source: AG47ELvVgNxa92MKHstZImK7wFx/MNrzFqmMQumL7uTXzLMBCRxnc/tfWutg06QLupOQ58Nzjdr2 X-Received: by 2002:a17:902:2862:: with SMTP id e89-v6mr619106plb.348.1520948225387; Tue, 13 Mar 2018 06:37:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520948225; cv=none; d=google.com; s=arc-20160816; b=y4qtRJtFSzG9prqUJSmC1Wi7gxNQgvQaUv+yIaa0TTaoRurisce5asPUgfGr6l6CtE BXvF1mbx/nj8y0aGhZf7YePBRnSsFdwKzpbaMAjWNrf+VicITmihAq4o6f7ZTQxb/TkE KrjU6oeu3sqhsxS9+a8+plCKRWgt+/oIijpuKcwXZzXbAnv35sO7E4snymIR4Mv41Rfq lZDNETYzXIAX2KD1d7OB4wfyvUusTwWafpNjJiVJU9/SjFaQqFp0HLfweI/+A3cCWL0r xg60me0RU2Y7JkNTc/Id8yB9k16v4KuxeVGM+TEiV18eCKTNLABDn4Ga5k34XA5Txa03 P06g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=Hbcg3ivHQcSMeE0GYtQJzUKec5GcfusbCx3PBCTe/WU=; b=t1vrIPVBIECfHlI0PxcrtuGHX7OamN7VI1hSTfGGQ3NrbSvIE1MdxTREuYpOeSeDoe 3eLj+EGiRqSmzQ1JLw32DwB5rCCcrHok78hfPnJCf+qoMU6voVEyXdAvN/YuAnybd3/d YQWZLPt5868lByBX++dv1mJ7jNoNPcXIcCTVNpfAxehLz/Pfmc/NjQHkK+cuFbVAEcH7 1pwGI6vifMBf9mKIuE+WjJNmJokkHZ29tHhy8yMln75XXlf83Ls2QnMPDA6cTuuSpQVS /PQpXu+FywRE1vhCibJtsiGVH1aBAcOWTyc3OIoLfBBECrbkgHAGsyMLjZdYK2lKtSo6 CUsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=mYthoKtU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n3si125501pgr.189.2018.03.13.06.36.50; Tue, 13 Mar 2018 06:37:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=mYthoKtU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752907AbeCMNe2 (ORCPT + 99 others); Tue, 13 Mar 2018 09:34:28 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:51872 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752283AbeCMNeY (ORCPT ); Tue, 13 Mar 2018 09:34:24 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w2DDXc31116126; Tue, 13 Mar 2018 13:33:49 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=Hbcg3ivHQcSMeE0GYtQJzUKec5GcfusbCx3PBCTe/WU=; b=mYthoKtU5c1xNavMIVo3ZcQ1MasCYObn+0CiI8XC5+FGvrtJNPoHwqmlp9/7rgHfqCpr SkXhN/Ww1yFGTne3g4aeEy02F+RAY86WTo4R++tsey0kjUJeqDPNZ+N9PFOP8o9cUJFA VmUDdB/6G9OP07J4SdfGsFtovYrQiGQNu1m993ocgZJWcft51puHsdrBhQ+7YbQgPXrx 3jEeAwZmf8i07hWCII1HVc9YOCfNOEH1Odf5hbSOnYKMgV0fH3jCfZnUerA7WffmWl7S YqexH1Mt/j2VJZgU7F0hBB4ZovyiAvjpEVRszk20ZNHw/6WtmhJCR79aE7eIgP+5y9Hb 7A== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2gpcnq8v5f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Mar 2018 13:33:48 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w2DDU4le004900 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Mar 2018 13:30:04 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w2DDU3ur005372; Tue, 13 Mar 2018 13:30:03 GMT Received: from [10.191.22.3] (/10.191.22.3) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 13 Mar 2018 06:30:03 -0700 Subject: Re: [PATCH V4 3/5] nvme-pci: avoid nvme_dev_disable to be invoked in nvme_timeout To: keith.busch@intel.com, axboe@fb.com, hch@lst.de, sagi@grimberg.me Cc: linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org References: <1520489971-31174-1-git-send-email-jianchao.w.wang@oracle.com> <1520489971-31174-4-git-send-email-jianchao.w.wang@oracle.com> <5e60e6bc-d0cc-9ea9-1a35-ed5031904da6@oracle.com> From: "jianchao.wang" Message-ID: <200f5d7c-77f6-d90b-a409-08fad712139d@oracle.com> Date: Tue, 13 Mar 2018 21:29:55 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <5e60e6bc-d0cc-9ea9-1a35-ed5031904da6@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8830 signatures=668690 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803130161 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Keith Would you please take a look at this patch. I really need your suggestion on this. Sincerely Jianchao On 03/09/2018 10:01 AM, jianchao.wang wrote: > Hi Keith > > Can I have the honor of getting your comment on this patch? > > Thanks in advance > Jianchao > > On 03/08/2018 02:19 PM, Jianchao Wang wrote: >> nvme_dev_disable will issue command on adminq to clear HMB and >> delete io cq/sqs, maybe more in the future. When adminq no response, >> it has to depends on timeout path. However, nvme_timeout has to >> invoke nvme_dev_disable before return, so that the DMA mappings >> could be released safely. This will introduce dangerous circular >> dependence. Moreover, the whole nvme_dev_disable is under >> shutdown_lock even waiting for the command, this makes things >> worse. >> >> To avoid this, this patch refactors the nvme_timeout. The basic >> principle is: >> - When need to schedule reset_work, hand over expired requests >> to nvme_dev_disable. They will be completed after the controller >> is disabled/shtudown. >> - When requests from nvme_dev_disable and nvme_reset_work expires, >> disable the controller directly then the request could be completed >> to wakeup the waiter. nvme_pci_disable_ctrl_directly is introduced >> for this, it doesn't send commands on adminq and the shutdown_lock >> is not needed here, because the nvme_abort_requests_sync in >> nvme_dev_disable could synchronize with nvme_timeout. >> >> Signed-off-by: Jianchao Wang >> --- >> drivers/nvme/host/pci.c | 199 ++++++++++++++++++++++++++++++++---------------- >> 1 file changed, 134 insertions(+), 65 deletions(-) >> >> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c >> index e186158..ce09057 100644 >> --- a/drivers/nvme/host/pci.c >> +++ b/drivers/nvme/host/pci.c >> @@ -70,6 +70,7 @@ struct nvme_queue; >> >> static void nvme_process_cq(struct nvme_queue *nvmeq); >> static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown); >> +static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev); >> >> /* >> * Represents an NVM Express device. Each nvme_dev is a PCI function. >> @@ -98,6 +99,7 @@ struct nvme_dev { >> u32 cmbloc; >> struct nvme_ctrl ctrl; >> struct completion ioq_wait; >> + bool inflight_flushed; >> >> /* shadow doorbell buffer support: */ >> u32 *dbbuf_dbs; >> @@ -1180,73 +1182,13 @@ static void nvme_warn_reset(struct nvme_dev *dev, u32 csts) >> csts, result); >> } >> >> -static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) >> +static enum blk_eh_timer_return nvme_pci_abort_io_req(struct request *req) >> { >> struct nvme_iod *iod = blk_mq_rq_to_pdu(req); >> struct nvme_queue *nvmeq = iod->nvmeq; >> struct nvme_dev *dev = nvmeq->dev; >> - struct request *abort_req; >> struct nvme_command cmd; >> - u32 csts = readl(dev->bar + NVME_REG_CSTS); >> - >> - /* >> - * Reset immediately if the controller is failed >> - */ >> - if (nvme_should_reset(dev, csts)) { >> - nvme_warn_reset(dev, csts); >> - nvme_dev_disable(dev, false); >> - nvme_reset_ctrl(&dev->ctrl); >> - return BLK_EH_HANDLED; >> - } >> - >> - /* >> - * Did we miss an interrupt? >> - */ >> - if (__nvme_poll(nvmeq, req->tag)) { >> - dev_warn(dev->ctrl.device, >> - "I/O %d QID %d timeout, completion polled\n", >> - req->tag, nvmeq->qid); >> - return BLK_EH_HANDLED; >> - } >> - >> - /* >> - * Shutdown immediately if controller times out while starting. The >> - * reset work will see the pci device disabled when it gets the forced >> - * cancellation error. All outstanding requests are completed on >> - * shutdown, so we return BLK_EH_HANDLED. >> - */ >> - switch (dev->ctrl.state) { >> - case NVME_CTRL_CONNECTING: >> - case NVME_CTRL_RESETTING: >> - dev_warn(dev->ctrl.device, >> - "I/O %d QID %d timeout, disable controller\n", >> - req->tag, nvmeq->qid); >> - nvme_dev_disable(dev, false); >> - set_bit(NVME_REQ_CANCELLED, &nvme_req(req)->flags); >> - return BLK_EH_HANDLED; >> - default: >> - break; >> - } >> - >> - /* >> - * Shutdown the controller immediately and schedule a reset if the >> - * command was already aborted once before and still hasn't been >> - * returned to the driver, or if this is the admin queue. >> - */ >> - if (!nvmeq->qid || iod->aborted) { >> - dev_warn(dev->ctrl.device, >> - "I/O %d QID %d timeout, reset controller\n", >> - req->tag, nvmeq->qid); >> - nvme_dev_disable(dev, false); >> - nvme_reset_ctrl(&dev->ctrl); >> - >> - /* >> - * Mark the request as handled, since the inline shutdown >> - * forces all outstanding requests to complete. >> - */ >> - set_bit(NVME_REQ_CANCELLED, &nvme_req(req)->flags); >> - return BLK_EH_HANDLED; >> - } >> + struct request *abort_req; >> >> if (atomic_dec_return(&dev->ctrl.abort_limit) < 0) { >> atomic_inc(&dev->ctrl.abort_limit); >> @@ -1282,6 +1224,105 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) >> return BLK_EH_RESET_TIMER; >> } >> >> +static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) >> +{ >> + struct nvme_iod *iod = blk_mq_rq_to_pdu(req); >> + struct nvme_queue *nvmeq = iod->nvmeq; >> + struct nvme_dev *dev = nvmeq->dev; >> + u32 csts = readl(dev->bar + NVME_REG_CSTS); >> + enum {ABORT, RESET, DISABLE} action; >> + enum blk_eh_timer_return ret; >> + /* >> + * This is for nvme_abort_req. This request will be completed >> + * by nvme_flush_aborted_requests after the controller is >> + * disabled/shutdown >> + */ >> + if (test_bit(NVME_REQ_ABORTED, &nvme_req(req)->flags)) >> + return BLK_EH_NOT_HANDLED; >> + >> + /* >> + * Reset immediately if the controller is failed. >> + * Defer the completion to nvme_flush_aborted_requests. >> + */ >> + if (nvme_should_reset(dev, csts)) { >> + nvme_warn_reset(dev, csts); >> + nvme_reset_ctrl(&dev->ctrl); >> + return BLK_EH_RESET_TIMER; >> + } >> + >> + /* >> + * Did we miss an interrupt? >> + */ >> + if (__nvme_poll(nvmeq, req->tag)) { >> + dev_warn(dev->ctrl.device, >> + "I/O %d QID %d timeout, completion polled\n", >> + req->tag, nvmeq->qid); >> + return BLK_EH_HANDLED; >> + } >> + >> + if (nvmeq->qid) { >> + if (dev->ctrl.state == NVME_CTRL_RESETTING || >> + iod->aborted) >> + action = RESET; >> + else >> + action = ABORT; >> + } else { >> + /* >> + * Disable immediately if controller times out while disabling/ >> + * shuting down/starting. The nvme_dev_disable/nvme_reset_work >> + * will see the error. >> + * Note: inflight_flushed is set in nvme_dev_disable when it >> + * abort all the inflight requests. Introducing this flag is due >> + * to there is no state to represent the shutdown procedure. >> + */ >> + if (dev->ctrl.state == NVME_CTRL_CONNECTING || >> + dev->inflight_flushed) >> + action = DISABLE; >> + else >> + action = RESET; >> + } >> + >> + switch (action) { >> + case ABORT: >> + ret = nvme_pci_abort_io_req(req); >> + break; >> + case RESET: >> + dev_warn(dev->ctrl.device, >> + "I/O %d QID %d timeout, reset controller\n", >> + req->tag, nvmeq->qid); >> + set_bit(NVME_REQ_CANCELLED, &nvme_req(req)->flags); >> + nvme_reset_ctrl(&dev->ctrl); >> + /* >> + * The reset work will take over this request. nvme_abort_req >> + * employs blk_abort_request to force the request to be timed >> + * out. So we need to return BLK_EH_RESET_TIMER then the >> + * RQF_MQ_TIMEOUT_EXPIRED could be cleared. >> + */ >> + ret = BLK_EH_RESET_TIMER; >> + break; >> + case DISABLE: >> + /* >> + * We disable the controller directly here so that we could >> + * complete the request safely to wake up the nvme_dev_disable/ >> + * nvme_reset_work who is waiting on adminq. We cannot return >> + * BLK_EH_RESET_TIMER to depend on error recovery procedure >> + * because it is waiting for timeout path. >> + */ >> + dev_warn(dev->ctrl.device, >> + "I/O %d QID %d timeout, disable controller\n", >> + req->tag, nvmeq->qid); >> + nvme_pci_disable_ctrl_directly(dev); >> + set_bit(NVME_REQ_CANCELLED, &nvme_req(req)->flags); >> + ret = BLK_EH_HANDLED; >> + break; >> + default: >> + WARN_ON(1); >> + break; >> + } >> + >> + return ret; >> +} >> + >> static void nvme_free_queue(struct nvme_queue *nvmeq) >> { >> dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth), >> @@ -2169,6 +2210,33 @@ static void nvme_pci_disable(struct nvme_dev *dev) >> } >> } >> >> +/* >> + * This is only invoked by nvme_timeout. shutdown_lock is not needed >> + * here because nvme_abort_requests_sync in nvme_dev_disable will >> + * synchronize with the timeout path. >> + */ >> +static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev) >> +{ >> + int i; >> + struct pci_dev *pdev = to_pci_dev(dev->dev); >> + bool dead = true; >> + >> + if (pci_is_enabled(pdev)) { >> + u32 csts = readl(dev->bar + NVME_REG_CSTS); >> + >> + dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) || >> + pdev->error_state != pci_channel_io_normal); >> + >> + if (!dead) >> + nvme_disable_ctrl(&dev->ctrl, dev->ctrl.cap); >> + } >> + >> + for (i = dev->ctrl.queue_count - 1; i >= 0; i--) >> + nvme_suspend_queue(&dev->queues[i]); >> + >> + nvme_pci_disable(dev); >> +} >> + >> static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) >> { >> int i; >> @@ -2205,6 +2273,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) >> >> } >> nvme_stop_queues(&dev->ctrl); >> + nvme_abort_requests_sync(&dev->ctrl); >> + dev->inflight_flushed = true; >> >> if (!dead) { >> nvme_disable_io_queues(dev); >> @@ -2215,9 +2285,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) >> >> nvme_pci_disable(dev); >> >> - blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl); >> - blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl); >> - >> + nvme_flush_aborted_requests(&dev->ctrl); >> + dev->inflight_flushed = false; >> /* >> * The driver will not be starting up queues again if shutting down so >> * must flush all entered requests to their failed completion to avoid >> > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme@lists.infradead.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=5Mw5gjePWHT_YZ77OroeGMhDmXLF0LW6EtV-hBMRYas&s=VM31J1nvWkmoWzuZIHV0f5nlrSW0ZTKUQ1TjLwbB18g&e= >