Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932490AbdGSKha (ORCPT ); Wed, 19 Jul 2017 06:37:30 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:55512 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754106AbdGSKh0 (ORCPT ); Wed, 19 Jul 2017 06:37:26 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org BC5226071C Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=okaya@codeaurora.org Subject: Re: [PATCH] nvme: Acknowledge completion queue on each iteration To: Sagi Grimberg , linux-nvme@lists.infradead.org, timur@codeaurora.org Cc: linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Keith Busch , Jens Axboe , Christoph Hellwig , linux-kernel@vger.kernel.org References: <1500330983-27501-1-git-send-email-okaya@codeaurora.org> <5595ca25-f616-c0f8-fb2c-241a951e8848@grimberg.me> From: Sinan Kaya Message-ID: <933e9d49-ecfd-cc83-c116-29f97211480c@codeaurora.org> Date: Wed, 19 Jul 2017 06:37:22 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <5595ca25-f616-c0f8-fb2c-241a951e8848@grimberg.me> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1673 Lines: 45 On 7/19/2017 5:20 AM, Sagi Grimberg wrote: >> Code is moving the completion queue doorbell after processing all completed >> events and sending callbacks to the block layer on each iteration. >> >> This is causing a performance drop when a lot of jobs are queued towards >> the HW. Move the completion queue doorbell on each loop instead and allow new >> jobs to be queued by the HW. >> >> Signed-off-by: Sinan Kaya >> --- >> drivers/nvme/host/pci.c | 5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c >> index d10d2f2..33d9b5b 100644 >> --- a/drivers/nvme/host/pci.c >> +++ b/drivers/nvme/host/pci.c >> @@ -810,13 +810,12 @@ static void nvme_process_cq(struct nvme_queue *nvmeq) >> while (nvme_read_cqe(nvmeq, &cqe)) { >> nvme_handle_cqe(nvmeq, &cqe); >> + nvme_ring_cq_doorbell(nvmeq); >> consumed++; >> } >> - if (consumed) { >> - nvme_ring_cq_doorbell(nvmeq); >> + if (consumed) >> nvmeq->cqe_seen = 1; >> - } >> } > > Agree with Keith that this is definitely not the way to go, it > adds mmio operations in the hot path with very little gain (if > at all). > Understood, different architectures might have different latency accessing the HW registers. It might be expansive on some platform like you indicated and this change would make it worse. I'm doing a self NACK as well. -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.