Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2341798imu; Sat, 8 Dec 2018 22:24:36 -0800 (PST) X-Google-Smtp-Source: AFSGD/UG6g0Dvde0S5Jw1hp/ZrKejsEcbCUtnv2oCTGdlxtkvRLP3ZLIsZKRtkG0HWkKlCDm1MYT X-Received: by 2002:a17:902:9897:: with SMTP id s23mr7506343plp.69.1544336676710; Sat, 08 Dec 2018 22:24:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544336676; cv=none; d=google.com; s=arc-20160816; b=gK7x/Y4eIObXsvsn/iFkP+xZmZ1d8E0ts7tNdqpRqcGKVCSYfLXZGBwkHKwwrlf8iN jJRqwjP9VgJsryLqUeXVQoU2EBDnstxKdae2/Kr3/dfGoecU/cW3nMIv0suDddtqGmSD O77VB7Ms6UjWPyUF7xaqwGpguaUObXzgD0RZGyvBQv1rR95MwD5KHCpoVY0cp3qcUCJp DB70kpDbVsdBWPxhwY8qe++61yd8+DobRe2cS3rogRBPeUt/X7XRP/u45FhzPlJDcx0H YAFMZOBT2C7noUL16X4UatxlbFHxYe2iu3f+QWLydltaT17p/2SjGripOgrWARy1T7au 30LA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=vrDZhVhESJrgo9I37HsCYlD0UhJd1Z3L7pfxlocMpRY=; b=ewDCqEi/EtglYJIRdyo3uoRMOr/H4AFv/l/U/DpGiokZUqxpoPSp3TKWu191eAXpOQ ctN8FJON73ECZwX94lNK5+ahX06wCWTpUNAaLtDbZsBVjvXAd4SABJ4Gz7pmoXZRj0X6 LRLcuwrO3z4jqCc4fTkRtblqRTNFIlIO/TPG+Gsc/JFENdJTizpoQERiqUfH1ggT7ggH de9aZ2NArjNvJUw/ah01zMvKftyaBF+RRvBChf5j6Y8gkVkzRk0f8sxGqvEi5f7yS1Nn TcMp77ibuEq7krObWrdwPqQAbM/OLxfPy9cmySvgBknEWEkW9NsWi2x4CR3vC+u9r8fe 0ilA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=f93ppxyP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o7si6765214pgh.403.2018.12.08.22.24.21; Sat, 08 Dec 2018 22:24:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=f93ppxyP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726113AbeLIGWf (ORCPT + 99 others); Sun, 9 Dec 2018 01:22:35 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:36426 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726066AbeLIGWe (ORCPT ); Sun, 9 Dec 2018 01:22:34 -0500 Received: by mail-pf1-f195.google.com with SMTP id b85so3854075pfc.3 for ; Sat, 08 Dec 2018 22:22:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=vrDZhVhESJrgo9I37HsCYlD0UhJd1Z3L7pfxlocMpRY=; b=f93ppxyP+g/HMh0XPcIluZMRvd3Kx7I4fKxPVBGqTntU1ePzIyYb7D4Ghe9PmN7GGP f75ncJBga2sRv89wuyrzpPTl6mLbvBnuM3R46IFORfeYCacCH3Pm8RouZI2YPrxBIR2Z 4NVv1f4Re3fVs+3u8+swXH3DByJtcvHXMTDCRABmJvduLRedZrgUxya2cIg/37robvwh jlCsWwr8Ilk/qEOr447gVrIYMrBMQoJlw9Ysc5VW725J4lXZ5JB6NGvfY7Uggg4JfxDM Wh+4xOu/MbxUCGMb9Oa2o57mOnJqLFXBf+s0dtXt3cA5L/GfPBUQ9qSLzT44sYTchEzC TCng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=vrDZhVhESJrgo9I37HsCYlD0UhJd1Z3L7pfxlocMpRY=; b=r6CtC092PKt1sD6lGd4ajGVF8L6kAc6lUakO3lZ2JyL12a+m13hKuaFU21Gi/S0WML +J9rNZ6Ez4jxutJIv5sFucVh3KkgN36BalrSde4MctKAB7nE4uAejviDyL3dK6mbO/FR QZor/7ya/WLe7oLS7spb5BEvVgAnPWaZvjh7jBt8rFsEnKcoCRdfMNPw1dCNXPtznfEu UpyWDzQ/3Eu8GypYxz/Z4HVe9qRB6uCyuUwXvt3FUA5cOycyQHvjPG33/aZCGNw+0SCu MEdjkveC3OGhXrJpf4hi5eIuEZgOzSzKFIPjUL4opZdtFsVpKwJ7V83GP9yt4MP0DbCX PWOg== X-Gm-Message-State: AA+aEWZrVTJiDBgJHBNd4zMiIxHLX+UXLHjLlb/P4DDVgstbn+aPnj0Q F8gdl6eGfAOzvsg3/8UZdIbqk93y X-Received: by 2002:a62:9913:: with SMTP id d19mr7992353pfe.107.1544336553506; Sat, 08 Dec 2018 22:22:33 -0800 (PST) Received: from server.roeck-us.net ([2600:1700:e321:62f0:329c:23ff:fee3:9d7c]) by smtp.gmail.com with ESMTPSA id x27sm19057521pfe.178.2018.12.08.22.22.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 08 Dec 2018 22:22:32 -0800 (PST) Subject: Re: [PATCH] nvme: default to 0 poll queues To: Jens Axboe Cc: Christoph Hellwig , Keith Busch , Sagi Grimberg , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org References: <20181209004953.GA11638@roeck-us.net> From: Guenter Roeck Message-ID: <4ad5653b-1cd4-a770-2290-ca032eeb7072@roeck-us.net> Date: Sat, 8 Dec 2018 22:22:31 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/8/18 9:38 PM, Jens Axboe wrote: > On 12/8/18 5:49 PM, Guenter Roeck wrote: >> Hi, >> >> On Mon, Nov 19, 2018 at 08:18:24AM -0700, Jens Axboe wrote: >>> We need a better way of configuring this, and given that polling is >>> (still) a bit niche, let's default to using 0 poll queues. That way >>> we'll have the same read/write/poll behavior as 4.20, and users that >>> want to test/use polling are required to do manual configuration of the >>> number of poll queues. >>> >>> Reviewed-by: Christoph Hellwig >>> Signed-off-by: Jens Axboe >>> --- >> >> This patch results in a boot stall when booting parisc (hppa) images >> from nvme in qemu. >> >> ... >> Fusion MPT SAS Host driver 3.04.20 >> rcu: INFO: rcu_sched detected stalls on CPUs/tasks: >> rcu: (detected by 0, t=5252 jiffies, g=141, q=22) >> rcu: All QSes seen, last rcu_sched kthread activity 5252 (-66742--71994), jiffies_till_next_fqs=1, root ->qsmask 0x0 >> kworker/u8:3 R running task 0 85 2 0x00000004 >> Workqueue: nvme-reset-wq nvme_reset_work >> Backtrace: >> [<10190d20>] show_stack+0x28/0x38 >> [<101dd1e0>] sched_show_task.part.3+0xc4/0x144 >> [<101dd290>] sched_show_task+0x30/0x38 >> [<10221e18>] rcu_check_callbacks+0x760/0x7a4 >> >> rcu: rcu_sched kthread starved for 5252 jiffies! g141 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 >> rcu: RCU grace-period kthread stack dump: >> rcu_sched R running task 0 10 2 0x00000000 >> Backtrace: >> [<10995b1c>] __schedule+0x214/0x648 >> [<10995f94>] schedule+0x44/0xa8 >> [<1099a7c4>] schedule_timeout+0x114/0x1a0 >> [<10220e70>] rcu_gp_kthread+0x744/0x968 >> [<101d5438>] kthread+0x154/0x15c >> [<1019501c>] ret_from_kernel_thread+0x1c/0x24 >> >> [ continued ] >> >> This is only seen in SMP configurations; non-SMP configurations are ok. >> Reverting the patch fixes the problem. v4.20-rcX and earlier kernels >> also boot without problems. >> >> For reference, here is the qemu command line. This is with qemu 3.0. >> >> qemu-system-hppa -kernel vmlinux -no-reboot \ >> -snapshot \ >> -device nvme,serial=foo,drive=d0 \ >> -drive file=rootfs.ext2,if=none,format=raw,id=d0 \ >> -append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0,115200 ' \ >> -nographic -monitor null >> >> Please let me know if you need additional information. > > Hmm, I think the queue reduction case has a logic error. Actually there > are two bugs: > > 1) Ensure we don't keep overwriting the queue count we ask for > 2) Don't include poll_queues in the vectors we need > > Untested... And not super pretty. But does this work for you? > It solves the boot problem on parisc/hppa. I didn't test with any other architectures. Should I run a complete test sequence ? Guenter > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 7732c4979a4e..fe00e19493ae 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -2083,7 +2083,7 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int nr_io_queues) > } > } > > -static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues) > +static int nvme_setup_irqs(struct nvme_dev *dev, int irq_queues, int pqueues) > { > struct pci_dev *pdev = to_pci_dev(dev->dev); > int irq_sets[2]; > @@ -2100,7 +2100,8 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues) > * IRQ vector needs. > */ > do { > - nvme_calc_io_queues(dev, nr_io_queues); > + nvme_calc_io_queues(dev, irq_queues + pqueues); > + pqueues = dev->io_queues[HCTX_TYPE_POLL]; > irq_sets[0] = dev->io_queues[HCTX_TYPE_DEFAULT]; > irq_sets[1] = dev->io_queues[HCTX_TYPE_READ]; > if (!irq_sets[1]) > @@ -2111,11 +2112,11 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues) > * 1 + 1 queues, just ask for a single vector. We'll share > * that between the single IO queue and the admin queue. > */ > - if (!(result < 0 && nr_io_queues == 1)) > - nr_io_queues = irq_sets[0] + irq_sets[1] + 1; > + if (!(result < 0 || irq_queues == 1)) > + irq_queues = irq_sets[0] + irq_sets[1] + 1; > > - result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues, > - nr_io_queues, > + result = pci_alloc_irq_vectors_affinity(pdev, irq_queues, > + irq_queues, > PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd); > > /* > @@ -2125,12 +2126,12 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues) > * likely does not. Back down to ask for just one vector. > */ > if (result == -ENOSPC) { > - nr_io_queues--; > - if (!nr_io_queues) > + irq_queues--; > + if (!irq_queues) > return result; > continue; > } else if (result == -EINVAL) { > - nr_io_queues = 1; > + irq_queues = 1; > continue; > } else if (result <= 0) > return -EIO; > @@ -2144,7 +2145,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) > { > struct nvme_queue *adminq = &dev->queues[0]; > struct pci_dev *pdev = to_pci_dev(dev->dev); > - int result, nr_io_queues; > + int result, want_irqs, nr_io_queues, pqueues; > unsigned long size; > > nr_io_queues = max_io_queues(); > @@ -2185,7 +2186,20 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) > */ > pci_free_irq_vectors(pdev); > > - result = nvme_setup_irqs(dev, nr_io_queues); > + /* > + * If we don't get the number of IO queues we asked for, see if we > + * need to adjust the number of poll queues down > + */ > + pqueues = poll_queues; > + if (!pqueues) > + want_irqs = nr_io_queues; > + else if (pqueues >= nr_io_queues) { > + want_irqs = 1; > + pqueues = nr_io_queues - 1; > + } else > + want_irqs = nr_io_queues - pqueues; > + > + result = nvme_setup_irqs(dev, want_irqs, pqueues); > if (result <= 0) > return -EIO; > >