Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2319214imu; Sat, 8 Dec 2018 21:40:19 -0800 (PST) X-Google-Smtp-Source: AFSGD/W2zEGIf143ssduSD4TwBw0bKgSx2Qnur85/OP0898H7e5HvbaT4ryPe8hlYkI9UKJGmHsb X-Received: by 2002:a17:902:e085:: with SMTP id cb5mr7689973plb.24.1544334019277; Sat, 08 Dec 2018 21:40:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544334019; cv=none; d=google.com; s=arc-20160816; b=NN6vv6VaPC4UB62fMKleGH6N6mghFAjeFNdvyzyLQztYD2Q+cgFHFPuJv81Qh8BeT4 l43SPly013l29ipjc2r17rX2AwOlmPFPhpm2kPZqqvFQ1ttys8CSHSClnWDi00J8WJhS 41wdsUM6S8NO6R7DWun6F9mrP+cIafakg9nZ+ov2xYt+lvCONLNoK9BnmKGaD/5yO871 vJru039f7D9OJJ+4hAp7UQ1mutERKsjudpvOpHYUw6cZ7A2pxjDz/FxZHAFTGsLIfyZg nG1DyykQLcH4kGKHvtU6BcAwf7wmYgDi5M2F3yPRcewKcBOycUd0tJBpAlV9xBgZAILB BAfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=CiGWkZTOQhgw4/ULUiDNZ31gNjqoVtx24yRL3+wf5+s=; b=WOpzoYHivJn9ZKaDqw562sTsPrxFjmwGu+LQ+0+5p9vUKviZ9bav8X583pF6Fj+Zf8 Tyd+/WMzrNwl8x/lX7Ik797YcO712XJsf3eUDKt0FBlLZcxIPhi1VTP3YufPIhXoDdTv tIWvF8/JB+xhDql8mhckp0nfQNMGWUa5An6Y0Rd3kAyWquYtbWTysrT6A9LKUoiXIHgR EaJBUkJQR2/gaecFQN6jQ6rpY7VOWgOZ1EdKjYxzA+mlqZsCeq3syNAHeaJAHLL4ZvAM U5QUODLUqk0lUO+Zf+aYko3eJ+lK3FE2Sy73M3Tm+hfCAbHQhsjWKIoF8KTeGhbS0qGp WWRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=dTUT44cp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j5si7343717pfg.254.2018.12.08.21.39.51; Sat, 08 Dec 2018 21:40:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=dTUT44cp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726088AbeLIFiy (ORCPT + 99 others); Sun, 9 Dec 2018 00:38:54 -0500 Received: from mail-pf1-f196.google.com ([209.85.210.196]:36255 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726067AbeLIFiy (ORCPT ); Sun, 9 Dec 2018 00:38:54 -0500 Received: by mail-pf1-f196.google.com with SMTP id b85so3829462pfc.3 for ; Sat, 08 Dec 2018 21:38:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=CiGWkZTOQhgw4/ULUiDNZ31gNjqoVtx24yRL3+wf5+s=; b=dTUT44cpaidpnCaltrpkmJNkjLW1BiibcJWN3Gf7NfM2PCWV2oibeZklH+Ds5xO5Xj iAzqf4W02RhplxHhvDNySQpmoqYym1EcwYfw08HXtoNUKvKwqtwuxj/NotNGnTEBwk2i GjCfeSLpFBRkIw2i6cqJfMr82toIyby8D/xEla2CvQUdjkxbLj1tjtJ1JvoezU7S4oav QpLD+ezcs81RK4hfBV8jwSHdOLAabeKuqcODk8fbFeqKgu1RKL9GpmDFCdhYWvz4V/aM 0FfSbCZkIBICmBo9MutMyLDs70ubnT5qXF4sg15ec555W/jNcahTgn+Djaam7WSLlqLt jiYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=CiGWkZTOQhgw4/ULUiDNZ31gNjqoVtx24yRL3+wf5+s=; b=rK2gwCe+oYBLiqB3H3zdFe9hmXsUqPjdt5idJXCJ581zYKtjEjq4T4rFpqi4gi9fWw /byKuqiJhH2qblMx64/6zqXp76SR5Uux7xOsAjR3o/iXpmmAkjj4cMgCT/8VteXMjTEX LaJmNjGhDDaHO/CwjtXTuLTiQyn2szsxsqLGvlJmri2BahDM9KipQeREtshlDYzANgs8 wO811mWIEt5xQ3Bqlfpp7SNNKRSeHR/fVhqG18A7EOwGDRoGOA0VJWmcTNgeCCLBeEpt occhYN6KCS5iwZigFGrUIVz1985NmhyQPkbFk4eZWniKa4EI2/zbpojKtfRAkD/uzS1X XAvA== X-Gm-Message-State: AA+aEWYfr52iADPSlywiwsmV3yrWKRH9l9RylAxwdMLTEJI0j7e78dBr OpHxd8khUXTiA2KBaI89pncSni7kf6o= X-Received: by 2002:a65:610d:: with SMTP id z13mr7031867pgu.427.1544333932417; Sat, 08 Dec 2018 21:38:52 -0800 (PST) Received: from [192.168.1.121] (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id w136sm11535761pfd.169.2018.12.08.21.38.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 08 Dec 2018 21:38:51 -0800 (PST) Subject: Re: [PATCH] nvme: default to 0 poll queues To: Guenter Roeck Cc: Christoph Hellwig , Keith Busch , Sagi Grimberg , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org References: <20181209004953.GA11638@roeck-us.net> From: Jens Axboe Message-ID: Date: Sat, 8 Dec 2018 22:38:47 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <20181209004953.GA11638@roeck-us.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/8/18 5:49 PM, Guenter Roeck wrote: > Hi, > > On Mon, Nov 19, 2018 at 08:18:24AM -0700, Jens Axboe wrote: >> We need a better way of configuring this, and given that polling is >> (still) a bit niche, let's default to using 0 poll queues. That way >> we'll have the same read/write/poll behavior as 4.20, and users that >> want to test/use polling are required to do manual configuration of the >> number of poll queues. >> >> Reviewed-by: Christoph Hellwig >> Signed-off-by: Jens Axboe >> --- > > This patch results in a boot stall when booting parisc (hppa) images > from nvme in qemu. > > ... > Fusion MPT SAS Host driver 3.04.20 > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > rcu: (detected by 0, t=5252 jiffies, g=141, q=22) > rcu: All QSes seen, last rcu_sched kthread activity 5252 (-66742--71994), jiffies_till_next_fqs=1, root ->qsmask 0x0 > kworker/u8:3 R running task 0 85 2 0x00000004 > Workqueue: nvme-reset-wq nvme_reset_work > Backtrace: > [<10190d20>] show_stack+0x28/0x38 > [<101dd1e0>] sched_show_task.part.3+0xc4/0x144 > [<101dd290>] sched_show_task+0x30/0x38 > [<10221e18>] rcu_check_callbacks+0x760/0x7a4 > > rcu: rcu_sched kthread starved for 5252 jiffies! g141 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > rcu: RCU grace-period kthread stack dump: > rcu_sched R running task 0 10 2 0x00000000 > Backtrace: > [<10995b1c>] __schedule+0x214/0x648 > [<10995f94>] schedule+0x44/0xa8 > [<1099a7c4>] schedule_timeout+0x114/0x1a0 > [<10220e70>] rcu_gp_kthread+0x744/0x968 > [<101d5438>] kthread+0x154/0x15c > [<1019501c>] ret_from_kernel_thread+0x1c/0x24 > > [ continued ] > > This is only seen in SMP configurations; non-SMP configurations are ok. > Reverting the patch fixes the problem. v4.20-rcX and earlier kernels > also boot without problems. > > For reference, here is the qemu command line. This is with qemu 3.0. > > qemu-system-hppa -kernel vmlinux -no-reboot \ > -snapshot \ > -device nvme,serial=foo,drive=d0 \ > -drive file=rootfs.ext2,if=none,format=raw,id=d0 \ > -append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0,115200 ' \ > -nographic -monitor null > > Please let me know if you need additional information. Hmm, I think the queue reduction case has a logic error. Actually there are two bugs: 1) Ensure we don't keep overwriting the queue count we ask for 2) Don't include poll_queues in the vectors we need Untested... And not super pretty. But does this work for you? diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 7732c4979a4e..fe00e19493ae 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2083,7 +2083,7 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int nr_io_queues) } } -static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues) +static int nvme_setup_irqs(struct nvme_dev *dev, int irq_queues, int pqueues) { struct pci_dev *pdev = to_pci_dev(dev->dev); int irq_sets[2]; @@ -2100,7 +2100,8 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues) * IRQ vector needs. */ do { - nvme_calc_io_queues(dev, nr_io_queues); + nvme_calc_io_queues(dev, irq_queues + pqueues); + pqueues = dev->io_queues[HCTX_TYPE_POLL]; irq_sets[0] = dev->io_queues[HCTX_TYPE_DEFAULT]; irq_sets[1] = dev->io_queues[HCTX_TYPE_READ]; if (!irq_sets[1]) @@ -2111,11 +2112,11 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues) * 1 + 1 queues, just ask for a single vector. We'll share * that between the single IO queue and the admin queue. */ - if (!(result < 0 && nr_io_queues == 1)) - nr_io_queues = irq_sets[0] + irq_sets[1] + 1; + if (!(result < 0 || irq_queues == 1)) + irq_queues = irq_sets[0] + irq_sets[1] + 1; - result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues, - nr_io_queues, + result = pci_alloc_irq_vectors_affinity(pdev, irq_queues, + irq_queues, PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd); /* @@ -2125,12 +2126,12 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues) * likely does not. Back down to ask for just one vector. */ if (result == -ENOSPC) { - nr_io_queues--; - if (!nr_io_queues) + irq_queues--; + if (!irq_queues) return result; continue; } else if (result == -EINVAL) { - nr_io_queues = 1; + irq_queues = 1; continue; } else if (result <= 0) return -EIO; @@ -2144,7 +2145,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) { struct nvme_queue *adminq = &dev->queues[0]; struct pci_dev *pdev = to_pci_dev(dev->dev); - int result, nr_io_queues; + int result, want_irqs, nr_io_queues, pqueues; unsigned long size; nr_io_queues = max_io_queues(); @@ -2185,7 +2186,20 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) */ pci_free_irq_vectors(pdev); - result = nvme_setup_irqs(dev, nr_io_queues); + /* + * If we don't get the number of IO queues we asked for, see if we + * need to adjust the number of poll queues down + */ + pqueues = poll_queues; + if (!pqueues) + want_irqs = nr_io_queues; + else if (pqueues >= nr_io_queues) { + want_irqs = 1; + pqueues = nr_io_queues - 1; + } else + want_irqs = nr_io_queues - pqueues; + + result = nvme_setup_irqs(dev, want_irqs, pqueues); if (result <= 0) return -EIO; -- Jens Axboe