Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp344590imj; Sat, 16 Feb 2019 01:29:54 -0800 (PST) X-Google-Smtp-Source: AHgI3IbjDIdAlcvvZ2c7wSpBHI82PFz2wtimYDRTZUH7AzaloQ7jEeYzaVEw68CzvVqz18KpRzDN X-Received: by 2002:a17:902:c05:: with SMTP id 5mr14740332pls.155.1550309394903; Sat, 16 Feb 2019 01:29:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550309394; cv=none; d=google.com; s=arc-20160816; b=B2GROjpr2J9vuq8+Ye/Shpq6rueNI9X5suy26UgqnplFjTLvioioyzGLOp4C5KUu/M gDS0+kgIQFi9Xea95M5zVlaNcBtqpWFLuOw4E43rkclpZn6er3ztpZfjg2EAqdjv+YzE hzt6zY/wv18yW5Gmryl7dmF4Q/NdQvAOj5kJ3q4DOaqckBygnTZMEZvHrz/7NFeAJJn7 gm3Hlr6UoUN6ol9zjFYsK+z/kLe+2f8UjZ7lzWBzub0frlBKAOo0IId40OIHBrzwBXLv 0/9UHLqvCqyUnDwvHuuYWtAOWIU5Eqrt54uIbZXx/Y0zPalJHBTbALAWkHnUfzTX43mk Ol9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=fJbqFjmHkPkHo/H2GYv32vzDbXJIs/SOEBGaQWpr8yk=; b=HATBlBIUmDzbZ/9ltpLwkI7WQ93B3EYdcE1wjw9A/awiK59fDGvx+tgcF81eeor+ZL rjrf6+15ZwRoMouJTSWljZO1gyY7vTnAaAdBfRfnTo6lUupcqMA7kCpUZCdp3fGhXSnH H4knhmqW4O+YjaoiwMbIAQZleklrqR572iVBj5HBqv8edJh7ScGikTlA4TMoknzNceAf O6VLlWC5WjGlWevmaNtMQHMcCSKNauCWWS7v1Fe5co4hWyCPW2aYm3kDxvZEH/Z/F+jx HzLWZQmzPZU15QuD2bMJY/bnJNo2gYxB38CIP+yFhfZ5dnVOAZSdvkiLYhNJ6/g3UGOC c1rw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d38si7885554pla.207.2019.02.16.01.29.38; Sat, 16 Feb 2019 01:29:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394298AbfBOXAq (ORCPT + 99 others); Fri, 15 Feb 2019 18:00:46 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:53897 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388553AbfBOXAp (ORCPT ); Fri, 15 Feb 2019 18:00:45 -0500 Received: from p5492e0d8.dip0.t-ipconnect.de ([84.146.224.216] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1gumT8-0005pQ-0r; Sat, 16 Feb 2019 00:00:30 +0100 Date: Sat, 16 Feb 2019 00:00:29 +0100 (CET) From: Thomas Gleixner To: Ming Lei cc: LKML , Christoph Hellwig , Bjorn Helgaas , Jens Axboe , linux-block@vger.kernel.org, Sagi Grimberg , linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, Keith Busch , Marc Zyngier , Sumit Saxena , Kashyap Desai , Shivasharan Srikanteshwara Subject: Re: [patch V5 4/8] nvme-pci: Simplify interrupt allocation In-Reply-To: Message-ID: References: <20190214204755.819014197@linutronix.de> <20190214211759.699390983@linutronix.de> <20190214224136.GA18764@ming.t460p> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 15 Feb 2019, Thomas Gleixner wrote: > On Fri, 15 Feb 2019, Ming Lei wrote: > > > + * If only one interrupt is available, combine write and read > > > + * queues. If 'write_queues' is set, ensure it leaves room for at > > > + * least one read queue. > > > + */ > > > + if (nrirqs == 1) > > > + nr_read_queues = 0; > > > + else if (write_queues >= nrirqs) > > > + nr_read_queues = nrirqs - 1; > > > + else > > > + nr_read_queues = nrirqs - write_queues; > > > + > > > + dev->io_queues[HCTX_TYPE_DEFAULT] = nrirqs - nr_read_queues; > > > + affd->set_size[HCTX_TYPE_DEFAULT] = nrirqs - nr_read_queues; > > > + dev->io_queues[HCTX_TYPE_READ] = nr_read_queues; > > > + affd->set_size[HCTX_TYPE_READ] = nr_read_queues; > > > + affd->nr_sets = nr_read_queues ? 2 : 1; > > > } > > > > .calc_sets is called only if more than .pre_vectors is available, > > then dev->io_queues[HCTX_TYPE_DEFAULT] may not be set in case of > > (nvecs == affd->pre_vectors + affd->post_vectors). > > Hmm, good catch. The delta patch below should fix that, but I have to go > through all the possible cases in pci_alloc_irq_vectors_affinity() once > more with brain awake. The existing logic in the driver is somewhat strange. If there is only a single interrupt available, i.e. no separation of admin and I/O queue, then dev->io_queues[HCTX_TYPE_DEFAULT] is still set to 1. Now with the callback scheme (independent of my changes) that breaks in various ways: 1) irq_create_affinity_masks() bails out early: if (nvecs == affd->pre_vectors + affd->post_vectors) return NULL; So the callback won't be invoked and the last content of dev->io_queues is preserved. By chance this might end up with dev->io_queues[HCTX_TYPE_DEFAULT] = 1; dev->io_queues[HCTX_TYPE_READ] = 0; but that does not happen by design. 2) pci_alloc_irq_vectors_affinity() has the following flow: __pci_enable_msix_range() for (...) { __pci_enable_msix() } If that fails because MSIX is not supported, then the affinity spreading code has not been called yet and neither the callback. Now it goes on and tries MSI, which is the same as the above. When that fails because MSI is not supported, same situation as above and the code tries to allocate the legacy interrupt. Now with the initial initialization that case is covered, but not the following error case: Assume MSIX is enabled, but __pci_enable_msix() fails with -ENOMEM after having called irq_create_affinity_masks() and subsequently the callback with maxvecs. Then pci_alloc_irq_vectors_affinity() will try MSI and fail _BEFORE_ the callback is invoked. The next step is to install the leagcy interrupt, but that does not invoke the affinity code and neither the callback. At the end dev->io_queues[*] are still initialized with the failed attempt of enabling MSIX based on maxvecs. Result: inconsistent state ... I have an idea how to fix that, but that's again a task for brain being awake. Will take care of that tomorrow morning. Thanks, tglx