Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp1098374imj; Thu, 14 Feb 2019 01:00:15 -0800 (PST) X-Google-Smtp-Source: AHgI3IZ948W3M/5eK27gQvc8kFBhQpt++s1W3wa3zR1AL62wKZPYx5THUFUYblR6gin43jJfQg8C X-Received: by 2002:a63:6244:: with SMTP id w65mr2698706pgb.300.1550134815288; Thu, 14 Feb 2019 01:00:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550134815; cv=none; d=google.com; s=arc-20160816; b=B1+ngfRCvK0KYUHq/ggnvd+FZNdtwi3SmH4wN8Fry5i8T8Xh7tSBpj5kdn4n2TqQrR CmVq8BRuuFIBfDtb/grOzBgyLWgb4VTioI1sWv58veSD4m+MNV9g4aLkXnt43t2qUv4v iuTIAc5yYBOH7K5YW6Dpgh9OzRCnsQQhaF+piUY2kqX37AAzOfpmaSTdYV1Fd7ejOFrA MnEL61+1adkKLfXTS0GBoBqGWOUZ0nwSfUDge8w05kqCVZ2a7fd5AywuqFZvKOyIwEtq h7rMDZG/N33OE4P9mDn1ra4zpD10F8CgSoiq2PJ+O8GYLIJekz33r7v4gk13WKZqDGHX AsCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=LoFCREty2LBq2F4cZsVeACZbiyFdIyuJ3c4cwocZj8g=; b=OxhiZbtl/UTURHCpJGapcK0DD1A6dKEnO9m+w8CdoHt3hKivEBEkLV/vGgfcahDn94 tCFFgGov1aUvAsktlDGSt8u/p37W8gKt8Fw0yeYmfC4seqnqau8vwAc+vVjkj3AVeVOM /ajNNz6NgrWajiwxOIMZ7fxGj8/Lzorel4WiWR8bLJXsSlEHeLI1yNqUygPJJM2i/GId jOURmWCOwJPyJGFsue3ZOOmsZdQor7qr+ahQrZTKRXikb9ykikkMJP5v93vPqnD8Hp3Q EfrtLdmBSBmvQnljI/yTa1dcDdZX9x5DY9JMoABNuKWXmJdYdbQ0GLXtcoCmqhr/UekX hpuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a23si1853659pls.384.2019.02.14.00.59.59; Thu, 14 Feb 2019 01:00:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394978AbfBMVmN (ORCPT + 99 others); Wed, 13 Feb 2019 16:42:13 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:47796 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389739AbfBMVmM (ORCPT ); Wed, 13 Feb 2019 16:42:12 -0500 Received: from p5492e0d8.dip0.t-ipconnect.de ([84.146.224.216] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1gu2Hz-0006JT-KB; Wed, 13 Feb 2019 22:41:55 +0100 Date: Wed, 13 Feb 2019 22:41:55 +0100 (CET) From: Thomas Gleixner To: Keith Busch cc: Bjorn Helgaas , Jens Axboe , Sagi Grimberg , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, Ming Lei , linux-block@vger.kernel.org, Christoph Hellwig Subject: Re: [PATCH V3 1/5] genirq/affinity: don't mark 'affd' as const In-Reply-To: <20190213213149.GB8027@localhost.localdomain> Message-ID: References: <20190213105041.13537-1-ming.lei@redhat.com> <20190213105041.13537-2-ming.lei@redhat.com> <20190213150407.GB96272@google.com> <20190213213149.GB8027@localhost.localdomain> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 13 Feb 2019, Keith Busch wrote: > On Wed, Feb 13, 2019 at 09:56:36PM +0100, Thomas Gleixner wrote: > > On Wed, 13 Feb 2019, Bjorn Helgaas wrote: > > > On Wed, Feb 13, 2019 at 06:50:37PM +0800, Ming Lei wrote: > > > > We have to ask driver to re-caculate set vectors after the whole IRQ > > > > vectors are allocated later, and the result needs to be stored in 'affd'. > > > > Also both the two interfaces are core APIs, which should be trusted. > > > > > > s/re-caculate/recalculate/ > > > s/stored in 'affd'/stored in '*affd'/ > > > s/both the two/both/ > > > > > > This is a little confusing because you're talking about both "IRQ > > > vectors" and these other "set vectors", which I think are different > > > things. I assume the "set vectors" are cpumasks showing the affinity > > > of the IRQ vectors with some CPUs? > > > > I think we should drop the whole vector wording completely. > > > > The driver does not care about vectors, it only cares about a block of > > interrupt numbers. These numbers are kernel managed and the interrupts just > > happen to have a CPU vector assigned at some point. Depending on the CPU > > architecture the underlying mechanism might not even be named vector. > > Perhaps longer term we could move affinity mask creation from the irq > subsystem into a more generic library. Interrupts aren't the only > resource that want to spread across CPUs. For example, blk-mq has it's > own implementation to for polled queues, so I think a non-irq specific > implementation would be a nice addition to the kernel lib. Agreed. There is nothing interrupt specific in that code aside of some name choices. Btw, while I have your attention. There popped up an issue recently related to that affinity logic. The current implementation fails when: /* * If there aren't any vectors left after applying the pre/post * vectors don't bother with assigning affinity. */ if (nvecs == affd->pre_vectors + affd->post_vectors) return NULL; Now the discussion arised, that in that case the affinity sets are not allocated and filled in for the pre/post vectors, but somehow the underlying device still works and later on triggers the warning in the blk-mq code because the MSI entries do not have affinity information attached. Sure, we could make that work, but there are several issues: 1) irq_create_affinity_masks() has another reason to return NULL: memory allocation fails. 2) Does it make sense at all. Right now the PCI allocator ignores the NULL return and proceeds without setting any affinities. As a consequence nothing is managed and everything happens to work. But that happens to work is more by chance than by design and the warning is bogus if this is an expected mode of operation. We should address these points in some way. Thanks, tglx