Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932343AbbDQMpv (ORCPT ); Fri, 17 Apr 2015 08:45:51 -0400 Received: from [217.140.101.70] ([217.140.101.70]:40029 "EHLO foss.arm.com" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1752998AbbDQMps (ORCPT ); Fri, 17 Apr 2015 08:45:48 -0400 Message-ID: <55310050.7000003@arm.com> Date: Fri, 17 Apr 2015 13:45:04 +0100 From: Marc Zyngier Organization: ARM Ltd User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.4.0 MIME-Version: 1.0 To: Duc Dang CC: Feng Kan , Arnd Bergmann , "linux-pci@vger.kernel.org" , Liviu Dudau , "linux-kernel@vger.kernel.org" , "grant.likely@linaro.org" , Tanmay Inamdar , Bjorn Helgaas , linux-arm , Loc Ho Subject: Re: [PATCH v3 1/4] PCI: X-Gene: Add the APM X-Gene v1 PCIe MSI/MSIX termination driver References: <20150408084448.5f0793b7@why.wild-wind.fr.eu.org> <7923ef6ffd60a7c3de3c8c04fe746bbd3c15b5ac.1428564115.git.dhdang@apm.com> <55280665.9020603@arm.com> <7a739a984c1b849ae92cb3bb3edc92f2@www.loen.fr> <20150415091655.02361cdc@arm.com> <5530DD9D.9020808@arm.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12069 Lines: 278 On 17/04/15 13:37, Duc Dang wrote: > On Fri, Apr 17, 2015 at 3:17 AM, Marc Zyngier wrote: >> On 17/04/15 11:00, Duc Dang wrote: >>> On Wed, Apr 15, 2015 at 1:16 AM, Marc Zyngier wrote: >>>> On Tue, 14 Apr 2015 19:20:19 +0100 >>>> Duc Dang wrote: >>>> >>>>> On Sat, Apr 11, 2015 at 5:06 AM, Marc Zyngier >>>>> wrote: >>>>>> On 2015-04-11 00:42, Duc Dang wrote: >>>>>>> >>>>>>> On Fri, Apr 10, 2015 at 10:20 AM, Marc Zyngier >>>>>>> wrote: >>>>>>>> >>>>>>>> On 09/04/15 18:05, Duc Dang wrote: >>>>>>>>> >>>>>>>>> X-Gene v1 SoC supports total 2688 MSI/MSIX vectors coalesced into >>>>>>>>> 16 HW IRQ lines. >>>>>>>>> >>>>>>>>> Signed-off-by: Duc Dang >>>>>>>>> Signed-off-by: Tanmay Inamdar >>>>>>>>> --- >>>>>>>>> drivers/pci/host/Kconfig | 6 + >>>>>>>>> drivers/pci/host/Makefile | 1 + >>>>>>>>> drivers/pci/host/pci-xgene-msi.c | 407 >>>>>>>>> +++++++++++++++++++++++++++++++++++++++ >>>>>>>>> drivers/pci/host/pci-xgene.c | 21 ++ >>>>>>>>> 4 files changed, 435 insertions(+) >>>>>>>>> create mode 100644 drivers/pci/host/pci-xgene-msi.c >>>>>>>>> >>>>>>>>> diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig >>>>>>>>> index 7b892a9..c9b61fa 100644 >>>>>>>>> --- a/drivers/pci/host/Kconfig >>>>>>>>> +++ b/drivers/pci/host/Kconfig >>>>>>>>> @@ -89,11 +89,17 @@ config PCI_XGENE >>>>>>>>> depends on ARCH_XGENE >>>>>>>>> depends on OF >>>>>>>>> select PCIEPORTBUS >>>>>>>>> + select PCI_MSI_IRQ_DOMAIN if PCI_MSI >>>>>>>>> + select PCI_XGENE_MSI if PCI_MSI >>>>>>>>> help >>>>>>>>> Say Y here if you want internal PCI support on APM >>>>>>>>> X-Gene SoC. There are 5 internal PCIe ports available. Each port >>>>>>>>> is GEN3 capable >>>>>>>>> and have varied lanes from x1 to x8. >>>>>>>>> >>>>>>>>> +config PCI_XGENE_MSI >>>>>>>>> + bool "X-Gene v1 PCIe MSI feature" >>>>>>>>> + depends on PCI_XGENE && PCI_MSI >>>>>>>>> + >>>>>>>>> config PCI_LAYERSCAPE >>>>>>>>> bool "Freescale Layerscape PCIe controller" >>>>>>>>> depends on OF && ARM >>>>>>>>> diff --git a/drivers/pci/host/Makefile >>>>>>>>> b/drivers/pci/host/Makefile index e61d91c..f39bde3 100644 >>>>>>>>> --- a/drivers/pci/host/Makefile >>>>>>>>> +++ b/drivers/pci/host/Makefile >>>>>>>>> @@ -11,5 +11,6 @@ obj-$(CONFIG_PCIE_SPEAR13XX) += >>>>>>>>> pcie-spear13xx.o obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone-dw.o >>>>>>>>> pci-keystone.o obj-$(CONFIG_PCIE_XILINX) += pcie-xilinx.o >>>>>>>>> obj-$(CONFIG_PCI_XGENE) += pci-xgene.o >>>>>>>>> +obj-$(CONFIG_PCI_XGENE_MSI) += pci-xgene-msi.o >>>>>>>>> obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o >>>>>>>>> obj-$(CONFIG_PCI_VERSATILE) += pci-versatile.o >>>>>>>>> diff --git a/drivers/pci/host/pci-xgene-msi.c >>>>>>>>> b/drivers/pci/host/pci-xgene-msi.c >>>>>>>>> new file mode 100644 >>>>>>>>> index 0000000..4f0ff42 >>>>>>>>> --- /dev/null >>>>>>>>> +++ b/drivers/pci/host/pci-xgene-msi.c >>>>>>>>> @@ -0,0 +1,407 @@ >>>>>>>>> +/* >>>>>>>>> + * APM X-Gene MSI Driver >>>>>>>>> + * >>>>>>>>> + * Copyright (c) 2014, Applied Micro Circuits Corporation >>>>>>>>> + * Author: Tanmay Inamdar >>>>>>>>> + * Duc Dang >>>>>>>>> + * >>>>>>>>> + * This program is free software; you can redistribute it >>>>>>>>> and/or modify it >>>>>>>>> + * under the terms of the GNU General Public License as >>>>>>>>> published by the >>>>>>>>> + * Free Software Foundation; either version 2 of the License, >>>>>>>>> or (at your >>>>>>>>> + * option) any later version. >>>>>>>>> + * >>>>>>>>> + * This program is distributed in the hope that it will be >>>>>>>>> useful, >>>>>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty >>>>>>>>> of >>>>>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>>>>>>>> + * GNU General Public License for more details. >>>>>>>>> + */ >>>>>>>>> +#include >>>>>>>>> +#include >>>>>>>>> +#include >>>>>>>>> +#include >>>>>>>>> +#include >>>>>>>>> +#include >>>>>>>>> +#include >>>>>>>>> + >>>>>>>>> +#define MSI_INDEX0 0x000000 >>>>>>>>> +#define MSI_INT0 0x800000 >>>>>>>>> + >>>>>>>>> +struct xgene_msi_settings { >>>>>>>>> + u32 index_per_group; >>>>>>>>> + u32 irqs_per_index; >>>>>>>>> + u32 nr_msi_vec; >>>>>>>>> + u32 nr_hw_irqs; >>>>>>>>> +}; >>>>>>>>> + >>>>>>>>> +struct xgene_msi { >>>>>>>>> + struct device_node *node; >>>>>>>>> + struct msi_controller mchip; >>>>>>>>> + struct irq_domain *domain; >>>>>>>>> + struct xgene_msi_settings *settings; >>>>>>>>> + u32 msi_addr_lo; >>>>>>>>> + u32 msi_addr_hi; >>>>>>>> >>>>>>>> >>>>>>>> I'd rather see the mailbox address directly, and only do the >>>>>>>> split when assigning it to the message (you seem to play all kind >>>>>>>> of tricks on the address anyway). >>>>>>> >>>>>>> >>>>>>> msi_addr_lo and msi_addr_hi store the physical base address of MSI >>>>>>> controller registers. I will add comment to clarify this. >>>>>> >>>>>> >>>>>> What I mean is that there is no point in keeping this around as a >>>>>> pair of 32bit variables. You'd better keep it as a single 64bit, >>>>>> and do the split when assigning it the the MSI message. >>>>> >>>>> Hi Marc, >>>>> >>>>> These came from device-tree (which describes 64-bit address number as >>>>> 2 32-bit words). >>>> >>>> ... and converted to a resource as a 64bit word, on which you apply >>>> {upper,lower}_32_bit(). So much for DT... >>>> >>>>> If I store them this way, I don't need CPU cycles to do the split >>>>> every time assigning them to the MSI message. Please let me know what >>>>> do you think about it. >>>> >>>> This is getting absolutely silly. >>>> >>>> How many cycles does it take to execute "lsr x1, x0, #32" on X-Gene? If >>>> it takes so long that it is considered to be a bottleneck, I suggest >>>> you go and design a better CPU (hint: the answer is probably 1 cycle >>>> absolutely everywhere). >>>> >>>> How often are you configuring MSIs in the face of what is happening in >>>> the rest of the kernel? Almost never! >>>> >>>> So, given that "never" times 1 is still never, I'll consider that >>>> readability of the code trumps it anytime (I can't believe we're having >>>> that kind of conversation...). >>>> >>> I changed to use u64 for msi_addr and split it when composing MSI messages. >>> The change is in v4 of the patch set that I just posted. >>>>>> >>>>>> [...] >>>>>> >>>>>>>>> +static int xgene_msi_set_affinity(struct irq_data *irq_data, >>>>>>>>> + const struct cpumask *mask, >>>>>>>>> bool force) >>>>>>>>> +{ >>>>>>>>> + struct xgene_msi *msi = >>>>>>>>> irq_data_get_irq_chip_data(irq_data); >>>>>>>>> + unsigned int gic_irq; >>>>>>>>> + int ret; >>>>>>>>> + >>>>>>>>> + gic_irq = msi->msi_virqs[irq_data->hwirq % >>>>>>>>> msi->settings->nr_hw_irqs]; >>>>>>>>> + ret = irq_set_affinity(gic_irq, mask); >>>>>>>> >>>>>>>> >>>>>>>> Erm... This as the effect of moving *all* the MSIs hanging off >>>>>>>> this interrupt to another CPU. I'm not sure that's an acceptable >>>>>>>> effect... What if another MSI requires a different affinity? >>>>>>> >>>>>>> >>>>>>> We have 16 'real' hardware IRQs. Each of these has multiple MSIs >>>>>>> attached to it. >>>>>>> So this will move all MSIs handing off this interrupt to another >>>>>>> CPU; and we don't support different affinity settings for >>>>>>> different MSIs that are attached to the same hardware IRQ. >>>>>> >>>>>> >>>>>> Well, that's a significant departure from the expected behaviour. >>>>>> In other words, I wonder how useful this is. Could you instead >>>>>> reconfigure the MSI itself to hit the right CPU (assuming you don't >>>>>> have more than 16 CPUs and if >>>>>> that's only for XGene-1, this will only be 8 at most)? This would >>>>>> reduce your number of possible MSIs, but at least the semantics of >>>>>> set_afinity would be preserved. >>>>> >>>>> X-Gene-1 supports 2688 MSIs that are divided into 16 groups, each >>>>> group has 168 MSIs that are mapped to 1 hardware GIC IRQ (so we have >>>>> 16 hardware GIC IRQs for 2688 MSIs). >>>> >>>> We've already established that. >>>> >>>>> Setting affinity of single MSI to deliver it to a target CPU will move >>>>> all the MSIs mapped to the same GIC IRQ to that CPU as well. This is >>>>> not a standard behavior, but limiting the total number of MSIs will >>>>> cause a lot of devices to fall back to INTx (with huge performance >>>>> penalty) or even fail to load their driver as these devices request >>>>> more than 16 MSIs during driver initialization. >>>> >>>> No, I think you got it wrong. If you have 168 MSIs per GIC IRQ, and >>>> provided that you have 8 CPUs (XGene-1), you end up with 336 MSIs per >>>> CPU (having statically assigned 2 IRQs per CPU). >>>> >>>> Assuming you adopt my scheme, you still have a grand total of 336 MSIs >>>> that can be freely moved around without breaking any userspace >>>> expectations. >>>> >>> Thanks Marc. This is a very good idea. >>> >>> But to move MSIs around, I need to change MSI termination address and data >>> and write them to device configuration space. This may cause problems >>> if the device >>> fires an interrupt at the same time when I do the config write? >>> >>> What is your opinion here? >> >> There is an inherent race when changing the affinity of any interrupt, >> whether that's an MSI or not. The kernel is perfectly prepared to handle >> such a situation (to be honest, the kernel doesn't really care). >> >> The write to the config space shouldn't be a concern. You will either >> hit the old *or* the new CPU, but that race is only during the write >> itself (you can read back from the config space if you're really >> paranoid). By the time you return from this read/write, the device will >> be reconfigured. >> >>>> I think that 336 MSIs is a fair number (nowhere near the 16 you claim). >>>> Most platforms are doing quite well with that kind of numbers. Also, >>>> you don't have to allocate all the MSIs a device can possibly claim (up >>>> to 2048 MSI-X per device), as they are all perfectly capable of using >>>> less MSI without having to fallback to INTx). >>>> >>>>> I can document the limitation in affinity setting of X-Gene-1 MSI in >>>>> the driver to hopefully not make people surprise and hope to keep the >>>>> total number of supported MSI as 2688 so that we can support as many >>>>> cards that require MSI/MSI-X as possible. >>>> >>>> I don't think this is a valid approach. This breaks userspace (think of >>>> things like irqbalance), and breaks the SMP affinity model that Linux >>>> uses. No amount of documentation is going to solve it, so I think you >>>> just have to admit that the HW is mis-designed and do the best you can >>>> to make it work like Linux expect it to work. >>>> >>>> The alternative would to disable affinity setting altogether instead of >>>> introducing these horrible side effects. >>>> >>> I have it disabled (set_affinity does nothing) in my v4 patch. >> >> It would be good if you could try the above approach. It shouldn't be >> hard to write, and it would be a lot better than just ignoring the problem. >> > Yes. I am working on this change. In which case, I'll wait for a v5. No need to spend time on something that is going to change anyway. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/