Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp3461973pxp; Tue, 8 Mar 2022 15:06:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJxFmVFlu076NB0HobZVdxNq/MFVO8YE/zkrl58qIN0VTxMK9jrH7A41OtS2BC7yJgqm5aRq X-Received: by 2002:a17:902:e84b:b0:151:ca71:7d3d with SMTP id t11-20020a170902e84b00b00151ca717d3dmr19821608plg.26.1646780778983; Tue, 08 Mar 2022 15:06:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646780778; cv=none; d=google.com; s=arc-20160816; b=Dcqq0wknq04S4p+/R+PsxrIK3uSguaQnKNY2QsfcxTSGRO76U90aJo8rhxsYl/wwnD zXNuqbwb3U1EW/0TD3SoWtnkRQQMpJXo2zfcOzoEJ6CpqMpKaZZxsmxoZXeF7j7UrERD iXGvVznC0hYKOqNNWcDlbM3L2hWDOV64e/EHqubYxWzacQFzn3Qj8GgHVdk6f5ONAOw6 CEk1UkTcEF5Qmd1kJ/tiscEdgHPElO2fQopu+X/t0vMyvTQFPDlJpM1M5BlrPeEnmq8w vHP5fdo+MVK9kuyPIYDJlI2jmm9aNvE7G76oCL6liOuYdkb2d4K5XRacoh2sCrgQO61w WhVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=YSzf3uJ53RieYzkzVMcrNRY+Vk5BdJy6BCSNc9AuhGM=; b=bMXdjnEjGPRIsXd7yKXb2SUstcB6HRwmFQmNU0OoYm7q+BL3KT/Mlzore6BKQZNYxh qe73jGVZFa2rEp1/lviYprASkvkXpBfxI+MWaKT04Nc6qXapbw97L699EtlXvtgzeHJp 6VeQT9AAsh3Ljaq8AmaUCqCqar83ns5Ktye2PLnRIimPv9CY2AZaM3Bsk8VjAIRRFUg/ 2Oh3u9tJv1SVtJqRvnEqeNX+71732xgCsw2rPws+VwaS2O2+ijmRDSmA1sDdLCnFD9xC Tk6clCWA0HxcmlNNulkcJP3b2BNT1kQy/Tp0bYW/YnfQCW5nEameL7msL+bHYAYZGZkq czNA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id t15-20020a62780f000000b004e0cfd7ebf0si145528pfc.321.2022.03.08.15.06.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 15:06:18 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4F3D6606E3; Tue, 8 Mar 2022 15:05:16 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241746AbiCHD6i (ORCPT + 99 others); Mon, 7 Mar 2022 22:58:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231656AbiCHD6g (ORCPT ); Mon, 7 Mar 2022 22:58:36 -0500 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C32966541 for ; Mon, 7 Mar 2022 19:57:37 -0800 (PST) Received: from dggpemm500023.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4KCM1S5fqlzBrhb; Tue, 8 Mar 2022 11:55:40 +0800 (CST) Received: from dggpemm500002.china.huawei.com (7.185.36.229) by dggpemm500023.china.huawei.com (7.185.36.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 8 Mar 2022 11:57:35 +0800 Received: from [10.174.179.5] (10.174.179.5) by dggpemm500002.china.huawei.com (7.185.36.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 8 Mar 2022 11:57:34 +0800 Subject: Re: PCI MSI issue for maxcpus=1 To: John Garry , Marc Zyngier CC: Thomas Gleixner , chenxiang , Shameer Kolothum , "linux-kernel@vger.kernel.org" , "liuqi (BA)" , "David Decotigny" References: <78615d08-1764-c895-f3b7-bfddfbcbdfb9@huawei.com> <87a6g8vp8k.wl-maz@kernel.org> <19d55cdf-9ef7-e4a3-5ae5-0970f0d7751b@huawei.com> <87v8yjyjc0.wl-maz@kernel.org> <87k0ey9122.wl-maz@kernel.org> <5f529b4e-1f6c-5a7d-236c-09ebe3a7db29@huawei.com> <1cbe7daa-8003-562b-06fa-5a50f7ee6ed2@huawei.com> <87a6e4tnkm.wl-maz@kernel.org> <452d97ed-459f-7936-99e4-600380608615@huawei.com> From: Xiongfeng Wang Message-ID: <645767eb-c5a5-cafa-eb1e-b8d999484ea8@huawei.com> Date: Tue, 8 Mar 2022 11:57:33 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <452d97ed-459f-7936-99e4-600380608615@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.5] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemm500002.china.huawei.com (7.185.36.229) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 2022/3/7 21:48, John Garry wrote: > Hi Marc, > >> >> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c >> index 2bdfce5edafd..97e9eb9aecc6 100644 >> --- a/kernel/irq/msi.c >> +++ b/kernel/irq/msi.c >> @@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int >> virq, unsigned int vflag >>       if (!(vflags & VIRQ_ACTIVATE)) >>           return 0; >>   +    if (!(vflags & VIRQ_CAN_RESERVE)) { >> +        /* >> +         * If the interrupt is managed but no CPU is available >> +         * to service it, shut it down until better times. >> +         */ >> +        if (irqd_affinity_is_managed(irqd) && >> +            !cpumask_intersects(irq_data_get_affinity_mask(irqd), >> +                    cpu_online_mask)) { >> +            irqd_set_managed_shutdown(irqd); >> +            return 0; >> +        } >> +    } >> + >>       ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE); >>       if (ret) >>           return ret; >> I applied the above modification and add kernel parameter 'maxcpus=1'. It can boot successfully on D06. Then I remove 'maxcpus=1' and add 'nohz_full=5-127 isolcpus=nohz,domain,managed_irq,5-127'. The 'effective_affinity' of the kernel managed irq is not correct. [root@localhost wxf]# cat /proc/interrupts | grep 350 350: 0 0 0 0 0 522 (ignored info) 0 0 0 ITS-MSI 60882972 Edge hisi_sas_v3_hw cq [root@localhost wxf]# cat /proc/irq/350/smp_affinity 00000000,00000000,00000000,000000ff [root@localhost wxf]# cat /proc/irq/350/effective_affinity 00000000,00000000,00000000,00000020 Then I apply the following modification. Refer to https://lore.kernel.org/all/87a6fl8jgb.wl-maz@kernel.org/ The 'effective_affinity' is correct now. diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index eb0882d15366..0cea46bdaf99 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1620,7 +1620,7 @@ static int its_select_cpu(struct irq_data *d, cpu = cpumask_pick_least_loaded(d, tmpmask); } else { - cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask); + cpumask_copy(tmpmask, aff_mask); /* If we cannot cross sockets, limit the search to that node */ if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) && Then I add both kernel parameters. nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127 maxcpus=1 It crashed with the following message. [ 51.813803][T21132] cma_alloc: 29 callbacks suppressed [ 51.813809][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4 pages, ret: -12 [ 51.897537][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8 pages, ret: -12 [ 52.014432][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4 pages, ret: -12 [ 52.067313][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8 pages, ret: -12 [ 52.180011][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4 pages, ret: -12 [ 52.270846][ T0] Detected VIPT I-cache on CPU1 [ 52.275541][ T0] GICv3: CPU1: found redistributor 80100 region 1:0x00000000ae140000 [ 52.283425][ T0] GICv3: CPU1: using allocated LPI pending table @0x00000040808b0000 [ 52.291381][ T0] CPU1: Booted secondary processor 0x0000080100 [0x481fd010] [ 52.432971][ T0] Detected VIPT I-cache on CPU101 [ 52.437914][ T0] GICv3: CPU101: found redistributor 390100 region 101:0x00002000aa240000 [ 52.446233][ T0] GICv3: CPU101: using allocated LPI pending table @0x0000004081170000 [ 52.ULL pointer dereference at virtual address 00000000000000a0 [ 52.471539][T24563] Mem abort info: [ 52.475011][T24563] ESR = 0x96000044 [ 52.478742][T24563] EC = 0x25: DABT (current EL), IL = 32 bits [ 52.484721][T24563] SET = 0, FnV = 0 [ 52.488451][T24563] EA = 0, S1PTW = 0 [ 52.492269][T24563] FSC = 0x04: level 0 translation fault [ 52.497815][T24563] Data abort info: [ 52.501374][T24563] ISV = 0, ISS = 0x00000044 [ 52.505884][T24563] CM = 0, WnR = 1 [ 52.509530][T24563] [00000000000000a0] user address but active_mm is swapper [ 52.516548][T24563] Internal error: Oops: 96000044 [#1] SMP [ 52.522096][T24563] Modules linked in: ghash_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt hns_roce_hw_v2 vfat fat ib_uverbs ib_core ipmi_ssif sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu hisi_uncore_pmu ip_tables xfs libcrc32c sd_mod realtek hclge nvme hisi_sas_v3_hw nvme_core hisi_sas_main t10_pi libsas ahci libahci hns3 scsi_transport_sas libata hnae3 i2c_designware_platform i2c_designware_core nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod [ 52.567181][T24563] CPU: 101 PID: 24563 Comm: cpuhp/101 Not tainted 5.17.0-rc7+ #5 [ 52.574716][T24563] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDD, BIOS 1.79 08/21/2021 [ 52.583547][T24563] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 52.591170][T24563] pc : lpi_update_config+0xe0/0x300 [ 52.59620000ce6bb90 x28: 0000000000000000 x27: 0000000000000060 [ 52.613021][T24563] x26: ffff20800798b818 x25: 0000000000002781 x24: ffff80000962f460 [ 52.620815][T24563] x23: 0000000000000000 x22: 0000000000000060 x21: ffff80000962ec58 [ 52.628610][T24563] x20: ffff20800633b540 x19: ffff208007946e00 x18: 0000000000000000 [ 52.636404][T24563] x17: 3731313830343030 x16: 3030303078304020 x15: 0000000000000000 [ 52.644199][T24563] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 52.651993][T24563] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff80000867a99c [ 52.659788][T24563] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff800008d3dda0 [ 52.667582][T24563] x5 : ffff800028e00000 x4 : 0000000000000000 x3 : ffff20be7f837780 [ 52.675376][T24563] x2 : 0000000000000001 x1 : 00000000000000a0 x0 : 0000000000000000 [ 52.683170][T24563] Call trace: [ 52.686298][T24563] lpi_update_config+0xe0/0x300 [ 52.690982][T24563] its_unmask_irq+0x34/0x68 [ 52.695318][T24563] irq_chip_unmask_parent+0x20/0x28 [ 52.700349][T24563] its_unmask_msi_irq+0x24/0x30 [ 52.705032][T24563] unmask_irq.part.0+0x2c/0x48 [ 52.709630][T24563] irq_enable+0x70/0x80 [ 52.713623][T24563] __irq_startup+0x7c/0xa8 [ 52.717875][T24563] irq_startup+0x134/0x158 [ 52.722127][T24563] irq_affinity_online_cpu+0x1c0/0x210 [ 52.727415][T24563] cpuhp_invoke_callback+0x14c/0x590 [ 52.732533][T24563] cpuhp_thread_fun+0xd4/0x188 [ 52.737130][T24563] 52.749890][T24563] Code: f94002a0 8b000020 f9400400 91028001 (f9000039) [ 52.756649][T24563] ---[ end trace 0000000000000000 ]--- [ 52.787287][T24563] Kernel panic - not syncing: Oops: Fatal exception [ 52.793701][T24563] SMP: stopping secondary CPUs [ 52.798309][T24563] Kernel Offset: 0xb0000 from 0xffff800008000000 [ 52.804462][T24563] PHYS_OFFSET: 0x0 [ 52.808021][T24563] CPU features: 0x00,00000803,46402c40 [ 52.813308][T24563] Memory Limit: none [ 52.841424][T24563] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- Then I only add kernel parameter 'maxcpus=1. It also crash with the same Call Trace. Then I add the cpu_online_mask check like below. Add both kernel parameters. It won't crash now. diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index d25b7a864bbb..17c15d3b2784 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1624,7 +1624,10 @@ static int its_select_cpu(struct irq_data *d, cpu = cpumask_pick_least_loaded(d, tmpmask); } else { - cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask); + cpumask_and(tmpmask, aff_mask, cpu_online_mask); + if (cpumask_empty(tmpmask)) + cpumask_and(tmpmask, irq_data_get_affinity_mask(d), + cpu_online_mask); /* If we cannot cross sockets, limit the search to that node */ if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) && Thanks, Xiongfeng > > Yeah, that seems to solve the issue. I will test it a bit more. > > We need to check the isolcpus cmdline issue as well - wang xiongfeng, please > assist here. I assume that this feature just never worked for arm64 since it was > added. > >> With this in place, I get the following results (VM booted with 4 >> vcpus and maxcpus=1, the virtio device is using managed interrupts): >> >> root@debian:~# cat /proc/interrupts >>             CPU0 >>   10:       2298     GICv3  27 Level     arch_timer >>   12:         84     GICv3  33 Level     uart-pl011 >>   49:          0     GICv3  41 Edge      ACPI:Ged >>   50:          0   ITS-MSI 16384 Edge      virtio0-config >>   51:       2088   ITS-MSI 16385 Edge      virtio0-req.0 >>   52:          0   ITS-MSI 16386 Edge      virtio0-req.1 >>   53:          0   ITS-MSI 16387 Edge      virtio0-req.2 >>   54:          0   ITS-MSI 16388 Edge      virtio0-req.3 >>   55:      11641   ITS-MSI 32768 Edge      xhci_hcd >>   56:          0   ITS-MSI 32769 Edge      xhci_hcd >> IPI0:         0       Rescheduling interrupts >> IPI1:         0       Function call interrupts >> IPI2:         0       CPU stop interrupts >> IPI3:         0       CPU stop (for crash dump) interrupts >> IPI4:         0       Timer broadcast interrupts >> IPI5:         0       IRQ work interrupts >> IPI6:         0       CPU wake-up interrupts >> Err:          0 >> root@debian:~# echo 1 >/sys/devices/system/cpu/cpu2/online >> root@debian:~# cat /proc/interrupts >>             CPU0       CPU2 >>   10:       2530         90     GICv3  27 Level     arch_timer >>   12:        103          0     GICv3  33 Level     uart-pl011 >>   49:          0          0     GICv3  41 Edge      ACPI:Ged >>   50:          0          0   ITS-MSI 16384 Edge      virtio0-config >>   51:       2097          0   ITS-MSI 16385 Edge      virtio0-req.0 >>   52:          0          0   ITS-MSI 16386 Edge      virtio0-req.1 >>   53:          0         12   ITS-MSI 16387 Edge      virtio0-req.2 >>   54:          0          0   ITS-MSI 16388 Edge      virtio0-req.3 >>   55:      13487          0   ITS-MSI 32768 Edge      xhci_hcd >>   56:          0          0   ITS-MSI 32769 Edge      xhci_hcd >> IPI0:        38         45       Rescheduling interrupts >> IPI1:         3          3       Function call interrupts >> IPI2:         0          0       CPU stop interrupts >> IPI3:         0          0       CPU stop (for crash dump) interrupts >> IPI4:         0          0       Timer broadcast interrupts >> IPI5:         0          0       IRQ work interrupts >> IPI6:         0          0       CPU wake-up interrupts >> Err:          0 >> > > Out of interest, is the virtio managed interrupts support just in your sandbox? > You did mention earlier in the thread that you were considering adding this > feature. > > Thanks, > John > .