Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1295660rwl; Fri, 24 Mar 2023 08:41:32 -0700 (PDT) X-Google-Smtp-Source: AKy350a1zNatJIAnx+VzYsDv8NRXe3tWZ8RsO5wr67yrCB47Mx2BgeZ/A29gIxOgIqWcHOIb7/T/ X-Received: by 2002:a17:906:eb16:b0:939:c411:f784 with SMTP id mb22-20020a170906eb1600b00939c411f784mr3089476ejb.56.1679672491818; Fri, 24 Mar 2023 08:41:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679672491; cv=none; d=google.com; s=arc-20160816; b=wW0vX9nhpu/rNjgTmM12VfbkNhw1B8n/z3y5wAfXYHY2dXZH8kAYO6gq4AeTEG3rR0 MciqpBD1dzV3BE0c5zou2ZlU1rYADyJEG7r+1/0cq4NWBLMF+W7hB3wGD0fRriR9+G+5 IU007TJOIq0Yyvw+jTH2oJOdhsgPPe7nCGLfPJ8b1vYclpN711V4NUHcl7u+2CquPMii jPUl7jp/W9TWEQc0sv9FyjAItzuMyZBCt1Tv7qBxZq9CdzinGFyhfGqgIKqaZL7v4bIa pxnd5C1f19ufBCHFqdRlruhQJjDwfgDT98ewHRu+o+uoIsO4W+mLFPQgLdd27qQtEXE0 zA4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=EZsJSuXZHEuy+tOh8cCziDMVMaKb39iRLCNHtH20hgU=; b=wLDjWyrEbLOYOFS9QXIITyo2Yj8fJ4QtmR3Oeu2FC0ru5qpcGW209uUxpH1O3Vf4Sr WuaEqOPhTbWjXo6AlZUCOwwyR/omjo8JjAXlEvqoEKvolblHdNoO8d2hMY5aQ68UtVSn +T3jmWyPtnBnnfEiTd3Qmyh+x7JGpffiKKNeJZo9t2WFml0xGUPlFUdb1HS+ka+sPBZ9 BQxsXS2c0vtIndxOJBu82j3U9FbSDWFOWciSYMEJtuUQBA/EyAx3lzfTLSjTDdCLT16x i9bHt87TM1B8R3xwElTURReOpd3BNFBlYTYtHO0m4UYsX1YfYb4Q4M+Blq3rO8AOAVp1 RIPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="En//+wAX"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=TLARnM6V; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p5-20020a170906838500b009316736fd97si20761523ejx.405.2023.03.24.08.41.07; Fri, 24 Mar 2023 08:41:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="En//+wAX"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=TLARnM6V; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232013AbjCXPjq (ORCPT + 99 others); Fri, 24 Mar 2023 11:39:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231894AbjCXPjn (ORCPT ); Fri, 24 Mar 2023 11:39:43 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C49195FD7 for ; Fri, 24 Mar 2023 08:39:20 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1679672348; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=EZsJSuXZHEuy+tOh8cCziDMVMaKb39iRLCNHtH20hgU=; b=En//+wAXaKPUNUYL/kpRcBj6SGry6M/vowv27rstAs6ykDgAw24DAMzCq7H34CfwMQs5hH Iv/QXhpSBVxl6pqYfPwMP1aHSmQRh1B0HkHym9y9ZCgvDcXZibS+gVyK5TA/kee5GkGUdN npu761ek6d9h3F1ZZ/PHW2GnOd+OSkSXHyiWBc116lsYegv8PrbNn9UKa32Bih3fG4dxzK 8ibdXOGAESnopDeq9IbzvO+83UR9mBUE9bX+7nY6U05RWnojqhafydZhPZd7Z5XDwcmC3P ordlzXzeeu5aJ4glW7y7pAMrzj3d38cph2vPQd+6mnQ9kHgaW1U/UEhRoVRI8A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1679672348; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=EZsJSuXZHEuy+tOh8cCziDMVMaKb39iRLCNHtH20hgU=; b=TLARnM6VYXTc5Xy1KTQ71/y6rPbCRZRHqviwlTs4kYxnacSSR7pH0yjJeBTg51m5aswvC/ OAfUI5fx39isnhBg== To: Saurabh Singh Sengar , Borislav Petkov Cc: Saurabh Sengar , "mingo@redhat.com" , "dave.hansen@linux.intel.com" , "x86@kernel.org" , "hpa@zytor.com" , "johan+linaro@kernel.org" , "isaku.yamahata@intel.com" , "Michael Kelley (LINUX)" , "linux-kernel@vger.kernel.org" , "rahul.tanwar@linux.intel.com" , "andriy.shevchenko@intel.com" Subject: RE: [EXTERNAL] Re: [PATCH] x86/ioapic: Don't return 0 as valid virq In-Reply-To: References: <1677785686-2152-1-git-send-email-ssengar@linux.microsoft.com> <20230312204019.GBZA44s28AOAfAcRuy@fat_crate.local> <20230313111425.GDZA8Fkar9Z9BzSImD@fat_crate.local> Date: Fri, 24 Mar 2023 16:39:07 +0100 Message-ID: <87fs9u6twk.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 14 2023 at 10:23, Saurabh Singh Sengar wrote: >> This should be added to your commit message: what guest VM is that and >> why should the kernel support it. > > Guest VM is a linux VM running as child partition on Hyper-V. Hyper-v Linux > documentation is in Documentation/virt/hyperv/. > > In commit I wanted to mention that any system which is not registering > IO-APIC will have this issue. But I am fine to mention specifically > about the issue I am facing. As part of your next comment, I have > explained the issue in detail if that is good, I can put that as > commit message. >> >> Why doesn't it need an IO-APIC and why does the current code need to be >> changed just for your guest VM? > > For Hyper-V Virtual Machines, few platforms don't have any devices to be > hooked to IO-APIC. Although it has Hyper-V based MSI over VMBus which > assigns interrupts to PCIe devices. In such platforms IO-APIC is not > registered which causes gsi_top value to remain at 0 and not get properly > assigned. Moreover, due to the inability to disable CONFIG_X86_IO_APIC > flag, the io-apic code still gets compiled. Thus, arch_dynirq_lower_bound > function in io_apic.c decides the lower bound of irq numbers based on gsi_top. > > Later when PCIe-MSI attempts to allocate interrupts, it gets 0 as the first > virq number because gsi_top is still 0. 0 being invalid virq is ignored by > MSI irq domain and results allocation of the same PCIe MSI twice. > > CPU0 CPU1 > 0: 2 0 Hyper-V PCIe MSI 1073741824-edge > 1: 69 0 Hyper-V PCIe MSI 1073741824-edge nvme0q0 > > To avoid this issue, if IO-APIC and gsi_top are not initialized, return the > hint value passed as 'from' value to arch_dynirq_lower_bound instead of 0. > This will also be identical to the behaviour of weak arch_dynirq_lower_bound > function defined in kernel/softirq.c. I find this mightly confusing. Something like this perhaps: Subject: x86/ioapic: Don't return 0 from arch_dynirq_lower_bound() arch_dynirq_lower_bound() is invoked by the core interrupt code to retrieve the lowest possible Linux interrupt number for dynamically allocated interrupts like MSI. The x86 implementation uses this to exclude the IO/APIC GSI space. This works correctly as long as there is an IO/APIC registered, but returns 0 if not. This has been observed in VMs where the BIOS does not advertise an IO/APIC. 0 is an invalid interrupt number except for the legacy timer interrupt on x86. The return value is unchecked in the core code, so it ends up to allocate interrupt number 0 which is subsequently considered to be invalid by the caller, e.g. the MSI allocation code. The function has already a check for 0 in the case that an IO/APIC is registered, but ioapic_dynirq_base is 0 in case of device tree setups. Consolidate this and zero check for both ioapic_dynirq_base and gsi_top, which is used in the case that no IO/APIC is registered. And then make the code to look like the below, which makes it very clear what this is about. Thanks, tglx --- --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -2477,17 +2477,22 @@ static int io_apic_get_redir_entries(int unsigned int arch_dynirq_lower_bound(unsigned int from) { + unsigned int ret; + /* * dmar_alloc_hwirq() may be called before setup_IO_APIC(), so use * gsi_top if ioapic_dynirq_base hasn't been initialized yet. */ - if (!ioapic_initialized) - return gsi_top; + ret = ioapic_dynirq_base ? : gsi_top; + /* - * For DT enabled machines ioapic_dynirq_base is irrelevant and not - * updated. So simply return @from if ioapic_dynirq_base == 0. + * For DT enabled machines ioapic_dynirq_base is irrelevant and + * always 0. gsi_top can be 0 if there is no IO/APIC registered. + * + * 0 is an invalid interrupt number for dynamic allocations. Return + * @from instead. */ - return ioapic_dynirq_base ? : from; + return ret ? : from; } #ifdef CONFIG_X86_32