Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp1300264pxa; Sat, 15 Aug 2020 15:00:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy90HvRF2wFYvzSaDP72e4sFq57PVGLT6gq6Ad1KjK5Uz3K0ooZhv3DuvDxHldloGHEedjb X-Received: by 2002:a17:906:6bc9:: with SMTP id t9mr8206447ejs.372.1597528820190; Sat, 15 Aug 2020 15:00:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597528820; cv=none; d=google.com; s=arc-20160816; b=boIcL9xUR1zAHzLoRDuxHMdH2Oi6D38TPncwydhVikeqhOHXf7JunIUJiThSWd2huw QZI6K9P3CJtNFqeWX3HJZlmw3WkYprZgYZZmKuP0XB+ITxQgfA1xVTOE7BpFe7M3anXQ jqEXp1AJfSmKKY5ix55eBvfjk+C1fLAgYx7NNQWzN/cBNUoponKoQ29IC0gpRgt5HgYY naT619hqeTvkdKrozi1hRRQivgpW4xYH2cUUGo/Y8NWwRmruL8AVhqhi+CVguVU4SKI2 QDTg91iuBUg0sERSn3glLHNDx1pUP2fqjTOvg6qwXRxjHpk+g0q+TlcdT8YwwVRkhlRH 0Y3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=6mv7l1Be4UP5zn7lxLV4SDqqgzaH/QkV0RcwZc9bXNw=; b=A0bOUOWcL0baGXx+OFqDqGvyAv3HmCxJhWoEEeenBrGfVW3EZMJ+vgcyEIHc3t848E FuiMThaU5NlZ7nnBtk4sUH5u2j2un5BjcNe0ONMwOjtfVKSIzgs8RNKLQaiNCtsND4+g T2qcJ479IWha5Gf4OfzSI6ONC4IKYl8CCUbdcbWEGI7luYpIf/OJmA0GZ3eWCEg0PjOS JvQsU3zP9RueIu/MYZWol2+kVvjPW4H9M901Aklp6dyE8J5ZWU1UidCjQ1BYbGo9FDHE V9Do6L9FHfisdpn/4U6fK3J+Y2oTd79R3UdrBwDlK0aHuiQcBpp89x0fe3QvViSLdoZ6 zBJA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q19si8252261eja.613.2020.08.15.14.59.57; Sat, 15 Aug 2020 15:00:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729232AbgHOV6W (ORCPT + 99 others); Sat, 15 Aug 2020 17:58:22 -0400 Received: from mga14.intel.com ([192.55.52.115]:63150 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729164AbgHOV6P (ORCPT ); Sat, 15 Aug 2020 17:58:15 -0400 IronPort-SDR: chfJsoWD21AZCo8XmipsY48Gr8Xp3fPVJU5kSbNv3X3S4iNFbIXkyB8U6JWbDTnuVx9ArIdlsP crfC4UdWv/cw== X-IronPort-AV: E=McAfee;i="6000,8403,9713"; a="153763269" X-IronPort-AV: E=Sophos;i="5.76,315,1592895600"; d="scan'208";a="153763269" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Aug 2020 22:58:07 -0700 IronPort-SDR: su0+2Bc+sf5a5AkUhAnKl1SwBt4M44H8rpqtoq3dBXweKweSiz9JoJjzqHl15/gQODauiDVSZq 3jjjhZb3JdGw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,315,1592895600"; d="scan'208";a="291931077" Received: from araj-mobl1.jf.intel.com ([10.254.83.48]) by orsmga003.jf.intel.com with ESMTP; 14 Aug 2020 22:58:06 -0700 Date: Fri, 14 Aug 2020 22:58:06 -0700 From: "Raj, Ashok" To: Randy Dunlap Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, Sukumar Ghorai , Srikanth Nandamuri , Evan Green , Mathias Nyman , Bjorn Helgaas , stable@vger.kernel.org, Ashok Raj Subject: Re: [PATCH] x86/hotplug: Silence APIC only after all irq's are migrated Message-ID: <20200815055806.GA3828@araj-mobl1.jf.intel.com> References: <20200814213842.31151-1-ashok.raj@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Randy, On Fri, Aug 14, 2020 at 04:25:32PM -0700, Randy Dunlap wrote: > On 8/14/20 2:38 PM, Ashok Raj wrote: > > When offlining CPU's, fixup_irqs() migrates all interrupts away from the > > CPUs, Thanks for catching these. I'll fix all these suggested changes in my next rev Once i get additional feedback from Thomas. > > > outgoing CPU to an online CPU. Its always possible the device sent an > > It's > > > interrupt to the previous CPU destination. Pending interrupt bit in IRR in > > lapic identifies such interrupts. apic_soft_disable() will not capture any > > LAPIC > > > new interrupts in IRR. This causes interrupts from device to be lost during > > cpu offline. The issue was found when explicitly setting MSI affinity to a > > CPU > > > CPU and immediately offlining it. It was simple to recreate with a USB > > ethernet device and doing I/O to it while the CPU is offlined. Lost > > interrupts happen even when Interrupt Remapping is enabled. > > > > Current code does apic_soft_disable() before migrating interrupts. > > > > native_cpu_disable() > > { > > ... > > apic_soft_disable(); > > cpu_disable_common(); > > --> fixup_irqs(); // Too late to capture anything in IRR. > > } > > > > Just fliping the above call sequence seems to hit the IRR checks > > flipping > > > and the lost interrupt is fixed for both legacy MSI and when > > interrupt remapping is enabled. > > > > > > Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead") > > Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/ > > Signed-off-by: Ashok Raj > > > > To: linux-kernel@vger.kernel.org > > To: Thomas Gleixner > > Cc: Sukumar Ghorai > > Cc: Srikanth Nandamuri > > Cc: Evan Green > > Cc: Mathias Nyman > > Cc: Bjorn Helgaas > > Cc: stable@vger.kernel.org > > --- > > arch/x86/kernel/smpboot.c | 11 +++++++++-- > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > index ffbd9a3d78d8..278cc9f92f2f 100644 > > --- a/arch/x86/kernel/smpboot.c > > +++ b/arch/x86/kernel/smpboot.c > > @@ -1603,13 +1603,20 @@ int native_cpu_disable(void) > > if (ret) > > return ret; > > > > + cpu_disable_common(); > > /* > > * Disable the local APIC. Otherwise IPI broadcasts will reach > > * it. It still responds normally to INIT, NMI, SMI, and SIPI > > - * messages. > > + * messages. Its important to do apic_soft_disable() after > > It's > > > + * fixup_irqs(), because fixup_irqs() called from cpu_disable_common() > > + * depends on IRR being set. After apic_soft_disable() CPU preserves > > + * currently set IRR/ISR but new interrupts will not set IRR. > > + * This causes interrupts sent to outgoing cpu before completion > > CPU > > > + * of irq migration to be lost. Check SDM Vol 3 "10.4.7.2 Local > > IRQ > > > + * APIC State after It Has been Software Disabled" section for more > > + * details. > > */ > > apic_soft_disable(); > > - cpu_disable_common(); > > > > return 0; > > } > > > > thanks. > -- > ~Randy >