Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp1293064pxa; Sat, 15 Aug 2020 14:40:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwv04mOkrZesS5dQ8AYDuUvirKIsh11hyLG31Iamj8vgIVgVx42kayPN1UiUqR1t0c3StIZ X-Received: by 2002:a17:906:6b87:: with SMTP id l7mr8072063ejr.198.1597527608209; Sat, 15 Aug 2020 14:40:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597527608; cv=none; d=google.com; s=arc-20160816; b=hITaFAPK/EYEFEqfySMd3utoO67KoBkI3T4kovljtEfVyF0clUdBEjXAf+vLxcchc8 M5alyutJi2xGG3EQvWaBBOoV9l8o7VDjLQnHxg0uHRkRbZQiJLR5tIPDaW7JNIxqhx6J fGv9V0221fCBRIRn2JQ8ct+nwYgq0dduOGr56bR6WSTZEv0LXE5PD+LcSXcu2liA4Ex+ ElQqKpis7JIgFfyLcia+dieWWHPUsxIp1fNNyJY76u0oYcr6M2LNA41H4Z5C4/MTqdKf m/1Dl9an8Ct7StIyL0vLNkfla+lI3jau1SvsCFcxH2vHOQhK4/TUJ+ngZhLypj+7PAYQ 9IfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=xB91pDTRwqw7pwTB66yAzHGkQkM9xqDQ690X1bhgBx4=; b=RtC0Uyb2uI9xmOVHK0DcLioD8oT+3iOOokqX45hCExIvSiK7YepiusWilwp1i5YYPm IfOz9V90NTq31uu5/lZ+T44BflUiLxcgElXpA7mMXZ+cAoXZpuX2aJQNoIw5N6SAhhsf eyqprccbuDSmyLuxb1DDw0ccXPLQmlKRljFM0z+TuFIno2Hsx276HxnanD11bYvw3mvt FuK+Z9dkLL/Xg2wK/f+kvkOwuVnFXCf1Kn5P4BsNBkmA2TggoxUqDZl50eUnDzDBy3kb GKKbzJ1TghwecrX4LkNivIy4GeStEcpEdNSk7/jc4sW0o/iaJjzOAlA58RYnRIZfTPYp 6VoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t4si7874316ejt.506.2020.08.15.14.39.45; Sat, 15 Aug 2020 14:40:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727882AbgHOVig (ORCPT + 99 others); Sat, 15 Aug 2020 17:38:36 -0400 Received: from mga07.intel.com ([134.134.136.100]:62773 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726004AbgHOVid (ORCPT ); Sat, 15 Aug 2020 17:38:33 -0400 IronPort-SDR: 6Q+OXgm6xMndxUabB+wZh+rPsd/+B2l2and0Lx90OfqVUS+bAelT9mZChwZBqAzULONvbZT0gA eVvE2xaQ2STg== X-IronPort-AV: E=McAfee;i="6000,8403,9714"; a="218878052" X-IronPort-AV: E=Sophos;i="5.76,317,1592895600"; d="scan'208";a="218878052" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2020 12:53:46 -0700 IronPort-SDR: ufbaN7XDWuZRsutGfXZoxTHSTwlWts13MQM+Kf9bLM8OnjsZXzf2XMESPkRiNDfnQ/+P+QzPwZ QJH7lktserLw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,317,1592895600"; d="scan'208";a="319240608" Received: from cibrouil-mobl1.amr.corp.intel.com (HELO araj-mobl1.jf.intel.com) ([10.254.80.64]) by fmsmga004.fm.intel.com with ESMTP; 15 Aug 2020 12:53:45 -0700 Date: Sat, 15 Aug 2020 12:53:45 -0700 From: "Raj, Ashok" To: Randy Dunlap Cc: Ashok Raj , linux-kernel@vger.kernel.org, tglx@linutronix.de, Sukumar Ghorai , Srikanth Nandamuri , Evan Green , Mathias Nyman , Bjorn Helgaas , stable@vger.kernel.org Subject: Re: [PATCH] x86/hotplug: Silence APIC only after all irq's are migrated Message-ID: <20200815195345.GA6022@araj-mobl1.jf.intel.com> References: <20200814213842.31151-1-ashok.raj@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Randy For some unknown reason my previous response said its taiting to be delivered. On Fri, Aug 14, 2020 at 04:25:32PM -0700, Randy Dunlap wrote: > On 8/14/20 2:38 PM, Ashok Raj wrote: > > When offlining CPU's, fixup_irqs() migrates all interrupts away from the > > CPUs, I'll fix all these in the next rev. Just waiting to hear back from Thomas if he has additional ones I can fix and resend v2. Cheers, Ashok > > > outgoing CPU to an online CPU. Its always possible the device sent an > > It's > > > interrupt to the previous CPU destination. Pending interrupt bit in IRR in > > lapic identifies such interrupts. apic_soft_disable() will not capture any > > LAPIC > > > new interrupts in IRR. This causes interrupts from device to be lost during > > cpu offline. The issue was found when explicitly setting MSI affinity to a > > CPU > > > CPU and immediately offlining it. It was simple to recreate with a USB > > ethernet device and doing I/O to it while the CPU is offlined. Lost > > interrupts happen even when Interrupt Remapping is enabled. > > > > Current code does apic_soft_disable() before migrating interrupts. > > > > native_cpu_disable() > > { > > ... > > apic_soft_disable(); > > cpu_disable_common(); > > --> fixup_irqs(); // Too late to capture anything in IRR. > > } > > > > Just fliping the above call sequence seems to hit the IRR checks > > flipping > > > and the lost interrupt is fixed for both legacy MSI and when > > interrupt remapping is enabled. > > > > > > Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead") > > Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/ > > Signed-off-by: Ashok Raj > > > > To: linux-kernel@vger.kernel.org > > To: Thomas Gleixner > > Cc: Sukumar Ghorai > > Cc: Srikanth Nandamuri > > Cc: Evan Green > > Cc: Mathias Nyman > > Cc: Bjorn Helgaas > > Cc: stable@vger.kernel.org > > --- > > arch/x86/kernel/smpboot.c | 11 +++++++++-- > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > index ffbd9a3d78d8..278cc9f92f2f 100644 > > --- a/arch/x86/kernel/smpboot.c > > +++ b/arch/x86/kernel/smpboot.c > > @@ -1603,13 +1603,20 @@ int native_cpu_disable(void) > > if (ret) > > return ret; > > > > + cpu_disable_common(); > > /* > > * Disable the local APIC. Otherwise IPI broadcasts will reach > > * it. It still responds normally to INIT, NMI, SMI, and SIPI > > - * messages. > > + * messages. Its important to do apic_soft_disable() after > > It's > > > + * fixup_irqs(), because fixup_irqs() called from cpu_disable_common() > > + * depends on IRR being set. After apic_soft_disable() CPU preserves > > + * currently set IRR/ISR but new interrupts will not set IRR. > > + * This causes interrupts sent to outgoing cpu before completion > > CPU > > > + * of irq migration to be lost. Check SDM Vol 3 "10.4.7.2 Local > > IRQ > > > + * APIC State after It Has been Software Disabled" section for more > > + * details. > > */ > > apic_soft_disable(); > > - cpu_disable_common(); > > > > return 0; > > } > > > > thanks. > -- > ~Randy >