Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp674785pxa; Fri, 14 Aug 2020 14:57:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxj7rtGNZpp/cHkzdzXntkKLvrK+raAwdAymMdLKEQViW/KHx0SgcoiSfJWL6HViSHsgXP5 X-Received: by 2002:a50:fb14:: with SMTP id d20mr4140619edq.191.1597442262923; Fri, 14 Aug 2020 14:57:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597442262; cv=none; d=google.com; s=arc-20160816; b=ZasVBf1CQA8iR5DzpcUySBxUVMiIkUekaCpxDCBWyubPpf2m3T0u2NRIIp805JwSs9 RJuxpQNuGPNyfdimJinRIi0NohYTtuNGPh50t1hL+UlV5IJEDOybbxrDo4cuky9jxdMe bLXQ5WJk6mNdnOD8Y0hGcCUp6n6eEMwAcWYdbzVHmXgDGiMsOsjcfVf4/7uB8c75G/H5 34Gv9yBRzFYeLa6nC6P5+RW/tC4VTwdrV7rqA0GMvtHt1rwky+Z4lrYD5/zHRodFoz45 rMfzWC3ScBz0Y4dGFWpVAjDkTyXfxuSLxocEdn4aA5++94phlmAW4GvpnSZph/YlSA9p Ac/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :ironport-sdr:ironport-sdr; bh=vR6TtMeAQCprYRr7lNU4cN6hhm7CbnR0xLkHRuLnUt8=; b=jum1nYMNIntLM3FqgdDrp0PgBjMl/Vww8tqO5VmU4mew/wr0rGIprRWXmpm5hsBHuS KGLdhyDnDoppHBPqscGVFT9Atxk51JHgH1M8qrg60eLO/HIXH/yYTT16gy6/mt3aBocO WbjRSaSs4tdDgc3YDoLuCqYLfwcev6+7dfH5PJMx8/gFBypHbDVZ2vT8OyHIFvqqvO26 ftacP4Y6kjV8iU1b+DaZ/VKK+zzsGqgeZhcLFePH9IPu/R0+/Wt42xPPr5+YvFIWg4Y/ YdlTwY80WZLhJ4d8/QSK03u57m5gbUhIktZwptC1ryXyuFOQ5RDWVsBpRZlYzh8FShsb vt/A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a12si6387172edn.536.2020.08.14.14.57.20; Fri, 14 Aug 2020 14:57:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728118AbgHNViz (ORCPT + 99 others); Fri, 14 Aug 2020 17:38:55 -0400 Received: from mga06.intel.com ([134.134.136.31]:24480 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726196AbgHNViz (ORCPT ); Fri, 14 Aug 2020 17:38:55 -0400 IronPort-SDR: 3QlMHMzX3e3ZrdBau7hrK2KAZm6L+RlKSVg63axxZlrGBB7/v4DrjY5NXMRJcxyD3GEI6umuRE DD5Fhv/HilZA== X-IronPort-AV: E=McAfee;i="6000,8403,9713"; a="216014790" X-IronPort-AV: E=Sophos;i="5.76,313,1592895600"; d="scan'208";a="216014790" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Aug 2020 14:38:52 -0700 IronPort-SDR: m1BxrBUKmO585g+Pmj3RtcKmI9FmAd+SQNenfVmglsMbkS+66xuNfqYODuRTqIFyJuzOmAqO+9 wO/Vv3/AaO1w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,313,1592895600"; d="scan'208";a="470717445" Received: from araj-mobl1.jf.intel.com ([10.254.120.157]) by orsmga005.jf.intel.com with ESMTP; 14 Aug 2020 14:38:50 -0700 From: Ashok Raj To: linux-kernel@vger.kernel.org, tglx@linutronix.de Cc: Ashok Raj , Sukumar Ghorai , Srikanth Nandamuri , Evan Green , Mathias Nyman , Bjorn Helgaas , stable@vger.kernel.org Subject: [PATCH] x86/hotplug: Silence APIC only after all irq's are migrated Date: Fri, 14 Aug 2020 14:38:42 -0700 Message-Id: <20200814213842.31151-1-ashok.raj@intel.com> X-Mailer: git-send-email 2.13.6 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When offlining CPU's, fixup_irqs() migrates all interrupts away from the outgoing CPU to an online CPU. Its always possible the device sent an interrupt to the previous CPU destination. Pending interrupt bit in IRR in lapic identifies such interrupts. apic_soft_disable() will not capture any new interrupts in IRR. This causes interrupts from device to be lost during cpu offline. The issue was found when explicitly setting MSI affinity to a CPU and immediately offlining it. It was simple to recreate with a USB ethernet device and doing I/O to it while the CPU is offlined. Lost interrupts happen even when Interrupt Remapping is enabled. Current code does apic_soft_disable() before migrating interrupts. native_cpu_disable() { ... apic_soft_disable(); cpu_disable_common(); --> fixup_irqs(); // Too late to capture anything in IRR. } Just fliping the above call sequence seems to hit the IRR checks and the lost interrupt is fixed for both legacy MSI and when interrupt remapping is enabled. Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead") Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/ Signed-off-by: Ashok Raj To: linux-kernel@vger.kernel.org To: Thomas Gleixner Cc: Sukumar Ghorai Cc: Srikanth Nandamuri Cc: Evan Green Cc: Mathias Nyman Cc: Bjorn Helgaas Cc: stable@vger.kernel.org --- arch/x86/kernel/smpboot.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index ffbd9a3d78d8..278cc9f92f2f 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1603,13 +1603,20 @@ int native_cpu_disable(void) if (ret) return ret; + cpu_disable_common(); /* * Disable the local APIC. Otherwise IPI broadcasts will reach * it. It still responds normally to INIT, NMI, SMI, and SIPI - * messages. + * messages. Its important to do apic_soft_disable() after + * fixup_irqs(), because fixup_irqs() called from cpu_disable_common() + * depends on IRR being set. After apic_soft_disable() CPU preserves + * currently set IRR/ISR but new interrupts will not set IRR. + * This causes interrupts sent to outgoing cpu before completion + * of irq migration to be lost. Check SDM Vol 3 "10.4.7.2 Local + * APIC State after It Has been Software Disabled" section for more + * details. */ apic_soft_disable(); - cpu_disable_common(); return 0; } -- 2.13.6