Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp725725pxa; Fri, 14 Aug 2020 16:45:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwrObCLRSYwgULIDbtojYi6aGf6Toi5GsYcgpAG5adiJDdtKqOWcZ37nVd/FdlzvMwFIQi8 X-Received: by 2002:a17:906:c7d3:: with SMTP id dc19mr4697041ejb.495.1597448714637; Fri, 14 Aug 2020 16:45:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597448714; cv=none; d=google.com; s=arc-20160816; b=HOfsCHjaCFM7GfmNB9FKNb83EMPYkKLf09pzd+9FpI5x0djg9p3OlnO/CQpu2J6ELI eGHr81WUqGRouuYwgCkh8S7Yh4oUiNqmFJqVLxnUbeAro4Cwo7XenwyuDoRj/9UFRYtC zBo+afkmokMGpEP0OxXSpC88oI75Zqkn/rHUzIC+AN9CA9LJJ+goFy0a5wPkYduSxtzm nwJaua0sBfOJx/SYYfm7tM/BYFv8cZDFklnQZSoWiucvwnpW+xeJ6KMpVVfl48Bqukyd hw3UqUZkSQt9acdzNj5HYtv6cXhUPD2RmKtgH28p08Mk3CVYMjip6vwPVnX6oD6+J0vw A2Tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=Y0E0LZUHG1vNnpFUrQvEenCL5ANWUqlMtRfd74+R+f4=; b=p5KSdkm7yQ4fdt+m/mA3T5DIM3MbRTOhg7npr2cTC7SZBPiNcrzC4gNr2xo9rm8CUs GrtUxjpp/kXsrWTuklLIFFuad4uwCQtkdK4S2dIzkOqsaVfEQQsh05HPKvk/+61+qDgS XmwtM5WJYPLZts3nTkJz8h5ZuaIiz7OskvuGzRZPzPf7AFrCiAH6mDO6m6cArMajwX/Y No4krAVo4mmD43OsKY/AlPIfCFbsQ1PgQK4P0GBXZ8bdaceUmCrjEmlJ0WGcKhVOm0cc E7P/Q7b3aU/EDKSAxITA4/T/lVJGuaEH6d4PQEjTdYt1yfDPsQExQq28HQRhvkm2nIvH qvWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b=ZMi+b2V2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id li13si6257287ejb.423.2020.08.14.16.44.51; Fri, 14 Aug 2020 16:45:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b=ZMi+b2V2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726845AbgHNXZl (ORCPT + 99 others); Fri, 14 Aug 2020 19:25:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35980 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726213AbgHNXZk (ORCPT ); Fri, 14 Aug 2020 19:25:40 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DA9DC061385; Fri, 14 Aug 2020 16:25:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:Cc:To:Subject:Sender :Reply-To:Content-ID:Content-Description; bh=Y0E0LZUHG1vNnpFUrQvEenCL5ANWUqlMtRfd74+R+f4=; b=ZMi+b2V2BDJLYdZu4LmYyLf93Y LbEZs4mp5zRmDlBM6rUd1xxVkoETDadYCD1tNb4NR1eEnLjlHBmJikkaxQHGKfiYd6gu0QD6k02IW pQtGg2/R9g33xhRbBmD9tEc+56RVqyONxwU2SfIULbdnHuXTXZqv7t1++M9gJIA0YO6QhITj2MXU1 6o7LvvXW+a9O619rFSRW6yQ1UuicmRbsdcVnT56Afe8VRCFN1EIzqq6u5iXEeadBdPe+lvlOClcxP XguDnOrVDKZHfhxYAaX0lppoeLvnxXdA82xRSPw8VFNjNYF+fR9zMAh1yH/EX+XgP5/QxVclmiqTG h9k5CJ0Q==; Received: from [2601:1c0:6280:3f0::19c2] by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1k6j4q-0003s4-1D; Fri, 14 Aug 2020 23:25:37 +0000 Subject: Re: [PATCH] x86/hotplug: Silence APIC only after all irq's are migrated To: Ashok Raj , linux-kernel@vger.kernel.org, tglx@linutronix.de Cc: Sukumar Ghorai , Srikanth Nandamuri , Evan Green , Mathias Nyman , Bjorn Helgaas , stable@vger.kernel.org References: <20200814213842.31151-1-ashok.raj@intel.com> From: Randy Dunlap Message-ID: Date: Fri, 14 Aug 2020 16:25:32 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20200814213842.31151-1-ashok.raj@intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/14/20 2:38 PM, Ashok Raj wrote: > When offlining CPU's, fixup_irqs() migrates all interrupts away from the CPUs, > outgoing CPU to an online CPU. Its always possible the device sent an It's > interrupt to the previous CPU destination. Pending interrupt bit in IRR in > lapic identifies such interrupts. apic_soft_disable() will not capture any LAPIC > new interrupts in IRR. This causes interrupts from device to be lost during > cpu offline. The issue was found when explicitly setting MSI affinity to a CPU > CPU and immediately offlining it. It was simple to recreate with a USB > ethernet device and doing I/O to it while the CPU is offlined. Lost > interrupts happen even when Interrupt Remapping is enabled. > > Current code does apic_soft_disable() before migrating interrupts. > > native_cpu_disable() > { > ... > apic_soft_disable(); > cpu_disable_common(); > --> fixup_irqs(); // Too late to capture anything in IRR. > } > > Just fliping the above call sequence seems to hit the IRR checks flipping > and the lost interrupt is fixed for both legacy MSI and when > interrupt remapping is enabled. > > > Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead") > Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/ > Signed-off-by: Ashok Raj > > To: linux-kernel@vger.kernel.org > To: Thomas Gleixner > Cc: Sukumar Ghorai > Cc: Srikanth Nandamuri > Cc: Evan Green > Cc: Mathias Nyman > Cc: Bjorn Helgaas > Cc: stable@vger.kernel.org > --- > arch/x86/kernel/smpboot.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index ffbd9a3d78d8..278cc9f92f2f 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -1603,13 +1603,20 @@ int native_cpu_disable(void) > if (ret) > return ret; > > + cpu_disable_common(); > /* > * Disable the local APIC. Otherwise IPI broadcasts will reach > * it. It still responds normally to INIT, NMI, SMI, and SIPI > - * messages. > + * messages. Its important to do apic_soft_disable() after It's > + * fixup_irqs(), because fixup_irqs() called from cpu_disable_common() > + * depends on IRR being set. After apic_soft_disable() CPU preserves > + * currently set IRR/ISR but new interrupts will not set IRR. > + * This causes interrupts sent to outgoing cpu before completion CPU > + * of irq migration to be lost. Check SDM Vol 3 "10.4.7.2 Local IRQ > + * APIC State after It Has been Software Disabled" section for more > + * details. > */ > apic_soft_disable(); > - cpu_disable_common(); > > return 0; > } > thanks. -- ~Randy