Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp490711ybb; Fri, 20 Mar 2020 02:54:49 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvpr3z70SMYHJz3pEFIp90MvX6uVw4Ws7TwFSzXP8jFK8urdQ3rQ2il6GlFpiy2y3jg4vs9 X-Received: by 2002:a05:6830:1447:: with SMTP id w7mr6003931otp.267.1584698089578; Fri, 20 Mar 2020 02:54:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584698089; cv=none; d=google.com; s=arc-20160816; b=oBbGBhod2thnItyLu/KzcyWjJ20niOtIG3UPhAGM+V3bV+c/dttSfGGXKuEaE1dLJa LNic3HTfF1xm1b3Ymkz19YLIdr3r62NvpkWM98SDAs20xcZDq/Ejx2HlipW86PJGHPCR TBMPvCmfMkU6LEd7ud1Gwk6PxF0kKeE9d0oUteGmOotbZWhsLptXnivKopIvrVStnBER c30RW8SZY7lWyfR5iHY2/GunU7U2AoILFqpBGSfW8MtrLeoXm8zgT3ripj7KLCS3EDOw jBM6+Fa/xAXQR6Pgq+Wj8cWv846LpPXyHsycRxclQTHm7Fo7buWtRw20YhVVBNMgWtJS UDWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=wIukv83WM0vgs/MmrLLTnd862d8RJJoZx69gHtAE7kE=; b=vBN5cK8SuAu75jCzw2m/BPnCfMRJhUGnfzjKr3xlM7VtyDmctvYSDGdOVLKT/ibcsn ryhVXn61S0uA6slKRvlKBHkHRh3Lyi7Tqie0SbKLNycJWciBbjjA5p+S/osNobHb5JCE kYdXte/J+Uw9CZarERfk7j5aB7dx/fjvxx4O8ubCdIFIfKbUUOK6ThXa8awOuXX2RZyD WVfkZliBtlrdz/P4EYxSViIgKK/oID04juHQ7LzdXonbHvWz5ICZ74HOT/+bu92urg8B dd06ex8oJOA1jUAPqCllufTpB/7cqb4qnqKCSlXuiUFN5SQi486xsTRJg94QlfRDdZxW exog== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v18si2722038oth.1.2020.03.20.02.54.37; Fri, 20 Mar 2020 02:54:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727120AbgCTJxO (ORCPT + 99 others); Fri, 20 Mar 2020 05:53:14 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:35099 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726767AbgCTJxN (ORCPT ); Fri, 20 Mar 2020 05:53:13 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jFEKv-00015l-M7; Fri, 20 Mar 2020 10:53:05 +0100 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id 07D27100375; Fri, 20 Mar 2020 10:52:59 +0100 (CET) From: Thomas Gleixner To: Mathias Nyman , x86@kernel.org Cc: linux-pci , LKML , Bjorn Helgaas , Evan Green , "Ghorai\, Sukumar" , "Amara\, Madhusudanarao" , "Nandamuri\, Srikanth" Subject: Re: MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug In-Reply-To: <806c51fa-992b-33ac-61a9-00a606f82edb@linux.intel.com> References: <806c51fa-992b-33ac-61a9-00a606f82edb@linux.intel.com> Date: Fri, 20 Mar 2020 10:52:59 +0100 Message-ID: <87d0974akk.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Mathias, Mathias Nyman writes: > I can reproduce the lost MSI interrupt issue on 5.6-rc6 which includes > the "Plug non-maskable MSI affinity race" patch. > > I can see this on a couple platforms, I'm running a script that first generates > a lot of usb traffic, and then in a busyloop sets irq affinity and turns off > and on cpus: > > for i in 1 3 5 7; do > echo "1" > /sys/devices/system/cpu/cpu$i/online > done > echo "A" > "/proc/irq/*/smp_affinity" > echo "A" > "/proc/irq/*/smp_affinity" > echo "F" > "/proc/irq/*/smp_affinity" > for i in 1 3 5 7; do > echo "0" > /sys/devices/system/cpu/cpu$i/online > done > trace snippet: > -0 [001] d.h. 129.676900: xhci_irq: xhci irq > -0 [001] d.h. 129.677507: xhci_irq: xhci irq > -0 [001] d.h. 129.677556: xhci_irq: xhci irq > -0 [001] d.h. 129.677647: xhci_irq: xhci irq > <...>-14 [001] d..1 129.679802: msi_set_affinity: direct update msi 122, vector 33 -> 33, apicid: 2 -> 6 Looks like a regular affinity setting in interrupt context, but I can't make sense of the time stamps > -0 [003] d.h. 129.682639: xhci_irq: xhci irq > -0 [003] d.h. 129.702380: xhci_irq: xhci irq > -0 [003] d.h. 129.702493: xhci_irq: xhci irq > migration/3-24 [003] d..1 129.703150: msi_set_affinity: direct update msi 122, vector 33 -> 33, apicid: 6 -> 0 So this is a CPU offline operation and after that irq 122 is silent, right? > kworker/0:0-5 [000] d.h. 131.328790: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 > kworker/0:0-5 [000] d.h. 133.312704: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 > kworker/0:0-5 [000] d.h. 135.360786: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 > -0 [000] d.h. 137.344694: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 > kworker/0:0-5 [000] d.h. 139.128679: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 > kworker/0:0-5 [000] d.h. 141.312686: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 > kworker/0:0-5 [000] d.h. 143.360703: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 > kworker/0:0-5 [000] d.h. 145.344791: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0 That kworker context looks fishy. Can you please enable stacktraces in the tracer so I can see the call chains leading to this? OTOH that's irq 121 not 122. Anyway moar information is always useful. And please add the patch below. Thanks, tglx 8<--------------- --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -243,6 +243,7 @@ u64 arch_irq_stat(void) RCU_LOCKDEP_WARN(!rcu_is_watching(), "IRQ failed to wake up RCU"); desc = __this_cpu_read(vector_irq[vector]); + trace_printk("vector: %u desc %lx\n", vector, (unsigned long) desc); if (likely(!IS_ERR_OR_NULL(desc))) { if (IS_ENABLED(CONFIG_X86_32)) handle_irq(desc, regs);