Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp683939ybj; Thu, 7 May 2020 05:21:09 -0700 (PDT) X-Google-Smtp-Source: APiQypJ4iISpcm43uHhAc1alM+0wSASs1vNm1AwTXsJg/aG3Gdw27IayU8N3nZLIcrcX6S7K5W0Z X-Received: by 2002:aa7:cc88:: with SMTP id p8mr11535363edt.387.1588854069508; Thu, 07 May 2020 05:21:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588854069; cv=none; d=google.com; s=arc-20160816; b=oDjFRhvaQcvebyAatWxrOhKDcHF/WnaIK7UTZhoeaFvevQ80+xOh4jT8ke6Ox/X4jr psw/8Vd6k6rbDsYLRG78fx9Vgt4bbSM0U1tQXvO7mPK89KK8M8yDXW443BjOVz3FA6Fu 3O6DV5CvKLRrFPnqyqMvR5xfjtczyhhP2/W8DACWJNtkjMCL11ln9qMc7qdBjQZbXhjt R/269IcghgqXKPXRb+l9ALw3SH21OE6harlwf8Nlrv4PvbNYY6Hre4/NFf916IukyPu1 zssyc6UxpXqCX9pQrQv8e3GWyB0QsAq17JBJmpDq85smdDkbO9Wt33rIHZztae2ssQCq NqtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=/SUD1vSue9os4D6mXEdpukxjzz4Y12kXzVn4wFToxfE=; b=yOnfwlc3SRQlo5Iq4OhQKhJBjnkqImb6n2tvGyDE9e5pqc8XYyBBff9XlO7d+Vi5dp g30r28CCqqR5/vJ6s8SD5xI70Zh+IlE6h6BcJWC6LO/AME8Df4P3WxlrXBGpTQEUtULx kLGqGRx3izML8TerlrST4rXf6+HWhL98/mvc9iP5dwH3XK8gkwmXWglHDnAkwbj+hxgN I+DPLvt/9ufnL7+Bf2B/0XEuCuteVC17lUGXzU8eOMvto2oGVQZxkcgDwGnvvrghOVRy 605DuoqPIXYgPtS4iWZJ6/jE8NWUnUZCCaP+v+wgWfi03nY6VQrzk8TpSQCB7ZxAAt3j S6KA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a35si3088136edf.377.2020.05.07.05.20.45; Thu, 07 May 2020 05:21:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726514AbgEGMSw (ORCPT + 99 others); Thu, 7 May 2020 08:18:52 -0400 Received: from mga09.intel.com ([134.134.136.24]:39815 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725857AbgEGMSv (ORCPT ); Thu, 7 May 2020 08:18:51 -0400 IronPort-SDR: 41dXBmJOV9kcgTQ+bbyKhRt22aKTJ2y++QJZ//5aJGDRNf3Mnmc9iNSHtOOrFBPR5znEJNi6b6 +TuJVpZTiTMA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 May 2020 05:18:50 -0700 IronPort-SDR: 7YPBFVxB2W7VTq0T8bMYyvETuExKzSIsCjuvpzvkT0DLPLeRORYAY3lIcO/qcU+pZzokMDfPuh fRFvCG2IH1tQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,363,1583222400"; d="scan'208";a="284975859" Received: from otc-nc-03.jf.intel.com (HELO otc-nc-03) ([10.54.39.25]) by fmsmga004.fm.intel.com with ESMTP; 07 May 2020 05:18:50 -0700 Date: Thu, 7 May 2020 05:18:50 -0700 From: "Raj, Ashok" To: Thomas Gleixner Cc: "Raj, Ashok" , Evan Green , Mathias Nyman , x86@kernel.org, linux-pci , LKML , Bjorn Helgaas , "Ghorai, Sukumar" , "Amara, Madhusudanarao" , "Nandamuri, Srikanth" , Ashok Raj Subject: Re: MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug Message-ID: <20200507121850.GB85463@otc-nc-03> References: <20200501184326.GA17961@araj-mobl1.jf.intel.com> <878si6rx7f.fsf@nanos.tec.linutronix.de> <20200505201616.GA15481@otc-nc-03> <875zdarr4h.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <875zdarr4h.fsf@nanos.tec.linutronix.de> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Thomas We did a bit more tracing and it looks like the IRR check is actually not happening on the right cpu. See below. On Tue, May 05, 2020 at 11:47:26PM +0200, Thomas Gleixner wrote: > > > > msi_set_affinit () > > { > > .... > > unlock_vector_lock(); > > > > /* > > * Check whether the transition raced with a device interrupt and > > * is pending in the local APICs IRR. It is safe to do this outside > > * of vector lock as the irq_desc::lock of this interrupt is still > > * held and interrupts are disabled: The check is not accessing the > > * underlying vector store. It's just checking the local APIC's > > * IRR. > > */ > > if (lapic_vector_set_in_irr(cfg->vector)) > > irq_data_get_irq_chip(irqd)->irq_retrigger(irqd); > > No. This catches the transitional interrupt to the new vector on the > original CPU, i.e. the one which is running that code. Mathias added some trace to his xhci driver when the isr is called. Below is the tail of my trace with last two times xhci_irq isr is called: -0 [003] d.h. 200.277971: xhci_irq: xhci irq -0 [003] d.h. 200.278052: xhci_irq: xhci irq Just trying to follow your steps below with traces. The traces follow the same comments in the source. > > Again the steps are: > > 1) Allocate new vector on new CPU /* Allocate a new target vector */ ret = parent->chip->irq_set_affinity(parent, mask, force); migration/3-24 [003] d..1 200.283012: msi_set_affinity: msi_set_affinity: quirk: 1: new vector allocated, new cpu = 0 > > 2) Set new vector on original CPU /* Redirect it to the new vector on the local CPU temporarily */ old_cfg.vector = cfg->vector; irq_msi_update_msg(irqd, &old_cfg); migration/3-24 [003] d..1 200.283033: msi_set_affinity: msi_set_affinity: Redirect to new vector 33 on old cpu 6 > > 3) Set new vector on new CPU /* Now transition it to the target CPU */ irq_msi_update_msg(irqd, cfg); migration/3-24 [003] d..1 200.283044: msi_set_affinity: msi_set_affinity: Transition to new target cpu 0 vector 33 if (lapic_vector_set_in_irr(cfg->vector)) irq_data_get_irq_chip(irqd)->irq_retrigger(irqd); migration/3-24 [003] d..1 200.283046: msi_set_affinity: msi_set_affinity: Update Done [IRR 0]: irq 123 localsw: Nvec 33 Napic 0 > > So we have 3 points where an interrupt can fire: > > A) Before #2 > > B) After #2 and before #3 > > C) After #3 > > #A is hitting the old vector which is still valid on the old CPU and > will be handled once interrupts are enabled with the correct irq > descriptor - Normal operation (same as with maskable MSI) > > #B This must be checked in the IRR because the there is no valid vector > on the old CPU. The check for IRR seems like on a random cpu3 vs checking for the new vector 33 on old cpu 6? This is the place when we force the retrigger without the IRR check things seem to fix itself. > > #C is handled on the new vector on the new CPU > Did we miss something? Cheers, Ashok