Received: by 10.223.176.46 with SMTP id f43csp1854687wra; Thu, 25 Jan 2018 00:55:30 -0800 (PST) X-Google-Smtp-Source: AH8x225NvYTNQnP9EUmAgug9l3qd7DR3DGDxKL35d2/HiRBNE34g6biVgcR1IthD/J3yaJQunDqC X-Received: by 2002:a17:902:9a43:: with SMTP id x3-v6mr10828811plv.45.1516870530425; Thu, 25 Jan 2018 00:55:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516870530; cv=none; d=google.com; s=arc-20160816; b=RypGtol1JSrBYlHN3s95Ysu9TfypLGzUmLyjPU9Li1c0naews5rzlF/dJrDjb4JAQL 8mcdGZDBHye0oCuzfreFUYP5FMIqlu+Rc7LWEQpZfJWL36K8rUNImUuX/CQ3qVRnaXHU Q7+tOS81ysLBgW2rJEMPY86NXtW0DEhpS3b0GpD67X+j2fQVuDlLU2WJXWtSK+UY+p0/ U4jR2RN+v/zf9MylsCWe+cJ5ClqSaHgY70N2MZwaihekUjY/pHUkzl8qQizIddC8SUOP wGXkK4g+pVLwUeDgGEmBCLPcWLplypQ0qv4IY0HtGRlxbkOZWWzh8BlTi6O9+w+1G1J5 NJDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=RJFsgRQ8hhFi1OxkEqK5ZgBAbtCeI5lvH6h4soJG2XQ=; b=1HZHY6N5dRq9JR330uQT+FidPr+WO+n6Jxj9XPutFyd6tpEZclxz0KxbHPLffAZOhp WJ87ncoEcOLd8kUQAqveke7yFqOeK+HcHTl0BP8mc3w6crCgStPe6oB4wDGw/if2O9CF BCVlvGRRTZ1QdjeC2g9ZL5FcVK4ilgxVdyYSKV7mshbZDc0srVaEOpxj+RUcsi7DHFB2 xpEni8Um4XvuMjtweCxGnJN4XVL2/Ts5eDl/sM3Fob2+kbB3Q6GSPWHXhojg55plUQqG c2UVNw4iJGYxgjwyFaTEdFEsVOcEQzCI0AX9bPTy88dJmw84oAiIfJ4lAI9m/pmSogmS FoUw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ay2-v6si1614838plb.664.2018.01.25.00.55.16; Thu, 25 Jan 2018 00:55:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751436AbeAYIye (ORCPT + 99 others); Thu, 25 Jan 2018 03:54:34 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:34186 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751339AbeAYIyc (ORCPT ); Thu, 25 Jan 2018 03:54:32 -0500 Received: from hsi-kbw-5-158-153-52.hsi19.kabel-badenwuerttemberg.de ([5.158.153.52] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1eedG0-0007Hm-3R; Thu, 25 Jan 2018 09:51:40 +0100 Date: Thu, 25 Jan 2018 09:54:21 +0100 (CET) From: Thomas Gleixner To: Lyude Paul cc: "Ghannam, Yazen" , "hpa@zytor.com" , "keith.busch@intel.com" , "mingo@kernel.org" , "linux-kernel@vger.kernel.org" , Borislav Petkov Subject: Re: "irq/matrix: Spread interrupts on allocation" breaks nouveau in mainline kernel In-Reply-To: <1516823810.4109.26.camel@redhat.com> Message-ID: References: <1516744873.29151.3.camel@redhat.com> <1516757219.29151.7.camel@redhat.com> <1516816150.4109.2.camel@redhat.com> <1516823810.4109.26.camel@redhat.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 24 Jan 2018, Lyude Paul wrote: > Sorry about that! Let me clarify a little bit: this is a problem that shows up > on mainline. Normally when we suspend the GPU in nouveau, we free the IRQs > it's using before going into suspend > (drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c:88), then reserve IRQs again > on resume (drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c:134). Since this > patch got pushed to mainline, the IRQ we get from request_irq() ends up having > the same MSI vector as another device on the system: It's not the same. > nouveau: > parent: > domain: VECTOR > hwirq: 0x2f > chip: APIC > flags: 0x0 > Vector: 35 > Target: 1 Vector 35 on CPU1 > After resume and allocating the interrupt for nouveau again, we get a message > from the kernel saying: > > [ 217.150787] do_IRQ: 1.35 No irq handler for vector That's because there is a pending irq on the old vector for unknown reasons. > As well, nouveau ends up getting no interrupts from the card and as a result > fails to come back up: > > [ 219.153049] nouveau 0000:22:00.0: DRM: EVO timeout > [ 220.226254] r8169 0000:1e:00.0 enp30s0: link up > [ 221.153054] nouveau 0000:22:00.0: DRM: base-0: timeout > [ 223.153528] nouveau 0000:22:00.0: DRM: base-0: timeout > > If we look through all of the other IRQ allocations, we'll find that now two > devices have the MSI vector 35: > > nouveau: > parent: > domain: VECTOR > hwirq: 0x2f > chip: APIC > flags: 0x0 > Vector: 35 > Target: 1 Vector 35 on CPU1 > and the PCI bridge (00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] > Family 17h (Models 00h-0fh) PCIe GPP Bridge): > > parent: > domain: VECTOR > hwirq: 0x19 > chip: APIC > flags: 0x0 > Vector: 35 > Target: 0 Vector 35 on CPU0. Same vector but different CPUs. So it's NOT the same thing. The real issue is something completely different and the revert of this patch merily papers over the underlying problem. I'm pretty sure that you can trigger this even with the revert in place. Do the following before suspend: echo 2 >/proc/irq/$NOUVEAUIRQ/smp_affinity_list Then do suspend/resume and you should end up with the same situation. I can't tell from your dmesg, but I'm pretty confident that > [ 217.150787] do_IRQ: 1.35 No irq handler for vector happens _before_ the nouveau driver requests the irq again. Can please you add some printk to the code in question to verify that? Thanks, tglx