Received: by 2002:a17:90a:88:0:0:0:0 with SMTP id a8csp90661pja; Fri, 22 Nov 2019 03:56:06 -0800 (PST) X-Google-Smtp-Source: APXvYqx3FYxz7OHJWCJde9brKymIwMf18p5+fwH1Tim9YF5pgeHPAIpTBgmexWEOu9VwfKAipfdn X-Received: by 2002:a05:6402:1a50:: with SMTP id bf16mr556341edb.116.1574423766107; Fri, 22 Nov 2019 03:56:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574423766; cv=none; d=google.com; s=arc-20160816; b=k6qBNb0zJU12bkWGbQ6ct2wMjk+UlXs2Zdy/Epu0cu462S1bMZE9gH+xKllGRwWYag 1fi03XvLs6bkUlFsT8FfTLLlFP9il1hx9ORppglbpPUX3HNgpVSaoAur/VExjLE3XawR yWEPODEkzzTc9LzrRHTUad75QlZDhHIkdiyFLdQxvUczcsspQkqxcoluF22UmQxOSplW X/QvUAYfooCi6Xnep1Y4puDvp0uEaQmQjKQE4TqNeC/6xGUWnbShlgIn6cMn3k/HqTCb QM0tK/xjWQ7GKYVAIz7Erip6WhyPSNZHE3RI+MD6jS5BHnQ1/B4hQnsvrOzMZz/R4O5g k2rw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=C4mw3VH3FIHQmO15qFz9D30nizXldnLunLKZhMRaWvs=; b=0IGcMXMBElnYotsGfSCjsIM50/Z5um8sgYvG3+MDd7mDKaLrNw5kF6cllaNkgsENBv kYbuuk+yaqoU+z9xTfnFY51tGutI2+T8y0dYch643CJ1rXgQNSRQ+p3ZuNbxIRDEjLtg DlJ+2BFT1qAsEoOURpH2Bp+8RSHwgEHOrYCqFnk4WRyLEDKg4Qfz0AiRflY3fW91UVoH itCvKCayx1Yt/eqofoV9St7bl8Ppu+5BN2WssibeTwyDQfDD+v9kYwzyl8cvzB3rDSwo AuubuU9DvXyf4vwMQENPz7ypT3iWBePiDb+GCCWsr7lQsFjFag3DlZ8Fxi5Hl+4IM+SL abuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d6si4958986eda.262.2019.11.22.03.55.41; Fri, 22 Nov 2019 03:56:06 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727530AbfKVLyk (ORCPT + 99 others); Fri, 22 Nov 2019 06:54:40 -0500 Received: from mail-ot1-f66.google.com ([209.85.210.66]:46735 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726714AbfKVLyj (ORCPT ); Fri, 22 Nov 2019 06:54:39 -0500 Received: by mail-ot1-f66.google.com with SMTP id n23so5875697otr.13; Fri, 22 Nov 2019 03:54:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=C4mw3VH3FIHQmO15qFz9D30nizXldnLunLKZhMRaWvs=; b=EJ3ns0xGAaZCh5YFTjBxCbxwIp9yoNh29f6lbhzbd1FZNq45GE+gGY8m/sFbTjB9YD Wb8j82TOTCErTcTgGndbRbpJmxEOwhYsfPDwAOky9Nd6xde6cfoXqiAWXbdKN+IKKBIW +BikCobn0rUJabdg2lFnByMjr06Ja7Mo+gK07RNFjlH921vvgAa5hq6AqpHVsK1LTTXi bo3dpMuvUL/g6wm5wNtTnykeKF5y5S2MyZJy4hyrhnEK3eUu1joP1nAWsktW3aW7wp6B EX1NXLIUKq86vEo5bWt/NW9mEqxewi8xXttNB2ZJ5AH4WfXU9MmBj2u2NHcrJRrH77Bg YObQ== X-Gm-Message-State: APjAAAWrAVvuf5vZv796GKl1SkFBuX2eiBTOgGqMihrkDAQYVONdg6ca 71gQelgc4hmpiNSstKlF7FwWoV9c8ZBL7U4Y0Uw= X-Received: by 2002:a9d:7d01:: with SMTP id v1mr9895024otn.167.1574423678614; Fri, 22 Nov 2019 03:54:38 -0800 (PST) MIME-Version: 1.0 References: <20191121112821.GU11621@lahna.fi.intel.com> <20191121114610.GW11621@lahna.fi.intel.com> <20191121125236.GX11621@lahna.fi.intel.com> <20191121194942.GY11621@lahna.fi.intel.com> <20191122103637.GA11621@lahna.fi.intel.com> In-Reply-To: From: "Rafael J. Wysocki" Date: Fri, 22 Nov 2019 12:54:26 +0100 Message-ID: Subject: Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges To: Karol Herbst Cc: "Rafael J. Wysocki" , Mika Westerberg , Bjorn Helgaas , LKML , Lyude Paul , "Rafael J . Wysocki" , Linux PCI , Linux PM , dri-devel , nouveau , Dave Airlie , Mario Limonciello Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 22, 2019 at 12:34 PM Karol Herbst wrote: > > On Fri, Nov 22, 2019 at 12:30 PM Rafael J. Wysocki wrote: > > [cut] > > > > the issue is not AML related at all as I am able to reproduce this > issue without having to invoke any of that at all, I just need to poke > into the PCI register directly to cut the power. Since the register is not documented, you don't actually know what exactly happens when it is written to. You basically are saying something like "if I write a specific value to an undocumented register, that makes things fail". And yes, writing things to undocumented registers is likely to cause failure to happen, in general. The point is that the kernel will never write into this register by itself. > The register is not documented, but effectively what the AML code is writing to as well. So that AML code is problematic. It expects the write to do something useful, but that's not the case. Without the AML, the register would not have been written to at all. > Of course it might also be that the code I was testing it was doing > things in a non conformant way and I just hit a different issue as > well, but in the end I don't think that the AML code is the root cause > of all of that. If AML is not involved at all, things work. You've just said so in another message in this thread, quoting verbatim: "yes. In my previous testing I was poking into the PCI registers of the bridge controller and the GPU directly and that never caused any issues as long as I limited it to putting the devices into D3hot." You cannot claim a hardware bug just because a write to an undocumented register from AML causes things to break. First, that may be a bug in the AML (which is not unheard of). Second, and that is more likely, the expectations of the AML code may not be met at the time it is run. Assuming the latter, the root cause is really that the kernel executes the AML in a hardware configuration in which the expectations of that AML are not met. We are now trying to understand what those expectations may be and so how to cause them to be met. Your observation that the issue can be avoided if the GPU is not put into D3hot by a PMCSR write is a step in that direction and it is a good finding. The information from Mika based on the ASL analysis is helpful too. Let's not jump to premature conclusions too quickly, though.