Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp1347990ybp; Thu, 17 Oct 2019 11:24:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqyP2M7ZCyH6m+Hv2ZhX8o5Y1uogGdBDZ9JBkPj9Hek8eyDqfvy1zO/7HxC5uCr5axV8UZwX X-Received: by 2002:a50:d794:: with SMTP id w20mr5303072edi.258.1571336683443; Thu, 17 Oct 2019 11:24:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571336683; cv=none; d=google.com; s=arc-20160816; b=owhL20VzuqKMsErvwBfk+V9iie2UgBDkZqd7w9r7+/b0u3U1soiL3nc137K4x4PKsb AMENYoxoZXY3CvJabp6NkpUH8QM22JfF3fiQw+2s3iTDaTx6O3xDT7xA14OC15iP97HK WOYqX+TPjaEh19GcSoJR8vKz69voeZEEEcU0tNSHLTVRA2qBlCaMjwC5mEqUZ7kAxu6U x+oqd6P3wgJXxJzYXuMj8RAxLDUcBFv3l7Bqg/jIBr0Bq/BR30CriUfSUVlU160ZDwaZ IX+XtcgGy9MJpmqTIq7Xr2gg0F2njC0xtqxGz50pdaPDBlM5FAECe0gWamGXN3J/EGcr oAMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:message-id:subject:cc:to:from:date :dkim-signature; bh=4IPDbK7yCE7FJj/y2RFiSB/IGlQ+ihS2ov1vGKfK/7M=; b=DGYG5730Aww7TtqzjDKKuHZw8LtOnp1jf6hmd2EWif/41piKqWpfeDei18BSgoHifK 6nIubAjPVgRiJzHyaSNRaLpVk/CFNfrAj7VrFyDPSmDygWsYmObSfygKnicqVAxPi0AG j3YiH99LUbTSXFjDsBSh1mrcPQ/AJ9Bw9NLHBLFIcpUqUPRNQK93gZcHlpuSaP6OGRDg kC3xBBJEgucKkYGPVQl5s0Mqbj/mW6n1RRL2zuastd3c9P4qdVVjneaZkWSYleytwKR1 SBtsNqMD8yEtIgeEjtO+R6xKFEdPX3mkNa5YlFXKmME2i8CY4/y5KCJKGrxDM8pNdDQp PWEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="BiWb/svl"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s7si2173298edd.290.2019.10.17.11.24.20; Thu, 17 Oct 2019 11:24:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="BiWb/svl"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438503AbfJPWDi (ORCPT + 99 others); Wed, 16 Oct 2019 18:03:38 -0400 Received: from mail.kernel.org ([198.145.29.99]:57792 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2406826AbfJPWDg (ORCPT ); Wed, 16 Oct 2019 18:03:36 -0400 Received: from localhost (unknown [69.71.4.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id EA193218DE; Wed, 16 Oct 2019 22:03:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571263415; bh=O0G17YN2tPEZ7i3JnTYzZCDAa/F7uCFuiSjmZ3wvrfA=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=BiWb/svlRwEwax+APMGEapJwph2a8ugfWL3JZwsnqQTfPQWA4SLexSE+F2cnVPgRU 2qJGp2NUMGqKYOWdToD89u2zQk8Y+dHmcIMFsy8yShVFJYscHruIHQrj9VnRNp7mlu i5f8AtdnKEMXdw3+v/a8iBP1vkESb0pk/b5c38S4= Date: Wed, 16 Oct 2019 17:03:33 -0500 From: Bjorn Helgaas To: Karol Herbst Cc: "Rafael J . Wysocki" , LKML , Lyude Paul , Mika Westerberg , Linux PCI , Linux PM , dri-devel , nouveau , Linux ACPI Mailing List Subject: Re: [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges Message-ID: <20191016220333.GA88523@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 16, 2019 at 11:48:22PM +0200, Karol Herbst wrote: > On Wed, Oct 16, 2019 at 11:37 PM Bjorn Helgaas wrote: > > On Wed, Oct 16, 2019 at 09:18:32PM +0200, Karol Herbst wrote: > > > but setting the PCI_DEV_FLAGS_NO_D3 flag does prevent using the > > > platform means of putting the device into D3cold, right? That's > > > actually what should still happen, just the D3hot step should be > > > skipped. > > > > If I understand correctly, when we put a device in D3cold on an ACPI > > system, we do something like this: > > > > pci_set_power_state(D3cold) > > if (PCI_DEV_FLAGS_NO_D3) > > return 0 <-- nothing at all if quirked > > pci_raw_set_power_state > > pci_write_config_word(PCI_PM_CTRL, D3hot) <-- set to D3hot > > __pci_complete_power_transition(D3cold) > > pci_platform_power_transition(D3cold) > > platform_pci_set_power_state(D3cold) > > acpi_pci_set_power_state(D3cold) > > acpi_device_set_power(ACPI_STATE_D3_COLD) > > ... > > acpi_evaluate_object("_OFF") <-- set to D3cold > > > > I did not understand the connection with platform (ACPI) power > > management from your patch. It sounds like you want this entire path > > except that you want to skip the PCI_PM_CTRL write? > > > > exactly. I am running with this workaround for a while now and never > had any fails with it anymore. The GPU gets turned off correctly and I > see the same power savings, just that the GPU can be powered on again. > > > That seems like something Rafael should weigh in on. I don't know > > why we set the device to D3hot with PCI_PM_CTRL before using the ACPI > > methods, and I don't know what the effect of skipping that is. It > > seems a little messy to slice out this tiny piece from the middle, but > > maybe it makes sense. > > > > afaik when I was talking with others in the past about it, Windows is > doing that before using ACPI calls, but maybe they have some similar > workarounds for certain intel bridges as well? I am sure it affects > more than the one I am blacklisting here, but I rather want to check > each device before blacklisting all kabylake and sky lake bridges (as > those are the ones were this issue can be observed). From a quick look at the ACPI spec, I didn't see conditions like "OSPM must put PCI devices in D3hot before executing _OFF". But obviously there's *some* reason and I probably just missed it. > Sadly we had no luck getting any information about such workaround out > of Nvidia or Intel. I'm not surprised; it doesn't seem like we really have the details needed to get to a root cause yet. I think what we really need is a PCIe analyzer trace to see what happens when the device "falls off the bus". Bjorn