Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3201994imm; Tue, 17 Jul 2018 00:17:57 -0700 (PDT) X-Google-Smtp-Source: AAOMgpczTgK7TzRGbhtttIZK/8fJwsVzu/YSvkkat7T3U/CjpncoWC2aX5hJbBeulTD61rxCt8fy X-Received: by 2002:a63:3444:: with SMTP id b65-v6mr464105pga.396.1531811877713; Tue, 17 Jul 2018 00:17:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531811877; cv=none; d=google.com; s=arc-20160816; b=L1Ia3vIvcA4yEXIpwykVHpUSzs9utLAN5lcpdAJ2gHQY7pl6EK3LTWMMoVkAoPqFwK Gprs6OoMPJDCZ/5Ik8XzE9eWyci7gFBt8cjvdzAMAsjlMttsMrBE3yKAkOh7t+8SBKhV uepsl1aDsHrDz1Uadf8DMC9Q9310ddb3nBa9W4MU6T7ljl85Ncheu6jl8u4W65cbI67N qMR1kKOw2gC3vmiiG4hFcRpTLg/LVmKH+K+/1+L5KP6pLoRPrY5+82t6GLc4RqsvFvEa JL+KireoadAoCK1bRmCPYwEoVcgHkLsNmMqFgIEW6nmtK0qoD6fVTNdxfzxZtzFyyDAZ kLJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=VF3kGwOYZWYZ/bRCDn1GFztPqiOaOUMxNwN7yfpoTZ0=; b=RVhpkQ5qskPVs3prlfPAD6ayM9vk1wS4FeDnXVS8a3Zg1rhWCGZtTkBCl/vtoB1Qid oX9jkiGmOX7ciw2l19gy3XbHIiAYLfYuQke95lAceahRDdHcJZRcJQfy6DC/z/lI5oyj cL3COvZaR3LnuHvD1DcjlMgJtIbRh3R22pRDrV0dQby7RjtqkpAwCk32c/fyr6semdnd 3Qc81+KH17j3w9hmh3lWafPEUOEZeBuM0n0gb7COyKRnSjbHyNHcQ2NQuZ2UG/3fqatA Uy732NPWUZIUwzW6doKFis+V49DRwe7N2ft7kFC5++zHm1XDNClqVd5VETL8pI6k1bmv TAfg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f17-v6si233377pgv.383.2018.07.17.00.17.42; Tue, 17 Jul 2018 00:17:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728983AbeGQHrx (ORCPT + 99 others); Tue, 17 Jul 2018 03:47:53 -0400 Received: from bmailout2.hostsharing.net ([83.223.90.240]:48591 "EHLO bmailout2.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728102AbeGQHrx (ORCPT ); Tue, 17 Jul 2018 03:47:53 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout2.hostsharing.net (Postfix) with ESMTPS id 8906E280248C7; Tue, 17 Jul 2018 09:16:41 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id 27F57E791; Tue, 17 Jul 2018 09:16:41 +0200 (CEST) Date: Tue, 17 Jul 2018 09:16:41 +0200 From: Lukas Wunner To: Lyude Paul Cc: nouveau@lists.freedesktop.org, David Airlie , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Ben Skeggs , linux-pm@vger.kernel.org Subject: Re: [Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths Message-ID: <20180717071641.GA5411@wunner.de> References: <20180716235936.11268-1-lyude@redhat.com> <20180716235936.11268-2-lyude@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180716235936.11268-2-lyude@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [cc += linux-pm] Hi Lyude, First of all, thanks a lot for looking into this. On Mon, Jul 16, 2018 at 07:59:25PM -0400, Lyude Paul wrote: > In order to fix all of the spots that need to have runtime PM get/puts() > added, we need to ensure that it's possible for us to call > pm_runtime_get/put() in any context, regardless of how deep, since > almost all of the spots that are currently missing refs can potentially > get called in the runtime suspend/resume path. Otherwise, we'll try to > resume the GPU as we're trying to resume the GPU (and vice-versa) and > cause the kernel to deadlock. > > With this, it should be safe to call the pm runtime functions in any > context in nouveau with one condition: any point in the driver that > calls pm_runtime_get*() cannot hold any locks owned by nouveau that > would be acquired anywhere inside nouveau_pmops_runtime_resume(). > This includes modesetting locks, i2c bus locks, etc. [snip] > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > @@ -835,6 +835,8 @@ nouveau_pmops_runtime_suspend(struct device *dev) > return -EBUSY; > } > > + dev->power.disable_depth++; > + I'm not sure if that variable is actually private to the PM core. Grepping through the tree I only find a single occurrence where it's accessed outside the PM core and that's in amdgpu. So this looks a little fishy TBH. It may make sense to cc such patches to linux-pm to get Rafael & other folks involved with the PM core to comment. Also, the disable_depth variable only exists if the kernel was compiled with CONFIG_PM enabled, but I can't find a "depends on PM" or something like that in nouveau's Kconfig. Actually, if PM is not selected, all the nouveau_pmops_*() functions should be #ifdef'ed away, but oddly there's no #ifdef CONFIG_PM anywhere in nouveau_drm.c. Anywayn, if I understand the commit message correctly, you're hitting a pm_runtime_get_sync() in a code path that itself is called during a pm_runtime_get_sync(). Could you include stack traces in the commit message? My gut feeling is that this patch masks a deeper issue, e.g. if the runtime_resume code path does in fact directly poll outputs, that would seem wrong. Runtime resume should merely make the card accessible, i.e. reinstate power if necessary, put into PCI_D0, restore registers, etc. Output polling should be scheduled asynchronously. Thanks, Lukas