Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp228001imm; Wed, 18 Jul 2018 00:48:23 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfcFfHtBxrAVPlen9t3QRpQ1upFjMP7K0aM+DQrdT9Z6kMb2pRHnVx7ErAYur6GC/Sxy85x X-Received: by 2002:a17:902:9690:: with SMTP id n16-v6mr4864904plp.94.1531900103517; Wed, 18 Jul 2018 00:48:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531900103; cv=none; d=google.com; s=arc-20160816; b=Q7Noy0Dx4TJLvhMi6eJEogX/X8Wv6L0u7LWr+f/vFCFLZwRTOA80YgzwbkPQbqEcfI FBE3JgupnA0F4lMx2utNb/l7UqTT41ucBRodVraO3KkuT+qWiXMuD4fVdKK3n4iP7aLe SuBnjd8ddzBT+rwsSNTmzD1sIYbnoxIKkKP92eLY14kJvNstMx+j4y4X3nNV2423HLJL U4BYW2DGXRl0lX7kEVJxEm5EvWiYQ4YxKX6unSU6lxZu43vD2/3p7o39s1a0r3zroHat 7Cm5we7Y9x+dhBq9+93p177AmZsxbeP1C8/5qeSgy8Mu62VzytC+i8/PLshj341NEfAc vr6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=XabC85BHubMs6eKGNCtgXaEDJvZE+H1ZJ++TgEnknxU=; b=swGcXSkzkQeWXpfXaaQaBjAuG2sU24o2MPugSVVJ1B4j3mG53lwOThK2pj9/dnFYjv K/+vYNqAXrPzP54VaAFL/4Vv30QT14qTKatd00YXMRRvmfbiEjtLWqmNSchn74/kyaYx hYj2vNfKbZPVIq1t7lVeOf1rVO9zN+lWf7ZD5QXE08YUd+yN8N2HqR1JAYs6ACnVcDW0 WCkMGPeBUF2ft9KAHkoFAC6xVBo0IT9BkRANXXUFgnFe+Hw3KZGeP8N/IZj1ZBdzQjS4 nlf+rGuA77ljHpDiDaeNDYXghBqniE13VcG/VcPvjJ0f2Zq8a2y5rUA70Noze4Sp9QE2 FGBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p9-v6si2743428pgk.645.2018.07.18.00.48.08; Wed, 18 Jul 2018 00:48:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728278AbeGRIXt (ORCPT + 99 others); Wed, 18 Jul 2018 04:23:49 -0400 Received: from bmailout2.hostsharing.net ([83.223.90.240]:60805 "EHLO bmailout2.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726110AbeGRIXt (ORCPT ); Wed, 18 Jul 2018 04:23:49 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout2.hostsharing.net (Postfix) with ESMTPS id 5E2A82800A282; Wed, 18 Jul 2018 09:47:11 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id EC41B6F3EF; Wed, 18 Jul 2018 09:47:10 +0200 (CEST) Date: Wed, 18 Jul 2018 09:47:10 +0200 From: Lukas Wunner To: Lyude Paul Cc: nouveau@lists.freedesktop.org, David Airlie , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Ben Skeggs , linux-pm@vger.kernel.org Subject: Re: [Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths Message-ID: <20180718074710.GA16072@wunner.de> References: <20180716235936.11268-1-lyude@redhat.com> <20180716235936.11268-2-lyude@redhat.com> <20180717071641.GA5411@wunner.de> <20180717182041.GA18363@wunner.de> <20180717183211.GB18363@wunner.de> <2dbe75b1a83c025b9cddc229dbca9af6fb30111e.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2dbe75b1a83c025b9cddc229dbca9af6fb30111e.camel@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 17, 2018 at 02:34:47PM -0400, Lyude Paul wrote: > On Tue, 2018-07-17 at 20:32 +0200, Lukas Wunner wrote: > > On Tue, Jul 17, 2018 at 02:24:31PM -0400, Lyude Paul wrote: > > > On Tue, 2018-07-17 at 20:20 +0200, Lukas Wunner wrote: > > > > Okay, the PCI device is suspending and the nvkm_i2c_aux_acquire() > > > > wants it in resumed state, so is waiting forever for the device to > > > > runtime suspend in order to resume it again immediately afterwards. > > > > > > > > The deadlock in the stack trace you've posted could be resolved using > > > > the technique I used in d61a5c106351 by adding the following to > > > > include/linux/pm_runtime.h: > > > > > > > > static inline bool pm_runtime_status_suspending(struct device *dev) > > > > { > > > > return dev->power.runtime_status == RPM_SUSPENDING; > > > > } > > > > > > > > static inline bool is_pm_work(struct device *dev) > > > > { > > > > struct work_struct *work = current_work(); > > > > > > > > return work && work->func == dev->power.work; > > > > } > > > > > > > > Then adding this to nvkm_i2c_aux_acquire(): > > > > > > > > struct device *dev = pad->i2c->subdev.device->dev; > > > > > > > > if (!(is_pm_work(dev) && pm_runtime_status_suspending(dev))) { > > > > ret = pm_runtime_get_sync(dev); > > > > if (ret < 0 && ret != -EACCES) > > > > return ret; > > > > } > > > > > > > > But here's the catch: This only works for an *async* runtime suspend. > > > > It doesn't work for pm_runtime_put_sync(), pm_runtime_suspend() etc, > > > > because then the runtime suspend is executed in the context of the caller, > > > > not in the context of dev->power.work. [snip] > > Something I'm curious about. This isn't the first time I've hit a > situation like this (see: the improper disable_depth fix I added into > amdgpu I now need to go and fix), which makes me wonder: is there > actually any reason Linux's runtime PM core doesn't just turn get/puts() > in the context of s/r callbacks into no-ops by default? So the PM core could save a pointer to the "current" task_struct in struct device before invoking the ->runtime_suspend or ->runtime_resume callback, and all subsequent rpm_resume() and rpm_suspend() calls could then become no-ops if "current" is equivalent to the saved pointer. (This is also how you could solve the deadlock you're dealing with for sync suspend.) For a recursive resume during a resume or a recursive suspend during a suspend, this might actually be fine. For a recursive suspend during a resume or a recursive resume during a suspend, things become murkier: How should the PM core know if the particular part of the device is still accessible when hitting a recursive resume during a suspend? Let's say a clock is needed for i2c. Then the recursive resume during a suspend may only become a no-op before that clock has been turned off. That's something only the device driver itself has knowledge about, because it implements the order in which subdevices of the GPU are turned off. Lukas