Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp221917imm; Wed, 18 Jul 2018 00:39:44 -0700 (PDT) X-Google-Smtp-Source: AAOMgpcweiRkC4wM012biK0eLPIojA6DVFsmWjsO7M4Auhpk2wz2llZRwM/bxgqlPDH6t445NVWi X-Received: by 2002:a62:384:: with SMTP id 126-v6mr4100616pfd.11.1531899584320; Wed, 18 Jul 2018 00:39:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531899584; cv=none; d=google.com; s=arc-20160816; b=mFXUXNj8C+Ms6o1XbaMLL1sJFsYLdP1cs23NN6wgwv9OIN+7UPTa8+BmNv1rTAldxg hjRm/7g0AQfqEhRiLV7xYjbAmh0PXtgbRVz/MfhVkfp/JgH1sfm8BPNlKCzbCFQ5Z16W XiheGhpMm78WZRXJziF1bEqQ+Xjao/RTjaFpBwe92NkwpDp9zV7pIyBN+jGtEbYZdMUB peaQHeNc8Qyyj+AwfDZ/T5x8QiK/TOFQKIuVh2DIDKdyI1uwbYaY3bDnxVhHtL8d1g8W 6WQkWZQYQNYdTERqPLdQtCXUQk8foQuh8wADtrusGSZrhQprxg0TlEwd3SPvEed2J2ES LdPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=QIi1nNVd+ZW9ksVOv+onXl/YGEjWr2g8jXFH3pZH/0Y=; b=cSU1uwl1asiejb0pSvyzJKKfx/n+3Dy8cDd1g6Px829kx2jB1vYyh3HhJuqv+mF2Ix QHObdwhvUXYUGCrtDFzRK+dy5xrs9HQFJpX8z0JQT0Z0nHad2ouDu6Aiwsp4ljEPqxwB Fak/gk77GyS2mXg3vDVURr9GOiZruImHptDlWxNwDsoO3QpZu/1l86INP8BeuTPfh6y6 2lv4XCi1/9S+dS03w7SzQy8oFfrQbikx2CoC6mZ4KkV+COTsmz4WDmdhFyflbG2HELdE LfCvJiSX82QIegv575xQauT0AOOOye7MKxiCtGVvf7Typipg4GROyBYv8CgUTgYmI5AU CptQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=ZPpxkuKj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p2-v6si2708076pgk.690.2018.07.18.00.39.29; Wed, 18 Jul 2018 00:39:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=ZPpxkuKj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728554AbeGRIPO (ORCPT + 99 others); Wed, 18 Jul 2018 04:15:14 -0400 Received: from mail-oi0-f65.google.com ([209.85.218.65]:37404 "EHLO mail-oi0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726240AbeGRIPN (ORCPT ); Wed, 18 Jul 2018 04:15:13 -0400 Received: by mail-oi0-f65.google.com with SMTP id k81-v6so6999777oib.4; Wed, 18 Jul 2018 00:38:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=QIi1nNVd+ZW9ksVOv+onXl/YGEjWr2g8jXFH3pZH/0Y=; b=ZPpxkuKjMSmn8SgsURQBXEyQh76Icbx8Rg/q4Qe2nAfNYRicB2ulFkV8d+KP6MuecP std83c6dopTDN3sdWHiElXe1VCTQuCCjdEGNMp2109B3g4BauroI4oGaJRzKugpYulgO 0k2HhVgOO5a3uMR4iY8UTuFk4IdlRaDxfk/YLkYmnMKhtpKlfPET7IRHKYHOuK4QzC1u 5MG3xYhWPOMJKtriQlAcrGeJaKS1t9/F10lujpzlrGShoJBiFIsD8rznFsoJ37IYRss4 2xz529qb+kSkSP5WnM7Fwx/noMrlivPEZMTlyryu6aodMAXsBnMPMD6YNt+ntQhvCglY QFTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=QIi1nNVd+ZW9ksVOv+onXl/YGEjWr2g8jXFH3pZH/0Y=; b=hTHQkj2gLT0LvBDngN9oP3T2krk/OU8C9O3yp53dm/PFCw0RZEDtiHsNIuqOK/HLZj cmKwnoLYBZqeGQVxQRKsFWVkIqprIv5tGHhsBllf2Ef7jjKMUbRz6Dy2NfFA9m6BhNyl KEKmO5awUk2DMObgG4dQ20tHIqfpnRecAUtJ9xOCQwLloGQKMDJJn5pO95H5wBjGrqDe QZZSHrEMR7UJci9bC9xG7ySTrwzpO7vct8cBn7V0kHi0pMlVIbh/GH+ClUCLjJc/QZs2 e5GH1q9nDogz1u2NGJ80NQmj4Uj/4sojfuaURjn5dtwZfJKzQ1Bw/3NYsoGBGw5JsbG0 YV4w== X-Gm-Message-State: AOUpUlGU2COfCrzSLSzpWXbJTpVeyAwWONFqhSB4tYg1/U4W73pY43UG NBs3LjKPNWMF12l/1pj+ztRjAnoCMQwmcvLaJXE= X-Received: by 2002:aca:b841:: with SMTP id i62-v6mr5051768oif.358.1531899521657; Wed, 18 Jul 2018 00:38:41 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:63d2:0:0:0:0:0 with HTTP; Wed, 18 Jul 2018 00:38:41 -0700 (PDT) In-Reply-To: <20180717182041.GA18363@wunner.de> References: <20180716235936.11268-1-lyude@redhat.com> <20180716235936.11268-2-lyude@redhat.com> <20180717071641.GA5411@wunner.de> <20180717182041.GA18363@wunner.de> From: "Rafael J. Wysocki" Date: Wed, 18 Jul 2018 09:38:41 +0200 X-Google-Sender-Auth: oAy8A0a6jcV3hxOFLTMmzj0MeMg Message-ID: Subject: Re: [Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths To: Lukas Wunner Cc: Lyude Paul , nouveau@lists.freedesktop.org, David Airlie , Linux Kernel Mailing List , dri-devel , Ben Skeggs , Linux PM Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 17, 2018 at 8:20 PM, Lukas Wunner wrote: > On Tue, Jul 17, 2018 at 12:53:11PM -0400, Lyude Paul wrote: >> On Tue, 2018-07-17 at 09:16 +0200, Lukas Wunner wrote: >> > On Mon, Jul 16, 2018 at 07:59:25PM -0400, Lyude Paul wrote: >> > > In order to fix all of the spots that need to have runtime PM get/puts() >> > > added, we need to ensure that it's possible for us to call >> > > pm_runtime_get/put() in any context, regardless of how deep, since >> > > almost all of the spots that are currently missing refs can potentially >> > > get called in the runtime suspend/resume path. Otherwise, we'll try to >> > > resume the GPU as we're trying to resume the GPU (and vice-versa) and >> > > cause the kernel to deadlock. >> > > >> > > With this, it should be safe to call the pm runtime functions in any >> > > context in nouveau with one condition: any point in the driver that >> > > calls pm_runtime_get*() cannot hold any locks owned by nouveau that >> > > would be acquired anywhere inside nouveau_pmops_runtime_resume(). >> > > This includes modesetting locks, i2c bus locks, etc. >> > >> > [snip] >> > > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c >> > > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c >> > > @@ -835,6 +835,8 @@ nouveau_pmops_runtime_suspend(struct device *dev) >> > > return -EBUSY; >> > > } >> > > >> > > + dev->power.disable_depth++; >> > > + >> > >> > Anyway, if I understand the commit message correctly, you're hitting a >> > pm_runtime_get_sync() in a code path that itself is called during a >> > pm_runtime_get_sync(). Could you include stack traces in the commit >> > message? My gut feeling is that this patch masks a deeper issue, >> > e.g. if the runtime_resume code path does in fact directly poll outputs, >> > that would seem wrong. Runtime resume should merely make the card >> > accessible, i.e. reinstate power if necessary, put into PCI_D0, >> > restore registers, etc. Output polling should be scheduled >> > asynchronously. >> >> So: the reason that patch was added was mainly for the patches later in the >> series that add guards around the i2c bus and aux bus, since both of those >> require that the device be awake for it to work. Currently, the spot where it >> would recurse is: > > Okay, the PCI device is suspending and the nvkm_i2c_aux_acquire() > wants it in resumed state, so is waiting forever for the device to > runtime suspend in order to resume it again immediately afterwards. > > The deadlock in the stack trace you've posted could be resolved using > the technique I used in d61a5c106351 by adding the following to > include/linux/pm_runtime.h: > > static inline bool pm_runtime_status_suspending(struct device *dev) > { > return dev->power.runtime_status == RPM_SUSPENDING; > } > > static inline bool is_pm_work(struct device *dev) > { > struct work_struct *work = current_work(); > > return work && work->func == dev->power.work; > } > > Then adding this to nvkm_i2c_aux_acquire(): > > struct device *dev = pad->i2c->subdev.device->dev; > > if (!(is_pm_work(dev) && pm_runtime_status_suspending(dev))) { > ret = pm_runtime_get_sync(dev); > if (ret < 0 && ret != -EACCES) > return ret; > } > > But here's the catch: This only works for an *async* runtime suspend. > It doesn't work for pm_runtime_put_sync(), pm_runtime_suspend() etc, > because then the runtime suspend is executed in the context of the caller, > not in the context of dev->power.work. > > So it's not a full solution, but hopefully something that gets you > going. I'm not really familiar with the code paths leading to > nvkm_i2c_aux_acquire() to come up with a full solution off the top > of my head I'm afraid. > > Note, it's not sufficient to just check pm_runtime_status_suspending(dev) > because if the runtime_suspend is carried out concurrently by something > else, this will return true but it's not guaranteed that the device is > actually kept awake until the i2c communication has been fully performed. For the record, I don't quite like this approach as it seems to be working around a broken dependency graph. If you need to resume device A from within the runtime resume callback of device B, then clearly B depends on A and there should be a link between them. That said, I do realize that it may be the path of least resistance, but then I wonder if we can do better than this. Thanks, Rafael