Received: by 10.223.185.116 with SMTP id b49csp3640337wrg; Mon, 19 Feb 2018 03:36:30 -0800 (PST) X-Google-Smtp-Source: AH8x226La2RkDT8SX3sgwOJF6rvwi9ltdyNITd28I5VRF8yt1ShA7kra6ktHaLHiUcHRlt9c/00i X-Received: by 10.99.37.7 with SMTP id l7mr12164706pgl.311.1519040190052; Mon, 19 Feb 2018 03:36:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519040189; cv=none; d=google.com; s=arc-20160816; b=DsJ6b+0ipiFpP39PkxDEXzZgQG1ewZ+nP8/OM3E45VsCo7McLVWRUbsfO3lCLeLrBk MhZn2xhLCKvIjiZ8/jMK/T1ihFsllG1aGud1kOxRNLIP+d/kLfAA6IQzBpxk7jaQ0VQW ZzINGgFBsM8u8Th2Fz0WhpVDPhO+yUQ1BrV6FTnNjgXnSXQitVWrZ9/cxqMzFi2YNNIr QbOuaHLdPb5SZvivPYPvpdkxpH6fYc8PtGCA75iEkhzT9pX/f4929/I77/XNqpIGnlWc WseI3h7oRpJlCE4O4WNjrb/WXiHSXy49M6sYVlJFBCt/SQKL55TfJMOJHwvn2OlySKhI z2Gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=kAQXDXU5gSQXguA9JNT+iHCSTWqgrIReYQiWmjhH6NY=; b=gzOkJtDfqERyVGJzpqsuq4Sc9DaSkDOSsxAZ4hFgcY+06OEC2TY3JpIU3UFwlkfUhi +z658/yKs3/TwgneVGg2dtzi5ZekqynT5MczT/0wszB5PMXANoWqWEV1ZThWYo6mQpmm DyDSgJ9Nblftpd3zxIcBU1/vElVbTZS/iSMf4pkb3/9WI8+w88C8trCmTe+9nZbSm+YV i/Xov5+ZUXniranp31lNr+1W61I0oi7W0wo07iqYhyZtU3Is2oOHY1iGksFd6n7hZiTS EUirKg0Nvod5fhb7M02xljTdsj+j0EYiiB7M8KlhF6hj7AnrMhBkUv5ME8yjmcJ300F4 m9KA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=Z9z3pU5x; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p8si7849606pgn.769.2018.02.19.03.36.15; Mon, 19 Feb 2018 03:36:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=Z9z3pU5x; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752651AbeBSLfB (ORCPT + 99 others); Mon, 19 Feb 2018 06:35:01 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:40000 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752528AbeBSLes (ORCPT ); Mon, 19 Feb 2018 06:34:48 -0500 Received: by mail-wm0-f67.google.com with SMTP id v10so14412447wmh.5 for ; Mon, 19 Feb 2018 03:34:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=sender:date:from:to:cc:subject:message-id:mail-followup-to :references:mime-version:content-disposition:in-reply-to:user-agent; bh=kAQXDXU5gSQXguA9JNT+iHCSTWqgrIReYQiWmjhH6NY=; b=Z9z3pU5xdTX/yO8lL4bcnKnocFpeoaeX6lbylSwpJ1bO66Dj0SrIrAe6/MyrVVWofh HwEAkKsVfSKStlzB1F6n47goHBpMqGCZbNnBqdzuBqSltSnVc+fsTcK25nv8PtoFBROo rBJu1/RWPNANB+qEl54GutsA21OhlUNK+YF/A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=kAQXDXU5gSQXguA9JNT+iHCSTWqgrIReYQiWmjhH6NY=; b=afuCyPuuMSvWWZx6OHtvOcGJTfMFoTIDHa1/zLsTufXOgZSMoYdKeLvFWBD787QDxb AzMsx/YCSwvrnK94IHR9Trcj6ZOmoXESaoFJn4L6pWKPS2mKyC2jqvVn/BHiCUYJRwO0 crs29w6wGHQgicpnzoBPt9z3endKRF7z8ozM/8ONwp0xXsqJG6ygkd4Q0nIdsn6P9Omt 5ErirhGwQGnTH2tEiNzu7Jt0Trli1XGQR5UywafyMjvA7RGNOPLLWyBrdhFtiJOqwf6X L+91ueMTTaqC4Q2vGCbjyNn0sojqVkxS5UlP52JQtzaOR7NWuTrFi/idKWB/y1FmOTZN czAA== X-Gm-Message-State: APf1xPD2s3mSBX+cqlgOrWm3HT2uaSfyWnlQccKEg1VYb3pK4WSjaHYP DZsJ4c6bErpm2OFgKOpKs6gU4w== X-Received: by 10.80.164.13 with SMTP id u13mr19661731edb.282.1519040086626; Mon, 19 Feb 2018 03:34:46 -0800 (PST) Received: from phenom.ffwll.local (212-51-149-109.fiber7.init7.net. [212.51.149.109]) by smtp.gmail.com with ESMTPSA id d30sm3970918ede.51.2018.02.19.03.34.45 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 19 Feb 2018 03:34:45 -0800 (PST) Date: Mon, 19 Feb 2018 12:34:43 +0100 From: Daniel Vetter To: Lukas Wunner Cc: Tejun Heo , Lai Jiangshan , Alex Deucher , Dave Airlie , Ben Skeggs , Archit Taneja , Ismo Toijala , nouveau@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Liviu Dudau , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Hans de Goede Subject: Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers Message-ID: <20180219113443.GU22199@phenom.ffwll.local> Mail-Followup-To: Lukas Wunner , Tejun Heo , Lai Jiangshan , Alex Deucher , Dave Airlie , Ben Skeggs , Archit Taneja , Ismo Toijala , nouveau@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Liviu Dudau , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Hans de Goede References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: Linux phenom 4.14.0-3-amd64 User-Agent: Mutt/1.9.3 (2018-01-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote: > Fix a deadlock on hybrid graphics laptops that's been present since 2013: > > DRM drivers poll connectors in 10 sec intervals. The poll worker is > stopped on ->runtime_suspend with cancel_delayed_work_sync(). However > the poll worker invokes the DRM drivers' ->detect callbacks, which call > pm_runtime_get_sync(). If the poll worker starts after runtime suspend > has begun, pm_runtime_get_sync() will wait for runtime suspend to finish > with the intention of runtime resuming the device afterwards. The result > is a circular wait between poll worker and autosuspend worker. Don't shut down the poll worker when runtime suspending, that' doesn't work. If you need the poll work, then that means waking up the gpu every few seconds. If you don't need it, then make sure the DRM_CONNECTOR_POLL flags are set correctly (you can update them at runtime, the poll worker will pick that up). That should fix the deadlock, and it's how we do it in i915 (where igt in CI totally hammers the runtime pm support, and it seems to hold up). And I guess we need to improve the poll worker docs about this. > I'm seeing this deadlock so often it's not funny anymore. I've also found > 3 nouveau bugzillas about it and 1 radeon bugzilla. (See patch [3/5] and > [4/5].) > > One theoretical solution would be to add a flag to the ->detect callback > to tell it that it's running in the poll worker's context. In that case > it doesn't need to call pm_runtime_get_sync() because the poll worker is > only enabled while runtime active and we know that ->runtime_suspend > waits for it to finish. But this approach would require touching every > single DRM driver's ->detect hook. Moreover the ->detect hook is called > from numerous other places, both in the DRM library as well as in each > driver, making it necessary to trace every possible call chain and check > if it's coming from the poll worker or not. I've tried to do that for > nouveau (see the call sites listed in the commit message of patch [3/5]) > and concluded that this approach is a nightmare to implement. > > Another reason for the infeasibility of this approach is that ->detect > is called from within the DRM library without driver involvement, e.g. > via DRM's sysfs interface. In those cases, pm_runtime_get_sync() would > have to be called by the DRM library, but the DRM library is supposed to > stay generic, wheras runtime PM is driver-specific. > > So instead, I've come up with this much simpler solution which gleans > from the current task's flags if it's a workqueue worker, retrieves the > work_struct and checks if it's the poll worker. All that's needed is > one small helper in the workqueue code and another small helper in the > DRM library. This solution lends itself nicely to stable-backporting. > > The patches for radeon and amdgpu are compile-tested only, I only have a > MacBook Pro with an Nvidia GK107 to test. To test the patches, add an > "msleep(12*1000);" at the top of the driver's ->runtime_suspend hook. > This ensures that the poll worker runs after ->runtime_suspend has begun. > Wait 12 sec after the GPU has begun runtime suspend, then check > /sys/bus/pci/devices/0000:01:00.0/power/runtime_status. Without this > series, the status will be stuck at "suspending" and you'll get hung task > errors in dmesg after a few minutes. > > i915, malidp and msm "solved" this issue by not stopping the poll worker > on runtime suspend. But this results in the GPU bouncing back and forth > between D0 and D3 continuously. That's a total no-go for GPUs which > runtime suspend to D3cold since every suspend/resume cycle costs a > significant amount of time and energy. (i915 and malidp do not seem > to acquire a runtime PM ref in the ->detect callbacks, which seems > questionable. msm however does and would also deadlock if it disabled > the poll worker on runtime suspend. cc += Archit, Liviu, intel-gfx) Well, userspace expects hotplug events, even when we runtime suspend stuff. Hence waking shit up with polling. Imo ignoring hotplug events is a pretty serious policy decision which is ok in the context of vga_switcheroo, but not really as an automatic thing. E.g. usb also wakes up if you plug something in, even with all the runtime pm stuff enabled. Same for sata and everything else. -Daniel > Please review. Thanks, > > Lukas > > Lukas Wunner (5): > workqueue: Allow retrieval of current task's work struct > drm: Allow determining if current task is output poll worker > drm/nouveau: Fix deadlock on runtime suspend > drm/radeon: Fix deadlock on runtime suspend > drm/amdgpu: Fix deadlock on runtime suspend > > drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++------- > drivers/gpu/drm/drm_probe_helper.c | 14 +++++ > drivers/gpu/drm/nouveau/nouveau_connector.c | 18 +++++-- > drivers/gpu/drm/radeon/radeon_connectors.c | 74 +++++++++++++++++--------- > include/drm/drm_crtc_helper.h | 1 + > include/linux/workqueue.h | 1 + > kernel/workqueue.c | 16 ++++++ > 7 files changed, 132 insertions(+), 50 deletions(-) > > -- > 2.15.1 > > _______________________________________________ > Nouveau mailing list > Nouveau@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch