Received: by 10.223.185.116 with SMTP id b49csp981073wrg; Sun, 11 Feb 2018 01:39:33 -0800 (PST) X-Google-Smtp-Source: AH8x2260ubBM8Xx3x7XaWudnpxaY+3zmMMe1A0/PNT0VeOxwSl+2ykh/MEoOXdqt4bRzZjzMjQb7 X-Received: by 2002:a17:902:4d46:: with SMTP id o6-v6mr731338plh.166.1518341973088; Sun, 11 Feb 2018 01:39:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518341973; cv=none; d=google.com; s=arc-20160816; b=olLWrsa+j7/ERFrQ3yUEhlo68vN/sIXq+dEKUGJvdUc00p6CtlgWBYp7qZXcMUfFM6 pAnu86hv6uVVtw/fPLU9ngCJgsCNwrljZlptqeP4whnRF7vsqj405BthqIG3pY7dBzwi rBaEklWAZqEIFValYeiT308WlfEoihjBtFnSbmzUVdhCkUa1GEYPGs20rn0SO3kFbREM VNrzPLWhR/Ab2+In1whw5bhRT/kbI5qD2c/n3/QUVqfHLB4ZxGrNmDLKEaBhtsTmBEvw CJKWQygW0jzAlSjxK4ZSnTnff6M0Oj60gRGku9cb3tfGDkRza6YCE6TYsmUVM8X3swxB NQyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:date:from:message-id :arc-authentication-results; bh=3NCTg5uLm20BKCvbj7Wfac7yO8yFb7lPSXHv5Wnoywg=; b=fdJ77nA3vGUOu8ql/LEWSxzWOh3K7IwJFZpvddIBzG2MnrkVmit7KleJcx5IzQLFro dIyYVKYg9BvPhNi4BDC8C4m7a22DKMBD4fizJ1jMK4JbDXu7Ek4PwJSfs8n03vSM2occ Dzu4h98PXI5blBRrVzR/pKb2KgCmUxFPik3ftc0F6jethdAPogOMOOftCreO5QQe0le4 DFErzktjuFL1kqrFD9rqZOA76vHOlEH4Dqtt+SCIXrur7IYj9BWA9qQezWN8rsMgxAdh ioCVqzYwDC5LsmZjwp+av7C3r1mLGtffR0cWJC4VWuxtpy7NaSqpeW7YWYwwT5SHZv2R 1bBg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p14-v6si4255219plo.420.2018.02.11.01.39.16; Sun, 11 Feb 2018 01:39:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752719AbeBKJih (ORCPT + 99 others); Sun, 11 Feb 2018 04:38:37 -0500 Received: from mailout3.hostsharing.net ([176.9.242.54]:57827 "EHLO mailout3.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752273AbeBKJif (ORCPT ); Sun, 11 Feb 2018 04:38:35 -0500 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by mailout3.hostsharing.net (Postfix) with ESMTPS id 73361102E2F89; Sun, 11 Feb 2018 10:38:44 +0100 (CET) Received: from localhost (6-38-90-81.adsl.cmo.de [81.90.38.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by h08.hostsharing.net (Postfix) with ESMTPSA id 09892603E052; Sun, 11 Feb 2018 10:38:32 +0100 (CET) X-Mailbox-Line: From 4c9bf72aacae1eef062bd134cd112e0770a7f121 Mon Sep 17 00:00:00 2001 Message-Id: From: Lukas Wunner Date: Sun, 11 Feb 2018 10:38:28 +0100 Subject: [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers To: Tejun Heo , Lai Jiangshan , Alex Deucher , Dave Airlie , Ben Skeggs Cc: dri-devel@lists.freedesktop.org, Peter Wu , nouveau@lists.freedesktop.org, Lyude Paul , Hans de Goede , Pierre Moreau , linux-kernel@vger.kernel.org, Ismo Toijala , intel-gfx@lists.freedesktop.org, Liviu Dudau , Archit Taneja Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Fix a deadlock on hybrid graphics laptops that's been present since 2013: DRM drivers poll connectors in 10 sec intervals. The poll worker is stopped on ->runtime_suspend with cancel_delayed_work_sync(). However the poll worker invokes the DRM drivers' ->detect callbacks, which call pm_runtime_get_sync(). If the poll worker starts after runtime suspend has begun, pm_runtime_get_sync() will wait for runtime suspend to finish with the intention of runtime resuming the device afterwards. The result is a circular wait between poll worker and autosuspend worker. I'm seeing this deadlock so often it's not funny anymore. I've also found 3 nouveau bugzillas about it and 1 radeon bugzilla. (See patch [3/5] and [4/5].) One theoretical solution would be to add a flag to the ->detect callback to tell it that it's running in the poll worker's context. In that case it doesn't need to call pm_runtime_get_sync() because the poll worker is only enabled while runtime active and we know that ->runtime_suspend waits for it to finish. But this approach would require touching every single DRM driver's ->detect hook. Moreover the ->detect hook is called from numerous other places, both in the DRM library as well as in each driver, making it necessary to trace every possible call chain and check if it's coming from the poll worker or not. I've tried to do that for nouveau (see the call sites listed in the commit message of patch [3/5]) and concluded that this approach is a nightmare to implement. Another reason for the infeasibility of this approach is that ->detect is called from within the DRM library without driver involvement, e.g. via DRM's sysfs interface. In those cases, pm_runtime_get_sync() would have to be called by the DRM library, but the DRM library is supposed to stay generic, wheras runtime PM is driver-specific. So instead, I've come up with this much simpler solution which gleans from the current task's flags if it's a workqueue worker, retrieves the work_struct and checks if it's the poll worker. All that's needed is one small helper in the workqueue code and another small helper in the DRM library. This solution lends itself nicely to stable-backporting. The patches for radeon and amdgpu are compile-tested only, I only have a MacBook Pro with an Nvidia GK107 to test. To test the patches, add an "msleep(12*1000);" at the top of the driver's ->runtime_suspend hook. This ensures that the poll worker runs after ->runtime_suspend has begun. Wait 12 sec after the GPU has begun runtime suspend, then check /sys/bus/pci/devices/0000:01:00.0/power/runtime_status. Without this series, the status will be stuck at "suspending" and you'll get hung task errors in dmesg after a few minutes. i915, malidp and msm "solved" this issue by not stopping the poll worker on runtime suspend. But this results in the GPU bouncing back and forth between D0 and D3 continuously. That's a total no-go for GPUs which runtime suspend to D3cold since every suspend/resume cycle costs a significant amount of time and energy. (i915 and malidp do not seem to acquire a runtime PM ref in the ->detect callbacks, which seems questionable. msm however does and would also deadlock if it disabled the poll worker on runtime suspend. cc += Archit, Liviu, intel-gfx) Please review. Thanks, Lukas Lukas Wunner (5): workqueue: Allow retrieval of current task's work struct drm: Allow determining if current task is output poll worker drm/nouveau: Fix deadlock on runtime suspend drm/radeon: Fix deadlock on runtime suspend drm/amdgpu: Fix deadlock on runtime suspend drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++------- drivers/gpu/drm/drm_probe_helper.c | 14 +++++ drivers/gpu/drm/nouveau/nouveau_connector.c | 18 +++++-- drivers/gpu/drm/radeon/radeon_connectors.c | 74 +++++++++++++++++--------- include/drm/drm_crtc_helper.h | 1 + include/linux/workqueue.h | 1 + kernel/workqueue.c | 16 ++++++ 7 files changed, 132 insertions(+), 50 deletions(-) -- 2.15.1