Received: by 10.223.185.116 with SMTP id b49csp1396997wrg; Sun, 11 Feb 2018 10:59:26 -0800 (PST) X-Google-Smtp-Source: AH8x225AujMHgcDzBjza4I+xX/6fz9DDK+iy63vTDZH5sil01wW0P8+svjoeyvqYsyxfEMd6ecGU X-Received: by 2002:a17:902:2c43:: with SMTP id m61-v6mr7726618plb.387.1518375566594; Sun, 11 Feb 2018 10:59:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518375566; cv=none; d=google.com; s=arc-20160816; b=JjOd5s7L4owLGa0SeuR/+kDUl128JmSrXVxF2Yi18kd0aVk/1kL0UZJbDzq4lxXNrd 5F5iWwwhHSxhcGIxJGvGa6XGODFGCHG8fA1FibBCSRqLDIZKwUZImfQ0oZJZr3LrFMwN nJcuZJX1aoH4GGupoPU0RQSEe/TdirecjQrOpF4a3nNYpKyH8MVQmVp+DDZjBzEKj67r xr2L8MLfaEnI0X6tnjK6tpmVxS0zTF7fP94fXQ2qIUs8dgjlz8lI5gPG8BaF70l+gD4o jqeRIRgAdeQJrvPUbr6tfqEbZjLeaLlWV55QEM3g6vMgw6wBbuKe4DJuW/jX8x/TySyW WFeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=5ysSi12Lko+aSM7Gj83u3uqGooOVTkBtWFUa2ZnsTkA=; b=cMDbk79okwdVBUzVP90pRvD9GA8XbM69oy5e6/Xs3gdytuoaJ0aHuTEyg/gsanQY67 hDNx3bfPu4IKUBRZPNblJElvQdaz7M4DbfzCOMr6nGu2PsVsSZthHDBDBHpIGGOjb1d1 WtVuHjhYsrbV+s2RFp/CZa9Ck9VcaRkCGNSlqXik1AT6VIHMce1F68DvyuW7qTjyXPpq cRRs2ustYA3vcJTHjZKbbZUxt+/ti2T7nx/TsbI/y5pIjByqPcvnqtDqtIUc6kGAKPxb bZvKJKmh8LLF4ps63n8ed4biefPOBn/tjAO8b0cyGSPNbGIrz/2L4T03/Dsp6TmMzv6V p3LQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fireburn-co-uk.20150623.gappssmtp.com header.s=20150623 header.b=DIS4Wl2X; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u6-v6si2121871pld.31.2018.02.11.10.59.11; Sun, 11 Feb 2018 10:59:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@fireburn-co-uk.20150623.gappssmtp.com header.s=20150623 header.b=DIS4Wl2X; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753650AbeBKS6f (ORCPT + 99 others); Sun, 11 Feb 2018 13:58:35 -0500 Received: from mail-qk0-f195.google.com ([209.85.220.195]:32804 "EHLO mail-qk0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753600AbeBKS6d (ORCPT ); Sun, 11 Feb 2018 13:58:33 -0500 Received: by mail-qk0-f195.google.com with SMTP id c82so16064460qka.0 for ; Sun, 11 Feb 2018 10:58:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fireburn-co-uk.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=5ysSi12Lko+aSM7Gj83u3uqGooOVTkBtWFUa2ZnsTkA=; b=DIS4Wl2Xk7Q2TKLw9dVAaw1LwAnOWLDqrJL6ZGuGOIMsgFJkK6F9fX2wTAE/Bx8Yz7 u2el5mXzitPGGR4vU+Jv+L9cbWSL52ZWKuhrPmS2N+9yryk8JRl2gUfP6phnv5VHUAv8 tCOEQOissgUVNTSKRPgn6nAZNgq90rugxoUMOn7zG5AQ0GNUJbhTPOBCbh/swSazGSU1 gyAxMv68dsLPa0yPy3lAK798dWnM+35Wxw7u8XUxsYmMWVXij5Zx0ZqbVvvfRMMTAk6q jhwIfpPHI7FpRMK+oS7U3CL97gu1JaLxXKn9AT/RVVKanBi+A277GcJ/r2p6D5eX5iNz LYfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=5ysSi12Lko+aSM7Gj83u3uqGooOVTkBtWFUa2ZnsTkA=; b=jA7Nvji0VAMn187b7AmlUGLPgL9lTlkHV8uwtIEmg7nf8N1eYY7HL5qgfwm62BNSQF pBkmYRitIN/bHoaSIxjckveifD54pAO7A8ZrFOJpt2nxIeCJh5/4xDe9OOOQx17adn+d Q9W7aUueb5llGun2gyNfbvgVoAUjjqwdS4cgbvMYD6W+f/xOVUwlzGI7aDEQ6WO6HkzH 1D0A7pc1uN5qtLlixfcxVrCE87vaiVS2nbrlYovtkMaAuK3R+lHs77vSD9wjZF0wFZu+ G+2geNTK18E9n8eOy2COLL008Z7qmtgKMZCf3XJ1WcQp7sP/W55RtLXRb/a6S5Om0/Le denw== X-Gm-Message-State: APf1xPAk41YX8My9uVlPh6qKyDTADw/JmF2OdVfBX66M4zuzdYfeP3oi vilQoijzV5hizbNP/f3gGrW45xDnANXoTeMhV5RiJw== X-Received: by 10.55.221.15 with SMTP id n15mr6058678qki.184.1518375512347; Sun, 11 Feb 2018 10:58:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.214.81 with HTTP; Sun, 11 Feb 2018 10:58:11 -0800 (PST) In-Reply-To: References: From: Mike Lothian Date: Sun, 11 Feb 2018 18:58:11 +0000 Message-ID: Subject: Re: [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers To: Lukas Wunner Cc: Tejun Heo , Lai Jiangshan , Alex Deucher , Dave Airlie , Ben Skeggs , Ismo Toijala , nouveau@lists.freedesktop.org, Intel Graphics Development , Liviu Dudau , Linux Kernel Mailing List , Maling list - DRI developers , Hans de Goede , Peter Wu Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11 February 2018 at 09:38, Lukas Wunner wrote: > Fix a deadlock on hybrid graphics laptops that's been present since 2013: > > DRM drivers poll connectors in 10 sec intervals. The poll worker is > stopped on ->runtime_suspend with cancel_delayed_work_sync(). However > the poll worker invokes the DRM drivers' ->detect callbacks, which call > pm_runtime_get_sync(). If the poll worker starts after runtime suspend > has begun, pm_runtime_get_sync() will wait for runtime suspend to finish > with the intention of runtime resuming the device afterwards. The result > is a circular wait between poll worker and autosuspend worker. > > I'm seeing this deadlock so often it's not funny anymore. I've also found > 3 nouveau bugzillas about it and 1 radeon bugzilla. (See patch [3/5] and > [4/5].) > > One theoretical solution would be to add a flag to the ->detect callback > to tell it that it's running in the poll worker's context. In that case > it doesn't need to call pm_runtime_get_sync() because the poll worker is > only enabled while runtime active and we know that ->runtime_suspend > waits for it to finish. But this approach would require touching every > single DRM driver's ->detect hook. Moreover the ->detect hook is called > from numerous other places, both in the DRM library as well as in each > driver, making it necessary to trace every possible call chain and check > if it's coming from the poll worker or not. I've tried to do that for > nouveau (see the call sites listed in the commit message of patch [3/5]) > and concluded that this approach is a nightmare to implement. > > Another reason for the infeasibility of this approach is that ->detect > is called from within the DRM library without driver involvement, e.g. > via DRM's sysfs interface. In those cases, pm_runtime_get_sync() would > have to be called by the DRM library, but the DRM library is supposed to > stay generic, wheras runtime PM is driver-specific. > > So instead, I've come up with this much simpler solution which gleans > from the current task's flags if it's a workqueue worker, retrieves the > work_struct and checks if it's the poll worker. All that's needed is > one small helper in the workqueue code and another small helper in the > DRM library. This solution lends itself nicely to stable-backporting. > > The patches for radeon and amdgpu are compile-tested only, I only have a > MacBook Pro with an Nvidia GK107 to test. To test the patches, add an > "msleep(12*1000);" at the top of the driver's ->runtime_suspend hook. > This ensures that the poll worker runs after ->runtime_suspend has begun. > Wait 12 sec after the GPU has begun runtime suspend, then check > /sys/bus/pci/devices/0000:01:00.0/power/runtime_status. Without this > series, the status will be stuck at "suspending" and you'll get hung task > errors in dmesg after a few minutes. > > i915, malidp and msm "solved" this issue by not stopping the poll worker > on runtime suspend. But this results in the GPU bouncing back and forth > between D0 and D3 continuously. That's a total no-go for GPUs which > runtime suspend to D3cold since every suspend/resume cycle costs a > significant amount of time and energy. (i915 and malidp do not seem > to acquire a runtime PM ref in the ->detect callbacks, which seems > questionable. msm however does and would also deadlock if it disabled > the poll worker on runtime suspend. cc += Archit, Liviu, intel-gfx) > > Please review. Thanks, > > Lukas > > Lukas Wunner (5): > workqueue: Allow retrieval of current task's work struct > drm: Allow determining if current task is output poll worker > drm/nouveau: Fix deadlock on runtime suspend > drm/radeon: Fix deadlock on runtime suspend > drm/amdgpu: Fix deadlock on runtime suspend > > drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++------- > drivers/gpu/drm/drm_probe_helper.c | 14 +++++ > drivers/gpu/drm/nouveau/nouveau_connector.c | 18 +++++-- > drivers/gpu/drm/radeon/radeon_connectors.c | 74 +++++++++++++++++--------- > include/drm/drm_crtc_helper.h | 1 + > include/linux/workqueue.h | 1 + > kernel/workqueue.c | 16 ++++++ > 7 files changed, 132 insertions(+), 50 deletions(-) > > -- > 2.15.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel I wasn't quite sure where to add that msleep. I've tested the patches as is on top of agd5f's wip branch without ill effects I've had a radeon and now a amdgpu PRIME setup and don't believe I've ever seen this issue If you could pop a patch together for the msleep I'll give it a test on amdgpu Cheers Mike