Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A475EC61DA4 for ; Fri, 3 Feb 2023 21:18:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233412AbjBCVSm (ORCPT ); Fri, 3 Feb 2023 16:18:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232806AbjBCVSk (ORCPT ); Fri, 3 Feb 2023 16:18:40 -0500 Received: from mail-io1-xd2d.google.com (mail-io1-xd2d.google.com [IPv6:2607:f8b0:4864:20::d2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A869FA58E1 for ; Fri, 3 Feb 2023 13:18:38 -0800 (PST) Received: by mail-io1-xd2d.google.com with SMTP id l7so2459476ioa.7 for ; Fri, 03 Feb 2023 13:18:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=6lF6n2rSIBhhVUG47YWtdkbXsYAaDIW/2eoVceeEZec=; b=Y+GNDwBItscjx4IWNQgrL4rrzHX+NlPRCydRuD87A0o9szzj6m4ayXIy/pwgOM6pQT IRaRYwHQuMWqnuz8ocirIPlyikojM6Rha7bSHO/JrJmvdV21ZsJGbfORJE1qGZwgnBM9 pJN+lmKMXV5tTZSbIRkfEJZ3d437LRXG1N26U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=6lF6n2rSIBhhVUG47YWtdkbXsYAaDIW/2eoVceeEZec=; b=RVjK9gthr1kai3n/tcgmkI+yEPiL/Qd7/uw7F991xuSTg2gtK1EEzfTISVJFcMu00e qpob6dCnvmXZuqWfBQSxoj6mmJLCBvvyb7Zy2v1u9/S7CsLKsNuRsKU+VFHA++tDJJua JH2+f8TO7IjFjoMxHWW6uLJHhTNq1n3R7n/aYkZTwfHNhsdrhM2/a1WShk3MNAsqfpCR 5V5vNcgKM7rd/GeAZyeYjVUTVSGIOamA0Dt918LT+Du4NC18VLOyLb1QR553XdZXX6cz PGv10nVRGPc7eQ3oTivokxxFlpkSN4mrnHvjKv5z8wYN+KIGWL/pVYXZmoj4tx/xnThq 52GQ== X-Gm-Message-State: AO0yUKWjiYsgpRyCt5rq185VA5BPuWmTQsYyrSJaS6PBS3Vk98aRN3Su D1vGZfLAJsOAWEcguJGd40XhDg== X-Google-Smtp-Source: AK7set/iJDhcp481T1ZocaPxL/r5sQlivK3W+kbQZfyQXwYJ0ByMPDwkxhfb4u9NkdC4Vc5ZfWhVVA== X-Received: by 2002:a6b:dc10:0:b0:722:8687:fe37 with SMTP id s16-20020a6bdc10000000b007228687fe37mr7062384ioc.15.1675459117906; Fri, 03 Feb 2023 13:18:37 -0800 (PST) Received: from localhost (30.23.70.34.bc.googleusercontent.com. [34.70.23.30]) by smtp.gmail.com with UTF8SMTPSA id b7-20020a056638150700b00363c4307bb2sm1173084jat.79.2023.02.03.13.18.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 03 Feb 2023 13:18:37 -0800 (PST) Date: Fri, 3 Feb 2023 21:18:37 +0000 From: Matthias Kaehlcke To: Dmitry Baryshkov Cc: Abel Vesa , Bjorn Andersson , "Rafael J . Wysocki" , Kevin Hilman , Ulf Hansson , Len Brown , Pavel Machek , Greg Kroah-Hartman , Andy Gross , Konrad Dybcio , linux-pm@vger.kernel.org, Linux Kernel Mailing List , linux-arm-msm@vger.kernel.org, Stephen Boyd , Doug Anderson Subject: Re: [RFC PATCH v2 1/2] PM: domains: Skip disabling unused domains if provider has sync_state Message-ID: References: <20230127104054.895129-1-abel.vesa@linaro.org> <3826e0e6-bb2b-409d-d1c3-ed361305bce3@linaro.org> <9b8af6b3-9ab5-12f8-5576-1a93c58a26c1@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <9b8af6b3-9ab5-12f8-5576-1a93c58a26c1@linaro.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 03, 2023 at 10:00:27PM +0200, Dmitry Baryshkov wrote: > On 03/02/2023 03:20, Matthias Kaehlcke wrote: > > Hi Dmitry, > > > > On Thu, Feb 02, 2023 at 09:53:41PM +0200, Dmitry Baryshkov wrote: > > > On 02/02/2023 20:24, Matthias Kaehlcke wrote: > > > > Hi Abel, > > > > > > > > On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote: > > > > > Currently, there are cases when a domain needs to remain enabled until > > > > > the consumer driver probes. Sometimes such consumer drivers may be built > > > > > as modules. Since the genpd_power_off_unused is called too early for > > > > > such consumer driver modules to get a chance to probe, the domain, since > > > > > it is unused, will get disabled. On the other hand, the best time for > > > > > an unused domain to be disabled is on the provider's sync_state > > > > > callback. So, if the provider has registered a sync_state callback, > > > > > assume the unused domains for that provider will be disabled on its > > > > > sync_state callback. Also provide a generic sync_state callback which > > > > > disables all the domains unused for the provider that registers it. > > > > > > > > > > Signed-off-by: Abel Vesa > > > > > --- > > > > > > > > > > This approach has been applied for unused clocks as well. > > > > > With this patch merged in, all the providers that have sync_state > > > > > callback registered will leave the domains enabled unless the provider's > > > > > sync_state callback explicitly disables them. So those providers will > > > > > need to add the disabling part to their sync_state callback. On the > > > > > other hand, the platforms that have cases where domains need to remain > > > > > enabled (even if unused) until the consumer driver probes, will be able, > > > > > with this patch in, to run without the pd_ignore_unused kernel argument, > > > > > which seems to be the case for most Qualcomm platforms, at this moment. > > > > > > > > I recently encountered a related issue on a Qualcomm platform with a > > > > v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use > > > > highest corner until sync_state"). The issue involves a DT node with a > > > > rpmhpd, the DT node is enabled, however the corresponding device driver > > > > is not enabled in the kernel. In such a scenario the sync_state callback > > > > is never called, because the genpd consumer never probes. As a result > > > > the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during > > > > system suspend, which results in a substantially higher power consumption > > > > in S3. > > > > > > > > I wonder if genpd (and some other frameworks) needs something like > > > > regulator_init_complete(), which turns off unused regulators 30s after > > > > system boot. That's conceptually similar to the current > > > > genpd_power_off_unused(), but would provide time for modules being loaded. > > > > > > I think the overall goal is to move away from ad-hoc implementations like > > > clk_disable_unused/genpd_power_off_unused/regulator_init_complete towards > > > the sync_state. > > > > I generally agree with the goal of using common mechanisms whenever possible. > > > > > So inherently one either has to provide drivers for all devices in question > > > or disable unused devices in DT. > > > > I don't think that's a great solution, it essentially hands the issue down to > > the users or downstream maintainers of the kernel, who might not be aware that > > there is an issue, nor know about the specifics of genpd (or interconnects and > > clocks which have similar problems). > > The goal is to move the control down to individual drivers. Previously we > had issues with clk_disable_unused() disabling mdss/mdp clocks incorrectly, > which frequently led to broken display output. Other clock/genpd/regulator > drivers might have other internal dependencies. Thus it is not really > possible to handle resource shutdown in the common (framework) code. > > > > > In general symptoms are probably subtle, like a (potentially substantially) > > increased power consumption during system suspend. The issue might have been > > introduced by an update to a newer kernel, which now includes a DT node for a > > new SoC feature which wasn't supported by the 'old' kernel. It's common > > practice to use the 'old' .config, at least as a starting point, which > > obviously doesn't enable the new driver. That happend to me with [1] when > > testing v6.1. It took me quite some time to track the 'culprit' commit down > > and then some debugging to understand what's going on. Shortly after that I > > ran into a related issue involving genpds when testing v6.2-rc, which again > > took a non-trivial amount of time to track down (and I'm familiar with the SoC > > platform and the general nature of the issue). I don't think it's reasonable > > to expect every user/downstream maintainer of an impacted system to go through > > this, one person at a time. > > I think it would be nice to have some way of 'sync_pending' debug available > (compare this to debugfs/devices_deferred). Most folks are probably not even aware that they have a 'sync_state' issue and wouldn't look in debugfs, so I think this would have to be something proactive, like a warning log that is enabled by default (possibly with the option to disable it). Something in debugfs could be a nice complement. > Note, we are trying to make sure that all supported drivers are enabled at > least as modules (if possible). If we fail, please send a patch fixing the > defconfig. That's great, however not everybody uses the defconfig, it's just a default. > > Maybe there could be a generic solution for drivers with a 'sync_state' > > callback, e.g. a the driver (or framework) could have a 'sync_state_timeout' > > callback (or similar), which is called by the driver framework if 'sync_state' > > wasn't called (for example) 30s after the device was probed. Then the provider > > can power off or throttle unclaimed resources. > > I might be missing a point somewhere, but for me it looks like a logical > solution. Please send a proposal. I started working on a patch, I'll probably send it out next week if I don't encounter any evident major issues.