Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp4581295rdh; Wed, 29 Nov 2023 05:37:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IEnv3VvCMbKsFTz+RxX7UVoRiY8rDfujU6bF0YNIi1nyhMv1FEL6ribBgt7ftaFK794wnFp X-Received: by 2002:a17:903:249:b0:1cf:b4bb:9bdc with SMTP id j9-20020a170903024900b001cfb4bb9bdcmr20373048plh.9.1701265060547; Wed, 29 Nov 2023 05:37:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701265060; cv=none; d=google.com; s=arc-20160816; b=ZX2FMfyC2dN5Qh6b5lcTmIw2wKl1Lok04IJEbVCSmJF1HFJmQQvSTeKW1pcIGEn8HJ 1iit/PLRpYkHDbzuK7xBEyKIndA5IMRtNX3hL+56PjfXNILrgNFs5qR7WUzvnnZH7WnM k5ze98VIagTTmQCex2491/m5V5ZnpNEJK2zDAWec4JyXDixRuvZibXUOzA4nR0HFrWR/ /7tWTbcfBWujqnPeiAmJuNINclKL4GpINZeP4P76LHx/abbeKa4ka6wqZhPtCjrdr95V rb9zkBrUkrGflVmJRwwmXC+jv086DGM9yBR25FNI2GnlhVi2tLhE2sKpeqzMN7EpwUqU cnPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=0yqcjhh1R/Zt3kU3OlzcVUrZ9Tl4zJisa8W9XSF89Gw=; fh=QjtVMF+ZE1m1D8eULMcZmv1jf9vzms1Rm6RW1uI+Tpc=; b=FfUQn120lqw4hmHmDQPNBvEjpyalWsAtKqv1kX7pygoJTll0wPeZaa6L7KPjhhRCnj KxDlFz+LR0nSRNAV95zmrB1V9ID8qTx+WOgFxedNKS1hFUL1rkAPlLv4QDRl0Safy3P4 xp2SI8fPPBNiz+zrHQpSJkXvf04svY5DefS90PBvX1nnuWLTD6PO0dO6e1BO5jrWqMQ/ MoJSHhvSqdQj8dBG+1VxafxW2EgVgxcv38/Rhw83gNDMg09KrpzFB0hP/42D+j15z484 U97tDvVS4p3s4D7hKv8P4AASkKftIJzI7qG5ziIrL4gn0JJ8ApYg4eh+a6kvkTj/w/UW zJlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=TAAtLqws; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id c3-20020a170902d48300b001cfc2d024easi8810044plg.310.2023.11.29.05.37.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 05:37:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=TAAtLqws; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id E421380309DC; Wed, 29 Nov 2023 05:37:38 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230439AbjK2Nh3 (ORCPT + 99 others); Wed, 29 Nov 2023 08:37:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230389AbjK2Nh2 (ORCPT ); Wed, 29 Nov 2023 08:37:28 -0500 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 316B6B2; Wed, 29 Nov 2023 05:37:33 -0800 (PST) Received: from [100.107.97.3] (cola.collaboradmins.com [195.201.22.229]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: kholk11) by madras.collabora.co.uk (Postfix) with ESMTPSA id 701F966072FC; Wed, 29 Nov 2023 13:37:31 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1701265052; bh=dEfUC3EV5oGDYk6B9a83HtTpQA6XtTs4iw4qDuoeFo0=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=TAAtLqwstEXbrYyYfk2b5VVO8HT4VC23mQFvamR4C5sils78TFWUOVTwEkWNtv19J EEsMEIJ50A/wYbaBAKXOg8NAbuDyGsapsQFm6dGhhXS9AnKOa5jYK62P6HUpmtbke8 Hwtlor0gxd+ffWSecdTQG5UDp8gklIEKWY/IRkJhSnYRb3mbky75IZ7Q7pFx1WN9J3 H0gk3PTITlMCBKl1H0/us+3tZmHMurCyExJfHEcy2hdc06VqRcIJQZlsdjduZ4Rq4F 2IDCri17FDhatc2A09qc+IREm0gfsDDxxboSvaSAVDZgoEHnkn7NRq/ayw3ScHzXCf bmwu+ZvsNazAA== Message-ID: <80cbe288-2960-4937-be47-56610946695d@collabora.com> Date: Wed, 29 Nov 2023 14:37:28 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] pmdomain: mediatek: fix race condition in power on/power off sequences Content-Language: en-US To: Eugen Hristev , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-pm@vger.kernel.org Cc: eballetbo@kernel.org, ulf.hansson@linaro.org, linux-kernel@vger.kernel.org, kernel@collabora.com References: <20231129113120.4907-1-eugen.hristev@collabora.com> From: AngeloGioacchino Del Regno In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 29 Nov 2023 05:37:39 -0800 (PST) Il 29/11/23 14:28, Eugen Hristev ha scritto: > On 11/29/23 14:52, AngeloGioacchino Del Regno wrote: >> Il 29/11/23 12:31, Eugen Hristev ha scritto: >>> It can happen that during the power off sequence for a power domain >>> another power on sequence is started, and it can lead to powering on and >>> off in the same time for the similar power domain. >>> This can happen if parallel probing occurs: one device starts probing, and >>> one power domain is probe deferred, this leads to all power domains being >>> rolled back and powered off, while in the same time another device starts >>> probing and requests powering on the same power domains or similar. >>> >>> This was encountered on MT8186, when the sequence is : >>> Power on SSUSB >>> Power on SSUSB_P1 >>> Power on DIS >>>     -> probe deferred >>> Power off DIS >>> Power off SSUSB_P1 >>> Power off SSUSB >>> >>> During the sequence of powering off SSUSB, some new similar sequence starts, >>> and during the power on of SSUSB, clocks are enabled. >>> In this case, powering off SSUSB fails from the first sequence, because >>> power off ACK bit check times out (as clocks are powered back on by the second >>> sequence). In consequence, powering it on also times out, and it leads to >>> the whole power domain in a bad state. >>> >>> To solve this issue, added a mutex that locks the whole power off/power on >>> sequence such that it would never happen that multiple sequences try to >>> enable or disable the same power domain in parallel. >>> >>> Fixes: 59b644b01cf4 ("soc: mediatek: Add MediaTek SCPSYS power domains") >>> Signed-off-by: Eugen Hristev >> >> I don't think that it's a race between genpd_power_on() and genpd_power_off() calls >> at all, because genpd *does* have locking after all... at least for probe and for >> parents of a power domain (and more anyway). >> >> As far as I remember, what happens when you start .probe()'ing a device is: >> platform_probe() -> dev_pm_domain_attach() -> genpd_dev_pm_attach() >> >> There, you end up with >> >>      if (power_on) { >>          genpd_lock(pd); >>          ret = genpd_power_on(pd, 0); >>          genpd_unlock(pd); >>      } >> >> ...but when you fail probing, you go with genpd_dev_pm_detach(), which then calls >> >>      /* Check if PM domain can be powered off after removing this device. */ >>      genpd_queue_power_off_work(pd); >> >> but even then, you end up being in a worker doing >> >>      genpd_lock(genpd); >>      genpd_power_off(genpd, false, 0); >>      genpd_unlock(genpd); >> >> ...so I don't understand why this mutex can resolve the situation here (also: are >> you really sure that the race is solved like that?) >> >> I'd say that this probably needs more justification and a trace of the actual >> situation here. >> >> Besides, if this really resolves the issue, I would prefer seeing variants of >> scpsys_power_{on,off}() functions, because we anyway don't need to lock mutexes >> during this driver's probe (add_subdomain calls scpsys_power_on()). >> In that case, `scpsys_power_on_unlocked()` would be an idea... but still, please >> analyze why your solution works, if it does, because I'm not convinced. > > What I see in my tests, is that a power on call for SSUSB domain happens while the > previous power off sequence did not yet complete, most likely while it's waiting in > readx_poll_timeout . This leads to inconsistency of the power domain, not getting > the ACKs next time a power on attempt occurs. > > I understand what you say about locks, but in this case the powering off is not > called by the genpd itself, but rather it's called by the rollback probe failed > mechanism : when the probing fails, scpsys_domain_cleanup() is called during the > same probing session. > Then it happens that probing begins again and previous cleanup is not yet > completed. I am not sure whether the lock is still held from the previous run, but > it's clearly not waiting for a lock to be released to be called again. > Sorry but I'm a bit lost now: is the problem about probe deferrals of the USB driver, or about probe deferrals of the mtk-pm-domains driver? scpsys_domain_cleanup() is only called upon scpsys_probe() failure. >> >>> --- >>>   drivers/pmdomain/mediatek/mtk-pm-domains.c | 24 +++++++++++++++++----- >>>   1 file changed, 19 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/pmdomain/mediatek/mtk-pm-domains.c >>> b/drivers/pmdomain/mediatek/mtk-pm-domains.c >>> index d5f0ee05c794..4f136b47e539 100644 >>> --- a/drivers/pmdomain/mediatek/mtk-pm-domains.c >>> +++ b/drivers/pmdomain/mediatek/mtk-pm-domains.c >>> @@ -9,6 +9,7 @@ >>>   #include >>>   #include >>>   #include >>> +#include >>>   #include >>>   #include >>>   #include >>> @@ -56,6 +57,7 @@ struct scpsys { >>>       struct device *dev; >>>       struct regmap *base; >>>       const struct scpsys_soc_data *soc_data; >>> +    struct mutex mutex; >>>       struct genpd_onecell_data pd_data; >>>       struct generic_pm_domain *domains[]; >>>   }; >>> @@ -238,9 +240,13 @@ static int scpsys_power_on(struct generic_pm_domain *genpd) >>>       bool tmp; >>>       int ret; >>> +    mutex_lock(&scpsys->mutex); >>> + >>>       ret = scpsys_regulator_enable(pd->supply); >>> -    if (ret) >>> +    if (ret) { >>> +        mutex_unlock(&scpsys->mutex); >>>           return ret; >>> +    } >>>       ret = clk_bulk_prepare_enable(pd->num_clks, pd->clks); >>>       if (ret) >>> @@ -291,6 +297,7 @@ static int scpsys_power_on(struct generic_pm_domain *genpd) >>>               goto err_enable_bus_protect; >>>       } >>> +    mutex_unlock(&scpsys->mutex); >>>       return 0; >>>   err_enable_bus_protect: >>> @@ -305,6 +312,7 @@ static int scpsys_power_on(struct generic_pm_domain *genpd) >>>       clk_bulk_disable_unprepare(pd->num_clks, pd->clks); >>>   err_reg: >>>       scpsys_regulator_disable(pd->supply); >>> +    mutex_unlock(&scpsys->mutex); >>>       return ret; >>>   } >>> @@ -315,13 +323,15 @@ static int scpsys_power_off(struct generic_pm_domain *genpd) >>>       bool tmp; >>>       int ret; >>> +    mutex_lock(&scpsys->mutex); >>> + >>>       ret = scpsys_bus_protect_enable(pd); >>>       if (ret < 0) >>> -        return ret; >>> +        goto err_mutex_unlock; >>>       ret = scpsys_sram_disable(pd); >>>       if (ret < 0) >>> -        return ret; >>> +        goto err_mutex_unlock; >>>       if (pd->data->ext_buck_iso_offs && MTK_SCPD_CAPS(pd, MTK_SCPD_EXT_BUCK_ISO)) >>>           regmap_set_bits(scpsys->base, pd->data->ext_buck_iso_offs, >>> @@ -340,13 +350,15 @@ static int scpsys_power_off(struct generic_pm_domain *genpd) >>>       ret = readx_poll_timeout(scpsys_domain_is_on, pd, tmp, !tmp, >>> MTK_POLL_DELAY_US, >>>                    MTK_POLL_TIMEOUT); >>>       if (ret < 0) >>> -        return ret; >>> +        goto err_mutex_unlock; >>>       clk_bulk_disable_unprepare(pd->num_clks, pd->clks); >>>       scpsys_regulator_disable(pd->supply); >>> -    return 0; >>> +err_mutex_unlock: >>> +    mutex_unlock(&scpsys->mutex); >>> +    return ret; >>>   } >>>   static struct >>> @@ -700,6 +712,8 @@ static int scpsys_probe(struct platform_device *pdev) >>>           return PTR_ERR(scpsys->base); >>>       } >>> +    mutex_init(&scpsys->mutex); >>> + >>>       ret = -ENODEV; >>>       for_each_available_child_of_node(np, node) { >>>           struct generic_pm_domain *domain; >>