Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp4575503rdh; Wed, 29 Nov 2023 05:29:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IFPkCFu+Luu/LTJ4T2M49Nm3x2pW6VE/VQjx65DBPTiGCfkbLfsADLV5yMovfndQ1QolX/Z X-Received: by 2002:a17:902:8488:b0:1c9:cc88:502c with SMTP id c8-20020a170902848800b001c9cc88502cmr18687154plo.69.1701264563281; Wed, 29 Nov 2023 05:29:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701264563; cv=none; d=google.com; s=arc-20160816; b=R7SrrgjHrHXCztxSYCOId81yMYs0qVwsJPR9nD4gD6DxrclW+dg/daIfbm/9mQM+/r vG89Rp3EbalQ7w6+R/70/bsBtCs36udcBZDdhNCuJAw2CxoBdhYZIGxV/hjZJZ7gnl1m 8mtiCokoTbU/jREeRA6hC9M4A6p0OmOjUvr6uRjFnwttZNvHZFHNgyne+FIyvGlwtbTs f4dWhJckZfWu3CRVe7vTaWvsDbFkRp7BTHBmdVLW9zznY/Xm+TT6rHhrxukbT5o3nAkM gbjFUk14p1gDHiFLR7pa4u85an8nxfGhykqMCK9HrTf0nT528b0YW909ggFQ+c3NveR4 zgfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=Gyn4VTS7D0i1yGsVlx5U0L4wro88M9OCcCxoN/yDeV4=; fh=wzmlNSnwsptx25MRmgBe3shiQo6q+P/sAYQWOIyCtkI=; b=iqoLxL7JkG7cwFcFTDqbjiJoHOlaUhYhGmod3MaMijhvlc02LmpOyIZCAE0Vr9tHRb qcWFjruZ89kfhX7gXQemtL4nz/FvcsVFAgkKBnz94i0uLh7QkctHitkQy1XSqUyZse97 8tqxOCUCysTp0kVcRd2K5cWtdPtkHay1IC185EvbBk/wkVq99KTP46U0dU3aexSUCXV8 UOGDTWEPTrA8j4LQj/t2/6Tt3fWS/wbMkXMHdGwj+SupJpL2DcWOAA51ve4rz1uR6Yu5 /yeJEB1YvHXNE15bEePPZ8ltXZUjP3evqCxYepMmytXAMBc19QapGmhd/R6DdDoLCuRB su4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=KBC8zUjQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id m6-20020a170902768600b001cf78d8c9f5si13328652pll.641.2023.11.29.05.29.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 05:29:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=KBC8zUjQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 5346380B815C; Wed, 29 Nov 2023 05:29:19 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234458AbjK2N3A (ORCPT + 99 others); Wed, 29 Nov 2023 08:29:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234341AbjK2N2r (ORCPT ); Wed, 29 Nov 2023 08:28:47 -0500 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5277A1BDF; Wed, 29 Nov 2023 05:28:40 -0800 (PST) Received: from [100.94.55.57] (cola.collaboradmins.com [195.201.22.229]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: ehristev) by madras.collabora.co.uk (Postfix) with ESMTPSA id 4D8346607323; Wed, 29 Nov 2023 13:28:38 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1701264519; bh=voQon3LzdEu0qMz+wUEU2I4h6CzbZywQZscFJYWwvfA=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=KBC8zUjQWLT/3o9MkGNsKM9HzFusQO2gclXA4e+ncAKdAwzQD7ggdJ2s45jz5d1zw f/dopADVQm2q5CqKN7wQX7iRo8FNW4usmTXD6mKP/vFx8/dwE8Wgr0athzafXzu++s ZBGj11YJiU+mSeGwJAGJIee67h8WlsOxcxgcZeebSY1Tq7kKNId+yKIMo55QIOvjBd dJISR+s4MPA6UoaVbu/zLPQ8rfjocnt6yJ2drSCGUYOVCfG0HxSLhRnEbx84Pwiy2Y wW7baJMLTI88PsVv4XKPfQ7wMqRMimDG3AEqPMkhNT6ifLkOovmA39Z2a3m5bmVgZD qW0KTmySgU16w== Message-ID: Date: Wed, 29 Nov 2023 15:28:35 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] pmdomain: mediatek: fix race condition in power on/power off sequences Content-Language: en-US To: AngeloGioacchino Del Regno , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-pm@vger.kernel.org Cc: eballetbo@kernel.org, ulf.hansson@linaro.org, linux-kernel@vger.kernel.org, kernel@collabora.com References: <20231129113120.4907-1-eugen.hristev@collabora.com> From: Eugen Hristev In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 29 Nov 2023 05:29:19 -0800 (PST) On 11/29/23 14:52, AngeloGioacchino Del Regno wrote: > Il 29/11/23 12:31, Eugen Hristev ha scritto: >> It can happen that during the power off sequence for a power domain >> another power on sequence is started, and it can lead to powering on and >> off in the same time for the similar power domain. >> This can happen if parallel probing occurs: one device starts probing, >> and >> one power domain is probe deferred, this leads to all power domains being >> rolled back and powered off, while in the same time another device starts >> probing and requests powering on the same power domains or similar. >> >> This was encountered on MT8186, when the sequence is : >> Power on SSUSB >> Power on SSUSB_P1 >> Power on DIS >>     -> probe deferred >> Power off DIS >> Power off SSUSB_P1 >> Power off SSUSB >> >> During the sequence of powering off SSUSB, some new similar sequence >> starts, >> and during the power on of SSUSB, clocks are enabled. >> In this case, powering off SSUSB fails from the first sequence, because >> power off ACK bit check times out (as clocks are powered back on by >> the second >> sequence). In consequence, powering it on also times out, and it leads to >> the whole power domain in a bad state. >> >> To solve this issue, added a mutex that locks the whole power >> off/power on >> sequence such that it would never happen that multiple sequences try to >> enable or disable the same power domain in parallel. >> >> Fixes: 59b644b01cf4 ("soc: mediatek: Add MediaTek SCPSYS power domains") >> Signed-off-by: Eugen Hristev > > I don't think that it's a race between genpd_power_on() and > genpd_power_off() calls > at all, because genpd *does* have locking after all... at least for > probe and for > parents of a power domain (and more anyway). > > As far as I remember, what happens when you start .probe()'ing a device is: > platform_probe() -> dev_pm_domain_attach() -> genpd_dev_pm_attach() > > There, you end up with > >     if (power_on) { >         genpd_lock(pd); >         ret = genpd_power_on(pd, 0); >         genpd_unlock(pd); >     } > > ...but when you fail probing, you go with genpd_dev_pm_detach(), which > then calls > >     /* Check if PM domain can be powered off after removing this > device. */ >     genpd_queue_power_off_work(pd); > > but even then, you end up being in a worker doing > >     genpd_lock(genpd); >     genpd_power_off(genpd, false, 0); >     genpd_unlock(genpd); > > ...so I don't understand why this mutex can resolve the situation here > (also: are > you really sure that the race is solved like that?) > > I'd say that this probably needs more justification and a trace of the > actual > situation here. > > Besides, if this really resolves the issue, I would prefer seeing > variants of > scpsys_power_{on,off}() functions, because we anyway don't need to lock > mutexes > during this driver's probe (add_subdomain calls scpsys_power_on()). > In that case, `scpsys_power_on_unlocked()` would be an idea... but > still, please > analyze why your solution works, if it does, because I'm not convinced. What I see in my tests, is that a power on call for SSUSB domain happens while the previous power off sequence did not yet complete, most likely while it's waiting in readx_poll_timeout . This leads to inconsistency of the power domain, not getting the ACKs next time a power on attempt occurs. I understand what you say about locks, but in this case the powering off is not called by the genpd itself, but rather it's called by the rollback probe failed mechanism : when the probing fails, scpsys_domain_cleanup() is called during the same probing session. Then it happens that probing begins again and previous cleanup is not yet completed. I am not sure whether the lock is still held from the previous run, but it's clearly not waiting for a lock to be released to be called again. > > Cheers, > Angelo > >> --- >>   drivers/pmdomain/mediatek/mtk-pm-domains.c | 24 +++++++++++++++++----- >>   1 file changed, 19 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/pmdomain/mediatek/mtk-pm-domains.c >> b/drivers/pmdomain/mediatek/mtk-pm-domains.c >> index d5f0ee05c794..4f136b47e539 100644 >> --- a/drivers/pmdomain/mediatek/mtk-pm-domains.c >> +++ b/drivers/pmdomain/mediatek/mtk-pm-domains.c >> @@ -9,6 +9,7 @@ >>   #include >>   #include >>   #include >> +#include >>   #include >>   #include >>   #include >> @@ -56,6 +57,7 @@ struct scpsys { >>       struct device *dev; >>       struct regmap *base; >>       const struct scpsys_soc_data *soc_data; >> +    struct mutex mutex; >>       struct genpd_onecell_data pd_data; >>       struct generic_pm_domain *domains[]; >>   }; >> @@ -238,9 +240,13 @@ static int scpsys_power_on(struct >> generic_pm_domain *genpd) >>       bool tmp; >>       int ret; >> +    mutex_lock(&scpsys->mutex); >> + >>       ret = scpsys_regulator_enable(pd->supply); >> -    if (ret) >> +    if (ret) { >> +        mutex_unlock(&scpsys->mutex); >>           return ret; >> +    } >>       ret = clk_bulk_prepare_enable(pd->num_clks, pd->clks); >>       if (ret) >> @@ -291,6 +297,7 @@ static int scpsys_power_on(struct >> generic_pm_domain *genpd) >>               goto err_enable_bus_protect; >>       } >> +    mutex_unlock(&scpsys->mutex); >>       return 0; >>   err_enable_bus_protect: >> @@ -305,6 +312,7 @@ static int scpsys_power_on(struct >> generic_pm_domain *genpd) >>       clk_bulk_disable_unprepare(pd->num_clks, pd->clks); >>   err_reg: >>       scpsys_regulator_disable(pd->supply); >> +    mutex_unlock(&scpsys->mutex); >>       return ret; >>   } >> @@ -315,13 +323,15 @@ static int scpsys_power_off(struct >> generic_pm_domain *genpd) >>       bool tmp; >>       int ret; >> +    mutex_lock(&scpsys->mutex); >> + >>       ret = scpsys_bus_protect_enable(pd); >>       if (ret < 0) >> -        return ret; >> +        goto err_mutex_unlock; >>       ret = scpsys_sram_disable(pd); >>       if (ret < 0) >> -        return ret; >> +        goto err_mutex_unlock; >>       if (pd->data->ext_buck_iso_offs && MTK_SCPD_CAPS(pd, >> MTK_SCPD_EXT_BUCK_ISO)) >>           regmap_set_bits(scpsys->base, pd->data->ext_buck_iso_offs, >> @@ -340,13 +350,15 @@ static int scpsys_power_off(struct >> generic_pm_domain *genpd) >>       ret = readx_poll_timeout(scpsys_domain_is_on, pd, tmp, !tmp, >> MTK_POLL_DELAY_US, >>                    MTK_POLL_TIMEOUT); >>       if (ret < 0) >> -        return ret; >> +        goto err_mutex_unlock; >>       clk_bulk_disable_unprepare(pd->num_clks, pd->clks); >>       scpsys_regulator_disable(pd->supply); >> -    return 0; >> +err_mutex_unlock: >> +    mutex_unlock(&scpsys->mutex); >> +    return ret; >>   } >>   static struct >> @@ -700,6 +712,8 @@ static int scpsys_probe(struct platform_device *pdev) >>           return PTR_ERR(scpsys->base); >>       } >> +    mutex_init(&scpsys->mutex); >> + >>       ret = -ENODEV; >>       for_each_available_child_of_node(np, node) { >>           struct generic_pm_domain *domain; > >