Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp6235209rwi; Sun, 23 Oct 2022 21:08:03 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5c4IM8AWEIQ5KWWBatfTMqbrThUlfIAzj4Mzjv7RFZxWeEHoLMWkt0zsn0XigRafHwpdZY X-Received: by 2002:a17:90a:2bc9:b0:212:8210:c92d with SMTP id n9-20020a17090a2bc900b002128210c92dmr22572674pje.38.1666584482858; Sun, 23 Oct 2022 21:08:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666584482; cv=none; d=google.com; s=arc-20160816; b=cllqsDOG9/dEp+moYipZNmkEZefwrI8ReWEcsNPTUugMM2YTaup4zPA7ln0aZbMjeo StVcEAMpX/pu0K76oFtguK57j7AQXu7wYuKznRP9b5YLAUdNIoB6nB5GswYYBlrLBTQP HBjRz7+aws1SRSjw+qanGyJ99oBmm/lYB2tTSG3+0qOMwIGQmMX5BcgEROU5a5GlaoxC MgqdfjIo8xFMRFp22QcuPWGXvkfclHJfDbTlPNx7VE79u7kQm1OmxrB/JVlPXQar3XfQ +JLYLRuml6jEWDQB14ttoTluPlQ82OqXVO8WL0s0Bhr3PG8jPeBtwIKa9aKggVTHnUcJ 1O+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id :dkim-signature; bh=dyymc5XR/iziRTvi3y8RNFC2R9mx4OC4Q4sRKcbXEl4=; b=bjNxE4pgEYM60JQGZIO71vXngNImbkJfaG2F4Y+gbRIRIu365wwxrzr/V/50xuaYxb PQYOs0MzUoUZp45pjdO2Hkl9FO4hlutXV80eVjVc9CEANyKEGTRG/3iJcC1Qa+l/aixS Dvc43Sf8Pa5+4mALlmRWErX5+rIS6CrPUIOFfCONY/8LPqF9AVRBJbMiHHNS4LSvh7ah oYqGktzzmJry9BzR0nK9xcdFhvZw8fzuMhIp1S8JfW5M6Y0YiYA4rlmYZFn3+ofWghly CL8BNlsXtZYM/3QVRkvv9XAMlTLevS7g/L9qIGOT0RevBC21NF/zRyC8n56DMU7NpAOU ujwQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=PqX4MhFm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e8-20020a170903240800b001869e6efb8asi4001055plo.59.2022.10.23.21.07.51; Sun, 23 Oct 2022 21:08:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=PqX4MhFm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230315AbiJXDVI (ORCPT + 99 others); Sun, 23 Oct 2022 23:21:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230168AbiJXDU3 (ORCPT ); Sun, 23 Oct 2022 23:20:29 -0400 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC1AB7B1DC; Sun, 23 Oct 2022 20:17:46 -0700 (PDT) Received: from pps.filterd (m0279872.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29O3FpUS016404; Mon, 24 Oct 2022 03:17:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=qcppdkim1; bh=dyymc5XR/iziRTvi3y8RNFC2R9mx4OC4Q4sRKcbXEl4=; b=PqX4MhFmawF4+Kuv+pCATMw+YLLjVMnw9e0ZzaZu4g5Ouk9KdyOG7MLoMrQdens7fdas I8KYcfHDwDP/dsK28k5lend6ZT1hMUnJmOtVgVZR5ta1/dDiX059tQ/ftBFIqhZGHR+y WpjVPN80kJFRAi4QdEEVBdI7t52/gx23I0AzHDRMvvuk6BQ8GYH2uGyUnT7zNg/sdT8p dzHsJaKKUbBG2nFovXw9Pddv4EobkMdVhtJlaz5IWikBNb3YJjsr+jpExXjGcjKEhYoo OqjVW18J1cCC5x1jDIpI5Itgu5Hbqqqj6fWKdvkfxnops6DF88qCRGctk45iwvyZXFW8 9g== Received: from nasanppmta01.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3kc8wg2nwe-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Oct 2022 03:17:06 +0000 Received: from nasanex01a.na.qualcomm.com (corens_vlan604_snip.qualcomm.com [10.53.140.1]) by NASANPPMTA01.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 29O3H5mH011755 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Oct 2022 03:17:05 GMT Received: from [10.239.133.73] (10.80.80.8) by nasanex01a.na.qualcomm.com (10.52.223.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 23 Oct 2022 20:17:03 -0700 Message-ID: <70828854-8427-8ce1-1535-e14261fd122d@quicinc.com> Date: Mon, 24 Oct 2022 11:17:01 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0 Subject: Re: [PATCH v4] remoteproc: core: do pm relax when in RPROC_OFFLINE To: Mathieu Poirier , Arnaud Pouliquen CC: , , , References: <128dc161-8949-1146-bf8b-310aa33c06a8@quicinc.com> <1663312351-28476-1-git-send-email-quic_aiquny@quicinc.com> <20221012204344.GA1178915@p14s> <792f05fc-995e-9a87-ab7d-bee03f15bc79@quicinc.com> <20221013173442.GA1279972@p14s> <20221013180334.GB1279972@p14s> <8807a9a6-d93d-aef5-15f4-88648a6ecbe2@quicinc.com> From: "Aiqun(Maria) Yu" In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nasanex01a.na.qualcomm.com (10.52.223.231) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: FcWGSD1tVuCRZa0_N4jR-ZI_ioJolA0t X-Proofpoint-GUID: FcWGSD1tVuCRZa0_N4jR-ZI_ioJolA0t X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-21_04,2022-10-21_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 spamscore=0 mlxscore=0 adultscore=0 impostorscore=0 clxscore=1011 phishscore=0 lowpriorityscore=0 priorityscore=1501 mlxlogscore=999 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2210240019 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/22/2022 3:34 AM, Mathieu Poirier wrote: > On Wed, 19 Oct 2022 at 23:52, Aiqun(Maria) Yu wrote: >> >> On 10/14/2022 2:03 AM, Mathieu Poirier wrote: >>> On Thu, Oct 13, 2022 at 11:34:42AM -0600, Mathieu Poirier wrote: >>>> On Thu, Oct 13, 2022 at 09:40:09AM +0800, Aiqun(Maria) Yu wrote: >>>>> Hi Mathieu, >>>>> >>>>> On 10/13/2022 4:43 AM, Mathieu Poirier wrote: >>>>>> Please add what has changed from one version to another, either in a cover >>>>>> letter or after the "Signed-off-by". There are many examples on how to do that >>>>>> on the mailing list. >>>>>> >>>>> Thx for the information, will take a note and benefit for next time. >>>>> >>>>>> On Fri, Sep 16, 2022 at 03:12:31PM +0800, Maria Yu wrote: >>>>>>> RPROC_OFFLINE state indicate there is no recovery process >>>>>>> is in progress and no chance to do the pm_relax. >>>>>>> Because when recovering from crash, rproc->lock is held and >>>>>>> state is RPROC_CRASHED -> RPROC_OFFLINE -> RPROC_RUNNING, >>>>>>> and then unlock rproc->lock. >>>>>> >>>>>> You are correct - because the lock is held rproc->state should be set to RPROC_RUNNING >>>>>> when rproc_trigger_recovery() returns. If that is not the case then something >>>>>> went wrong. >>>>>> >>>>>> Function rproc_stop() sets rproc->state to RPROC_OFFLINE just before returning, >>>>>> so we know the remote processor was stopped. Therefore if rproc->state is set >>>>>> to RPROC_OFFLINE something went wrong in either request_firmware() or >>>>>> rproc_start(). Either way the remote processor is offline and the system probably >>>>>> in an unknown/unstable. As such I don't see how calling pm_relax() can help >>>>>> things along. >>>>>> >>>>> PROC_OFFLINE is possible that rproc_shutdown is triggered and successfully >>>>> finished. >>>>> Even if it is multi crash rproc_crash_handler_work contention issue, and >>>>> last rproc_trigger_recovery bailed out with only >>>>> rproc->state==RPROC_OFFLINE, it is still worth to do pm_relax in pair. >>>>> Since the subsystem may still can be recovered with customer's next trigger >>>>> of rproc_start, and we can make each error out path clean with pm resources. >>>>> >>>>>> I suggest spending time understanding what leads to the failure when recovering >>>>>> from a crash and address that problem(s). >>>>>> >>>>> In current case, the customer's information is that the issue happened when >>>>> rproc_shutdown is triggered at similar time. So not an issue from error out >>>>> of rproc_trigger_recovery. >>>> >>>> That is a very important element to consider and should have been mentioned from >>>> the beginning. What I see happening is the following: >>>> >>>> rproc_report_crash() >>>> pm_stay_awake() >>>> queue_work() // current thread is suspended >>>> >>>> rproc_shutdown() >>>> rproc_stop() >>>> rproc->state = RPROC_OFFLINE; >>>> >>>> rproc_crash_handler_work() >>>> if (rproc->state == RPROC_OFFLINE) >>>> return // pm_relax() is not called >>>> >>>> The right way to fix this is to add a pm_relax() in rproc_shutdown() and >>>> rproc_detach(), along with a very descriptive comment as to why it is needed. >>> >>> Thinking about this further there are more ramifications to consider. Please >>> confirm the above scenario is what you are facing. I will advise on how to move >>> forward if that is the case. >>> >> Not sure if the situation is clear or not. So resend the email again. >> >> The above senario is what customer is facing. crash hanppened while at >> the same time shutdown is triggered. > > Unfortunately this is not enough details to address a problem as > complex as this one. > >> And the device cannto goes to suspend state after that. >> the subsystem can still be start normally after this. > > If the code flow I pasted above reflects the problem at hand, the > current patch will not be sufficient to address the issue. If Arnaud > confirms my suspicions we will have to think about a better solution. > Hi Mathiew, Could you pls have more details of any side effects other then power issue of the current senario? Why the current patch is not sufficient pls? Have the current senario in details with rproc->lock information in details: | subsystem crashed interrupt issued | user trigger shutdown | rproc_report_crash() | | pm_stay_awake() | | queue_work() | | |rproc_shutdown | |mutex_lock(&rproc->lock); | |rproc_stop() |rproc_crash_handler_work() |rproc->state = RPROC_OFFLINE; | |mutex_unlock(&rproc->lock); |mutex_lock(&rproc->lock); | |if (rproc->state == RPROC_OFFLINE) | |return // pm_relax() is not called |rproc_boot |mutex_unlock(&rproc->lock); | | |mutex_lock(&rproc->lock); | |rproc_start() | |mutex_unlock(&rproc->lock); >> >>>> >>>> >>>>>> Thanks, >>>>>> Mathieu >>>>>> >>>>>> >>>>>>> When the state is in RPROC_OFFLINE it means separate request >>>>>>> of rproc_stop was done and no need to hold the wakeup source >>>>>>> in crash handler to recover any more. >>>>>>> >>>>>>> Signed-off-by: Maria Yu >>>>>>> --- >>>>>>> drivers/remoteproc/remoteproc_core.c | 11 +++++++++++ >>>>>>> 1 file changed, 11 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c >>>>>>> index e5279ed9a8d7..6bc7b8b7d01e 100644 >>>>>>> --- a/drivers/remoteproc/remoteproc_core.c >>>>>>> +++ b/drivers/remoteproc/remoteproc_core.c >>>>>>> @@ -1956,6 +1956,17 @@ static void rproc_crash_handler_work(struct work_struct *work) >>>>>>> if (rproc->state == RPROC_CRASHED || rproc->state == RPROC_OFFLINE) { >>>>>>> /* handle only the first crash detected */ >>>>>>> mutex_unlock(&rproc->lock); >>>>>>> + /* >>>>>>> + * RPROC_OFFLINE state indicate there is no recovery process >>>>>>> + * is in progress and no chance to have pm_relax in place. >>>>>>> + * Because when recovering from crash, rproc->lock is held and >>>>>>> + * state is RPROC_CRASHED -> RPROC_OFFLINE -> RPROC_RUNNING, >>>>>>> + * and then unlock rproc->lock. >>>>>>> + * RPROC_OFFLINE is only an intermediate state in recovery >>>>>>> + * process. >>>>>>> + */ >>>>>>> + if (rproc->state == RPROC_OFFLINE) >>>>>>> + pm_relax(rproc->dev.parent); >>>>>>> return; >>>>>>> } >>>>>>> -- >>>>>>> 2.7.4 >>>>>>> >>>>> >>>>> >>>>> -- >>>>> Thx and BRs, >>>>> Aiqun(Maria) Yu >> >> >> -- >> Thx and BRs, >> Aiqun(Maria) Yu -- Thx and BRs, Aiqun(Maria) Yu