Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp1048418ybb; Wed, 25 Mar 2020 14:43:23 -0700 (PDT) X-Google-Smtp-Source: ADFU+vuSbUCIdwc8FCY7Qou/rDOzubx+VKySDAvDhb+3ndD41nQOZUHeosRv2U88RCUNjX2fzXb2 X-Received: by 2002:a9d:6b02:: with SMTP id g2mr4147148otp.340.1585172602954; Wed, 25 Mar 2020 14:43:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585172602; cv=none; d=google.com; s=arc-20160816; b=ahtVxOab4bSrOe7GA5OTRiGjpjgQ98Y6kM1yYDGAMgpMnbd35DePvecpp3pbYhaR+w ajSwVjQO8uJciDYQZy4S/er/ceYTa7nvkR7eRCZJhfwWlYVWQJCl3NQMngP5OwbDU03/ snyzSLAfBzREyer+QIVF6urBIKUh1eIpaQrC2uUsZ/kfimBp/bWYJZp+4GUqRTIjD9St oOvSOjqSkeC/FW9QEOvmac/DFaPiiuKrwmfyS4UqYzqPF3w6M5nz6LF2qoW1JoQXkBtU gapJ1r1FYPrwoSUkz+Wup0WrccALTF6h7ZCpUECmt9LXEr0HiVoHtysfz80lHaNTh8WX Jwcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=buU5ko9WYyQLxe4W65aArJ2mzLoua7+RSvDwL1OmNEg=; b=ojLWjR/84Lx+zHzAzVQsIL1i7r0D2LETbZCTqHLpOiIMFhP4RbN8Lb+ZKW4WGi+SDS OA8F1tgZdnURw6xqCIFfPt1HAnW+n2Ez93j38+Bz2osjJ3lUJNZeGfm7vn9zLjo73gvS L2Z07qxoOC942h0tDI9GLlAmU7k668JwpSgQceZITAwvox9WEtrz8wauiuoTQFYw3TMI U4IhJ2LxEEOnFBs0OdYDiq4kOwjvHfWF5JlFg4r76JG5ivXr0QQGku07ekJuHW4FFhR1 svjWqB+ymq3CzxOvfKBVoACaaFQ7WWVlZpSlt8TCw6Trii+/qEnwmeTfBFM/tB6JNlel uT3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZxKKEZdm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w24si50253ooj.34.2020.03.25.14.43.08; Wed, 25 Mar 2020 14:43:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZxKKEZdm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727466AbgCYVmY (ORCPT + 99 others); Wed, 25 Mar 2020 17:42:24 -0400 Received: from mail-io1-f67.google.com ([209.85.166.67]:41717 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726081AbgCYVmY (ORCPT ); Wed, 25 Mar 2020 17:42:24 -0400 Received: by mail-io1-f67.google.com with SMTP id y24so3918491ioa.8 for ; Wed, 25 Mar 2020 14:42:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=buU5ko9WYyQLxe4W65aArJ2mzLoua7+RSvDwL1OmNEg=; b=ZxKKEZdmuoacpoKH0XbpR+refRSDkns4jbDBtRk3JBWdJYvtlvzH6zeRmYxRLnwPbi T/4W8v5h5MF9MulUBGCkY2+MwNYQjo8TfX2xlOniViVAQb0jIWN+mAdHMumMG5HeUR7H zjNLM/E84YwbKbrbKsC9VXmfZbDz1PBdA+pJH83LVHFtMeDAfzI+/LahfbYioLaq2JEI 7PS3spkJriaL0il5nrd72TkAuqckgC3/sFB5nIsMQiuO+ad0T8cYZPG/frekt36Rtnm4 KiUUiigbVxzt/Ok/p0Vv1tttxwvpzs4lbm69D/UyZ+2o+1a14+oC3DEgDQrkasMx/SI2 x7XA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=buU5ko9WYyQLxe4W65aArJ2mzLoua7+RSvDwL1OmNEg=; b=a0fiep8bFj0NfMqkgoNcl9AMNsHqoW1G9aJuOs84o1M9jnlvfS/ecRsgDCTjpEVIA5 zX2coqYzYgc6BNo8Xyf149q4Pis4YG++LWQaKCi6TU7in8UrrOqMPYRZ52KV2fvi/lf+ wdJqT/SxHjk01GiiY9gQgpdLHS84M5f2dlmkO/sRamgYZnyARWmxLXUIciy35q1fXA/c VINCEBCCsxOOtA+h9fb7VZ9COPGlbbGmYf4L+d6AhRRgiewwz46LUysWqSGnWKXGCiB/ 6w4RQnWsW5/SaqVuaDUdcXLW27Hve2UAfV0WzGIKzug1FuiFHpWRBePpm1DxIQBMH12l 0XBA== X-Gm-Message-State: ANhLgQ1S0OcPgKRqg0kPzKQS4l1RpIDYhICPBwoljiHIBXb45Y9wcOGu GF4LP9dwxUE2/gckR/dAVtofHz7HqA5zZnY59R+exw== X-Received: by 2002:a02:8405:: with SMTP id k5mr5081184jah.52.1585172541443; Wed, 25 Mar 2020 14:42:21 -0700 (PDT) MIME-Version: 1.0 References: <1583924072-20648-1-git-send-email-loic.pallardy@st.com> <1583924072-20648-2-git-send-email-loic.pallardy@st.com> <20200325175746.GA6227@xps15> <9a089cba07f7454ea0fc0f2d09bd9bf0@SFHDAG7NODE2.st.com> In-Reply-To: <9a089cba07f7454ea0fc0f2d09bd9bf0@SFHDAG7NODE2.st.com> From: Mathieu Poirier Date: Wed, 25 Mar 2020 15:42:10 -0600 Message-ID: Subject: Re: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed To: Loic PALLARDY Cc: "bjorn.andersson@linaro.org" , "ohad@wizery.com" , "linux-remoteproc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Arnaud POULIQUEN , "benjamin.gaignard@linaro.org" , Fabien DESSENNE , "s-anna@ti.com" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 25 Mar 2020 at 12:30, Loic PALLARDY wrote: > > Hi Mathieu, > > > -----Original Message----- > > From: Mathieu Poirier > > Sent: mercredi 25 mars 2020 18:58 > > To: Loic PALLARDY > > Cc: bjorn.andersson@linaro.org; ohad@wizery.com; linux- > > remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; Arnaud > > POULIQUEN ; benjamin.gaignard@linaro.org; > > Fabien DESSENNE ; s-anna@ti.com > > Subject: Re: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when > > rproc is crashed > > > > Hi Loic, > > > > On Wed, Mar 11, 2020 at 11:54:31AM +0100, Loic Pallardy wrote: > > > When remoteproc recovery is disabled and rproc crashed, user space > > > client has no way to reboot co-processor except by a complete platform > > > reboot. > > > Indeed rproc_shutdown() is called by sysfs state_store() only is rproc > > > state is RPROC_RUNNING. > > > > > > This patch offers the possibility to shutdown the co-processor if > > > it is in RPROC_CRASHED state and so to restart properly co-processor > > > from sysfs interface. > > > > If recovery is disabled on an rproc the platform likely intended to have a hard > > reboot and as such we should not be concerned about this case. > I disagree with your view. In fact, we can have a configuration for which > we don't want a silent recovery. Application layer can be involved to stop and > restart some services because it is the simplest way to resync with the coprocessor. > What's missing today is an event to notify user space application that coprocessor state > has changed. (even if we can rely on rpmsg services closure) I have a better understanding of the scenario now. > > > > > Where I think we have a problem, something that is asserted by looking at > > your 2 > > patches, is cases where rproc_trigger_recovery() fails. That leaves the > > system > > in a state where it can't be recovered, something the remoteproc core > > should not > > allow. > > > Right this is a second use case we faced when user space application which provided > firmware file crashed before coprocessor. In that case firmware file may be removed > from /lib/firmware directory and coprocessor recovery failed. > Application, when restarting, can't anymore control coprocessor. This is a very specific use case. It seems to me that fixing the problem with the availability of files under /lib/firmware is where the solution really lies. > > Regards, > Loic > > > > > > > Signed-off-by: Loic Pallardy > > > --- > > > drivers/remoteproc/remoteproc_core.c | 2 +- > > > drivers/remoteproc/remoteproc_sysfs.c | 2 +- > > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/remoteproc/remoteproc_core.c > > b/drivers/remoteproc/remoteproc_core.c > > > index 097f33e4f1f3..7ac87a75cd1b 100644 > > > --- a/drivers/remoteproc/remoteproc_core.c > > > +++ b/drivers/remoteproc/remoteproc_core.c > > > @@ -1812,7 +1812,7 @@ void rproc_shutdown(struct rproc *rproc) > > > if (!atomic_dec_and_test(&rproc->power)) > > > goto out; > > > > > > - ret = rproc_stop(rproc, false); > > > + ret = rproc_stop(rproc, rproc->state == RPROC_CRASHED); > > > > Please add a comment that explains how we can be in rproc_shutdown() > > when the > > processor has crashed and point to rproc_trigger_recovery(). See below for > > more > > details. > > > > > if (ret) { > > > atomic_inc(&rproc->power); > > > goto out; > > > diff --git a/drivers/remoteproc/remoteproc_sysfs.c > > b/drivers/remoteproc/remoteproc_sysfs.c > > > index 7f8536b73295..1029458a4678 100644 > > > --- a/drivers/remoteproc/remoteproc_sysfs.c > > > +++ b/drivers/remoteproc/remoteproc_sysfs.c > > > @@ -101,7 +101,7 @@ static ssize_t state_store(struct device *dev, > > > if (ret) > > > dev_err(&rproc->dev, "Boot failed: %d\n", ret); > > > } else if (sysfs_streq(buf, "stop")) { > > > - if (rproc->state != RPROC_RUNNING) > > > + if (rproc->state != RPROC_RUNNING && rproc->state != > > RPROC_CRASHED) > > > return -EINVAL; > > > > Wouldn't it be better to just prevent the MCU to stay in a crashed state > > (when > > recovery is not disabled)? > > > > I like what you did in the next patch where the state of the MCU is set to > > RPROC_CRASHED in case of failure, so that we keep. I also think the hunk > > above is correct. All that is left is to call rproc_shutdown() directly in > > rproc_trigger_recovery() when something goes wrong. I would also add a > > dev_err() so that users have a clue of what happened. > > > > That would leave the system in a stable state without having to add > > intelligence > > to state_store(). > It is a solution we debate internally. Should rproc_shutdown() called directly in > rproc_trigger_recovery() or not? If we go in such direction, that clearly simplify > coprocessor control as it will always be in a "stable" state. But that means user > will lost information that coprocessor crashed (mainly when recovery is disabled). > We just know that coprocessor is stopped but not why? Crashed or client action? > For debug purpose, it could be an issue from my pov. That is why I suggested to add a dev_err() so that users know recovery of the MCU has failed. Moreover I expect users to be aware of what is happening on their platform, i.e if application did not switch off the MCU and it is in the offline state, then it is fair to assume it crashed. > > Regards, > Loic > > > > Let me know that you think... > > > > Mathieu > > > > > > > > rproc_shutdown(rproc); > > > -- > > > 2.7.4 > > >