2022-07-05 01:25:43

by Peng Fan (OSS)

[permalink] [raw]
Subject: [PATCH V7 2/2] remoteproc: support attach recovery after rproc crash

From: Peng Fan <[email protected]>

Current logic only support main processor to stop/start the remote
processor after crash. However to SoC, such as i.MX8QM/QXP, the
remote processor could do attach recovery after crash and trigger watchdog
to reboot itself. It does not need main processor to load image, or
stop/start remote processor.

Introduce two functions: rproc_attach_recovery, rproc_boot_recovery
for the two cases. Boot recovery is as before, let main processor to
help recovery, while attach recovery is to recover itself without help.
To attach recovery, we only do detach and attach.

Acked-by: Arnaud Pouliquen <[email protected]>
Signed-off-by: Peng Fan <[email protected]>
---
drivers/remoteproc/remoteproc_core.c | 62 +++++++++++++++++++---------
1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index ed374c8bf14a..ef5b9310bc83 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1884,6 +1884,45 @@ static int __rproc_detach(struct rproc *rproc)
return 0;
}

+static int rproc_attach_recovery(struct rproc *rproc)
+{
+ int ret;
+
+ ret = __rproc_detach(rproc);
+ if (ret)
+ return ret;
+
+ return __rproc_attach(rproc);
+}
+
+static int rproc_boot_recovery(struct rproc *rproc)
+{
+ const struct firmware *firmware_p;
+ struct device *dev = &rproc->dev;
+ int ret;
+
+ ret = rproc_stop(rproc, true);
+ if (ret)
+ return ret;
+
+ /* generate coredump */
+ rproc->ops->coredump(rproc);
+
+ /* load firmware */
+ ret = request_firmware(&firmware_p, rproc->firmware, dev);
+ if (ret < 0) {
+ dev_err(dev, "request_firmware failed: %d\n", ret);
+ return ret;
+ }
+
+ /* boot the remote processor up again */
+ ret = rproc_start(rproc, firmware_p);
+
+ release_firmware(firmware_p);
+
+ return ret;
+}
+
/**
* rproc_trigger_recovery() - recover a remoteproc
* @rproc: the remote processor
@@ -1898,7 +1937,6 @@ static int __rproc_detach(struct rproc *rproc)
*/
int rproc_trigger_recovery(struct rproc *rproc)
{
- const struct firmware *firmware_p;
struct device *dev = &rproc->dev;
int ret;

@@ -1912,24 +1950,10 @@ int rproc_trigger_recovery(struct rproc *rproc)

dev_err(dev, "recovering %s\n", rproc->name);

- ret = rproc_stop(rproc, true);
- if (ret)
- goto unlock_mutex;
-
- /* generate coredump */
- rproc->ops->coredump(rproc);
-
- /* load firmware */
- ret = request_firmware(&firmware_p, rproc->firmware, dev);
- if (ret < 0) {
- dev_err(dev, "request_firmware failed: %d\n", ret);
- goto unlock_mutex;
- }
-
- /* boot the remote processor up again */
- ret = rproc_start(rproc, firmware_p);
-
- release_firmware(firmware_p);
+ if (rproc_has_feature(rproc, RPROC_FEAT_ATTACH_ON_RECOVERY))
+ ret = rproc_attach_recovery(rproc);
+ else
+ ret = rproc_boot_recovery(rproc);

unlock_mutex:
mutex_unlock(&rproc->lock);
--
2.25.1


2022-09-26 22:11:14

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc crash

On Tue, Jul 05, 2022 at 09:15:27AM +0800, Peng Fan (OSS) wrote:
> From: Peng Fan <[email protected]>
>
> Current logic only support main processor to stop/start the remote
> processor after crash. However to SoC, such as i.MX8QM/QXP, the
> remote processor could do attach recovery after crash and trigger watchdog
> to reboot itself. It does not need main processor to load image, or
> stop/start remote processor.
>
> Introduce two functions: rproc_attach_recovery, rproc_boot_recovery
> for the two cases. Boot recovery is as before, let main processor to
> help recovery, while attach recovery is to recover itself without help.
> To attach recovery, we only do detach and attach.
>
> Acked-by: Arnaud Pouliquen <[email protected]>
> Signed-off-by: Peng Fan <[email protected]>
> ---
> drivers/remoteproc/remoteproc_core.c | 62 +++++++++++++++++++---------
> 1 file changed, 43 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index ed374c8bf14a..ef5b9310bc83 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1884,6 +1884,45 @@ static int __rproc_detach(struct rproc *rproc)
> return 0;
> }
>
> +static int rproc_attach_recovery(struct rproc *rproc)
> +{
> + int ret;
> +
> + ret = __rproc_detach(rproc);
> + if (ret)
> + return ret;

I thought there was a specific reason to _not_ call rproc->ops->coredump() for
processors that have been attached to but looking at the STM32 and IMX_DSP now, it
would seem logical to do so. Am I missing something?

And this set will need a rebase.

Thanks,
Mathieu

> +
> + return __rproc_attach(rproc);
> +}
> +
> +static int rproc_boot_recovery(struct rproc *rproc)
> +{
> + const struct firmware *firmware_p;
> + struct device *dev = &rproc->dev;
> + int ret;
> +
> + ret = rproc_stop(rproc, true);
> + if (ret)
> + return ret;
> +
> + /* generate coredump */
> + rproc->ops->coredump(rproc);
> +
> + /* load firmware */
> + ret = request_firmware(&firmware_p, rproc->firmware, dev);
> + if (ret < 0) {
> + dev_err(dev, "request_firmware failed: %d\n", ret);
> + return ret;
> + }
> +
> + /* boot the remote processor up again */
> + ret = rproc_start(rproc, firmware_p);
> +
> + release_firmware(firmware_p);
> +
> + return ret;
> +}
> +
> /**
> * rproc_trigger_recovery() - recover a remoteproc
> * @rproc: the remote processor
> @@ -1898,7 +1937,6 @@ static int __rproc_detach(struct rproc *rproc)
> */
> int rproc_trigger_recovery(struct rproc *rproc)
> {
> - const struct firmware *firmware_p;
> struct device *dev = &rproc->dev;
> int ret;
>
> @@ -1912,24 +1950,10 @@ int rproc_trigger_recovery(struct rproc *rproc)
>
> dev_err(dev, "recovering %s\n", rproc->name);
>
> - ret = rproc_stop(rproc, true);
> - if (ret)
> - goto unlock_mutex;
> -
> - /* generate coredump */
> - rproc->ops->coredump(rproc);
> -
> - /* load firmware */
> - ret = request_firmware(&firmware_p, rproc->firmware, dev);
> - if (ret < 0) {
> - dev_err(dev, "request_firmware failed: %d\n", ret);
> - goto unlock_mutex;
> - }
> -
> - /* boot the remote processor up again */
> - ret = rproc_start(rproc, firmware_p);
> -
> - release_firmware(firmware_p);
> + if (rproc_has_feature(rproc, RPROC_FEAT_ATTACH_ON_RECOVERY))
> + ret = rproc_attach_recovery(rproc);
> + else
> + ret = rproc_boot_recovery(rproc);
>
> unlock_mutex:
> mutex_unlock(&rproc->lock);
> --
> 2.25.1
>

2022-09-27 03:31:50

by Peng Fan

[permalink] [raw]
Subject: RE: [PATCH V7 2/2] remoteproc: support attach recovery after rproc crash

Hi Mathieu,

> Subject: Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc
> crash
>
> On Tue, Jul 05, 2022 at 09:15:27AM +0800, Peng Fan (OSS) wrote:
> > From: Peng Fan <[email protected]>
> >
> > Current logic only support main processor to stop/start the remote
> > processor after crash. However to SoC, such as i.MX8QM/QXP, the remote
> > processor could do attach recovery after crash and trigger watchdog to
> > reboot itself. It does not need main processor to load image, or
> > stop/start remote processor.
> >
> > Introduce two functions: rproc_attach_recovery, rproc_boot_recovery
> > for the two cases. Boot recovery is as before, let main processor to
> > help recovery, while attach recovery is to recover itself without help.
> > To attach recovery, we only do detach and attach.
> >
> > Acked-by: Arnaud Pouliquen <[email protected]>
> > Signed-off-by: Peng Fan <[email protected]>
> > ---
> > drivers/remoteproc/remoteproc_core.c | 62
> > +++++++++++++++++++---------
> > 1 file changed, 43 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c
> > b/drivers/remoteproc/remoteproc_core.c
> > index ed374c8bf14a..ef5b9310bc83 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1884,6 +1884,45 @@ static int __rproc_detach(struct rproc *rproc)
> > return 0;
> > }
> >
> > +static int rproc_attach_recovery(struct rproc *rproc) {
> > + int ret;
> > +
> > + ret = __rproc_detach(rproc);
> > + if (ret)
> > + return ret;
>
> I thought there was a specific reason to _not_ call rproc->ops->coredump()
> for processors that have been attached to but looking at the STM32 and
> IMX_DSP now, it would seem logical to do so. Am I missing something?

ATTACH RECOVERY is to support recovery without help from Linux,

STM32 and IMX_DSP, both require linux to load image and start remote
core. So the two cases should not enable feature:
RPROC_FEAT_ATTACH_ON_RECOVERY

Also considering the recovery is out of linux control, actually when linux
start dump, remote core may already recovered.

>
> And this set will need a rebase.

I'll do the rebase and send V8 if the upper explanation could eliminate
your concern.

Thanks,
Peng.

>
> Thanks,
> Mathieu
>
> > +
> > + return __rproc_attach(rproc);
> > +}
> > +
> > +static int rproc_boot_recovery(struct rproc *rproc) {
> > + const struct firmware *firmware_p;
> > + struct device *dev = &rproc->dev;
> > + int ret;
> > +
> > + ret = rproc_stop(rproc, true);
> > + if (ret)
> > + return ret;
> > +
> > + /* generate coredump */
> > + rproc->ops->coredump(rproc);
> > +
> > + /* load firmware */
> > + ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > + if (ret < 0) {
> > + dev_err(dev, "request_firmware failed: %d\n", ret);
> > + return ret;
> > + }
> > +
> > + /* boot the remote processor up again */
> > + ret = rproc_start(rproc, firmware_p);
> > +
> > + release_firmware(firmware_p);
> > +
> > + return ret;
> > +}
> > +
> > /**
> > * rproc_trigger_recovery() - recover a remoteproc
> > * @rproc: the remote processor
> > @@ -1898,7 +1937,6 @@ static int __rproc_detach(struct rproc *rproc)
> > */
> > int rproc_trigger_recovery(struct rproc *rproc) {
> > - const struct firmware *firmware_p;
> > struct device *dev = &rproc->dev;
> > int ret;
> >
> > @@ -1912,24 +1950,10 @@ int rproc_trigger_recovery(struct rproc
> > *rproc)
> >
> > dev_err(dev, "recovering %s\n", rproc->name);
> >
> > - ret = rproc_stop(rproc, true);
> > - if (ret)
> > - goto unlock_mutex;
> > -
> > - /* generate coredump */
> > - rproc->ops->coredump(rproc);
> > -
> > - /* load firmware */
> > - ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > - if (ret < 0) {
> > - dev_err(dev, "request_firmware failed: %d\n", ret);
> > - goto unlock_mutex;
> > - }
> > -
> > - /* boot the remote processor up again */
> > - ret = rproc_start(rproc, firmware_p);
> > -
> > - release_firmware(firmware_p);
> > + if (rproc_has_feature(rproc, RPROC_FEAT_ATTACH_ON_RECOVERY))
> > + ret = rproc_attach_recovery(rproc);
> > + else
> > + ret = rproc_boot_recovery(rproc);
> >
> > unlock_mutex:
> > mutex_unlock(&rproc->lock);
> > --
> > 2.25.1
> >

2022-09-27 08:50:52

by Arnaud Pouliquen

[permalink] [raw]
Subject: Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc crash

Hi,

On 9/27/22 05:03, Peng Fan wrote:
> Hi Mathieu,
>
>> Subject: Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc
>> crash
>>
>> On Tue, Jul 05, 2022 at 09:15:27AM +0800, Peng Fan (OSS) wrote:
>>> From: Peng Fan <[email protected]>
>>>
>>> Current logic only support main processor to stop/start the remote
>>> processor after crash. However to SoC, such as i.MX8QM/QXP, the remote
>>> processor could do attach recovery after crash and trigger watchdog to
>>> reboot itself. It does not need main processor to load image, or
>>> stop/start remote processor.
>>>
>>> Introduce two functions: rproc_attach_recovery, rproc_boot_recovery
>>> for the two cases. Boot recovery is as before, let main processor to
>>> help recovery, while attach recovery is to recover itself without help.
>>> To attach recovery, we only do detach and attach.
>>>
>>> Acked-by: Arnaud Pouliquen <[email protected]>
>>> Signed-off-by: Peng Fan <[email protected]>
>>> ---
>>> drivers/remoteproc/remoteproc_core.c | 62
>>> +++++++++++++++++++---------
>>> 1 file changed, 43 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c
>>> b/drivers/remoteproc/remoteproc_core.c
>>> index ed374c8bf14a..ef5b9310bc83 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -1884,6 +1884,45 @@ static int __rproc_detach(struct rproc *rproc)
>>> return 0;
>>> }
>>>
>>> +static int rproc_attach_recovery(struct rproc *rproc) {
>>> + int ret;
>>> +
>>> + ret = __rproc_detach(rproc);
>>> + if (ret)
>>> + return ret;
>>
>> I thought there was a specific reason to _not_ call rproc->ops->coredump()
>> for processors that have been attached to but looking at the STM32 and
>> IMX_DSP now, it would seem logical to do so. Am I missing something?
>
> ATTACH RECOVERY is to support recovery without help from Linux,
>
> STM32 and IMX_DSP, both require linux to load image and start remote
> core. So the two cases should not enable feature:
> RPROC_FEAT_ATTACH_ON_RECOVERY
>
> Also considering the recovery is out of linux control, actually when linux
> start dump, remote core may already recovered.

I asked myself the same question. Indeed how to manage a core dump if the
remote processor restarts autonomously.
The answer doesn't seem obvious because it seems to be platform specific.

For time being on STM32 we consider that the remoteproc memory can be corrupted
so we don't plan to enable the feature by default even if the hardware allows it.

If we implement it, I would see 2 use cases:
- no core dump, the remote processor restart autonomously (need to manage the
VIRTIO_CONFIG_S_NEEDS_RESET in resource table vdev for resynchronization)
- core dump and the Linux stm32 driver handle the reset of the remote
processor core to be able to perform the core dump (no firmware loading)

What about dealing with the coredump in a separate thread, based on a concrete
use case/need?

Regards,
Arnaud

>
>>
>> And this set will need a rebase.
>
> I'll do the rebase and send V8 if the upper explanation could eliminate
> your concern.
>
> Thanks,
> Peng.
>
>>
>> Thanks,
>> Mathieu
>>
>>> +
>>> + return __rproc_attach(rproc);
>>> +}
>>> +
>>> +static int rproc_boot_recovery(struct rproc *rproc) {
>>> + const struct firmware *firmware_p;
>>> + struct device *dev = &rproc->dev;
>>> + int ret;
>>> +
>>> + ret = rproc_stop(rproc, true);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + /* generate coredump */
>>> + rproc->ops->coredump(rproc);
>>> +
>>> + /* load firmware */
>>> + ret = request_firmware(&firmware_p, rproc->firmware, dev);
>>> + if (ret < 0) {
>>> + dev_err(dev, "request_firmware failed: %d\n", ret);
>>> + return ret;
>>> + }
>>> +
>>> + /* boot the remote processor up again */
>>> + ret = rproc_start(rproc, firmware_p);
>>> +
>>> + release_firmware(firmware_p);
>>> +
>>> + return ret;
>>> +}
>>> +
>>> /**
>>> * rproc_trigger_recovery() - recover a remoteproc
>>> * @rproc: the remote processor
>>> @@ -1898,7 +1937,6 @@ static int __rproc_detach(struct rproc *rproc)
>>> */
>>> int rproc_trigger_recovery(struct rproc *rproc) {
>>> - const struct firmware *firmware_p;
>>> struct device *dev = &rproc->dev;
>>> int ret;
>>>
>>> @@ -1912,24 +1950,10 @@ int rproc_trigger_recovery(struct rproc
>>> *rproc)
>>>
>>> dev_err(dev, "recovering %s\n", rproc->name);
>>>
>>> - ret = rproc_stop(rproc, true);
>>> - if (ret)
>>> - goto unlock_mutex;
>>> -
>>> - /* generate coredump */
>>> - rproc->ops->coredump(rproc);
>>> -
>>> - /* load firmware */
>>> - ret = request_firmware(&firmware_p, rproc->firmware, dev);
>>> - if (ret < 0) {
>>> - dev_err(dev, "request_firmware failed: %d\n", ret);
>>> - goto unlock_mutex;
>>> - }
>>> -
>>> - /* boot the remote processor up again */
>>> - ret = rproc_start(rproc, firmware_p);
>>> -
>>> - release_firmware(firmware_p);
>>> + if (rproc_has_feature(rproc, RPROC_FEAT_ATTACH_ON_RECOVERY))
>>> + ret = rproc_attach_recovery(rproc);
>>> + else
>>> + ret = rproc_boot_recovery(rproc);
>>>
>>> unlock_mutex:
>>> mutex_unlock(&rproc->lock);
>>> --
>>> 2.25.1
>>>

2022-09-27 18:21:59

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc crash

On Tue, Sep 27, 2022 at 10:10:31AM +0200, Arnaud POULIQUEN wrote:
> Hi,
>
> On 9/27/22 05:03, Peng Fan wrote:
> > Hi Mathieu,
> >
> >> Subject: Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc
> >> crash
> >>
> >> On Tue, Jul 05, 2022 at 09:15:27AM +0800, Peng Fan (OSS) wrote:
> >>> From: Peng Fan <[email protected]>
> >>>
> >>> Current logic only support main processor to stop/start the remote
> >>> processor after crash. However to SoC, such as i.MX8QM/QXP, the remote
> >>> processor could do attach recovery after crash and trigger watchdog to
> >>> reboot itself. It does not need main processor to load image, or
> >>> stop/start remote processor.
> >>>
> >>> Introduce two functions: rproc_attach_recovery, rproc_boot_recovery
> >>> for the two cases. Boot recovery is as before, let main processor to
> >>> help recovery, while attach recovery is to recover itself without help.
> >>> To attach recovery, we only do detach and attach.
> >>>
> >>> Acked-by: Arnaud Pouliquen <[email protected]>
> >>> Signed-off-by: Peng Fan <[email protected]>
> >>> ---
> >>> drivers/remoteproc/remoteproc_core.c | 62
> >>> +++++++++++++++++++---------
> >>> 1 file changed, 43 insertions(+), 19 deletions(-)
> >>>
> >>> diff --git a/drivers/remoteproc/remoteproc_core.c
> >>> b/drivers/remoteproc/remoteproc_core.c
> >>> index ed374c8bf14a..ef5b9310bc83 100644
> >>> --- a/drivers/remoteproc/remoteproc_core.c
> >>> +++ b/drivers/remoteproc/remoteproc_core.c
> >>> @@ -1884,6 +1884,45 @@ static int __rproc_detach(struct rproc *rproc)
> >>> return 0;
> >>> }
> >>>
> >>> +static int rproc_attach_recovery(struct rproc *rproc) {
> >>> + int ret;
> >>> +
> >>> + ret = __rproc_detach(rproc);
> >>> + if (ret)
> >>> + return ret;
> >>
> >> I thought there was a specific reason to _not_ call rproc->ops->coredump()
> >> for processors that have been attached to but looking at the STM32 and
> >> IMX_DSP now, it would seem logical to do so. Am I missing something?
> >
> > ATTACH RECOVERY is to support recovery without help from Linux,
> >
> > STM32 and IMX_DSP, both require linux to load image and start remote
> > core. So the two cases should not enable feature:
> > RPROC_FEAT_ATTACH_ON_RECOVERY
> >
> > Also considering the recovery is out of linux control, actually when linux
> > start dump, remote core may already recovered.
>
> I asked myself the same question. Indeed how to manage a core dump if the
> remote processor restarts autonomously.
> The answer doesn't seem obvious because it seems to be platform specific.
>
> For time being on STM32 we consider that the remoteproc memory can be corrupted
> so we don't plan to enable the feature by default even if the hardware allows it.
>
> If we implement it, I would see 2 use cases:
> - no core dump, the remote processor restart autonomously (need to manage the
> VIRTIO_CONFIG_S_NEEDS_RESET in resource table vdev for resynchronization)
> - core dump and the Linux stm32 driver handle the reset of the remote
> processor core to be able to perform the core dump (no firmware loading)
>
> What about dealing with the coredump in a separate thread, based on a concrete
> use case/need?

Definitely, we can deal with that later.

Peng - please send me a rebase as quickly as possible.

>
> Regards,
> Arnaud
>
> >
> >>
> >> And this set will need a rebase.
> >
> > I'll do the rebase and send V8 if the upper explanation could eliminate
> > your concern.
> >
> > Thanks,
> > Peng.
> >
> >>
> >> Thanks,
> >> Mathieu
> >>
> >>> +
> >>> + return __rproc_attach(rproc);
> >>> +}
> >>> +
> >>> +static int rproc_boot_recovery(struct rproc *rproc) {
> >>> + const struct firmware *firmware_p;
> >>> + struct device *dev = &rproc->dev;
> >>> + int ret;
> >>> +
> >>> + ret = rproc_stop(rproc, true);
> >>> + if (ret)
> >>> + return ret;
> >>> +
> >>> + /* generate coredump */
> >>> + rproc->ops->coredump(rproc);
> >>> +
> >>> + /* load firmware */
> >>> + ret = request_firmware(&firmware_p, rproc->firmware, dev);
> >>> + if (ret < 0) {
> >>> + dev_err(dev, "request_firmware failed: %d\n", ret);
> >>> + return ret;
> >>> + }
> >>> +
> >>> + /* boot the remote processor up again */
> >>> + ret = rproc_start(rproc, firmware_p);
> >>> +
> >>> + release_firmware(firmware_p);
> >>> +
> >>> + return ret;
> >>> +}
> >>> +
> >>> /**
> >>> * rproc_trigger_recovery() - recover a remoteproc
> >>> * @rproc: the remote processor
> >>> @@ -1898,7 +1937,6 @@ static int __rproc_detach(struct rproc *rproc)
> >>> */
> >>> int rproc_trigger_recovery(struct rproc *rproc) {
> >>> - const struct firmware *firmware_p;
> >>> struct device *dev = &rproc->dev;
> >>> int ret;
> >>>
> >>> @@ -1912,24 +1950,10 @@ int rproc_trigger_recovery(struct rproc
> >>> *rproc)
> >>>
> >>> dev_err(dev, "recovering %s\n", rproc->name);
> >>>
> >>> - ret = rproc_stop(rproc, true);
> >>> - if (ret)
> >>> - goto unlock_mutex;
> >>> -
> >>> - /* generate coredump */
> >>> - rproc->ops->coredump(rproc);
> >>> -
> >>> - /* load firmware */
> >>> - ret = request_firmware(&firmware_p, rproc->firmware, dev);
> >>> - if (ret < 0) {
> >>> - dev_err(dev, "request_firmware failed: %d\n", ret);
> >>> - goto unlock_mutex;
> >>> - }
> >>> -
> >>> - /* boot the remote processor up again */
> >>> - ret = rproc_start(rproc, firmware_p);
> >>> -
> >>> - release_firmware(firmware_p);
> >>> + if (rproc_has_feature(rproc, RPROC_FEAT_ATTACH_ON_RECOVERY))
> >>> + ret = rproc_attach_recovery(rproc);
> >>> + else
> >>> + ret = rproc_boot_recovery(rproc);
> >>>
> >>> unlock_mutex:
> >>> mutex_unlock(&rproc->lock);
> >>> --
> >>> 2.25.1
> >>>

2022-09-28 07:24:25

by Peng Fan (OSS)

[permalink] [raw]
Subject: Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc crash



On 9/28/2022 1:44 AM, Mathieu Poirier wrote:
> On Tue, Sep 27, 2022 at 10:10:31AM +0200, Arnaud POULIQUEN wrote:
>> Hi,
>>
>> On 9/27/22 05:03, Peng Fan wrote:
>>> Hi Mathieu,
>>>
>>>> Subject: Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc
>>>> crash
>>>>
>>>> On Tue, Jul 05, 2022 at 09:15:27AM +0800, Peng Fan (OSS) wrote:
>>>>> From: Peng Fan <[email protected]>
>>>>>
>>>>> Current logic only support main processor to stop/start the remote
>>>>> processor after crash. However to SoC, such as i.MX8QM/QXP, the remote
>>>>> processor could do attach recovery after crash and trigger watchdog to
>>>>> reboot itself. It does not need main processor to load image, or
>>>>> stop/start remote processor.
>>>>>
>>>>> Introduce two functions: rproc_attach_recovery, rproc_boot_recovery
>>>>> for the two cases. Boot recovery is as before, let main processor to
>>>>> help recovery, while attach recovery is to recover itself without help.
>>>>> To attach recovery, we only do detach and attach.
>>>>>
>>>>> Acked-by: Arnaud Pouliquen <[email protected]>
>>>>> Signed-off-by: Peng Fan <[email protected]>
>>>>> ---
>>>>> drivers/remoteproc/remoteproc_core.c | 62
>>>>> +++++++++++++++++++---------
>>>>> 1 file changed, 43 insertions(+), 19 deletions(-)
>>>>>
>>>>> diff --git a/drivers/remoteproc/remoteproc_core.c
>>>>> b/drivers/remoteproc/remoteproc_core.c
>>>>> index ed374c8bf14a..ef5b9310bc83 100644
>>>>> --- a/drivers/remoteproc/remoteproc_core.c
>>>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>>>> @@ -1884,6 +1884,45 @@ static int __rproc_detach(struct rproc *rproc)
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +static int rproc_attach_recovery(struct rproc *rproc) {
>>>>> + int ret;
>>>>> +
>>>>> + ret = __rproc_detach(rproc);
>>>>> + if (ret)
>>>>> + return ret;
>>>>
>>>> I thought there was a specific reason to _not_ call rproc->ops->coredump()
>>>> for processors that have been attached to but looking at the STM32 and
>>>> IMX_DSP now, it would seem logical to do so. Am I missing something?
>>>
>>> ATTACH RECOVERY is to support recovery without help from Linux,
>>>
>>> STM32 and IMX_DSP, both require linux to load image and start remote
>>> core. So the two cases should not enable feature:
>>> RPROC_FEAT_ATTACH_ON_RECOVERY
>>>
>>> Also considering the recovery is out of linux control, actually when linux
>>> start dump, remote core may already recovered.
>>
>> I asked myself the same question. Indeed how to manage a core dump if the
>> remote processor restarts autonomously.
>> The answer doesn't seem obvious because it seems to be platform specific.
>>
>> For time being on STM32 we consider that the remoteproc memory can be corrupted
>> so we don't plan to enable the feature by default even if the hardware allows it.
>>
>> If we implement it, I would see 2 use cases:
>> - no core dump, the remote processor restart autonomously (need to manage the
>> VIRTIO_CONFIG_S_NEEDS_RESET in resource table vdev for resynchronization)
>> - core dump and the Linux stm32 driver handle the reset of the remote
>> processor core to be able to perform the core dump (no firmware loading)
>>
>> What about dealing with the coredump in a separate thread, based on a concrete
>> use case/need?
>
> Definitely, we can deal with that later.
>
> Peng - please send me a rebase as quickly as possible.

Mathieu,

Just send out V8 rebased on linux-next/master tag: next-20220927

Thanks,
Peng.
>
>>
>> Regards,
>> Arnaud
>>
>>>
>>>>
>>>> And this set will need a rebase.
>>>
>>> I'll do the rebase and send V8 if the upper explanation could eliminate
>>> your concern.
>>>
>>> Thanks,
>>> Peng.
>>>
>>>>
>>>> Thanks,
>>>> Mathieu
>>>>
>>>>> +
>>>>> + return __rproc_attach(rproc);
>>>>> +}
>>>>> +
>>>>> +static int rproc_boot_recovery(struct rproc *rproc) {
>>>>> + const struct firmware *firmware_p;
>>>>> + struct device *dev = &rproc->dev;
>>>>> + int ret;
>>>>> +
>>>>> + ret = rproc_stop(rproc, true);
>>>>> + if (ret)
>>>>> + return ret;
>>>>> +
>>>>> + /* generate coredump */
>>>>> + rproc->ops->coredump(rproc);
>>>>> +
>>>>> + /* load firmware */
>>>>> + ret = request_firmware(&firmware_p, rproc->firmware, dev);
>>>>> + if (ret < 0) {
>>>>> + dev_err(dev, "request_firmware failed: %d\n", ret);
>>>>> + return ret;
>>>>> + }
>>>>> +
>>>>> + /* boot the remote processor up again */
>>>>> + ret = rproc_start(rproc, firmware_p);
>>>>> +
>>>>> + release_firmware(firmware_p);
>>>>> +
>>>>> + return ret;
>>>>> +}
>>>>> +
>>>>> /**
>>>>> * rproc_trigger_recovery() - recover a remoteproc
>>>>> * @rproc: the remote processor
>>>>> @@ -1898,7 +1937,6 @@ static int __rproc_detach(struct rproc *rproc)
>>>>> */
>>>>> int rproc_trigger_recovery(struct rproc *rproc) {
>>>>> - const struct firmware *firmware_p;
>>>>> struct device *dev = &rproc->dev;
>>>>> int ret;
>>>>>
>>>>> @@ -1912,24 +1950,10 @@ int rproc_trigger_recovery(struct rproc
>>>>> *rproc)
>>>>>
>>>>> dev_err(dev, "recovering %s\n", rproc->name);
>>>>>
>>>>> - ret = rproc_stop(rproc, true);
>>>>> - if (ret)
>>>>> - goto unlock_mutex;
>>>>> -
>>>>> - /* generate coredump */
>>>>> - rproc->ops->coredump(rproc);
>>>>> -
>>>>> - /* load firmware */
>>>>> - ret = request_firmware(&firmware_p, rproc->firmware, dev);
>>>>> - if (ret < 0) {
>>>>> - dev_err(dev, "request_firmware failed: %d\n", ret);
>>>>> - goto unlock_mutex;
>>>>> - }
>>>>> -
>>>>> - /* boot the remote processor up again */
>>>>> - ret = rproc_start(rproc, firmware_p);
>>>>> -
>>>>> - release_firmware(firmware_p);
>>>>> + if (rproc_has_feature(rproc, RPROC_FEAT_ATTACH_ON_RECOVERY))
>>>>> + ret = rproc_attach_recovery(rproc);
>>>>> + else
>>>>> + ret = rproc_boot_recovery(rproc);
>>>>>
>>>>> unlock_mutex:
>>>>> mutex_unlock(&rproc->lock);
>>>>> --
>>>>> 2.25.1
>>>>>