LinuxLists.cc - [PATCH] remoteproc: block premature rproc booting

2012-05-22 11:51:27

Subject: [PATCH] remoteproc: block premature rproc booting

When an rproc instance is registered, remoteproc asynchronously
loads its firmware in order to tell which vdevs it supports.

Later on those vdevs are registered, and when probed, their drivers
usually trigger powering on of the remote processor.

OTOH, non-standard scenarios might involve early invocation of
rproc_boot even before the asynchronous fw loading has completed.

We're not sure we really want to support those scenarios, but right
now we do (e.g. via rproc_get_by_name), so let's simply fix this race
by blocking those premature rproc_boot() flows until the async fw
loading is completed.

Reported-and-tested-by: Sjur Brandeland <[email protected]>
Signed-off-by: Ohad Ben-Cohen <[email protected]>
---
drivers/remoteproc/remoteproc_core.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 40e2b2d..464ea4f 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1141,6 +1141,18 @@ int rproc_boot(struct rproc *rproc)

dev = rproc->dev;

+ /*
+ * if asynchronoush fw loading is underway, wait up to 65 secs
+ * (just a bit more than the firmware request's timeout)
+ */
+ ret = wait_for_completion_interruptible_timeout(
+ &rproc->firmware_loading_complete,
+ msecs_to_jiffies(65000));
+ if (ret <= 0) {
+ dev_err(dev, "async fw loading isn't complete: %d\n", ret);
+ return ret ? ret : -ETIMEDOUT;
+ }
+
ret = mutex_lock_interruptible(&rproc->lock);
if (ret) {
dev_err(dev, "can't lock rproc %s: %d\n", rproc->name, ret);
--
1.7.5.4

2012-05-24 09:15:57

by Stephen Boyd

[permalink] [raw]

Subject: Re: [PATCH] remoteproc: block premature rproc booting

On 5/22/2012 4:51 AM, Ohad Ben-Cohen wrote:
> When an rproc instance is registered, remoteproc asynchronously
> loads its firmware in order to tell which vdevs it supports.
>
> Later on those vdevs are registered, and when probed, their drivers
> usually trigger powering on of the remote processor.
>
> OTOH, non-standard scenarios might involve early invocation of
> rproc_boot even before the asynchronous fw loading has completed.
>
> We're not sure we really want to support those scenarios, but right
> now we do (e.g. via rproc_get_by_name), so let's simply fix this race
> by blocking those premature rproc_boot() flows until the async fw
> loading is completed.
>
> Reported-and-tested-by: Sjur Brandeland <[email protected]>
> Signed-off-by: Ohad Ben-Cohen <[email protected]>
> ---
> drivers/remoteproc/remoteproc_core.c | 12 ++++++++++++
> 1 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 40e2b2d..464ea4f 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1141,6 +1141,18 @@ int rproc_boot(struct rproc *rproc)
>
> dev = rproc->dev;
>
> + /*
> + * if asynchronoush fw loading is underway, wait up to 65 secs
> + * (just a bit more than the firmware request's timeout)
> + */
> + ret = wait_for_completion_interruptible_timeout(
> + &rproc->firmware_loading_complete,
> + msecs_to_jiffies(65000));

The request_firmware timeout is defaulted to 60 seconds but not
necessarily 60 if the user has changed the timeout in sysfs.

Why does this need to be a timeout at all? Presumably
request_firmware_nowait() in rproc_register() will timeout and complete
the firmware_loading_complete completion variable. Would it suffice to
have some new rproc->state like RPROC_UNKNOWN that we set in
rproc_register() before adding it to the list of rprocs? If we find the
firmware then we set it to RPROC_READY or RPROC_REGISTERED? Then we
wait_for_completion() and check the state, failing if it's still in the
unknown state.

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2012-05-24 20:11:57

by Ohad Ben Cohen

[permalink] [raw]

Subject: Re: [PATCH] remoteproc: block premature rproc booting

Hi Stephen,

On Thu, May 24, 2012 at 12:15 PM, Stephen Boyd <[email protected]> wrote:
> The request_firmware timeout is defaulted to 60 seconds but not
> necessarily 60 if the user has changed the timeout in sysfs.
..
> Why does this need to be a timeout at all? Presumably
> request_firmware_nowait() in rproc_register() will timeout and complete
> the firmware_loading_complete completion variable.

Very good points, thanks for pointing them out!

> Would it suffice to
> have some new rproc->state like RPROC_UNKNOWN that we set in
> rproc_register() before adding it to the list of rprocs? If we find the
> firmware then we set it to RPROC_READY or RPROC_REGISTERED? Then we
> wait_for_completion() and check the state, failing if it's still in the
> unknown state.

That makes me think - what if we'll add the rproc to the list only
after we find the firmware? This way we avoid this race completely.

Speaking of which, I was wondering whether you guys have some free
cycles to try remoteproc out.

The main reason we kept the get/put interface was to make it easier
for you guys to adopt it, but I've been re-thinking lately whether we
really want that interface. It's a problematic interface with
non-negligible maintenance burden, and the code will be greatly
simplified without it.

Even if you guys won't be adopting virtio (of any other
virtio-over-smd variant) - do you think you'll be able to adopt a
similar model with which remoteproc registers, according to the fw
capabilities, the relevant devices which then get bound to the
relevant higher-level drivers (virtio drivers, smd drivers, etc..) ?
That way these devices can point to the rproc context and we don't
need any get_by_name interface.

Thanks,
Ohad.

2012-06-04 21:23:56

by Stephen Boyd

[permalink] [raw]

Subject: Re: [PATCH] remoteproc: block premature rproc booting

On 05/24/12 13:11, Ohad Ben-Cohen wrote:
>
>> Would it suffice to
>> have some new rproc->state like RPROC_UNKNOWN that we set in
>> rproc_register() before adding it to the list of rprocs? If we find the
>> firmware then we set it to RPROC_READY or RPROC_REGISTERED? Then we
>> wait_for_completion() and check the state, failing if it's still in the
>> unknown state.
> That makes me think - what if we'll add the rproc to the list only
> after we find the firmware? This way we avoid this race completely.

I thought we wanted to allow both calls to proceed in parallel? If we
don't care about that then "announcing" it once the firmware is found
the first time sounds correct.

>
> Speaking of which, I was wondering whether you guys have some free
> cycles to try remoteproc out.

I want to try it out but I've not had enough free time to do so.

>
> The main reason we kept the get/put interface was to make it easier
> for you guys to adopt it, but I've been re-thinking lately whether we
> really want that interface. It's a problematic interface with
> non-negligible maintenance burden, and the code will be greatly
> simplified without it.

If nobody in the kernel is using it why keep it? If MSM needs we can add
it back when we move to rproc.

>
> Even if you guys won't be adopting virtio (of any other
> virtio-over-smd variant) - do you think you'll be able to adopt a
> similar model with which remoteproc registers, according to the fw
> capabilities, the relevant devices which then get bound to the
> relevant higher-level drivers (virtio drivers, smd drivers, etc..) ?
> That way these devices can point to the rproc context and we don't
> need any get_by_name interface.

I think we'll need to have some way to describe the resources in the
kernel when we register the rproc. We aren't going to be able to change
our firmware to have the resource section. It would probably just be one
device (the equivalent of the rpmsg device but for smd channels).
Everything else would build on top of the smd virtio device.

Getting to that point would require changing smd code to be more linux
device model friendly. We're exploring using virtio over smd (basically
virtqueues would replace smd APIs while we replace the vrings with smd
wire protocol). If that works out then we'll be able to attach the smd
virtio device to the remote proc via some in kernel resource list. Then
I imagine when the rproc registers we'll add the device directly. We
need to figure out some way to delay loading of the firmware at that
point. I guess it means we'll need to move both firmware requests to be
async and the first firmware request to get the resource table will be
skipped on MSM.

One sticking point has been the desire to shut down processors when
they're not in use and reclaim the memory. Also we would like to upgrade
the firmware without rebooting the phone. Do you have any plans for
that? It looks like the current design boots anything that is registered
and has a matching rpmsg driver. I suppose one solution is to use
modules but that looks like it disrupts other processors (I don't want
to rmmod all of rpmsg). We probably need some sort of shutdown/boot
mechanism that isn't driven entirely by the client drivers.

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2012-06-05 10:57:41

by Ohad Ben Cohen

[permalink] [raw]

Subject: Re: [PATCH] remoteproc: block premature rproc booting

On Tue, Jun 5, 2012 at 12:23 AM, Stephen Boyd <[email protected]> wrote:
> I thought we wanted to allow both calls to proceed in parallel? If we
> don't care about that

Yeah, I don't think we do.

> then "announcing" it once the firmware is found the first time sounds correct.

I agree. Though this patch may be moot very soon due to:

>> The main reason we kept the get/put interface was to make it easier
>> for you guys to adopt it, but I've been re-thinking lately whether we
>> really want that interface. It's a problematic interface with
>> non-negligible maintenance burden, and the code will be greatly
>> simplified without it.
>
> If nobody in the kernel is using it why keep it?

I was concerned that the non get/put interface might not suit
everyone, and I planned to wait for another user or two to show up
before I remove that interface.

Since MSM's PIL is based on a get/put interface, I actually hoped to
see if you guys can adopt the new interface before we ditch the
get/put one.

> If MSM needs we can add it back when we move to rproc.

Thanks - that's the kind of feedback I wanted to get.

> I think we'll need to have some way to describe the resources in the
> kernel when we register the rproc. We aren't going to be able to change
> our firmware to have the resource section.

What about using a separate file for the resource table ?

That should be very easy to support, and may make life easier for you
in the long term.

Resource tables tend to change in time, and hard coding it in the
kernel doesn't sound ideal (both in terms of development overhead, and
kernel-firmware backward and forward compatibility).

> Getting to that point would require changing smd code to be more linux
> device model friendly. We're exploring using virtio over smd (basically
> virtqueues would replace smd APIs while we replace the vrings with smd
> wire protocol).

That sounds good.

> If that works out then we'll be able to attach the smd
> virtio device to the remote proc via some in kernel resource list.

Please consider using an external file for the resource table. That
should give you more flexibility and less overhead.

> One sticking point has been the desire to shut down processors when
> they're not in use and reclaim the memory. Also we would like to upgrade
> the firmware without rebooting the phone. Do you have any plans for
> that? It looks like the current design boots anything that is registered
> and has a matching rpmsg driver. I suppose one solution is to use
> modules but that looks like it disrupts other processors (I don't want
> to rmmod all of rpmsg). We probably need some sort of shutdown/boot
> mechanism that isn't driven entirely by the client drivers.

Does the below work for you (sans the OMAP terminology ;) ?

root@omap4430-panda:/sys/bus/platform/drivers/omap-rproc# echo
omap-rproc.1 > unbind
[ 471.376556] remoteproc remoteproc0: releasing ipu_c0
root@omap4430-panda:/sys/bus/platform/drivers/omap-rproc# echo
omap-rproc.1 > bind
[ 478.219177] remoteproc remoteproc0: ipu_c0 is available
[ 478.224639] remoteproc remoteproc0: Note: remoteproc is still under
development and considered experimental.
[ 478.235015] remoteproc remoteproc0: THE BINARY FORMAT IS NOT YET
FINALIZED, and backward compatibility isn't yet guaranteed.
[ 478.325347] remoteproc remoteproc0: registered virtio0 (type 7)
[ 478.331848] remoteproc remoteproc0: registered virtio1 (type 3)

This way user space can unbind a specific remote processor (which will
also trigger unbinding the entire device hierarchy below it, i.e. all
rpmsg/virtio devices).

Thanks,
Ohad.

2012-06-06 03:25:07

by Stephen Boyd

[permalink] [raw]

Subject: Re: [PATCH] remoteproc: block premature rproc booting

On 06/05/12 03:57, Ohad Ben-Cohen wrote:
> What about using a separate file for the resource table ?
>
> That should be very easy to support, and may make life easier for you
> in the long term.
>
> Resource tables tend to change in time, and hard coding it in the
> kernel doesn't sound ideal (both in terms of development overhead, and
> kernel-firmware backward and forward compatibility).

Thanks. I'll look into that as that seems feasible.

> Does the below work for you (sans the OMAP terminology ;) ?
>
> root@omap4430-panda:/sys/bus/platform/drivers/omap-rproc# echo
> omap-rproc.1 > unbind
> [ 471.376556] remoteproc remoteproc0: releasing ipu_c0
> root@omap4430-panda:/sys/bus/platform/drivers/omap-rproc# echo
> omap-rproc.1 > bind
> [ 478.219177] remoteproc remoteproc0: ipu_c0 is available
> [ 478.224639] remoteproc remoteproc0: Note: remoteproc is still under
> development and considered experimental.
> [ 478.235015] remoteproc remoteproc0: THE BINARY FORMAT IS NOT YET
> FINALIZED, and backward compatibility isn't yet guaranteed.
> [ 478.325347] remoteproc remoteproc0: registered virtio0 (type 7)
> [ 478.331848] remoteproc remoteproc0: registered virtio1 (type 3)
>
> This way user space can unbind a specific remote processor (which will
> also trigger unbinding the entire device hierarchy below it, i.e. all
> rpmsg/virtio devices).

This is great! I finally see how bind/unbind is useful.

What if I don't want to boot the device at kernel start-up? Do I have to
make it a module then?

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2012-06-06 05:44:40

by Ohad Ben Cohen

[permalink] [raw]

Subject: Re: [PATCH] remoteproc: block premature rproc booting

On Wed, Jun 6, 2012 at 6:25 AM, Stephen Boyd <[email protected]> wrote:
> What if I don't want to boot the device at kernel start-up? Do I have to
> make it a module then?

Currently, yeah.

But we can change stuff if needed.

We took that design decision because it was simple and we didn't have
any reason not to do it, but I'm thinking we can let the users decide
about this using runtime pm (e.g. calling pm_runtime_get_sync with an
rpmsg device, which will then be propagated all the way up to the
remoteproc device and power it up).

2012-06-29 14:56:31

by Ohad Ben Cohen

[permalink] [raw]

Subject: Re: [PATCH] remoteproc: block premature rproc booting

Hello Preetham rao,

On Fri, Jun 29, 2012 at 2:13 PM, Preetham-rao K
<[email protected]> wrote:
> We have some questions ?
>
>
>
> Is remoteproc ?self governor ?the remote processor?
>
> When the time of crash on remote processor, will remoteproc take care of
> loading the firmware?

Yes. the low level driver just has to notify remoteproc that the crash
occurred, and the remoteproc framework will take care of the rest.

This is not upstream yet - but Fernando (cc'ed) already has the
patches and can probably share them with you if you're interested.

Ohad.

> Or user has to drive this using remoteproc API?
>
>
>
> Regards,
>
> Preetham rao