2021-08-05 02:51:41

by Rishabh Bhatnagar

[permalink] [raw]
Subject: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

Mailbox channels for the base protocol are setup during probe.
There can be a scenario where probe fails to acquire the base
protocol due to a timeout leading to cleaning up of all device
managed memory including the scmi_mailbox structure setup during
mailbox_chan_setup function.
[ 12.735104]arm-scmi soc:qcom,scmi: timed out in resp(caller: version_get+0x84/0x140)
[ 12.735224]arm-scmi soc:qcom,scmi: unable to communicate with SCMI
[ 12.735947]arm-scmi: probe of soc:qcom,scmi failed with error -110

Now when a message arrives at cpu slightly after the timeout, the mailbox
controller will try to call the rx_callback of the client and might end
up accessing freed memory.
[ 12.758363][ C0] Call trace:
[ 12.758367][ C0] rx_callback+0x24/0x160
[ 12.758372][ C0] mbox_chan_received_data+0x44/0x94
[ 12.758386][ C0] __handle_irq_event_percpu+0xd4/0x240
This patch frees the mailbox channels setup during probe and adds some more
error handling in case the probe fails.

Signed-off-by: Rishabh Bhatnagar <[email protected]>
---
drivers/firmware/arm_scmi/driver.c | 35 ++++++++++++++++++++++++-----------
1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index 9b2e8d4..ead3bd3 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -1390,6 +1390,21 @@ void scmi_protocol_device_unrequest(const struct scmi_device_id *id_table)
mutex_unlock(&scmi_requested_devices_mtx);
}

+static int scmi_cleanup_txrx_channels(struct scmi_info *info)
+{
+ int ret;
+ struct idr *idr = &info->tx_idr;
+
+ ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
+ idr_destroy(&info->tx_idr);
+
+ idr = &info->rx_idr;
+ ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
+ idr_destroy(&info->rx_idr);
+
+ return ret;
+}
+
static int scmi_probe(struct platform_device *pdev)
{
int ret;
@@ -1430,7 +1445,7 @@ static int scmi_probe(struct platform_device *pdev)

ret = scmi_xfer_info_init(info);
if (ret)
- return ret;
+ goto clear_txrx_setup;

if (scmi_notification_init(handle))
dev_err(dev, "SCMI Notifications NOT available.\n");
@@ -1443,7 +1458,7 @@ static int scmi_probe(struct platform_device *pdev)
ret = scmi_protocol_acquire(handle, SCMI_PROTOCOL_BASE);
if (ret) {
dev_err(dev, "unable to communicate with SCMI\n");
- return ret;
+ goto notification_exit;
}

mutex_lock(&scmi_list_mutex);
@@ -1482,6 +1497,12 @@ static int scmi_probe(struct platform_device *pdev)
}

return 0;
+
+notification_exit:
+ scmi_notification_exit(&info->handle);
+clear_txrx_setup:
+ scmi_cleanup_txrx_channels(info);
+ return ret;
}

void scmi_free_channel(struct scmi_chan_info *cinfo, struct idr *idr, int id)
@@ -1493,7 +1514,6 @@ static int scmi_remove(struct platform_device *pdev)
{
int ret = 0, id;
struct scmi_info *info = platform_get_drvdata(pdev);
- struct idr *idr = &info->tx_idr;
struct device_node *child;

mutex_lock(&scmi_list_mutex);
@@ -1517,14 +1537,7 @@ static int scmi_remove(struct platform_device *pdev)
idr_destroy(&info->active_protocols);

/* Safe to free channels since no more users */
- ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
- idr_destroy(&info->tx_idr);
-
- idr = &info->rx_idr;
- ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
- idr_destroy(&info->rx_idr);
-
- return ret;
+ return scmi_cleanup_txrx_channels(info);
}

static ssize_t protocol_version_show(struct device *dev,
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


2021-08-05 13:17:35

by Cristian Marussi

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On Wed, Aug 04, 2021 at 02:19:59PM -0700, Rishabh Bhatnagar wrote:
> Mailbox channels for the base protocol are setup during probe.
> There can be a scenario where probe fails to acquire the base
> protocol due to a timeout leading to cleaning up of all device
> managed memory including the scmi_mailbox structure setup during
> mailbox_chan_setup function.
> [ 12.735104]arm-scmi soc:qcom,scmi: timed out in resp(caller: version_get+0x84/0x140)
> [ 12.735224]arm-scmi soc:qcom,scmi: unable to communicate with SCMI
> [ 12.735947]arm-scmi: probe of soc:qcom,scmi failed with error -110
>
> Now when a message arrives at cpu slightly after the timeout, the mailbox
> controller will try to call the rx_callback of the client and might end
> up accessing freed memory.
> [ 12.758363][ C0] Call trace:
> [ 12.758367][ C0] rx_callback+0x24/0x160
> [ 12.758372][ C0] mbox_chan_received_data+0x44/0x94
> [ 12.758386][ C0] __handle_irq_event_percpu+0xd4/0x240
> This patch frees the mailbox channels setup during probe and adds some more
> error handling in case the probe fails.
>
> Signed-off-by: Rishabh Bhatnagar <[email protected]>
> ---
> drivers/firmware/arm_scmi/driver.c | 35 ++++++++++++++++++++++++-----------
> 1 file changed, 24 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
> index 9b2e8d4..ead3bd3 100644
> --- a/drivers/firmware/arm_scmi/driver.c
> +++ b/drivers/firmware/arm_scmi/driver.c
> @@ -1390,6 +1390,21 @@ void scmi_protocol_device_unrequest(const struct scmi_device_id *id_table)
> mutex_unlock(&scmi_requested_devices_mtx);
> }
>

Hi,

> +static int scmi_cleanup_txrx_channels(struct scmi_info *info)
> +{
> + int ret;
> + struct idr *idr = &info->tx_idr;
> +
> + ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> + idr_destroy(&info->tx_idr);
> +
> + idr = &info->rx_idr;
> + ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> + idr_destroy(&info->rx_idr);
> +
> + return ret;
> +}
> +
> static int scmi_probe(struct platform_device *pdev)
> {
> int ret;
> @@ -1430,7 +1445,7 @@ static int scmi_probe(struct platform_device *pdev)
>
> ret = scmi_xfer_info_init(info);
> if (ret)
> - return ret;
> + goto clear_txrx_setup;
>
> if (scmi_notification_init(handle))
> dev_err(dev, "SCMI Notifications NOT available.\n");
> @@ -1443,7 +1458,7 @@ static int scmi_probe(struct platform_device *pdev)
> ret = scmi_protocol_acquire(handle, SCMI_PROTOCOL_BASE);
> if (ret) {
> dev_err(dev, "unable to communicate with SCMI\n");
> - return ret;
> + goto notification_exit;
> }
>
> mutex_lock(&scmi_list_mutex);
> @@ -1482,6 +1497,12 @@ static int scmi_probe(struct platform_device *pdev)
> }
>
> return 0;
> +
> +notification_exit:
> + scmi_notification_exit(&info->handle);
> +clear_txrx_setup:
> + scmi_cleanup_txrx_channels(info);
> + return ret;
> }
>
> void scmi_free_channel(struct scmi_chan_info *cinfo, struct idr *idr, int id)
> @@ -1493,7 +1514,6 @@ static int scmi_remove(struct platform_device *pdev)
> {
> int ret = 0, id;
> struct scmi_info *info = platform_get_drvdata(pdev);
> - struct idr *idr = &info->tx_idr;
> struct device_node *child;
>
> mutex_lock(&scmi_list_mutex);
> @@ -1517,14 +1537,7 @@ static int scmi_remove(struct platform_device *pdev)
> idr_destroy(&info->active_protocols);
>
> /* Safe to free channels since no more users */
> - ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> - idr_destroy(&info->tx_idr);
> -
> - idr = &info->rx_idr;
> - ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> - idr_destroy(&info->rx_idr);
> -
> - return ret;
> + return scmi_cleanup_txrx_channels(info);
> }
>

Looks good to me.

Reviewed-by: Cristian Marussi <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
(Re-tested on for-next/scmi on top of virtio-scmi series)

Thanks,
Cristian

2021-08-09 05:10:28

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On Wed, 4 Aug 2021 14:19:59 -0700, Rishabh Bhatnagar wrote:
> Mailbox channels for the base protocol are setup during probe.
> There can be a scenario where probe fails to acquire the base
> protocol due to a timeout leading to cleaning up of all device
> managed memory including the scmi_mailbox structure setup during
> mailbox_chan_setup function.
> [ 12.735104]arm-scmi soc:qcom,scmi: timed out in resp(caller: version_get+0x84/0x140)
> [ 12.735224]arm-scmi soc:qcom,scmi: unable to communicate with SCMI
> [ 12.735947]arm-scmi: probe of soc:qcom,scmi failed with error -110
>
> [...]

Applied to sudeep.holla/linux (for-next/scmi), thanks!

[1/1] firmware: arm_scmi: Free mailbox channels if probe fails
https://git.kernel.org/sudeep.holla/c/1e7cbfaa66

--
Regards,
Sudeep

2021-08-30 21:11:49

by Rishabh Bhatnagar

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

Hi Christian
There seems to be another issue here. The response from agent can be
delayed causing a timeout during base protocol acquire,
which leads to the probe failure. What I have observed is sometimes the
failure of probe and rx_callback (due to a delayed message)
happens at the same time on different cpus.
Because of this race, the device memory may be cleared while the
interrupt(rx_callback) is executing on another cpu.
How do you propose we solve this? Do you think it is better to take the
setting up of base and other protocols out of probe and
in some delayed work? That would imply the device memory is not released
until remove is called. Or should we add locking to
the interrupt handler(scmi_rx_callback) and the cleanup in probe to
avoid the race?

On 2021-08-05 03:54, Cristian Marussi wrote:
> On Wed, Aug 04, 2021 at 02:19:59PM -0700, Rishabh Bhatnagar wrote:
>> Mailbox channels for the base protocol are setup during probe.
>> There can be a scenario where probe fails to acquire the base
>> protocol due to a timeout leading to cleaning up of all device
>> managed memory including the scmi_mailbox structure setup during
>> mailbox_chan_setup function.
>> [ 12.735104]arm-scmi soc:qcom,scmi: timed out in resp(caller:
>> version_get+0x84/0x140)
>> [ 12.735224]arm-scmi soc:qcom,scmi: unable to communicate with SCMI
>> [ 12.735947]arm-scmi: probe of soc:qcom,scmi failed with error -110
>>
>> Now when a message arrives at cpu slightly after the timeout, the
>> mailbox
>> controller will try to call the rx_callback of the client and might
>> end
>> up accessing freed memory.
>> [ 12.758363][ C0] Call trace:
>> [ 12.758367][ C0] rx_callback+0x24/0x160
>> [ 12.758372][ C0] mbox_chan_received_data+0x44/0x94
>> [ 12.758386][ C0] __handle_irq_event_percpu+0xd4/0x240
>> This patch frees the mailbox channels setup during probe and adds some
>> more
>> error handling in case the probe fails.
>>
>> Signed-off-by: Rishabh Bhatnagar <[email protected]>
>> ---
>> drivers/firmware/arm_scmi/driver.c | 35
>> ++++++++++++++++++++++++-----------
>> 1 file changed, 24 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/firmware/arm_scmi/driver.c
>> b/drivers/firmware/arm_scmi/driver.c
>> index 9b2e8d4..ead3bd3 100644
>> --- a/drivers/firmware/arm_scmi/driver.c
>> +++ b/drivers/firmware/arm_scmi/driver.c
>> @@ -1390,6 +1390,21 @@ void scmi_protocol_device_unrequest(const
>> struct scmi_device_id *id_table)
>> mutex_unlock(&scmi_requested_devices_mtx);
>> }
>>
>
> Hi,
>
>> +static int scmi_cleanup_txrx_channels(struct scmi_info *info)
>> +{
>> + int ret;
>> + struct idr *idr = &info->tx_idr;
>> +
>> + ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> + idr_destroy(&info->tx_idr);
>> +
>> + idr = &info->rx_idr;
>> + ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> + idr_destroy(&info->rx_idr);
>> +
>> + return ret;
>> +}
>> +
>> static int scmi_probe(struct platform_device *pdev)
>> {
>> int ret;
>> @@ -1430,7 +1445,7 @@ static int scmi_probe(struct platform_device
>> *pdev)
>>
>> ret = scmi_xfer_info_init(info);
>> if (ret)
>> - return ret;
>> + goto clear_txrx_setup;
>>
>> if (scmi_notification_init(handle))
>> dev_err(dev, "SCMI Notifications NOT available.\n");
>> @@ -1443,7 +1458,7 @@ static int scmi_probe(struct platform_device
>> *pdev)
>> ret = scmi_protocol_acquire(handle, SCMI_PROTOCOL_BASE);
>> if (ret) {
>> dev_err(dev, "unable to communicate with SCMI\n");
>> - return ret;
>> + goto notification_exit;
>> }
>>
>> mutex_lock(&scmi_list_mutex);
>> @@ -1482,6 +1497,12 @@ static int scmi_probe(struct platform_device
>> *pdev)
>> }
>>
>> return 0;
>> +
>> +notification_exit:
>> + scmi_notification_exit(&info->handle);
>> +clear_txrx_setup:
>> + scmi_cleanup_txrx_channels(info);
>> + return ret;
>> }
>>
>> void scmi_free_channel(struct scmi_chan_info *cinfo, struct idr *idr,
>> int id)
>> @@ -1493,7 +1514,6 @@ static int scmi_remove(struct platform_device
>> *pdev)
>> {
>> int ret = 0, id;
>> struct scmi_info *info = platform_get_drvdata(pdev);
>> - struct idr *idr = &info->tx_idr;
>> struct device_node *child;
>>
>> mutex_lock(&scmi_list_mutex);
>> @@ -1517,14 +1537,7 @@ static int scmi_remove(struct platform_device
>> *pdev)
>> idr_destroy(&info->active_protocols);
>>
>> /* Safe to free channels since no more users */
>> - ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> - idr_destroy(&info->tx_idr);
>> -
>> - idr = &info->rx_idr;
>> - ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> - idr_destroy(&info->rx_idr);
>> -
>> - return ret;
>> + return scmi_cleanup_txrx_channels(info);
>> }
>>
>
> Looks good to me.
>
> Reviewed-by: Cristian Marussi <[email protected]>
> Tested-by: Cristian Marussi <[email protected]>
> (Re-tested on for-next/scmi on top of virtio-scmi series)
>
> Thanks,
> Cristian

2021-08-31 05:51:57

by Cristian Marussi

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On Mon, Aug 30, 2021 at 02:09:37PM -0700, [email protected] wrote:
> Hi Christian

Hi Rishabh,

thanks for looking into this kind of bad interactions.

> There seems to be another issue here. The response from agent can be delayed
> causing a timeout during base protocol acquire,
> which leads to the probe failure. What I have observed is sometimes the
> failure of probe and rx_callback (due to a delayed message)
> happens at the same time on different cpus.
> Because of this race, the device memory may be cleared while the
> interrupt(rx_callback) is executing on another cpu.

You are right that concurrency was not handled properly in this kind of
context and moreover, if you think about it, even the case of out of
order reception of responses and delayed_responses (type2 SCMI messages)
for asynchronous SCMI commands was not handled properly.

> How do you propose we solve this? Do you think it is better to take the
> setting up of base and other protocols out of probe and
> in some delayed work? That would imply the device memory is not released
> until remove is called. Or should we add locking to
> the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
> the race?
>

These issues were more easily exposed by SCMI Virtio transport, so in
the series where I introduced scmi-virtio:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

(which is now queued for v5.15 ... now on -next I think...finger crossed)

I took the chance to rectify a couple of other things in the SCMI core
in the initial commits.
As an example, in the above series

[PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and out-of-order messages

cares to add a refcount to xfers and some locking on xfers between TX
and RX path to avoid that a timed out xfer can vanish while the rx path
is concurrently working on it (as you said); moreover I handle the
condition (rare if not unplausible anyway) in which a transport delivers
out of order responses and delayed responses.

I tested this scenarios on some fake emulated SCMI Virtio transport
where I could play any sort of mess and tricks to stress this limit
conditions, but you're more than welcome to verify if the race you are
seeing on Base protocol time out is solved (as I would hope :D) by this
series of mine.

Let me know, any feedback is welcome.

Btw, in the series above there are also other minor changes, but there
is also another more radical change needed to ensure correctness and
protection against stale old messages which maybe could interest you
in general if you are looking into SCMI:

[PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically increasing tokens

Let me know if yo have other concerns.

Thanks
Cristian

2021-09-01 09:37:46

by Cristian Marussi

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
> On Mon, Aug 30, 2021 at 02:09:37PM -0700, [email protected] wrote:
> > Hi Christian
>
> Hi Rishabh,
>
> thanks for looking into this kind of bad interactions.
>
> > There seems to be another issue here. The response from agent can be delayed
> > causing a timeout during base protocol acquire,
> > which leads to the probe failure. What I have observed is sometimes the
> > failure of probe and rx_callback (due to a delayed message)
> > happens at the same time on different cpus.
> > Because of this race, the device memory may be cleared while the
> > interrupt(rx_callback) is executing on another cpu.
>
> You are right that concurrency was not handled properly in this kind of
> context and moreover, if you think about it, even the case of out of
> order reception of responses and delayed_responses (type2 SCMI messages)
> for asynchronous SCMI commands was not handled properly.
>
> > How do you propose we solve this? Do you think it is better to take the
> > setting up of base and other protocols out of probe and
> > in some delayed work? That would imply the device memory is not released
> > until remove is called. Or should we add locking to
> > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
> > the race?
> >
>
> These issues were more easily exposed by SCMI Virtio transport, so in
> the series where I introduced scmi-virtio:
>
> https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> (which is now queued for v5.15 ... now on -next I think...finger crossed)
>
> I took the chance to rectify a couple of other things in the SCMI core
> in the initial commits.
> As an example, in the above series
>
> [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and out-of-order messages
>
> cares to add a refcount to xfers and some locking on xfers between TX
> and RX path to avoid that a timed out xfer can vanish while the rx path
> is concurrently working on it (as you said); moreover I handle the
> condition (rare if not unplausible anyway) in which a transport delivers
> out of order responses and delayed responses.
>
> I tested this scenarios on some fake emulated SCMI Virtio transport
> where I could play any sort of mess and tricks to stress this limit
> conditions, but you're more than welcome to verify if the race you are
> seeing on Base protocol time out is solved (as I would hope :D) by this
> series of mine.
>
> Let me know, any feedback is welcome.
>
> Btw, in the series above there are also other minor changes, but there
> is also another more radical change needed to ensure correctness and
> protection against stale old messages which maybe could interest you
> in general if you are looking into SCMI:
>
> [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically increasing tokens
>
> Let me know if yo have other concerns.
>

Hi Rishabhb,

just a quick remark, thinking again about your fail @probe scenario above
I realized that while the concurrency patch I mentioned above could help on
races against vanishing xfers when late timed-out responses are delivered,
here we really are then also shutting down everything on failure, so there
could be further issues between a very late invokation of scmi_rx_callback
and the core devm_ helpers freeing the underlying xfer/cinfo/etc.. structs
used by scmi-rx-callback itself (maybe this was already what you meant and
I didn't get it,...sorry)

On the other side, I don't feel that delaying Base init to a deferred
worker is a viable solution since we need Base protocol init to be
initialized and we need to just give up if we cannot communicate with
the SCMI platform fw in such early stages. (Base protocol is really the
only mandatory proto is I remember correctly the spec)

Currenly I'm off and only glancing at mails but I'll have a thought about
these issues once back in a few weeks time.

Thanks,
Cristian

2021-11-01 16:38:58

by Rishabh Bhatnagar

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On 2021-09-01 02:35, Cristian Marussi wrote:
> On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
>> On Mon, Aug 30, 2021 at 02:09:37PM -0700, [email protected]
>> wrote:
>> > Hi Christian
>>
>> Hi Rishabh,
>>
>> thanks for looking into this kind of bad interactions.
>>
>> > There seems to be another issue here. The response from agent can be delayed
>> > causing a timeout during base protocol acquire,
>> > which leads to the probe failure. What I have observed is sometimes the
>> > failure of probe and rx_callback (due to a delayed message)
>> > happens at the same time on different cpus.
>> > Because of this race, the device memory may be cleared while the
>> > interrupt(rx_callback) is executing on another cpu.
>>
>> You are right that concurrency was not handled properly in this kind
>> of
>> context and moreover, if you think about it, even the case of out of
>> order reception of responses and delayed_responses (type2 SCMI
>> messages)
>> for asynchronous SCMI commands was not handled properly.
>>
>> > How do you propose we solve this? Do you think it is better to take the
>> > setting up of base and other protocols out of probe and
>> > in some delayed work? That would imply the device memory is not released
>> > until remove is called. Or should we add locking to
>> > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
>> > the race?
>> >
>>
>> These issues were more easily exposed by SCMI Virtio transport, so in
>> the series where I introduced scmi-virtio:
>>
>> https://lore.kernel.org/linux-arm-kernel/[email protected]/
>>
>> (which is now queued for v5.15 ... now on -next I think...finger
>> crossed)
>>
>> I took the chance to rectify a couple of other things in the SCMI core
>> in the initial commits.
>> As an example, in the above series
>>
>> [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and
>> out-of-order messages
>>
>> cares to add a refcount to xfers and some locking on xfers between TX
>> and RX path to avoid that a timed out xfer can vanish while the rx
>> path
>> is concurrently working on it (as you said); moreover I handle the
>> condition (rare if not unplausible anyway) in which a transport
>> delivers
>> out of order responses and delayed responses.
>>
>> I tested this scenarios on some fake emulated SCMI Virtio transport
>> where I could play any sort of mess and tricks to stress this limit
>> conditions, but you're more than welcome to verify if the race you are
>> seeing on Base protocol time out is solved (as I would hope :D) by
>> this
>> series of mine.
>>
>> Let me know, any feedback is welcome.
>>
>> Btw, in the series above there are also other minor changes, but there
>> is also another more radical change needed to ensure correctness and
>> protection against stale old messages which maybe could interest you
>> in general if you are looking into SCMI:
>>
>> [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically
>> increasing tokens
>>
>> Let me know if yo have other concerns.
>>
>
> Hi Rishabhb,
>
> just a quick remark, thinking again about your fail @probe scenario
> above
> I realized that while the concurrency patch I mentioned above could
> help on
> races against vanishing xfers when late timed-out responses are
> delivered,
> here we really are then also shutting down everything on failure, so
> there
> could be further issues between a very late invokation of
> scmi_rx_callback
> and the core devm_ helpers freeing the underlying xfer/cinfo/etc..
> structs
> used by scmi-rx-callback itself (maybe this was already what you meant
> and
> I didn't get it,...sorry)
>
> On the other side, I don't feel that delaying Base init to a deferred
> worker is a viable solution since we need Base protocol init to be
> initialized and we need to just give up if we cannot communicate with
> the SCMI platform fw in such early stages. (Base protocol is really the
> only mandatory proto is I remember correctly the spec)
>
> Currenly I'm off and only glancing at mails but I'll have a thought
> about
> these issues once back in a few weeks time.
>
> Thanks,
> Cristian
>
Hi Cristian
I hope you enjoyed your vacation. Did you get a chance to look at the
issue stated above
and have some idea as to how to solve this?
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2021-11-02 11:34:56

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On Mon, Nov 01, 2021 at 09:35:42AM -0700, [email protected] wrote:
> On 2021-09-01 02:35, Cristian Marussi wrote:
> > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
> > > On Mon, Aug 30, 2021 at 02:09:37PM -0700, [email protected]
> > > wrote:
> > > > Hi Christian
> > >
> > > Hi Rishabh,
> > >
> > > thanks for looking into this kind of bad interactions.
> > >
> > > > There seems to be another issue here. The response from agent can be delayed
> > > > causing a timeout during base protocol acquire,
> > > > which leads to the probe failure. What I have observed is sometimes the
> > > > failure of probe and rx_callback (due to a delayed message)
> > > > happens at the same time on different cpus.
> > > > Because of this race, the device memory may be cleared while the
> > > > interrupt(rx_callback) is executing on another cpu.
> > >
> > > You are right that concurrency was not handled properly in this kind
> > > of
> > > context and moreover, if you think about it, even the case of out of
> > > order reception of responses and delayed_responses (type2 SCMI
> > > messages)
> > > for asynchronous SCMI commands was not handled properly.
> > >
> > > > How do you propose we solve this? Do you think it is better to take the
> > > > setting up of base and other protocols out of probe and
> > > > in some delayed work? That would imply the device memory is not released
> > > > until remove is called. Or should we add locking to
> > > > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
> > > > the race?
> > > >
> > >
> > > These issues were more easily exposed by SCMI Virtio transport, so in
> > > the series where I introduced scmi-virtio:
> > >
> > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > >
> > > (which is now queued for v5.15 ... now on -next I think...finger
> > > crossed)
> > >
> > > I took the chance to rectify a couple of other things in the SCMI core
> > > in the initial commits.
> > > As an example, in the above series
> > >
> > > [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and
> > > out-of-order messages
> > >
> > > cares to add a refcount to xfers and some locking on xfers between TX
> > > and RX path to avoid that a timed out xfer can vanish while the rx
> > > path
> > > is concurrently working on it (as you said); moreover I handle the
> > > condition (rare if not unplausible anyway) in which a transport
> > > delivers
> > > out of order responses and delayed responses.
> > >
> > > I tested this scenarios on some fake emulated SCMI Virtio transport
> > > where I could play any sort of mess and tricks to stress this limit
> > > conditions, but you're more than welcome to verify if the race you are
> > > seeing on Base protocol time out is solved (as I would hope :D) by
> > > this
> > > series of mine.
> > >
> > > Let me know, any feedback is welcome.
> > >
> > > Btw, in the series above there are also other minor changes, but there
> > > is also another more radical change needed to ensure correctness and
> > > protection against stale old messages which maybe could interest you
> > > in general if you are looking into SCMI:
> > >
> > > [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically
> > > increasing tokens
> > >
> > > Let me know if yo have other concerns.
> > >
> >
> > Hi Rishabhb,
> >
> > just a quick remark, thinking again about your fail @probe scenario
> > above
> > I realized that while the concurrency patch I mentioned above could help
> > on
> > races against vanishing xfers when late timed-out responses are
> > delivered,
> > here we really are then also shutting down everything on failure, so
> > there
> > could be further issues between a very late invokation of
> > scmi_rx_callback
> > and the core devm_ helpers freeing the underlying xfer/cinfo/etc..
> > structs
> > used by scmi-rx-callback itself (maybe this was already what you meant
> > and
> > I didn't get it,...sorry)
> >
> > On the other side, I don't feel that delaying Base init to a deferred
> > worker is a viable solution since we need Base protocol init to be
> > initialized and we need to just give up if we cannot communicate with
> > the SCMI platform fw in such early stages. (Base protocol is really the
> > only mandatory proto is I remember correctly the spec)
> >
> > Currenly I'm off and only glancing at mails but I'll have a thought
> > about
> > these issues once back in a few weeks time.
> >
> > Thanks,
> > Cristian
> >
> Hi Cristian
> I hope you enjoyed your vacation. Did you get a chance to look at the issue
> stated above and have some idea as to how to solve this?

Do you still see the issue with v5.15 ? Can you please check if haven't
already done that ?

Also 30ms delay we have is huge IMO and we typically expect the communication
with remote processor or any entity that implements SCMI to happen in terms
of one or few ms tops.

If there is a race, we need to fix that but I am interested in knowing
why the default time of 30ms not sufficient ? Did increasing that helps
and is this timeout happening only for the initial commands(guessing the
SCMI firmware is not yet ready) or does it happen even during run-time ?

--
Regards,
Sudeep

2021-11-04 23:43:03

by Rishabh Bhatnagar

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On 2021-11-02 04:32, Sudeep Holla wrote:
> On Mon, Nov 01, 2021 at 09:35:42AM -0700, [email protected]
> wrote:
>> On 2021-09-01 02:35, Cristian Marussi wrote:
>> > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
>> > > On Mon, Aug 30, 2021 at 02:09:37PM -0700, [email protected]
>> > > wrote:
>> > > > Hi Christian
>> > >
>> > > Hi Rishabh,
>> > >
>> > > thanks for looking into this kind of bad interactions.
>> > >
>> > > > There seems to be another issue here. The response from agent can be delayed
>> > > > causing a timeout during base protocol acquire,
>> > > > which leads to the probe failure. What I have observed is sometimes the
>> > > > failure of probe and rx_callback (due to a delayed message)
>> > > > happens at the same time on different cpus.
>> > > > Because of this race, the device memory may be cleared while the
>> > > > interrupt(rx_callback) is executing on another cpu.
>> > >
>> > > You are right that concurrency was not handled properly in this kind
>> > > of
>> > > context and moreover, if you think about it, even the case of out of
>> > > order reception of responses and delayed_responses (type2 SCMI
>> > > messages)
>> > > for asynchronous SCMI commands was not handled properly.
>> > >
>> > > > How do you propose we solve this? Do you think it is better to take the
>> > > > setting up of base and other protocols out of probe and
>> > > > in some delayed work? That would imply the device memory is not released
>> > > > until remove is called. Or should we add locking to
>> > > > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
>> > > > the race?
>> > > >
>> > >
>> > > These issues were more easily exposed by SCMI Virtio transport, so in
>> > > the series where I introduced scmi-virtio:
>> > >
>> > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
>> > >
>> > > (which is now queued for v5.15 ... now on -next I think...finger
>> > > crossed)
>> > >
>> > > I took the chance to rectify a couple of other things in the SCMI core
>> > > in the initial commits.
>> > > As an example, in the above series
>> > >
>> > > [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and
>> > > out-of-order messages
>> > >
>> > > cares to add a refcount to xfers and some locking on xfers between TX
>> > > and RX path to avoid that a timed out xfer can vanish while the rx
>> > > path
>> > > is concurrently working on it (as you said); moreover I handle the
>> > > condition (rare if not unplausible anyway) in which a transport
>> > > delivers
>> > > out of order responses and delayed responses.
>> > >
>> > > I tested this scenarios on some fake emulated SCMI Virtio transport
>> > > where I could play any sort of mess and tricks to stress this limit
>> > > conditions, but you're more than welcome to verify if the race you are
>> > > seeing on Base protocol time out is solved (as I would hope :D) by
>> > > this
>> > > series of mine.
>> > >
>> > > Let me know, any feedback is welcome.
>> > >
>> > > Btw, in the series above there are also other minor changes, but there
>> > > is also another more radical change needed to ensure correctness and
>> > > protection against stale old messages which maybe could interest you
>> > > in general if you are looking into SCMI:
>> > >
>> > > [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically
>> > > increasing tokens
>> > >
>> > > Let me know if yo have other concerns.
>> > >
>> >
>> > Hi Rishabhb,
>> >
>> > just a quick remark, thinking again about your fail @probe scenario
>> > above
>> > I realized that while the concurrency patch I mentioned above could help
>> > on
>> > races against vanishing xfers when late timed-out responses are
>> > delivered,
>> > here we really are then also shutting down everything on failure, so
>> > there
>> > could be further issues between a very late invokation of
>> > scmi_rx_callback
>> > and the core devm_ helpers freeing the underlying xfer/cinfo/etc..
>> > structs
>> > used by scmi-rx-callback itself (maybe this was already what you meant
>> > and
>> > I didn't get it,...sorry)
>> >
>> > On the other side, I don't feel that delaying Base init to a deferred
>> > worker is a viable solution since we need Base protocol init to be
>> > initialized and we need to just give up if we cannot communicate with
>> > the SCMI platform fw in such early stages. (Base protocol is really the
>> > only mandatory proto is I remember correctly the spec)
>> >
>> > Currenly I'm off and only glancing at mails but I'll have a thought
>> > about
>> > these issues once back in a few weeks time.
>> >
>> > Thanks,
>> > Cristian
>> >
>> Hi Cristian
>> I hope you enjoyed your vacation. Did you get a chance to look at the
>> issue
>> stated above and have some idea as to how to solve this?
>
> Do you still see the issue with v5.15 ? Can you please check if haven't
> already done that ?
>
> Also 30ms delay we have is huge IMO and we typically expect the
> communication
> with remote processor or any entity that implements SCMI to happen in
> terms
> of one or few ms tops.
>
> If there is a race, we need to fix that but I am interested in knowing
> why the default time of 30ms not sufficient ? Did increasing that helps
> and is this timeout happening only for the initial commands(guessing
> the
> SCMI firmware is not yet ready) or does it happen even during run-time
> ?

Hi Sudeep
I haven't checked on 5.15 but after glancing at the code I believe we
should see the same issue.
I agree 30ms is a big enough value and should be something that remote
firmware should resolve. But
if remote firmware goes into a bad state and not functioning properly at
least kernel should not panic.

The issue we see here happens during scmi probe. The response from the
remote agent can be delayed
causing a timeout during base protocol acquire, which leads to the probe
failure.
What I have observed is sometimes the failure of probe and rx_callback
(due to a delayed message)
happens around the same time on different cpus. Because of this race,
the device memory may be cleared
while the interrupt(rx_callback) is executing on another cpu.

2021-11-05 09:45:01

by Cristian Marussi

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On Thu, Nov 04, 2021 at 04:40:03PM -0700, [email protected] wrote:
> On 2021-11-02 04:32, Sudeep Holla wrote:
> > On Mon, Nov 01, 2021 at 09:35:42AM -0700, [email protected] wrote:
> > > On 2021-09-01 02:35, Cristian Marussi wrote:
> > > > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
> > > > > On Mon, Aug 30, 2021 at 02:09:37PM -0700, [email protected]
> > > > > wrote:
> > > > > > Hi Christian
> > > > >
> > > > > Hi Rishabh,
> > > > >

Hi Rishabh,

apologies for the delay in coming back to you.
A few comments below.

> > > > > thanks for looking into this kind of bad interactions.
> > > > >
> > > > > > There seems to be another issue here. The response from agent can be delayed
> > > > > > causing a timeout during base protocol acquire,
> > > > > > which leads to the probe failure. What I have observed is sometimes the
> > > > > > failure of probe and rx_callback (due to a delayed message)
> > > > > > happens at the same time on different cpus.
> > > > > > Because of this race, the device memory may be cleared while the
> > > > > > interrupt(rx_callback) is executing on another cpu.
> > > > >
> > > > > You are right that concurrency was not handled properly in this kind
> > > > > of
> > > > > context and moreover, if you think about it, even the case of out of
> > > > > order reception of responses and delayed_responses (type2 SCMI
> > > > > messages)
> > > > > for asynchronous SCMI commands was not handled properly.
> > > > >
> > > > > > How do you propose we solve this? Do you think it is better to take the
> > > > > > setting up of base and other protocols out of probe and
> > > > > > in some delayed work? That would imply the device memory is not released
> > > > > > until remove is called. Or should we add locking to
> > > > > > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
> > > > > > the race?
> > > > > >
> > > > >
> > > > > These issues were more easily exposed by SCMI Virtio transport, so in
> > > > > the series where I introduced scmi-virtio:
> > > > >
> > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > >
> > > > > (which is now queued for v5.15 ... now on -next I think...finger
> > > > > crossed)
> > > > >
> > > > > I took the chance to rectify a couple of other things in the SCMI core
> > > > > in the initial commits.
> > > > > As an example, in the above series
> > > > >
> > > > > [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and
> > > > > out-of-order messages
> > > > >
> > > > > cares to add a refcount to xfers and some locking on xfers between TX
> > > > > and RX path to avoid that a timed out xfer can vanish while the rx
> > > > > path
> > > > > is concurrently working on it (as you said); moreover I handle the
> > > > > condition (rare if not unplausible anyway) in which a transport
> > > > > delivers
> > > > > out of order responses and delayed responses.
> > > > >
> > > > > I tested this scenarios on some fake emulated SCMI Virtio transport
> > > > > where I could play any sort of mess and tricks to stress this limit
> > > > > conditions, but you're more than welcome to verify if the race you are
> > > > > seeing on Base protocol time out is solved (as I would hope :D) by
> > > > > this
> > > > > series of mine.
> > > > >
> > > > > Let me know, any feedback is welcome.
> > > > >
> > > > > Btw, in the series above there are also other minor changes, but there
> > > > > is also another more radical change needed to ensure correctness and
> > > > > protection against stale old messages which maybe could interest you
> > > > > in general if you are looking into SCMI:
> > > > >
> > > > > [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically
> > > > > increasing tokens
> > > > >
> > > > > Let me know if yo have other concerns.
> > > > >
> > > >
> > > > Hi Rishabhb,
> > > >
> > > > just a quick remark, thinking again about your fail @probe scenario
> > > > above
> > > > I realized that while the concurrency patch I mentioned above could help
> > > > on
> > > > races against vanishing xfers when late timed-out responses are
> > > > delivered,
> > > > here we really are then also shutting down everything on failure, so
> > > > there
> > > > could be further issues between a very late invokation of
> > > > scmi_rx_callback
> > > > and the core devm_ helpers freeing the underlying xfer/cinfo/etc..
> > > > structs
> > > > used by scmi-rx-callback itself (maybe this was already what you meant
> > > > and
> > > > I didn't get it,...sorry)
> > > >
> > > > On the other side, I don't feel that delaying Base init to a deferred
> > > > worker is a viable solution since we need Base protocol init to be
> > > > initialized and we need to just give up if we cannot communicate with
> > > > the SCMI platform fw in such early stages. (Base protocol is really the
> > > > only mandatory proto is I remember correctly the spec)
> > > >
> > > > Currenly I'm off and only glancing at mails but I'll have a thought
> > > > about
> > > > these issues once back in a few weeks time.
> > > >
> > > > Thanks,
> > > > Cristian
> > > >
> > > Hi Cristian
> > > I hope you enjoyed your vacation. Did you get a chance to look at
> > > the issue
> > > stated above and have some idea as to how to solve this?
> >
> > Do you still see the issue with v5.15 ? Can you please check if haven't
> > already done that ?
> >
> > Also 30ms delay we have is huge IMO and we typically expect the
> > communication
> > with remote processor or any entity that implements SCMI to happen in
> > terms
> > of one or few ms tops.
> >
> > If there is a race, we need to fix that but I am interested in knowing
> > why the default time of 30ms not sufficient ? Did increasing that helps
> > and is this timeout happening only for the initial commands(guessing the
> > SCMI firmware is not yet ready) or does it happen even during run-time ?
>
> Hi Sudeep
> I haven't checked on 5.15 but after glancing at the code I believe we should
> see the same issue.
> I agree 30ms is a big enough value and should be something that remote
> firmware should resolve. But
> if remote firmware goes into a bad state and not functioning properly at
> least kernel should not panic.
>
> The issue we see here happens during scmi probe. The response from the
> remote agent can be delayed
> causing a timeout during base protocol acquire, which leads to the probe
> failure.
> What I have observed is sometimes the failure of probe and rx_callback (due
> to a delayed message)
> happens around the same time on different cpus. Because of this race, the
> device memory may be cleared
> while the interrupt(rx_callback) is executing on another cpu.

So I was looking at the failure path you mentioned: a late concurrent
reply on Base protocol from the fw, during the probe, leads to an invocation
of scmi_rx_callback() on a different CPU while core data structs like
cinfo are being freed by the SCMI core on the probe failure path.
(v5.15-added SCMI concurrrency handling stuff I mentiond shuld help for
races regarding xfer but not for the cinfo stuff in this case ...)

We cannot defer Base proto init since we just wanna fail early while
probing if not even the Base protocol can work fine, and also because
Base protocol information are indeed needed for initial setup, so we
cannot juts proceed if we did not even got a Base reply on the number of
protos. (already said)

In my opinion, the proper way to address this kind of races at probe
failure should be to ensure that the transport you are using is properly
shut down completely before cleanup starts (same applies for a clean
remove), i.e. scmi_rx_callback should not even be possibly registered to
be called when the the final cleanup by the core is started (devm_ frees
I mean after scmi_probe exit failing...)

BUT indeed looking back at transport layers like mailbox and virtio, this
should be happening already, because the flow is like

scmi_probe()
{
...

clean_tx_rx_setup:
scmi_cleanup_txrx_channels()
....
--->>> ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
-
return ret;
}

.... only after this scmi_probe returns the core devm layer starts freeing devm_
allocated stuff like cinfo, AND the above per-transport specific .chan_free seems
to take care to 'deactivate/dregister' the scmi_rx_callback at the
transport layer:


e.g. MBOX transport
-------------------------
static int mailbox_chan_free(int id, void *p, void *data)
{
struct scmi_chan_info *cinfo = p;
struct scmi_mailbox *smbox = cinfo->transport_info;

if (smbox && !IS_ERR(smbox->chan)) {
mbox_free_channel(smbox->chan); <<< THIS MBOX CORE CALL DEACTIVATE
cinfo->transport_info = NULL;


e.g. VIRTIO Transport
-----------------------------
static int virtio_chan_free(int id, void *p, void *data)
{
unsigned long flags;
struct scmi_chan_info *cinfo = p;
struct scmi_vio_channel *vioch = cinfo->transport_info;

spin_lock_irqsave(&vioch->ready_lock, flags);
vioch->ready = false; <<<< THIS VIRTIO FLAG DEACTIVATE VIRTIO CBS INVOKCATION
spin_unlock_irqrestore(&vioch->ready_lock, flags);


... AND both of the above call are indeed also spinlocked heavily, so that
the 'deactivation' of the scmi_rx_callback should be visible properly; in
other words I would expect that after the above .chan_free() have
completed the scmi_rx_callback() cannot be called anymore, because the
transport itself will properly drop any so-late fw reply.

So I am now wondering, which transport are you using in your tests ?
since at least for the above 2 example it seems to me that your
race-on-probe failure condition should be already addressed by the
transport layer itself....or am I getting wrong the nature of the race ?

Thanks
Cristian

2021-11-05 17:49:53

by Rishabh Bhatnagar

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On 2021-11-05 02:43, Cristian Marussi wrote:
> On Thu, Nov 04, 2021 at 04:40:03PM -0700, [email protected]
> wrote:
>> On 2021-11-02 04:32, Sudeep Holla wrote:
>> > On Mon, Nov 01, 2021 at 09:35:42AM -0700, [email protected] wrote:
>> > > On 2021-09-01 02:35, Cristian Marussi wrote:
>> > > > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
>> > > > > On Mon, Aug 30, 2021 at 02:09:37PM -0700, [email protected]
>> > > > > wrote:
>> > > > > > Hi Christian
>> > > > >
>> > > > > Hi Rishabh,
>> > > > >
>
> Hi Rishabh,
>
> apologies for the delay in coming back to you.
> A few comments below.
>
>> > > > > thanks for looking into this kind of bad interactions.
>> > > > >
>> > > > > > There seems to be another issue here. The response from agent can be delayed
>> > > > > > causing a timeout during base protocol acquire,
>> > > > > > which leads to the probe failure. What I have observed is sometimes the
>> > > > > > failure of probe and rx_callback (due to a delayed message)
>> > > > > > happens at the same time on different cpus.
>> > > > > > Because of this race, the device memory may be cleared while the
>> > > > > > interrupt(rx_callback) is executing on another cpu.
>> > > > >
>> > > > > You are right that concurrency was not handled properly in this kind
>> > > > > of
>> > > > > context and moreover, if you think about it, even the case of out of
>> > > > > order reception of responses and delayed_responses (type2 SCMI
>> > > > > messages)
>> > > > > for asynchronous SCMI commands was not handled properly.
>> > > > >
>> > > > > > How do you propose we solve this? Do you think it is better to take the
>> > > > > > setting up of base and other protocols out of probe and
>> > > > > > in some delayed work? That would imply the device memory is not released
>> > > > > > until remove is called. Or should we add locking to
>> > > > > > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
>> > > > > > the race?
>> > > > > >
>> > > > >
>> > > > > These issues were more easily exposed by SCMI Virtio transport, so in
>> > > > > the series where I introduced scmi-virtio:
>> > > > >
>> > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
>> > > > >
>> > > > > (which is now queued for v5.15 ... now on -next I think...finger
>> > > > > crossed)
>> > > > >
>> > > > > I took the chance to rectify a couple of other things in the SCMI core
>> > > > > in the initial commits.
>> > > > > As an example, in the above series
>> > > > >
>> > > > > [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and
>> > > > > out-of-order messages
>> > > > >
>> > > > > cares to add a refcount to xfers and some locking on xfers between TX
>> > > > > and RX path to avoid that a timed out xfer can vanish while the rx
>> > > > > path
>> > > > > is concurrently working on it (as you said); moreover I handle the
>> > > > > condition (rare if not unplausible anyway) in which a transport
>> > > > > delivers
>> > > > > out of order responses and delayed responses.
>> > > > >
>> > > > > I tested this scenarios on some fake emulated SCMI Virtio transport
>> > > > > where I could play any sort of mess and tricks to stress this limit
>> > > > > conditions, but you're more than welcome to verify if the race you are
>> > > > > seeing on Base protocol time out is solved (as I would hope :D) by
>> > > > > this
>> > > > > series of mine.
>> > > > >
>> > > > > Let me know, any feedback is welcome.
>> > > > >
>> > > > > Btw, in the series above there are also other minor changes, but there
>> > > > > is also another more radical change needed to ensure correctness and
>> > > > > protection against stale old messages which maybe could interest you
>> > > > > in general if you are looking into SCMI:
>> > > > >
>> > > > > [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically
>> > > > > increasing tokens
>> > > > >
>> > > > > Let me know if yo have other concerns.
>> > > > >
>> > > >
>> > > > Hi Rishabhb,
>> > > >
>> > > > just a quick remark, thinking again about your fail @probe scenario
>> > > > above
>> > > > I realized that while the concurrency patch I mentioned above could help
>> > > > on
>> > > > races against vanishing xfers when late timed-out responses are
>> > > > delivered,
>> > > > here we really are then also shutting down everything on failure, so
>> > > > there
>> > > > could be further issues between a very late invokation of
>> > > > scmi_rx_callback
>> > > > and the core devm_ helpers freeing the underlying xfer/cinfo/etc..
>> > > > structs
>> > > > used by scmi-rx-callback itself (maybe this was already what you meant
>> > > > and
>> > > > I didn't get it,...sorry)
>> > > >
>> > > > On the other side, I don't feel that delaying Base init to a deferred
>> > > > worker is a viable solution since we need Base protocol init to be
>> > > > initialized and we need to just give up if we cannot communicate with
>> > > > the SCMI platform fw in such early stages. (Base protocol is really the
>> > > > only mandatory proto is I remember correctly the spec)
>> > > >
>> > > > Currenly I'm off and only glancing at mails but I'll have a thought
>> > > > about
>> > > > these issues once back in a few weeks time.
>> > > >
>> > > > Thanks,
>> > > > Cristian
>> > > >
>> > > Hi Cristian
>> > > I hope you enjoyed your vacation. Did you get a chance to look at
>> > > the issue
>> > > stated above and have some idea as to how to solve this?
>> >
>> > Do you still see the issue with v5.15 ? Can you please check if haven't
>> > already done that ?
>> >
>> > Also 30ms delay we have is huge IMO and we typically expect the
>> > communication
>> > with remote processor or any entity that implements SCMI to happen in
>> > terms
>> > of one or few ms tops.
>> >
>> > If there is a race, we need to fix that but I am interested in knowing
>> > why the default time of 30ms not sufficient ? Did increasing that helps
>> > and is this timeout happening only for the initial commands(guessing the
>> > SCMI firmware is not yet ready) or does it happen even during run-time ?
>>
>> Hi Sudeep
>> I haven't checked on 5.15 but after glancing at the code I believe we
>> should
>> see the same issue.
>> I agree 30ms is a big enough value and should be something that remote
>> firmware should resolve. But
>> if remote firmware goes into a bad state and not functioning properly
>> at
>> least kernel should not panic.
>>
>> The issue we see here happens during scmi probe. The response from the
>> remote agent can be delayed
>> causing a timeout during base protocol acquire, which leads to the
>> probe
>> failure.
>> What I have observed is sometimes the failure of probe and rx_callback
>> (due
>> to a delayed message)
>> happens around the same time on different cpus. Because of this race,
>> the
>> device memory may be cleared
>> while the interrupt(rx_callback) is executing on another cpu.
>
> So I was looking at the failure path you mentioned: a late concurrent
> reply on Base protocol from the fw, during the probe, leads to an
> invocation
> of scmi_rx_callback() on a different CPU while core data structs like
> cinfo are being freed by the SCMI core on the probe failure path.
> (v5.15-added SCMI concurrrency handling stuff I mentiond shuld help for
> races regarding xfer but not for the cinfo stuff in this case ...)
>
> We cannot defer Base proto init since we just wanna fail early while
> probing if not even the Base protocol can work fine, and also because
> Base protocol information are indeed needed for initial setup, so we
> cannot juts proceed if we did not even got a Base reply on the number
> of
> protos. (already said)
>
> In my opinion, the proper way to address this kind of races at probe
> failure should be to ensure that the transport you are using is
> properly
> shut down completely before cleanup starts (same applies for a clean
> remove), i.e. scmi_rx_callback should not even be possibly registered
> to
> be called when the the final cleanup by the core is started (devm_
> frees
> I mean after scmi_probe exit failing...)
>
> BUT indeed looking back at transport layers like mailbox and virtio,
> this
> should be happening already, because the flow is like
>
> scmi_probe()
> {
> ...
>
> clean_tx_rx_setup:
> scmi_cleanup_txrx_channels()
> ....
> --->>> ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> -
> return ret;
> }
>
> .... only after this scmi_probe returns the core devm layer starts
> freeing devm_
> allocated stuff like cinfo, AND the above per-transport specific
> .chan_free seems
> to take care to 'deactivate/dregister' the scmi_rx_callback at the
> transport layer:
>
>
> e.g. MBOX transport
> -------------------------
> static int mailbox_chan_free(int id, void *p, void *data)
> {
> struct scmi_chan_info *cinfo = p;
> struct scmi_mailbox *smbox = cinfo->transport_info;
>
> if (smbox && !IS_ERR(smbox->chan)) {
> mbox_free_channel(smbox->chan); <<< THIS MBOX CORE CALL DEACTIVATE
> cinfo->transport_info = NULL;
>
>
> e.g. VIRTIO Transport
> -----------------------------
> static int virtio_chan_free(int id, void *p, void *data)
> {
> unsigned long flags;
> struct scmi_chan_info *cinfo = p;
> struct scmi_vio_channel *vioch = cinfo->transport_info;
>
> spin_lock_irqsave(&vioch->ready_lock, flags);
> vioch->ready = false; <<<< THIS VIRTIO FLAG
> DEACTIVATE VIRTIO CBS INVOKCATION
> spin_unlock_irqrestore(&vioch->ready_lock, flags);
>
>
> ... AND both of the above call are indeed also spinlocked heavily, so
> that
> the 'deactivation' of the scmi_rx_callback should be visible properly;
> in
> other words I would expect that after the above .chan_free() have
> completed the scmi_rx_callback() cannot be called anymore, because the
> transport itself will properly drop any so-late fw reply.
>
> So I am now wondering, which transport are you using in your tests ?
> since at least for the above 2 example it seems to me that your
> race-on-probe failure condition should be already addressed by the
> transport layer itself....or am I getting wrong the nature of the race
> ?
>
> Thanks
> Cristian

Hi Cristian
You caught the scenario perfectly. But there is still a possibility of a
race. To be clear we use
the mbox transport. Let me explain in more detail.
Lets assume that the last command (base protocol acquire) kernel sent to
remote agent timed out.
This will lead to final cleanup before exiting probe like you mentioned.
Once cleanup is done(mailbox_chan_free)
no more responses from remote agent will acknowledged but if the
response comes in between the cleanup in probe
and the last command timing out we will see a race since the response
can come asynchronously. In this scenario cleanup
and scmi_rx_callback race with each other.
I believe to solve this we need to synchronize cleanup with
scmi_rx_callback. we can serialize these two paths
and exit early in rx_callback if cleanup has been completed.

Thanks
Rishabh

2021-11-07 18:26:08

by Cristian Marussi

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On Fri, Nov 05, 2021 at 10:40:59AM -0700, [email protected] wrote:
> On 2021-11-05 02:43, Cristian Marussi wrote:
> > On Thu, Nov 04, 2021 at 04:40:03PM -0700, [email protected] wrote:
> > > On 2021-11-02 04:32, Sudeep Holla wrote:
> > > > On Mon, Nov 01, 2021 at 09:35:42AM -0700, [email protected] wrote:
> > > > > On 2021-09-01 02:35, Cristian Marussi wrote:
> > > > > > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
> > > > > > > On Mon, Aug 30, 2021 at 02:09:37PM -0700, [email protected]
> > > > > > > wrote:
> > > > > > > > Hi Christian
> > > > > > >
> > > > > > > Hi Rishabh,
> > > > > > >
> >
> > Hi Rishabh,
> >

Hi Rishabh,

> > apologies for the delay in coming back to you.
> > A few comments below.
> >
> > > > > > > thanks for looking into this kind of bad interactions.
> > > > > > >
> > > > > > > > There seems to be another issue here. The response from agent can be delayed
> > > > > > > > causing a timeout during base protocol acquire,
> > > > > > > > which leads to the probe failure. What I have observed is sometimes the
> > > > > > > > failure of probe and rx_callback (due to a delayed message)
> > > > > > > > happens at the same time on different cpus.
> > > > > > > > Because of this race, the device memory may be cleared while the
> > > > > > > > interrupt(rx_callback) is executing on another cpu.
> > > > > > >
> > > > > > > You are right that concurrency was not handled properly in this kind
> > > > > > > of
> > > > > > > context and moreover, if you think about it, even the case of out of
> > > > > > > order reception of responses and delayed_responses (type2 SCMI
> > > > > > > messages)
> > > > > > > for asynchronous SCMI commands was not handled properly.
> > > > > > >
> > > > > > > > How do you propose we solve this? Do you think it is better to take the
> > > > > > > > setting up of base and other protocols out of probe and
> > > > > > > > in some delayed work? That would imply the device memory is not released
> > > > > > > > until remove is called. Or should we add locking to
> > > > > > > > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
> > > > > > > > the race?
> > > > > > > >
> > > > > > >
> > > > > > > These issues were more easily exposed by SCMI Virtio transport, so in
> > > > > > > the series where I introduced scmi-virtio:
> > > > > > >
> > > > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > >
> > > > > > > (which is now queued for v5.15 ... now on -next I think...finger
> > > > > > > crossed)
> > > > > > >
> > > > > > > I took the chance to rectify a couple of other things in the SCMI core
> > > > > > > in the initial commits.
> > > > > > > As an example, in the above series
> > > > > > >
> > > > > > > [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and
> > > > > > > out-of-order messages
> > > > > > >
> > > > > > > cares to add a refcount to xfers and some locking on xfers between TX
> > > > > > > and RX path to avoid that a timed out xfer can vanish while the rx
> > > > > > > path
> > > > > > > is concurrently working on it (as you said); moreover I handle the
> > > > > > > condition (rare if not unplausible anyway) in which a transport
> > > > > > > delivers
> > > > > > > out of order responses and delayed responses.
> > > > > > >
> > > > > > > I tested this scenarios on some fake emulated SCMI Virtio transport
> > > > > > > where I could play any sort of mess and tricks to stress this limit
> > > > > > > conditions, but you're more than welcome to verify if the race you are
> > > > > > > seeing on Base protocol time out is solved (as I would hope :D) by
> > > > > > > this
> > > > > > > series of mine.
> > > > > > >
> > > > > > > Let me know, any feedback is welcome.
> > > > > > >
> > > > > > > Btw, in the series above there are also other minor changes, but there
> > > > > > > is also another more radical change needed to ensure correctness and
> > > > > > > protection against stale old messages which maybe could interest you
> > > > > > > in general if you are looking into SCMI:
> > > > > > >
> > > > > > > [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically
> > > > > > > increasing tokens
> > > > > > >
> > > > > > > Let me know if yo have other concerns.
> > > > > > >
> > > > > >
> > > > > > Hi Rishabhb,
> > > > > >
> > > > > > just a quick remark, thinking again about your fail @probe scenario
> > > > > > above
> > > > > > I realized that while the concurrency patch I mentioned above could help
> > > > > > on
> > > > > > races against vanishing xfers when late timed-out responses are
> > > > > > delivered,
> > > > > > here we really are then also shutting down everything on failure, so
> > > > > > there
> > > > > > could be further issues between a very late invokation of
> > > > > > scmi_rx_callback
> > > > > > and the core devm_ helpers freeing the underlying xfer/cinfo/etc..
> > > > > > structs
> > > > > > used by scmi-rx-callback itself (maybe this was already what you meant
> > > > > > and
> > > > > > I didn't get it,...sorry)
> > > > > >
> > > > > > On the other side, I don't feel that delaying Base init to a deferred
> > > > > > worker is a viable solution since we need Base protocol init to be
> > > > > > initialized and we need to just give up if we cannot communicate with
> > > > > > the SCMI platform fw in such early stages. (Base protocol is really the
> > > > > > only mandatory proto is I remember correctly the spec)
> > > > > >
> > > > > > Currenly I'm off and only glancing at mails but I'll have a thought
> > > > > > about
> > > > > > these issues once back in a few weeks time.
> > > > > >
> > > > > > Thanks,
> > > > > > Cristian
> > > > > >
> > > > > Hi Cristian
> > > > > I hope you enjoyed your vacation. Did you get a chance to look at
> > > > > the issue
> > > > > stated above and have some idea as to how to solve this?
> > > >
> > > > Do you still see the issue with v5.15 ? Can you please check if haven't
> > > > already done that ?
> > > >
> > > > Also 30ms delay we have is huge IMO and we typically expect the
> > > > communication
> > > > with remote processor or any entity that implements SCMI to happen in
> > > > terms
> > > > of one or few ms tops.
> > > >
> > > > If there is a race, we need to fix that but I am interested in knowing
> > > > why the default time of 30ms not sufficient ? Did increasing that helps
> > > > and is this timeout happening only for the initial commands(guessing the
> > > > SCMI firmware is not yet ready) or does it happen even during run-time ?
> > >
> > > Hi Sudeep
> > > I haven't checked on 5.15 but after glancing at the code I believe
> > > we should
> > > see the same issue.
> > > I agree 30ms is a big enough value and should be something that remote
> > > firmware should resolve. But
> > > if remote firmware goes into a bad state and not functioning
> > > properly at
> > > least kernel should not panic.
> > >
> > > The issue we see here happens during scmi probe. The response from the
> > > remote agent can be delayed
> > > causing a timeout during base protocol acquire, which leads to the
> > > probe
> > > failure.
> > > What I have observed is sometimes the failure of probe and
> > > rx_callback (due
> > > to a delayed message)
> > > happens around the same time on different cpus. Because of this
> > > race, the
> > > device memory may be cleared
> > > while the interrupt(rx_callback) is executing on another cpu.
> >
> > So I was looking at the failure path you mentioned: a late concurrent
> > reply on Base protocol from the fw, during the probe, leads to an
> > invocation
> > of scmi_rx_callback() on a different CPU while core data structs like
> > cinfo are being freed by the SCMI core on the probe failure path.
> > (v5.15-added SCMI concurrrency handling stuff I mentiond shuld help for
> > races regarding xfer but not for the cinfo stuff in this case ...)
> >
> > We cannot defer Base proto init since we just wanna fail early while
> > probing if not even the Base protocol can work fine, and also because
> > Base protocol information are indeed needed for initial setup, so we
> > cannot juts proceed if we did not even got a Base reply on the number of
> > protos. (already said)
> >
> > In my opinion, the proper way to address this kind of races at probe
> > failure should be to ensure that the transport you are using is properly
> > shut down completely before cleanup starts (same applies for a clean
> > remove), i.e. scmi_rx_callback should not even be possibly registered to
> > be called when the the final cleanup by the core is started (devm_ frees
> > I mean after scmi_probe exit failing...)
> >
> > BUT indeed looking back at transport layers like mailbox and virtio,
> > this
> > should be happening already, because the flow is like
> >
> > scmi_probe()
> > {
> > ...
> >
> > clean_tx_rx_setup:
> > scmi_cleanup_txrx_channels()
> > ....
> > --->>> ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> > -
> > return ret;
> > }
> >
> > .... only after this scmi_probe returns the core devm layer starts
> > freeing devm_
> > allocated stuff like cinfo, AND the above per-transport specific
> > .chan_free seems
> > to take care to 'deactivate/dregister' the scmi_rx_callback at the
> > transport layer:
> >
> >
> > e.g. MBOX transport
> > -------------------------
> > static int mailbox_chan_free(int id, void *p, void *data)
> > {
> > struct scmi_chan_info *cinfo = p;
> > struct scmi_mailbox *smbox = cinfo->transport_info;
> >
> > if (smbox && !IS_ERR(smbox->chan)) {
> > mbox_free_channel(smbox->chan); <<< THIS MBOX CORE CALL DEACTIVATE
> > cinfo->transport_info = NULL;
> >
> >
> > e.g. VIRTIO Transport
> > -----------------------------
> > static int virtio_chan_free(int id, void *p, void *data)
> > {
> > unsigned long flags;
> > struct scmi_chan_info *cinfo = p;
> > struct scmi_vio_channel *vioch = cinfo->transport_info;
> >
> > spin_lock_irqsave(&vioch->ready_lock, flags);
> > vioch->ready = false; <<<< THIS VIRTIO FLAG
> > DEACTIVATE VIRTIO CBS INVOKCATION
> > spin_unlock_irqrestore(&vioch->ready_lock, flags);
> >
> >
> > ... AND both of the above call are indeed also spinlocked heavily, so
> > that
> > the 'deactivation' of the scmi_rx_callback should be visible properly;
> > in
> > other words I would expect that after the above .chan_free() have
> > completed the scmi_rx_callback() cannot be called anymore, because the
> > transport itself will properly drop any so-late fw reply.
> >
> > So I am now wondering, which transport are you using in your tests ?
> > since at least for the above 2 example it seems to me that your
> > race-on-probe failure condition should be already addressed by the
> > transport layer itself....or am I getting wrong the nature of the race ?
> >
> > Thanks
> > Cristian
>
> Hi Cristian
> You caught the scenario perfectly. But there is still a possibility of a
> race. To be clear we use
> the mbox transport. Let me explain in more detail.
> Lets assume that the last command (base protocol acquire) kernel sent to
> remote agent timed out.
> This will lead to final cleanup before exiting probe like you mentioned.
> Once cleanup is done(mailbox_chan_free)
> no more responses from remote agent will acknowledged but if the response
> comes in between the cleanup in probe
> and the last command timing out we will see a race since the response can
> come asynchronously. In this scenario cleanup
> and scmi_rx_callback race with each other.
> I believe to solve this we need to synchronize cleanup with
> scmi_rx_callback. we can serialize these two paths
> and exit early in rx_callback if cleanup has been completed.
>

Yes indeed, but my concern is also not to introduce to much contention
on the RX path (with irqsave spinlocking & friends), given that this racy
scenario has surely to be handled but it is also highly unlikely, so I don't
want to slow down all the rx path all the time.

So I tried something along this lines:

----8<------
diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h
index dea1bfbe1052..036f8ccff450 100644
--- a/drivers/firmware/arm_scmi/common.h
+++ b/drivers/firmware/arm_scmi/common.h
@@ -340,11 +340,13 @@ void scmi_protocol_release(const struct scmi_handle *handle, u8 protocol_id);
* channel
* @handle: Pointer to SCMI entity handle
* @transport_info: Transport layer related information
+ * @users: A refcount to track active users of this channel
*/
struct scmi_chan_info {
struct device *dev;
struct scmi_handle *handle;
void *transport_info;
+ refcount_t users;
};

/**
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index b406b3f78f46..5814ed3f444e 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -678,6 +678,16 @@ static void scmi_handle_response(struct scmi_chan_info *cinfo,
scmi_xfer_command_release(info, xfer);
}

+static inline bool scmi_acquire_channel(struct scmi_chan_info *cinfo)
+{
+ return refcount_inc_not_zero(&cinfo->users);
+}
+
+static inline void scmi_release_channel(struct scmi_chan_info *cinfo)
+{
+ return refcount_dec(&cinfo->users);
+}
+
/**
* scmi_rx_callback() - callback for receiving messages
*
@@ -695,6 +705,10 @@ void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr, void *priv)
{
u8 msg_type = MSG_XTRACT_TYPE(msg_hdr);

+ /* Bail out if channel freed already */
+ if (!scmi_acquire_channel(cinfo))
+ return;
+
switch (msg_type) {
case MSG_TYPE_NOTIFICATION:
scmi_handle_notification(cinfo, msg_hdr, priv);
@@ -707,6 +721,8 @@ void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr, void *priv)
WARN_ONCE(1, "received unknown msg_type:%d\n", msg_type);
break;
}
+
+ scmi_release_channel(cinfo);
}

/**
@@ -1506,10 +1522,27 @@ static int scmi_chan_setup(struct scmi_info *info, struct device *dev,
return ret;
}

+ refcount_set(&cinfo->users, 1);
cinfo->handle = &info->handle;
return 0;
}

+static int scmi_chan_free(int id, void *p, void *data)
+{
+ struct scmi_chan_info *cinfo = p;
+ struct scmi_info *info = handle_to_scmi_info(cinfo->handle);
+
+ if (refcount_dec_and_test(&cinfo->users)) {
+ info->desc->ops->chan_free(id, cinfo, data);
+ } else {
+ /* Stall till the ongoing rx_callback completes */
+ spin_until_cond(refcount_read(&cinfo->users) == 0);
+ info->desc->ops->chan_free(id, cinfo, data);
+ }
+
+ return 0;
+}
+
static inline int
scmi_txrx_setup(struct scmi_info *info, struct device *dev, int prot_id)
{
@@ -1792,11 +1825,11 @@ static int scmi_cleanup_txrx_channels(struct scmi_info *info)
int ret;
struct idr *idr = &info->tx_idr;

- ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
+ ret = idr_for_each(idr, scmi_chan_free, idr);
idr_destroy(&info->tx_idr);

idr = &info->rx_idr;
- ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
+ ret = idr_for_each(idr, scmi_chan_free, idr);
idr_destroy(&info->rx_idr);

return ret;

------8<-----

Can you give it a go on your setup ?

Beware it is not really tested on the racy error path (:P) and I could have
still missed something regarding synchro (and I expect an undesired refcount
warn on the scmi_release_channel too when the race is hit....but just to
experiment a bit for now and see if something like this could be enough while
avoiding further locking)

Thanks,
Cristian

2021-11-08 01:36:23

by Cristian Marussi

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On Sun, Nov 07, 2021 at 10:34:07AM +0000, Cristian Marussi wrote:
> On Fri, Nov 05, 2021 at 10:40:59AM -0700, [email protected] wrote:
> > On 2021-11-05 02:43, Cristian Marussi wrote:
> > > On Thu, Nov 04, 2021 at 04:40:03PM -0700, [email protected] wrote:
> > > > On 2021-11-02 04:32, Sudeep Holla wrote:
> > > > > On Mon, Nov 01, 2021 at 09:35:42AM -0700, [email protected] wrote:
> > > > > > On 2021-09-01 02:35, Cristian Marussi wrote:
> > > > > > > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
> > > > > > > > On Mon, Aug 30, 2021 at 02:09:37PM -0700, [email protected]
> > > > > > > > wrote:
> > > > > > > > > Hi Christian
> > > > > > > >
> > > > > > > > Hi Rishabh,
> > > > > > > >
> > >
> > > Hi Rishabh,
> > >
>
> Hi Rishabh,
>

Hi Rishabhb,

> > > apologies for the delay in coming back to you.
> > > A few comments below.
> > >
> > > > > > > > thanks for looking into this kind of bad interactions.
> > > > > > > >
> > > > > > > > > There seems to be another issue here. The response from agent can be delayed
> > > > > > > > > causing a timeout during base protocol acquire,
> > > > > > > > > which leads to the probe failure. What I have observed is sometimes the
> > > > > > > > > failure of probe and rx_callback (due to a delayed message)
> > > > > > > > > happens at the same time on different cpus.
> > > > > > > > > Because of this race, the device memory may be cleared while the
> > > > > > > > > interrupt(rx_callback) is executing on another cpu.
> > > > > > > >
> > > > > > > > You are right that concurrency was not handled properly in this kind
> > > > > > > > of
> > > > > > > > context and moreover, if you think about it, even the case of out of
> > > > > > > > order reception of responses and delayed_responses (type2 SCMI
> > > > > > > > messages)
> > > > > > > > for asynchronous SCMI commands was not handled properly.
> > > > > > > >
> > > > > > > > > How do you propose we solve this? Do you think it is better to take the
> > > > > > > > > setting up of base and other protocols out of probe and
> > > > > > > > > in some delayed work? That would imply the device memory is not released
> > > > > > > > > until remove is called. Or should we add locking to
> > > > > > > > > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
> > > > > > > > > the race?
> > > > > > > > >
> > > > > > > >
> > > > > > > > These issues were more easily exposed by SCMI Virtio transport, so in
> > > > > > > > the series where I introduced scmi-virtio:
> > > > > > > >
> > > > > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > > >
> > > > > > > > (which is now queued for v5.15 ... now on -next I think...finger
> > > > > > > > crossed)
> > > > > > > >
> > > > > > > > I took the chance to rectify a couple of other things in the SCMI core
> > > > > > > > in the initial commits.
> > > > > > > > As an example, in the above series
> > > > > > > >
> > > > > > > > [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and
> > > > > > > > out-of-order messages
> > > > > > > >
> > > > > > > > cares to add a refcount to xfers and some locking on xfers between TX
> > > > > > > > and RX path to avoid that a timed out xfer can vanish while the rx
> > > > > > > > path
> > > > > > > > is concurrently working on it (as you said); moreover I handle the
> > > > > > > > condition (rare if not unplausible anyway) in which a transport
> > > > > > > > delivers
> > > > > > > > out of order responses and delayed responses.
> > > > > > > >
> > > > > > > > I tested this scenarios on some fake emulated SCMI Virtio transport
> > > > > > > > where I could play any sort of mess and tricks to stress this limit
> > > > > > > > conditions, but you're more than welcome to verify if the race you are
> > > > > > > > seeing on Base protocol time out is solved (as I would hope :D) by
> > > > > > > > this
> > > > > > > > series of mine.
> > > > > > > >
> > > > > > > > Let me know, any feedback is welcome.
> > > > > > > >
> > > > > > > > Btw, in the series above there are also other minor changes, but there
> > > > > > > > is also another more radical change needed to ensure correctness and
> > > > > > > > protection against stale old messages which maybe could interest you
> > > > > > > > in general if you are looking into SCMI:
> > > > > > > >
> > > > > > > > [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically
> > > > > > > > increasing tokens
> > > > > > > >
> > > > > > > > Let me know if yo have other concerns.
> > > > > > > >
> > > > > > >
> > > > > > > Hi Rishabhb,
> > > > > > >
> > > > > > > just a quick remark, thinking again about your fail @probe scenario
> > > > > > > above
> > > > > > > I realized that while the concurrency patch I mentioned above could help
> > > > > > > on
> > > > > > > races against vanishing xfers when late timed-out responses are
> > > > > > > delivered,
> > > > > > > here we really are then also shutting down everything on failure, so
> > > > > > > there
> > > > > > > could be further issues between a very late invokation of
> > > > > > > scmi_rx_callback
> > > > > > > and the core devm_ helpers freeing the underlying xfer/cinfo/etc..
> > > > > > > structs
> > > > > > > used by scmi-rx-callback itself (maybe this was already what you meant
> > > > > > > and
> > > > > > > I didn't get it,...sorry)
> > > > > > >
> > > > > > > On the other side, I don't feel that delaying Base init to a deferred
> > > > > > > worker is a viable solution since we need Base protocol init to be
> > > > > > > initialized and we need to just give up if we cannot communicate with
> > > > > > > the SCMI platform fw in such early stages. (Base protocol is really the
> > > > > > > only mandatory proto is I remember correctly the spec)
> > > > > > >
> > > > > > > Currenly I'm off and only glancing at mails but I'll have a thought
> > > > > > > about
> > > > > > > these issues once back in a few weeks time.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Cristian
> > > > > > >
> > > > > > Hi Cristian
> > > > > > I hope you enjoyed your vacation. Did you get a chance to look at
> > > > > > the issue
> > > > > > stated above and have some idea as to how to solve this?
> > > > >
> > > > > Do you still see the issue with v5.15 ? Can you please check if haven't
> > > > > already done that ?
> > > > >
> > > > > Also 30ms delay we have is huge IMO and we typically expect the
> > > > > communication
> > > > > with remote processor or any entity that implements SCMI to happen in
> > > > > terms
> > > > > of one or few ms tops.
> > > > >
> > > > > If there is a race, we need to fix that but I am interested in knowing
> > > > > why the default time of 30ms not sufficient ? Did increasing that helps
> > > > > and is this timeout happening only for the initial commands(guessing the
> > > > > SCMI firmware is not yet ready) or does it happen even during run-time ?
> > > >
> > > > Hi Sudeep
> > > > I haven't checked on 5.15 but after glancing at the code I believe
> > > > we should
> > > > see the same issue.
> > > > I agree 30ms is a big enough value and should be something that remote
> > > > firmware should resolve. But
> > > > if remote firmware goes into a bad state and not functioning
> > > > properly at
> > > > least kernel should not panic.
> > > >
> > > > The issue we see here happens during scmi probe. The response from the
> > > > remote agent can be delayed
> > > > causing a timeout during base protocol acquire, which leads to the
> > > > probe
> > > > failure.
> > > > What I have observed is sometimes the failure of probe and
> > > > rx_callback (due
> > > > to a delayed message)
> > > > happens around the same time on different cpus. Because of this
> > > > race, the
> > > > device memory may be cleared
> > > > while the interrupt(rx_callback) is executing on another cpu.
> > >
> > > So I was looking at the failure path you mentioned: a late concurrent
> > > reply on Base protocol from the fw, during the probe, leads to an
> > > invocation
> > > of scmi_rx_callback() on a different CPU while core data structs like
> > > cinfo are being freed by the SCMI core on the probe failure path.
> > > (v5.15-added SCMI concurrrency handling stuff I mentiond shuld help for
> > > races regarding xfer but not for the cinfo stuff in this case ...)
> > >
> > > We cannot defer Base proto init since we just wanna fail early while
> > > probing if not even the Base protocol can work fine, and also because
> > > Base protocol information are indeed needed for initial setup, so we
> > > cannot juts proceed if we did not even got a Base reply on the number of
> > > protos. (already said)
> > >
> > > In my opinion, the proper way to address this kind of races at probe
> > > failure should be to ensure that the transport you are using is properly
> > > shut down completely before cleanup starts (same applies for a clean
> > > remove), i.e. scmi_rx_callback should not even be possibly registered to
> > > be called when the the final cleanup by the core is started (devm_ frees
> > > I mean after scmi_probe exit failing...)
> > >
> > > BUT indeed looking back at transport layers like mailbox and virtio,
> > > this
> > > should be happening already, because the flow is like
> > >
> > > scmi_probe()
> > > {
> > > ...
> > >
> > > clean_tx_rx_setup:
> > > scmi_cleanup_txrx_channels()
> > > ....
> > > --->>> ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> > > -
> > > return ret;
> > > }
> > >
> > > .... only after this scmi_probe returns the core devm layer starts
> > > freeing devm_
> > > allocated stuff like cinfo, AND the above per-transport specific
> > > .chan_free seems
> > > to take care to 'deactivate/dregister' the scmi_rx_callback at the
> > > transport layer:
> > >
> > >
> > > e.g. MBOX transport
> > > -------------------------
> > > static int mailbox_chan_free(int id, void *p, void *data)
> > > {
> > > struct scmi_chan_info *cinfo = p;
> > > struct scmi_mailbox *smbox = cinfo->transport_info;
> > >
> > > if (smbox && !IS_ERR(smbox->chan)) {
> > > mbox_free_channel(smbox->chan); <<< THIS MBOX CORE CALL DEACTIVATE
> > > cinfo->transport_info = NULL;
> > >
> > >
> > > e.g. VIRTIO Transport
> > > -----------------------------
> > > static int virtio_chan_free(int id, void *p, void *data)
> > > {
> > > unsigned long flags;
> > > struct scmi_chan_info *cinfo = p;
> > > struct scmi_vio_channel *vioch = cinfo->transport_info;
> > >
> > > spin_lock_irqsave(&vioch->ready_lock, flags);
> > > vioch->ready = false; <<<< THIS VIRTIO FLAG
> > > DEACTIVATE VIRTIO CBS INVOKCATION
> > > spin_unlock_irqrestore(&vioch->ready_lock, flags);
> > >
> > >
> > > ... AND both of the above call are indeed also spinlocked heavily, so
> > > that
> > > the 'deactivation' of the scmi_rx_callback should be visible properly;
> > > in
> > > other words I would expect that after the above .chan_free() have
> > > completed the scmi_rx_callback() cannot be called anymore, because the
> > > transport itself will properly drop any so-late fw reply.
> > >
> > > So I am now wondering, which transport are you using in your tests ?
> > > since at least for the above 2 example it seems to me that your
> > > race-on-probe failure condition should be already addressed by the
> > > transport layer itself....or am I getting wrong the nature of the race ?
> > >
> > > Thanks
> > > Cristian
> >
> > Hi Cristian
> > You caught the scenario perfectly. But there is still a possibility of a
> > race. To be clear we use
> > the mbox transport. Let me explain in more detail.
> > Lets assume that the last command (base protocol acquire) kernel sent to
> > remote agent timed out.
> > This will lead to final cleanup before exiting probe like you mentioned.
> > Once cleanup is done(mailbox_chan_free)
> > no more responses from remote agent will acknowledged but if the response
> > comes in between the cleanup in probe
> > and the last command timing out we will see a race since the response can
> > come asynchronously. In this scenario cleanup
> > and scmi_rx_callback race with each other.
> > I believe to solve this we need to synchronize cleanup with
> > scmi_rx_callback. we can serialize these two paths
> > and exit early in rx_callback if cleanup has been completed.
> >
>
> Yes indeed, but my concern is also not to introduce to much contention
> on the RX path (with irqsave spinlocking & friends), given that this racy
> scenario has surely to be handled but it is also highly unlikely, so I don't
> want to slow down all the rx path all the time.
>
> So I tried something along this lines:
>
> ----8<------
> diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h
> index dea1bfbe1052..036f8ccff450 100644
> --- a/drivers/firmware/arm_scmi/common.h
> +++ b/drivers/firmware/arm_scmi/common.h
> @@ -340,11 +340,13 @@ void scmi_protocol_release(const struct scmi_handle *handle, u8 protocol_id);
> * channel
> * @handle: Pointer to SCMI entity handle
> * @transport_info: Transport layer related information
> + * @users: A refcount to track active users of this channel
> */
> struct scmi_chan_info {
> struct device *dev;
> struct scmi_handle *handle;
> void *transport_info;
> + refcount_t users;
> };
>
> /**
> diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
> index b406b3f78f46..5814ed3f444e 100644
> --- a/drivers/firmware/arm_scmi/driver.c
> +++ b/drivers/firmware/arm_scmi/driver.c
> @@ -678,6 +678,16 @@ static void scmi_handle_response(struct scmi_chan_info *cinfo,
> scmi_xfer_command_release(info, xfer);
> }
>
> +static inline bool scmi_acquire_channel(struct scmi_chan_info *cinfo)
> +{
> + return refcount_inc_not_zero(&cinfo->users);
> +}
> +
> +static inline void scmi_release_channel(struct scmi_chan_info *cinfo)
> +{
> + return refcount_dec(&cinfo->users);
> +}
> +
> /**
> * scmi_rx_callback() - callback for receiving messages
> *
> @@ -695,6 +705,10 @@ void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr, void *priv)
> {
> u8 msg_type = MSG_XTRACT_TYPE(msg_hdr);
>
> + /* Bail out if channel freed already */
> + if (!scmi_acquire_channel(cinfo))
> + return;
> +
> switch (msg_type) {
> case MSG_TYPE_NOTIFICATION:
> scmi_handle_notification(cinfo, msg_hdr, priv);
> @@ -707,6 +721,8 @@ void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr, void *priv)
> WARN_ONCE(1, "received unknown msg_type:%d\n", msg_type);
> break;
> }
> +
> + scmi_release_channel(cinfo);
> }
>
> /**
> @@ -1506,10 +1522,27 @@ static int scmi_chan_setup(struct scmi_info *info, struct device *dev,
> return ret;
> }
>
> + refcount_set(&cinfo->users, 1);
> cinfo->handle = &info->handle;
> return 0;
> }
>
> +static int scmi_chan_free(int id, void *p, void *data)
> +{
> + struct scmi_chan_info *cinfo = p;
> + struct scmi_info *info = handle_to_scmi_info(cinfo->handle);
> +
> + if (refcount_dec_and_test(&cinfo->users)) {
> + info->desc->ops->chan_free(id, cinfo, data);
> + } else {
> + /* Stall till the ongoing rx_callback completes */
> + spin_until_cond(refcount_read(&cinfo->users) == 0);
> + info->desc->ops->chan_free(id, cinfo, data);
> + }
> +
> + return 0;
> +}
> +
> static inline int
> scmi_txrx_setup(struct scmi_info *info, struct device *dev, int prot_id)
> {
> @@ -1792,11 +1825,11 @@ static int scmi_cleanup_txrx_channels(struct scmi_info *info)
> int ret;
> struct idr *idr = &info->tx_idr;
>
> - ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> + ret = idr_for_each(idr, scmi_chan_free, idr);
> idr_destroy(&info->tx_idr);
>
> idr = &info->rx_idr;
> - ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> + ret = idr_for_each(idr, scmi_chan_free, idr);
> idr_destroy(&info->rx_idr);
>
> return ret;
>
> ------8<-----
>
> Can you give it a go on your setup ?
>
> Beware it is not really tested on the racy error path (:P) and I could have
> still missed something regarding synchro (and I expect an undesired refcount
> warn on the scmi_release_channel too when the race is hit....but just to
> experiment a bit for now and see if something like this could be enough while
> avoiding further locking)
> o

Looking back at this patch of mine, even though it could work for the racy issue at
hand, it is currently clearly completely broken on the regular unload/free
flow since cinfo structs can be re-used multiple times.
Sorry, please ignore this attempt, I'll rework in a more sensible way.

Thanks,
Cristian

2021-11-09 02:26:43

by Rishabh Bhatnagar

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On 2021-11-07 10:22, Cristian Marussi wrote:
> On Sun, Nov 07, 2021 at 10:34:07AM +0000, Cristian Marussi wrote:
>> On Fri, Nov 05, 2021 at 10:40:59AM -0700, [email protected]
>> wrote:
>> > On 2021-11-05 02:43, Cristian Marussi wrote:
>> > > On Thu, Nov 04, 2021 at 04:40:03PM -0700, [email protected] wrote:
>> > > > On 2021-11-02 04:32, Sudeep Holla wrote:
>> > > > > On Mon, Nov 01, 2021 at 09:35:42AM -0700, [email protected] wrote:
>> > > > > > On 2021-09-01 02:35, Cristian Marussi wrote:
>> > > > > > > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
>> > > > > > > > On Mon, Aug 30, 2021 at 02:09:37PM -0700, [email protected]
>> > > > > > > > wrote:
>> > > > > > > > > Hi Christian
>> > > > > > > >
>> > > > > > > > Hi Rishabh,
>> > > > > > > >
>> > >
>> > > Hi Rishabh,
>> > >
>>
>> Hi Rishabh,
>>
>
> Hi Rishabhb,
>
>> > > apologies for the delay in coming back to you.
>> > > A few comments below.
>> > >
>> > > > > > > > thanks for looking into this kind of bad interactions.
>> > > > > > > >
>> > > > > > > > > There seems to be another issue here. The response from agent can be delayed
>> > > > > > > > > causing a timeout during base protocol acquire,
>> > > > > > > > > which leads to the probe failure. What I have observed is sometimes the
>> > > > > > > > > failure of probe and rx_callback (due to a delayed message)
>> > > > > > > > > happens at the same time on different cpus.
>> > > > > > > > > Because of this race, the device memory may be cleared while the
>> > > > > > > > > interrupt(rx_callback) is executing on another cpu.
>> > > > > > > >
>> > > > > > > > You are right that concurrency was not handled properly in this kind
>> > > > > > > > of
>> > > > > > > > context and moreover, if you think about it, even the case of out of
>> > > > > > > > order reception of responses and delayed_responses (type2 SCMI
>> > > > > > > > messages)
>> > > > > > > > for asynchronous SCMI commands was not handled properly.
>> > > > > > > >
>> > > > > > > > > How do you propose we solve this? Do you think it is better to take the
>> > > > > > > > > setting up of base and other protocols out of probe and
>> > > > > > > > > in some delayed work? That would imply the device memory is not released
>> > > > > > > > > until remove is called. Or should we add locking to
>> > > > > > > > > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
>> > > > > > > > > the race?
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > These issues were more easily exposed by SCMI Virtio transport, so in
>> > > > > > > > the series where I introduced scmi-virtio:
>> > > > > > > >
>> > > > > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
>> > > > > > > >
>> > > > > > > > (which is now queued for v5.15 ... now on -next I think...finger
>> > > > > > > > crossed)
>> > > > > > > >
>> > > > > > > > I took the chance to rectify a couple of other things in the SCMI core
>> > > > > > > > in the initial commits.
>> > > > > > > > As an example, in the above series
>> > > > > > > >
>> > > > > > > > [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and
>> > > > > > > > out-of-order messages
>> > > > > > > >
>> > > > > > > > cares to add a refcount to xfers and some locking on xfers between TX
>> > > > > > > > and RX path to avoid that a timed out xfer can vanish while the rx
>> > > > > > > > path
>> > > > > > > > is concurrently working on it (as you said); moreover I handle the
>> > > > > > > > condition (rare if not unplausible anyway) in which a transport
>> > > > > > > > delivers
>> > > > > > > > out of order responses and delayed responses.
>> > > > > > > >
>> > > > > > > > I tested this scenarios on some fake emulated SCMI Virtio transport
>> > > > > > > > where I could play any sort of mess and tricks to stress this limit
>> > > > > > > > conditions, but you're more than welcome to verify if the race you are
>> > > > > > > > seeing on Base protocol time out is solved (as I would hope :D) by
>> > > > > > > > this
>> > > > > > > > series of mine.
>> > > > > > > >
>> > > > > > > > Let me know, any feedback is welcome.
>> > > > > > > >
>> > > > > > > > Btw, in the series above there are also other minor changes, but there
>> > > > > > > > is also another more radical change needed to ensure correctness and
>> > > > > > > > protection against stale old messages which maybe could interest you
>> > > > > > > > in general if you are looking into SCMI:
>> > > > > > > >
>> > > > > > > > [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically
>> > > > > > > > increasing tokens
>> > > > > > > >
>> > > > > > > > Let me know if yo have other concerns.
>> > > > > > > >
>> > > > > > >
>> > > > > > > Hi Rishabhb,
>> > > > > > >
>> > > > > > > just a quick remark, thinking again about your fail @probe scenario
>> > > > > > > above
>> > > > > > > I realized that while the concurrency patch I mentioned above could help
>> > > > > > > on
>> > > > > > > races against vanishing xfers when late timed-out responses are
>> > > > > > > delivered,
>> > > > > > > here we really are then also shutting down everything on failure, so
>> > > > > > > there
>> > > > > > > could be further issues between a very late invokation of
>> > > > > > > scmi_rx_callback
>> > > > > > > and the core devm_ helpers freeing the underlying xfer/cinfo/etc..
>> > > > > > > structs
>> > > > > > > used by scmi-rx-callback itself (maybe this was already what you meant
>> > > > > > > and
>> > > > > > > I didn't get it,...sorry)
>> > > > > > >
>> > > > > > > On the other side, I don't feel that delaying Base init to a deferred
>> > > > > > > worker is a viable solution since we need Base protocol init to be
>> > > > > > > initialized and we need to just give up if we cannot communicate with
>> > > > > > > the SCMI platform fw in such early stages. (Base protocol is really the
>> > > > > > > only mandatory proto is I remember correctly the spec)
>> > > > > > >
>> > > > > > > Currenly I'm off and only glancing at mails but I'll have a thought
>> > > > > > > about
>> > > > > > > these issues once back in a few weeks time.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Cristian
>> > > > > > >
>> > > > > > Hi Cristian
>> > > > > > I hope you enjoyed your vacation. Did you get a chance to look at
>> > > > > > the issue
>> > > > > > stated above and have some idea as to how to solve this?
>> > > > >
>> > > > > Do you still see the issue with v5.15 ? Can you please check if haven't
>> > > > > already done that ?
>> > > > >
>> > > > > Also 30ms delay we have is huge IMO and we typically expect the
>> > > > > communication
>> > > > > with remote processor or any entity that implements SCMI to happen in
>> > > > > terms
>> > > > > of one or few ms tops.
>> > > > >
>> > > > > If there is a race, we need to fix that but I am interested in knowing
>> > > > > why the default time of 30ms not sufficient ? Did increasing that helps
>> > > > > and is this timeout happening only for the initial commands(guessing the
>> > > > > SCMI firmware is not yet ready) or does it happen even during run-time ?
>> > > >
>> > > > Hi Sudeep
>> > > > I haven't checked on 5.15 but after glancing at the code I believe
>> > > > we should
>> > > > see the same issue.
>> > > > I agree 30ms is a big enough value and should be something that remote
>> > > > firmware should resolve. But
>> > > > if remote firmware goes into a bad state and not functioning
>> > > > properly at
>> > > > least kernel should not panic.
>> > > >
>> > > > The issue we see here happens during scmi probe. The response from the
>> > > > remote agent can be delayed
>> > > > causing a timeout during base protocol acquire, which leads to the
>> > > > probe
>> > > > failure.
>> > > > What I have observed is sometimes the failure of probe and
>> > > > rx_callback (due
>> > > > to a delayed message)
>> > > > happens around the same time on different cpus. Because of this
>> > > > race, the
>> > > > device memory may be cleared
>> > > > while the interrupt(rx_callback) is executing on another cpu.
>> > >
>> > > So I was looking at the failure path you mentioned: a late concurrent
>> > > reply on Base protocol from the fw, during the probe, leads to an
>> > > invocation
>> > > of scmi_rx_callback() on a different CPU while core data structs like
>> > > cinfo are being freed by the SCMI core on the probe failure path.
>> > > (v5.15-added SCMI concurrrency handling stuff I mentiond shuld help for
>> > > races regarding xfer but not for the cinfo stuff in this case ...)
>> > >
>> > > We cannot defer Base proto init since we just wanna fail early while
>> > > probing if not even the Base protocol can work fine, and also because
>> > > Base protocol information are indeed needed for initial setup, so we
>> > > cannot juts proceed if we did not even got a Base reply on the number of
>> > > protos. (already said)
>> > >
>> > > In my opinion, the proper way to address this kind of races at probe
>> > > failure should be to ensure that the transport you are using is properly
>> > > shut down completely before cleanup starts (same applies for a clean
>> > > remove), i.e. scmi_rx_callback should not even be possibly registered to
>> > > be called when the the final cleanup by the core is started (devm_ frees
>> > > I mean after scmi_probe exit failing...)
>> > >
>> > > BUT indeed looking back at transport layers like mailbox and virtio,
>> > > this
>> > > should be happening already, because the flow is like
>> > >
>> > > scmi_probe()
>> > > {
>> > > ...
>> > >
>> > > clean_tx_rx_setup:
>> > > scmi_cleanup_txrx_channels()
>> > > ....
>> > > --->>> ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> > > -
>> > > return ret;
>> > > }
>> > >
>> > > .... only after this scmi_probe returns the core devm layer starts
>> > > freeing devm_
>> > > allocated stuff like cinfo, AND the above per-transport specific
>> > > .chan_free seems
>> > > to take care to 'deactivate/dregister' the scmi_rx_callback at the
>> > > transport layer:
>> > >
>> > >
>> > > e.g. MBOX transport
>> > > -------------------------
>> > > static int mailbox_chan_free(int id, void *p, void *data)
>> > > {
>> > > struct scmi_chan_info *cinfo = p;
>> > > struct scmi_mailbox *smbox = cinfo->transport_info;
>> > >
>> > > if (smbox && !IS_ERR(smbox->chan)) {
>> > > mbox_free_channel(smbox->chan); <<< THIS MBOX CORE CALL DEACTIVATE
>> > > cinfo->transport_info = NULL;
>> > >
>> > >
>> > > e.g. VIRTIO Transport
>> > > -----------------------------
>> > > static int virtio_chan_free(int id, void *p, void *data)
>> > > {
>> > > unsigned long flags;
>> > > struct scmi_chan_info *cinfo = p;
>> > > struct scmi_vio_channel *vioch = cinfo->transport_info;
>> > >
>> > > spin_lock_irqsave(&vioch->ready_lock, flags);
>> > > vioch->ready = false; <<<< THIS VIRTIO FLAG
>> > > DEACTIVATE VIRTIO CBS INVOKCATION
>> > > spin_unlock_irqrestore(&vioch->ready_lock, flags);
>> > >
>> > >
>> > > ... AND both of the above call are indeed also spinlocked heavily, so
>> > > that
>> > > the 'deactivation' of the scmi_rx_callback should be visible properly;
>> > > in
>> > > other words I would expect that after the above .chan_free() have
>> > > completed the scmi_rx_callback() cannot be called anymore, because the
>> > > transport itself will properly drop any so-late fw reply.
>> > >
>> > > So I am now wondering, which transport are you using in your tests ?
>> > > since at least for the above 2 example it seems to me that your
>> > > race-on-probe failure condition should be already addressed by the
>> > > transport layer itself....or am I getting wrong the nature of the race ?
>> > >
>> > > Thanks
>> > > Cristian
>> >
>> > Hi Cristian
>> > You caught the scenario perfectly. But there is still a possibility of a
>> > race. To be clear we use
>> > the mbox transport. Let me explain in more detail.
>> > Lets assume that the last command (base protocol acquire) kernel sent to
>> > remote agent timed out.
>> > This will lead to final cleanup before exiting probe like you mentioned.
>> > Once cleanup is done(mailbox_chan_free)
>> > no more responses from remote agent will acknowledged but if the response
>> > comes in between the cleanup in probe
>> > and the last command timing out we will see a race since the response can
>> > come asynchronously. In this scenario cleanup
>> > and scmi_rx_callback race with each other.
>> > I believe to solve this we need to synchronize cleanup with
>> > scmi_rx_callback. we can serialize these two paths
>> > and exit early in rx_callback if cleanup has been completed.
>> >
>>
>> Yes indeed, but my concern is also not to introduce to much contention
>> on the RX path (with irqsave spinlocking & friends), given that this
>> racy
>> scenario has surely to be handled but it is also highly unlikely, so I
>> don't
>> want to slow down all the rx path all the time.
>>
>> So I tried something along this lines:
>>
>> ----8<------
>> diff --git a/drivers/firmware/arm_scmi/common.h
>> b/drivers/firmware/arm_scmi/common.h
>> index dea1bfbe1052..036f8ccff450 100644
>> --- a/drivers/firmware/arm_scmi/common.h
>> +++ b/drivers/firmware/arm_scmi/common.h
>> @@ -340,11 +340,13 @@ void scmi_protocol_release(const struct
>> scmi_handle *handle, u8 protocol_id);
>> * channel
>> * @handle: Pointer to SCMI entity handle
>> * @transport_info: Transport layer related information
>> + * @users: A refcount to track active users of this channel
>> */
>> struct scmi_chan_info {
>> struct device *dev;
>> struct scmi_handle *handle;
>> void *transport_info;
>> + refcount_t users;
>> };
>>
>> /**
>> diff --git a/drivers/firmware/arm_scmi/driver.c
>> b/drivers/firmware/arm_scmi/driver.c
>> index b406b3f78f46..5814ed3f444e 100644
>> --- a/drivers/firmware/arm_scmi/driver.c
>> +++ b/drivers/firmware/arm_scmi/driver.c
>> @@ -678,6 +678,16 @@ static void scmi_handle_response(struct
>> scmi_chan_info *cinfo,
>> scmi_xfer_command_release(info, xfer);
>> }
>>
>> +static inline bool scmi_acquire_channel(struct scmi_chan_info *cinfo)
>> +{
>> + return refcount_inc_not_zero(&cinfo->users);
>> +}
>> +
>> +static inline void scmi_release_channel(struct scmi_chan_info *cinfo)
>> +{
>> + return refcount_dec(&cinfo->users);
>> +}
>> +
>> /**
>> * scmi_rx_callback() - callback for receiving messages
>> *
>> @@ -695,6 +705,10 @@ void scmi_rx_callback(struct scmi_chan_info
>> *cinfo, u32 msg_hdr, void *priv)
>> {
>> u8 msg_type = MSG_XTRACT_TYPE(msg_hdr);
>>
>> + /* Bail out if channel freed already */
>> + if (!scmi_acquire_channel(cinfo))
>> + return;
>> +
>> switch (msg_type) {
>> case MSG_TYPE_NOTIFICATION:
>> scmi_handle_notification(cinfo, msg_hdr, priv);
>> @@ -707,6 +721,8 @@ void scmi_rx_callback(struct scmi_chan_info
>> *cinfo, u32 msg_hdr, void *priv)
>> WARN_ONCE(1, "received unknown msg_type:%d\n", msg_type);
>> break;
>> }
>> +
>> + scmi_release_channel(cinfo);
>> }
>>
>> /**
>> @@ -1506,10 +1522,27 @@ static int scmi_chan_setup(struct scmi_info
>> *info, struct device *dev,
>> return ret;
>> }
>>
>> + refcount_set(&cinfo->users, 1);
>> cinfo->handle = &info->handle;
>> return 0;
>> }
>>
>> +static int scmi_chan_free(int id, void *p, void *data)
>> +{
>> + struct scmi_chan_info *cinfo = p;
>> + struct scmi_info *info = handle_to_scmi_info(cinfo->handle);
>> +
>> + if (refcount_dec_and_test(&cinfo->users)) {
>> + info->desc->ops->chan_free(id, cinfo, data);
>> + } else {
>> + /* Stall till the ongoing rx_callback completes */
>> + spin_until_cond(refcount_read(&cinfo->users) == 0);
>> + info->desc->ops->chan_free(id, cinfo, data);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static inline int
>> scmi_txrx_setup(struct scmi_info *info, struct device *dev, int
>> prot_id)
>> {
>> @@ -1792,11 +1825,11 @@ static int scmi_cleanup_txrx_channels(struct
>> scmi_info *info)
>> int ret;
>> struct idr *idr = &info->tx_idr;
>>
>> - ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> + ret = idr_for_each(idr, scmi_chan_free, idr);
>> idr_destroy(&info->tx_idr);
>>
>> idr = &info->rx_idr;
>> - ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> + ret = idr_for_each(idr, scmi_chan_free, idr);
>> idr_destroy(&info->rx_idr);
>>
>> return ret;
>>
>> ------8<-----
>>
>> Can you give it a go on your setup ?
>>
>> Beware it is not really tested on the racy error path (:P) and I could
>> have
>> still missed something regarding synchro (and I expect an undesired
>> refcount
>> warn on the scmi_release_channel too when the race is hit....but just
>> to
>> experiment a bit for now and see if something like this could be
>> enough while
>> avoiding further locking)
>> o
>
> Looking back at this patch of mine, even though it could work for the
> racy issue at
> hand, it is currently clearly completely broken on the regular
> unload/free
> flow since cinfo structs can be re-used multiple times.
> Sorry, please ignore this attempt, I'll rework in a more sensible way.
>
> Thanks,
> Cristian
Hi Cristian
Thanks for looking into this. One more issue I see with this is cinfo is
allocated
as part of device memory so it will be freed when probe exit path
completes. So we can't
really be using that to check the refcount I guess? IMO this should be a
global memory variable.
Thanks
Rishabh

2021-11-09 23:56:44

by Cristian Marussi

[permalink] [raw]
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

On Mon, Nov 08, 2021 at 09:58:57AM -0800, [email protected] wrote:
> On 2021-11-07 10:22, Cristian Marussi wrote:
> > On Sun, Nov 07, 2021 at 10:34:07AM +0000, Cristian Marussi wrote:
> > > On Fri, Nov 05, 2021 at 10:40:59AM -0700, [email protected]
> > > wrote:
> > > > On 2021-11-05 02:43, Cristian Marussi wrote:
> > > > > On Thu, Nov 04, 2021 at 04:40:03PM -0700, [email protected] wrote:
> > > > > > On 2021-11-02 04:32, Sudeep Holla wrote:
> > > > > > > On Mon, Nov 01, 2021 at 09:35:42AM -0700, [email protected] wrote:
> > > > > > > > On 2021-09-01 02:35, Cristian Marussi wrote:
> > > > > > > > > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
> > > > > > > > > > On Mon, Aug 30, 2021 at 02:09:37PM -0700, [email protected]
> > > > > > > > > > wrote:
> > > > > > > > > > > Hi Christian
> > > > > > > > > >
> > > > > > > > > > Hi Rishabh,
> > > > > > > > > >
> > > > >
> > > > > Hi Rishabh,
> > > > >
> > >
> > > Hi Rishabh,
> > >
> >
> > Hi Rishabhb,
> >
> > > > > apologies for the delay in coming back to you.
> > > > > A few comments below.
> > > > >
> > > > > > > > > > thanks for looking into this kind of bad interactions.
> > > > > > > > > >
> > > > > > > > > > > There seems to be another issue here. The response from agent can be delayed
> > > > > > > > > > > causing a timeout during base protocol acquire,
> > > > > > > > > > > which leads to the probe failure. What I have observed is sometimes the
> > > > > > > > > > > failure of probe and rx_callback (due to a delayed message)
> > > > > > > > > > > happens at the same time on different cpus.
> > > > > > > > > > > Because of this race, the device memory may be cleared while the
> > > > > > > > > > > interrupt(rx_callback) is executing on another cpu.
> > > > > > > > > >
> > > > > > > > > > You are right that concurrency was not handled properly in this kind
> > > > > > > > > > of
> > > > > > > > > > context and moreover, if you think about it, even the case of out of
> > > > > > > > > > order reception of responses and delayed_responses (type2 SCMI
> > > > > > > > > > messages)
> > > > > > > > > > for asynchronous SCMI commands was not handled properly.
> > > > > > > > > >
> > > > > > > > > > > How do you propose we solve this? Do you think it is better to take the
> > > > > > > > > > > setting up of base and other protocols out of probe and
> > > > > > > > > > > in some delayed work? That would imply the device memory is not released
> > > > > > > > > > > until remove is called. Or should we add locking to
> > > > > > > > > > > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
> > > > > > > > > > > the race?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > These issues were more easily exposed by SCMI Virtio transport, so in
> > > > > > > > > > the series where I introduced scmi-virtio:
> > > > > > > > > >
> > > > > > > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > > > > >
> > > > > > > > > > (which is now queued for v5.15 ... now on -next I think...finger
> > > > > > > > > > crossed)
> > > > > > > > > >
> > > > > > > > > > I took the chance to rectify a couple of other things in the SCMI core
> > > > > > > > > > in the initial commits.
> > > > > > > > > > As an example, in the above series
> > > > > > > > > >
> > > > > > > > > > [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and
> > > > > > > > > > out-of-order messages
> > > > > > > > > >
> > > > > > > > > > cares to add a refcount to xfers and some locking on xfers between TX
> > > > > > > > > > and RX path to avoid that a timed out xfer can vanish while the rx
> > > > > > > > > > path
> > > > > > > > > > is concurrently working on it (as you said); moreover I handle the
> > > > > > > > > > condition (rare if not unplausible anyway) in which a transport
> > > > > > > > > > delivers
> > > > > > > > > > out of order responses and delayed responses.
> > > > > > > > > >
> > > > > > > > > > I tested this scenarios on some fake emulated SCMI Virtio transport
> > > > > > > > > > where I could play any sort of mess and tricks to stress this limit
> > > > > > > > > > conditions, but you're more than welcome to verify if the race you are
> > > > > > > > > > seeing on Base protocol time out is solved (as I would hope :D) by
> > > > > > > > > > this
> > > > > > > > > > series of mine.
> > > > > > > > > >
> > > > > > > > > > Let me know, any feedback is welcome.
> > > > > > > > > >
> > > > > > > > > > Btw, in the series above there are also other minor changes, but there
> > > > > > > > > > is also another more radical change needed to ensure correctness and
> > > > > > > > > > protection against stale old messages which maybe could interest you
> > > > > > > > > > in general if you are looking into SCMI:
> > > > > > > > > >
> > > > > > > > > > [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically
> > > > > > > > > > increasing tokens
> > > > > > > > > >
> > > > > > > > > > Let me know if yo have other concerns.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Rishabhb,
> > > > > > > > >
> > > > > > > > > just a quick remark, thinking again about your fail @probe scenario
> > > > > > > > > above
> > > > > > > > > I realized that while the concurrency patch I mentioned above could help
> > > > > > > > > on
> > > > > > > > > races against vanishing xfers when late timed-out responses are
> > > > > > > > > delivered,
> > > > > > > > > here we really are then also shutting down everything on failure, so
> > > > > > > > > there
> > > > > > > > > could be further issues between a very late invokation of
> > > > > > > > > scmi_rx_callback
> > > > > > > > > and the core devm_ helpers freeing the underlying xfer/cinfo/etc..
> > > > > > > > > structs
> > > > > > > > > used by scmi-rx-callback itself (maybe this was already what you meant
> > > > > > > > > and
> > > > > > > > > I didn't get it,...sorry)
> > > > > > > > >
> > > > > > > > > On the other side, I don't feel that delaying Base init to a deferred
> > > > > > > > > worker is a viable solution since we need Base protocol init to be
> > > > > > > > > initialized and we need to just give up if we cannot communicate with
> > > > > > > > > the SCMI platform fw in such early stages. (Base protocol is really the
> > > > > > > > > only mandatory proto is I remember correctly the spec)
> > > > > > > > >
> > > > > > > > > Currenly I'm off and only glancing at mails but I'll have a thought
> > > > > > > > > about
> > > > > > > > > these issues once back in a few weeks time.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Cristian
> > > > > > > > >
> > > > > > > > Hi Cristian
> > > > > > > > I hope you enjoyed your vacation. Did you get a chance to look at
> > > > > > > > the issue
> > > > > > > > stated above and have some idea as to how to solve this?
> > > > > > >
> > > > > > > Do you still see the issue with v5.15 ? Can you please check if haven't
> > > > > > > already done that ?
> > > > > > >
> > > > > > > Also 30ms delay we have is huge IMO and we typically expect the
> > > > > > > communication
> > > > > > > with remote processor or any entity that implements SCMI to happen in
> > > > > > > terms
> > > > > > > of one or few ms tops.
> > > > > > >
> > > > > > > If there is a race, we need to fix that but I am interested in knowing
> > > > > > > why the default time of 30ms not sufficient ? Did increasing that helps
> > > > > > > and is this timeout happening only for the initial commands(guessing the
> > > > > > > SCMI firmware is not yet ready) or does it happen even during run-time ?
> > > > > >
> > > > > > Hi Sudeep
> > > > > > I haven't checked on 5.15 but after glancing at the code I believe
> > > > > > we should
> > > > > > see the same issue.
> > > > > > I agree 30ms is a big enough value and should be something that remote
> > > > > > firmware should resolve. But
> > > > > > if remote firmware goes into a bad state and not functioning
> > > > > > properly at
> > > > > > least kernel should not panic.
> > > > > >
> > > > > > The issue we see here happens during scmi probe. The response from the
> > > > > > remote agent can be delayed
> > > > > > causing a timeout during base protocol acquire, which leads to the
> > > > > > probe
> > > > > > failure.
> > > > > > What I have observed is sometimes the failure of probe and
> > > > > > rx_callback (due
> > > > > > to a delayed message)
> > > > > > happens around the same time on different cpus. Because of this
> > > > > > race, the
> > > > > > device memory may be cleared
> > > > > > while the interrupt(rx_callback) is executing on another cpu.
> > > > >
> > > > > So I was looking at the failure path you mentioned: a late concurrent
> > > > > reply on Base protocol from the fw, during the probe, leads to an
> > > > > invocation
> > > > > of scmi_rx_callback() on a different CPU while core data structs like
> > > > > cinfo are being freed by the SCMI core on the probe failure path.
> > > > > (v5.15-added SCMI concurrrency handling stuff I mentiond shuld help for
> > > > > races regarding xfer but not for the cinfo stuff in this case ...)
> > > > >
> > > > > We cannot defer Base proto init since we just wanna fail early while
> > > > > probing if not even the Base protocol can work fine, and also because
> > > > > Base protocol information are indeed needed for initial setup, so we
> > > > > cannot juts proceed if we did not even got a Base reply on the number of
> > > > > protos. (already said)
> > > > >
> > > > > In my opinion, the proper way to address this kind of races at probe
> > > > > failure should be to ensure that the transport you are using is properly
> > > > > shut down completely before cleanup starts (same applies for a clean
> > > > > remove), i.e. scmi_rx_callback should not even be possibly registered to
> > > > > be called when the the final cleanup by the core is started (devm_ frees
> > > > > I mean after scmi_probe exit failing...)
> > > > >
> > > > > BUT indeed looking back at transport layers like mailbox and virtio,
> > > > > this
> > > > > should be happening already, because the flow is like
> > > > >
> > > > > scmi_probe()
> > > > > {
> > > > > ...
> > > > >
> > > > > clean_tx_rx_setup:
> > > > > scmi_cleanup_txrx_channels()
> > > > > ....
> > > > > --->>> ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> > > > > -
> > > > > return ret;
> > > > > }
> > > > >
> > > > > .... only after this scmi_probe returns the core devm layer starts
> > > > > freeing devm_
> > > > > allocated stuff like cinfo, AND the above per-transport specific
> > > > > .chan_free seems
> > > > > to take care to 'deactivate/dregister' the scmi_rx_callback at the
> > > > > transport layer:
> > > > >
> > > > >
> > > > > e.g. MBOX transport
> > > > > -------------------------
> > > > > static int mailbox_chan_free(int id, void *p, void *data)
> > > > > {
> > > > > struct scmi_chan_info *cinfo = p;
> > > > > struct scmi_mailbox *smbox = cinfo->transport_info;
> > > > >
> > > > > if (smbox && !IS_ERR(smbox->chan)) {
> > > > > mbox_free_channel(smbox->chan); <<< THIS MBOX CORE CALL DEACTIVATE
> > > > > cinfo->transport_info = NULL;
> > > > >
> > > > >
> > > > > e.g. VIRTIO Transport
> > > > > -----------------------------
> > > > > static int virtio_chan_free(int id, void *p, void *data)
> > > > > {
> > > > > unsigned long flags;
> > > > > struct scmi_chan_info *cinfo = p;
> > > > > struct scmi_vio_channel *vioch = cinfo->transport_info;
> > > > >
> > > > > spin_lock_irqsave(&vioch->ready_lock, flags);
> > > > > vioch->ready = false; <<<< THIS VIRTIO FLAG
> > > > > DEACTIVATE VIRTIO CBS INVOKCATION
> > > > > spin_unlock_irqrestore(&vioch->ready_lock, flags);
> > > > >
> > > > >
> > > > > ... AND both of the above call are indeed also spinlocked heavily, so
> > > > > that
> > > > > the 'deactivation' of the scmi_rx_callback should be visible properly;
> > > > > in
> > > > > other words I would expect that after the above .chan_free() have
> > > > > completed the scmi_rx_callback() cannot be called anymore, because the
> > > > > transport itself will properly drop any so-late fw reply.
> > > > >
> > > > > So I am now wondering, which transport are you using in your tests ?
> > > > > since at least for the above 2 example it seems to me that your
> > > > > race-on-probe failure condition should be already addressed by the
> > > > > transport layer itself....or am I getting wrong the nature of the race ?
> > > > >
> > > > > Thanks
> > > > > Cristian
> > > >
> > > > Hi Cristian
> > > > You caught the scenario perfectly. But there is still a possibility of a
> > > > race. To be clear we use
> > > > the mbox transport. Let me explain in more detail.
> > > > Lets assume that the last command (base protocol acquire) kernel sent to
> > > > remote agent timed out.
> > > > This will lead to final cleanup before exiting probe like you mentioned.
> > > > Once cleanup is done(mailbox_chan_free)
> > > > no more responses from remote agent will acknowledged but if the response
> > > > comes in between the cleanup in probe
> > > > and the last command timing out we will see a race since the response can
> > > > come asynchronously. In this scenario cleanup
> > > > and scmi_rx_callback race with each other.
> > > > I believe to solve this we need to synchronize cleanup with
> > > > scmi_rx_callback. we can serialize these two paths
> > > > and exit early in rx_callback if cleanup has been completed.
> > > >
> > >
> > > Yes indeed, but my concern is also not to introduce to much contention
> > > on the RX path (with irqsave spinlocking & friends), given that this
> > > racy
> > > scenario has surely to be handled but it is also highly unlikely, so
> > > I don't
> > > want to slow down all the rx path all the time.
> > >
> > > So I tried something along this lines:
> > >
> > > ----8<------
> > > diff --git a/drivers/firmware/arm_scmi/common.h
> > > b/drivers/firmware/arm_scmi/common.h
> > > index dea1bfbe1052..036f8ccff450 100644
> > > --- a/drivers/firmware/arm_scmi/common.h
> > > +++ b/drivers/firmware/arm_scmi/common.h
> > > @@ -340,11 +340,13 @@ void scmi_protocol_release(const struct
> > > scmi_handle *handle, u8 protocol_id);
> > > * channel
> > > * @handle: Pointer to SCMI entity handle
> > > * @transport_info: Transport layer related information
> > > + * @users: A refcount to track active users of this channel
> > > */
> > > struct scmi_chan_info {
> > > struct device *dev;
> > > struct scmi_handle *handle;
> > > void *transport_info;
> > > + refcount_t users;
> > > };
> > >
> > > /**
> > > diff --git a/drivers/firmware/arm_scmi/driver.c
> > > b/drivers/firmware/arm_scmi/driver.c
> > > index b406b3f78f46..5814ed3f444e 100644
> > > --- a/drivers/firmware/arm_scmi/driver.c
> > > +++ b/drivers/firmware/arm_scmi/driver.c
> > > @@ -678,6 +678,16 @@ static void scmi_handle_response(struct
> > > scmi_chan_info *cinfo,
> > > scmi_xfer_command_release(info, xfer);
> > > }
> > >
> > > +static inline bool scmi_acquire_channel(struct scmi_chan_info *cinfo)
> > > +{
> > > + return refcount_inc_not_zero(&cinfo->users);
> > > +}
> > > +
> > > +static inline void scmi_release_channel(struct scmi_chan_info *cinfo)
> > > +{
> > > + return refcount_dec(&cinfo->users);
> > > +}
> > > +
> > > /**
> > > * scmi_rx_callback() - callback for receiving messages
> > > *
> > > @@ -695,6 +705,10 @@ void scmi_rx_callback(struct scmi_chan_info
> > > *cinfo, u32 msg_hdr, void *priv)
> > > {
> > > u8 msg_type = MSG_XTRACT_TYPE(msg_hdr);
> > >
> > > + /* Bail out if channel freed already */
> > > + if (!scmi_acquire_channel(cinfo))
> > > + return;
> > > +
> > > switch (msg_type) {
> > > case MSG_TYPE_NOTIFICATION:
> > > scmi_handle_notification(cinfo, msg_hdr, priv);
> > > @@ -707,6 +721,8 @@ void scmi_rx_callback(struct scmi_chan_info
> > > *cinfo, u32 msg_hdr, void *priv)
> > > WARN_ONCE(1, "received unknown msg_type:%d\n", msg_type);
> > > break;
> > > }
> > > +
> > > + scmi_release_channel(cinfo);
> > > }
> > >
> > > /**
> > > @@ -1506,10 +1522,27 @@ static int scmi_chan_setup(struct scmi_info
> > > *info, struct device *dev,
> > > return ret;
> > > }
> > >
> > > + refcount_set(&cinfo->users, 1);
> > > cinfo->handle = &info->handle;
> > > return 0;
> > > }
> > >
> > > +static int scmi_chan_free(int id, void *p, void *data)
> > > +{
> > > + struct scmi_chan_info *cinfo = p;
> > > + struct scmi_info *info = handle_to_scmi_info(cinfo->handle);
> > > +
> > > + if (refcount_dec_and_test(&cinfo->users)) {
> > > + info->desc->ops->chan_free(id, cinfo, data);
> > > + } else {
> > > + /* Stall till the ongoing rx_callback completes */
> > > + spin_until_cond(refcount_read(&cinfo->users) == 0);
> > > + info->desc->ops->chan_free(id, cinfo, data);
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > static inline int
> > > scmi_txrx_setup(struct scmi_info *info, struct device *dev, int
> > > prot_id)
> > > {
> > > @@ -1792,11 +1825,11 @@ static int scmi_cleanup_txrx_channels(struct
> > > scmi_info *info)
> > > int ret;
> > > struct idr *idr = &info->tx_idr;
> > >
> > > - ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> > > + ret = idr_for_each(idr, scmi_chan_free, idr);
> > > idr_destroy(&info->tx_idr);
> > >
> > > idr = &info->rx_idr;
> > > - ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
> > > + ret = idr_for_each(idr, scmi_chan_free, idr);
> > > idr_destroy(&info->rx_idr);
> > >
> > > return ret;
> > >
> > > ------8<-----
> > >
> > > Can you give it a go on your setup ?
> > >
> > > Beware it is not really tested on the racy error path (:P) and I
> > > could have
> > > still missed something regarding synchro (and I expect an undesired
> > > refcount
> > > warn on the scmi_release_channel too when the race is hit....but
> > > just to
> > > experiment a bit for now and see if something like this could be
> > > enough while
> > > avoiding further locking)
> > > o
> >
> > Looking back at this patch of mine, even though it could work for the
> > racy issue at
> > hand, it is currently clearly completely broken on the regular
> > unload/free
> > flow since cinfo structs can be re-used multiple times.
> > Sorry, please ignore this attempt, I'll rework in a more sensible way.
> >
> > Thanks,
> > Cristian
> Hi Cristian

Hi Rishabh,

> Thanks for looking into this. One more issue I see with this is cinfo is
> allocated
> as part of device memory so it will be freed when probe exit path completes.
> So we can't
> really be using that to check the refcount I guess? IMO this should be a
> global memory variable.

If you mean that is allocated using devm_ managed helpers, yes it is, but
that should not be a problem: my idea would be, on the error/cleanup path,
to inhibit any furrther call by deactivating the transport as it is now but
also stall the cleanup functions waiting for any other possiloy pending
already executing user to complete (if any user pending at all...like
scmi_rx_callback running concurrently on another core): we should not let
the probe complete until all active users are gone (with scmi_rcx_callback
bailing out early) or inhibited by deregistering the transport layer (as it
is now): because it is when the probe fail-completes that the devm_ helpers
(rightly) kick in and can free all: at that point the module could also
be unloaded as a whole...once probe() has finished with an error is game
over for us (am I wrong ?)

Moreover cinfo is the main data struct used by scmi_rx_callback, it is
mainly what we do NOT want to vanish too early on the error-path, i.e. we
are receiving something on a channel so it seems natural to me to refcount
it (I just have to figure out where and precisely how :P...)...consider also
that you can have a number of channels and also a number of server instances
(even though not really officially supported :D) so it'd be harder for me to use
a global refcount.

Thanks for your feedback testing and suggestions, I'll wrap my brain around this
soon-ish and get back to you for testing anyway ... (or if you want to propose something
alomng these lines -- with very limited or no spinlocking -- please by my guest :D)

Thanks
Cristian