From: Jassi Brar <[email protected]>
Currently scmi_do_xfer() submits a message to mailbox api and waits
for an apparently very short time. This works if there are not many
messages in the queue already. However, if many clients share a
channel and/or each client submits many messages in a row, the
timeout value becomes too short and returns error even if the mailbox
is working fine according to the load. The timeout occurs when the
message is still in the api/queue awaiting its turn to ride the bus.
Fix this by increasing the timeout value enough (500ms?) so that it
fails only if there is an actual problem in the transmission (like a
lockup or crash).
[If we want to capture a situation when the remote didn't
respond within expected latency, then the timeout should not
start here, but from tx_prepare callback ... just before the
message physically gets on the channel]
Signed-off-by: Jassi Brar <[email protected]>
---
drivers/firmware/arm_scmi/driver.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index dbec767222e9..46ddafe7ffc0 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -303,7 +303,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct scmi_xfer *xfer)
}
if (xfer->hdr.poll_completion) {
- ktime_t stop = ktime_add_ns(ktime_get(), SCMI_MAX_POLL_TO_NS);
+ ktime_t stop = ktime_add_ns(ktime_get(), 500 * 1000 * NSEC_PER_USEC);
spin_until_cond(scmi_xfer_done_no_timeout(cinfo, xfer, stop));
@@ -313,7 +313,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct scmi_xfer *xfer)
ret = -ETIMEDOUT;
} else {
/* And we wait for the response. */
- timeout = msecs_to_jiffies(info->desc->max_rx_timeout_ms);
+ timeout = msecs_to_jiffies(500);
if (!wait_for_completion_timeout(&xfer->done, timeout)) {
dev_err(dev, "timed out in resp(caller: %pS)\n",
(void *)_RET_IP_);
--
2.25.1
On Sun, Jun 07, 2020 at 02:30:23PM -0500, [email protected] wrote:
> From: Jassi Brar <[email protected]>
>
> Currently scmi_do_xfer() submits a message to mailbox api and waits
> for an apparently very short time. This works if there are not many
> messages in the queue already. However, if many clients share a
> channel and/or each client submits many messages in a row, the
The recommendation in such scenarios is to use multiple channel.
> timeout value becomes too short and returns error even if the mailbox
> is working fine according to the load. The timeout occurs when the
> message is still in the api/queue awaiting its turn to ride the bus.
>
> Fix this by increasing the timeout value enough (500ms?) so that it
> fails only if there is an actual problem in the transmission (like a
> lockup or crash).
>
> [If we want to capture a situation when the remote didn't
> respond within expected latency, then the timeout should not
> start here, but from tx_prepare callback ... just before the
> message physically gets on the channel]
>
The bottle neck may not be in the remote. It may be mailbox serialising
the requests even when it can parallelise.
> Signed-off-by: Jassi Brar <[email protected]>
> ---
> drivers/firmware/arm_scmi/driver.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
> index dbec767222e9..46ddafe7ffc0 100644
> --- a/drivers/firmware/arm_scmi/driver.c
> +++ b/drivers/firmware/arm_scmi/driver.c
> @@ -303,7 +303,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct scmi_xfer *xfer)
> }
>
> if (xfer->hdr.poll_completion) {
> - ktime_t stop = ktime_add_ns(ktime_get(), SCMI_MAX_POLL_TO_NS);
> + ktime_t stop = ktime_add_ns(ktime_get(), 500 * 1000 * NSEC_PER_USEC);
>
This is unacceptable delay for schedutil fast_switch. So no for this one.
> spin_until_cond(scmi_xfer_done_no_timeout(cinfo, xfer, stop));
>
> @@ -313,7 +313,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct scmi_xfer *xfer)
> ret = -ETIMEDOUT;
> } else {
> /* And we wait for the response. */
> - timeout = msecs_to_jiffies(info->desc->max_rx_timeout_ms);
> + timeout = msecs_to_jiffies(500);
In general, this hides issues in the remote. We are trying to move towards
tops 1ms for a request and with MBOX_QUEUE at 20, I see 20ms is more that
big enough. We have it set to 30ms now. 500ms is way too large and not
required IMO.
--
Regards,
Sudeep
On Wed, Jun 10, 2020 at 3:23 AM Sudeep Holla <[email protected]> wrote:
>
> On Sun, Jun 07, 2020 at 02:30:23PM -0500, [email protected] wrote:
> > From: Jassi Brar <[email protected]>
> >
> > Currently scmi_do_xfer() submits a message to mailbox api and waits
> > for an apparently very short time. This works if there are not many
> > messages in the queue already. However, if many clients share a
> > channel and/or each client submits many messages in a row, the
>
> The recommendation in such scenarios is to use multiple channel.
>
If SCMI is to be accepted as a standard (which I hope), it has to
support most kinds of controllers, but currently the implementation is
too myopic. It is only a matter of time, when someone sees value in
reusing firmware implementation (scmi) but does not have a MHU like
controller.
> > timeout value becomes too short and returns error even if the mailbox
> > is working fine according to the load. The timeout occurs when the
> > message is still in the api/queue awaiting its turn to ride the bus.
> >
> > Fix this by increasing the timeout value enough (500ms?) so that it
> > fails only if there is an actual problem in the transmission (like a
> > lockup or crash).
> >
> > [If we want to capture a situation when the remote didn't
> > respond within expected latency, then the timeout should not
> > start here, but from tx_prepare callback ... just before the
> > message physically gets on the channel]
> >
>
> The bottle neck may not be in the remote. It may be mailbox serialising
> the requests even when it can parallelise.
>
Your logs show (in your test case), using 1 physical channel shows
better transfer (those that complete) rates than virtual channels.
The transfers that fail are purely because of this short timeout.
> >
> > if (xfer->hdr.poll_completion) {
> > - ktime_t stop = ktime_add_ns(ktime_get(), SCMI_MAX_POLL_TO_NS);
> > + ktime_t stop = ktime_add_ns(ktime_get(), 500 * 1000 * NSEC_PER_USEC);
> >
>
> This is unacceptable delay for schedutil fast_switch. So no for this one.
>
Increasing timeout does not increase latency.
Also scmi_xfer() can not know if it was reached from the fast_switch path.
If a platform has many users over a channel such that it can not
guarantee low enough latency, then it must not set the
fast_switch_possible flag, which is optional for this reason.
> > @@ -313,7 +313,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct scmi_xfer *xfer)
> > ret = -ETIMEDOUT;
> > } else {
> > /* And we wait for the response. */
> > - timeout = msecs_to_jiffies(info->desc->max_rx_timeout_ms);
> > + timeout = msecs_to_jiffies(500);
>
> In general, this hides issues in the remote.
>
If you want to uncover remote issues, start the timeout in
tx_prepare() because that is when the message is physically sent to
the remote.
> We are trying to move towards
> tops 1ms for a request and with MBOX_QUEUE at 20, I see 20ms is more that
> big enough. We have it set to 30ms now. 500ms is way too large and not
> required IMO.
>
Again, increasing timeout does not slow the system down. It is to
support more variety of platform setups.
Cheers!
On Wed, Jun 10, 2020 at 10:21:19AM -0500, Jassi Brar wrote:
> On Wed, Jun 10, 2020 at 3:23 AM Sudeep Holla <[email protected]> wrote:
> >
> > On Sun, Jun 07, 2020 at 02:30:23PM -0500, [email protected] wrote:
> > > From: Jassi Brar <[email protected]>
> > >
> > > Currently scmi_do_xfer() submits a message to mailbox api and waits
> > > for an apparently very short time. This works if there are not many
> > > messages in the queue already. However, if many clients share a
> > > channel and/or each client submits many messages in a row, the
> >
> > The recommendation in such scenarios is to use multiple channel.
> >
> If SCMI is to be accepted as a standard (which I hope), it has to
> support most kinds of controllers, but currently the implementation is
> too myopic. It is only a matter of time, when someone sees value in
> reusing firmware implementation (scmi) but does not have a MHU like
> controller.
>
It is being used with other transports like smc/hvc and virtio.
But I agree, this experiment made me realise we need to work with
single channel disabling certain features like fast_switch. I will
work on that and push a solution. Thanks for asking for traces
and having stared at it for sometime, I see some issues but that's
orthogonal to this one. Fixing that won't solve the issue we are
discussing though.
But that said, that is not the solution for Juno/MHU. We can parallelise
there with multiple requests and we should do so.
> > > timeout value becomes too short and returns error even if the mailbox
> > > is working fine according to the load. The timeout occurs when the
> > > message is still in the api/queue awaiting its turn to ride the bus.
> > >
> > > Fix this by increasing the timeout value enough (500ms?) so that it
> > > fails only if there is an actual problem in the transmission (like a
> > > lockup or crash).
> > >
> > > [If we want to capture a situation when the remote didn't
> > > respond within expected latency, then the timeout should not
> > > start here, but from tx_prepare callback ... just before the
> > > message physically gets on the channel]
> > >
> >
> > The bottle neck may not be in the remote. It may be mailbox serialising
> > the requests even when it can parallelise.
> >
> Your logs show (in your test case), using 1 physical channel shows
> better transfer (those that complete) rates than virtual channels.
Indeed that is expected. It is like comparing output with 1 vs 2 CPUs
with some multi-thread load. The remote is now handling 2 requests at
a time and it clearly puts DVFS at priority and this will show up as
little higher latency for other requests like sensors.
> The transfers that fail are purely because of this short timeout.
>
> > >
> > > if (xfer->hdr.poll_completion) {
> > > - ktime_t stop = ktime_add_ns(ktime_get(), SCMI_MAX_POLL_TO_NS);
> > > + ktime_t stop = ktime_add_ns(ktime_get(), 500 * 1000 * NSEC_PER_USEC);
> > >
> >
> > This is unacceptable delay for schedutil fast_switch. So no for this one.
> >
> Increasing timeout does not increase latency.
Agreed, but worst case you may be stuck here for 500ms which is not
acceptable. That's what I meant, not that the request will take 500ms.
Sorry if I was not clear earlier on that.
> Also scmi_xfer() can not know if it was reached from the fast_switch path.
>
> If a platform has many users over a channel such that it can not
> guarantee low enough latency, then it must not set the
> fast_switch_possible flag, which is optional for this reason.
>
Yes, that's what I am trying to explore and that's what I meant above
when I mentioned I see some issues. I have hacked and checked that doesn't
change much, the timeout happens but under bit heavy load and not in simpler
use-case as I showed in my traces. In short, having multiple channels
helps. And we have been so fixated on Tx in our discussions. More fun
with Rx and serialising as it impacts remote firmware too.
>
> > > @@ -313,7 +313,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct scmi_xfer *xfer)
> > > ret = -ETIMEDOUT;
> > > } else {
> > > /* And we wait for the response. */
> > > - timeout = msecs_to_jiffies(info->desc->max_rx_timeout_ms);
> > > + timeout = msecs_to_jiffies(500);
> >
> > In general, this hides issues in the remote.
> >
> If you want to uncover remote issues, start the timeout in
> tx_prepare() because that is when the message is physically sent to
> the remote.
>
In that case we need to set it to 1ms as I mentioned earlier. Current
timeout of 30ms is for MBOX_MAX_LEN=20 which gives more than 1ms for each
and that's what we are targeting. I see no point in just changing the
timeout as you already mentioned above it is not changing the latency
anyway.
> > We are trying to move towards
> > tops 1ms for a request and with MBOX_QUEUE at 20, I see 20ms is more that
> > big enough. We have it set to 30ms now. 500ms is way too large and not
> > required IMO.
> >
> Again, increasing timeout does not slow the system down. It is to
> support more variety of platform setups.
>
Agreed and I have acknowledge. 30ms is chosen based on experiments and
also we are trying to achieve 1ms tops for each message. If some platform
has serious limitation, desc->max_rx_timeout_ms is configurable. We can
identify the platform and add specific timings for that. Same is true
do other parameters like the max_len and max_msg_size. If default is not
suitable, it can be changed.
--
Regards,
Sudeep
On Wed, Jun 10, 2020 at 10:56 AM Sudeep Holla <[email protected]> wrote:
[I admit you can write bigger posts than me, so I am not going to
write a passionate response to each of your paragraphs.
Let's keep it to the point.]
> > > > if (xfer->hdr.poll_completion) {
> > > > - ktime_t stop = ktime_add_ns(ktime_get(), SCMI_MAX_POLL_TO_NS);
> > > > + ktime_t stop = ktime_add_ns(ktime_get(), 500 * 1000 * NSEC_PER_USEC);
> > > >
> > >
> > > This is unacceptable delay for schedutil fast_switch. So no for this one.
> > >
> > Increasing timeout does not increase latency.
>
> Agreed, but worst case you may be stuck here for 500ms which is not
> acceptable.
>
Not acceptable to who, you or the kernel? :) Now that you said you
are fixing the scmi's fast_switch implementation.
Even though I don't think 500ms would ruin our lives, but ok, I will
make it 30ms - same as you did in the 'else' block. And drop the other
change.
Thanks.
On Wed, Jun 10, 2020 at 09:45:55PM -0500, Jassi Brar wrote:
> On Wed, Jun 10, 2020 at 10:56 AM Sudeep Holla <[email protected]> wrote:
>
> [I admit you can write bigger posts than me, so I am not going to
> write a passionate response to each of your paragraphs.
> Let's keep it to the point.]
>
> > > > > if (xfer->hdr.poll_completion) {
> > > > > - ktime_t stop = ktime_add_ns(ktime_get(), SCMI_MAX_POLL_TO_NS);
> > > > > + ktime_t stop = ktime_add_ns(ktime_get(), 500 * 1000 * NSEC_PER_USEC);
> > > > >
> > > >
> > > > This is unacceptable delay for schedutil fast_switch. So no for this one.
> > > >
> > > Increasing timeout does not increase latency.
> >
> > Agreed, but worst case you may be stuck here for 500ms which is not
> > acceptable.
> >
> Not acceptable to who, you or the kernel? :) Now that you said you
> are fixing the scmi's fast_switch implementation.
>
Sorry, I meant to disable it for single channel implementation. I am not
saying we want that on Juno/MHU.
> Even though I don't think 500ms would ruin our lives, but ok, I will
> make it 30ms - same as you did in the 'else' block. And drop the other
> change.
I am fine if cpufreq maintainers allow that in the fast switch path that
happens in the fast path.
--
Regards,
Sudeep
On Thu, Jun 11, 2020 at 3:40 AM Sudeep Holla <[email protected]> wrote:
> >
> > > > > > if (xfer->hdr.poll_completion) {
> > > > > > - ktime_t stop = ktime_add_ns(ktime_get(), SCMI_MAX_POLL_TO_NS);
> > > > > > + ktime_t stop = ktime_add_ns(ktime_get(), 500 * 1000 * NSEC_PER_USEC);
> > > > > >
> > > > >
> > > > > This is unacceptable delay for schedutil fast_switch. So no for this one.
> > > > >
> > > > Increasing timeout does not increase latency.
> > >
> > > Agreed, but worst case you may be stuck here for 500ms which is not
> > > acceptable.
> > >
> > Not acceptable to who, you or the kernel? :) Now that you said you
> > are fixing the scmi's fast_switch implementation.
> >
> Sorry, I meant to disable it for single channel implementation.
>
The single-channel platform may have only cpufreq as the user, or only
users that respond quickly ~3us. So maybe we leave the decision, to
enable fast_switch, totally to the platform. But of course, this
discussion is for another thread.
> > Even though I don't think 500ms would ruin our lives, but ok, I will
> > make it 30ms - same as you did in the 'else' block. And drop the other
> > change.
>
> I am fine if cpufreq maintainers allow that in the fast switch path that
> happens in the fast path.
>
Thanks, let me respin the patch and include some cpufreq folks.
Cheers!