2023-01-11 14:32:23

by Conor Dooley

[permalink] [raw]
Subject: [PATCH v1 0/7] MPFS system controller/mailbox fixes

Hey Jassi, all,

Here are some fixes for the system controller on PolarFire SoC that I
ran into while implementing support for using the system controller to
re-program the FPGA. A few are just minor bits that I fixed in passing,
but the bulk of the patchset is changes to how the mailbox figures out
if a "service" has completed.

Prior to implementing this particular functionality, the services
requested from the system controller, via its mailbox interface, always
triggered an interrupt when the system controller was finished with
the service.

Unfortunately some of the services used to validate the FPGA images
before programming them do not trigger an interrupt if they fail.
For example, the service that checks whether an FPGA image is actually
a newer version than what is already programmed, does not trigger an
interrupt, unless the image is actually newer than the one currently
programmed. If it has an earlier version, no interrupt is triggered
and a status is set in the system controller's status register to
signify the reason for the failure.

In order to differentiate between the service succeeding & the system
controller being inoperative or otherwise unable to function, I had to
switch the controller to poll a busy bit in the system controller's
registers to see if it has completed a service.
This makes sense anyway, as the interrupt corresponds to "data ready"
rather than "tx done", so I have changed the mailbox controller driver
to do that & left the interrupt solely for signalling data ready.
It just so happened that all of the services that I had worked with and
tested up to this point were "infallible" & did not set a status, so the
particular code paths were never tested.

Jassi, the mailbox and soc patches depend on each other, as the change
in what the interrupt is used for requires changing the client driver's
behaviour too, as mbox_send_message() will now return when the system
controller is no longer busy rather than when the data is ready.
I'm happy to send the lot via the soc tree with your Ack and/or reivew,
if that also works you?

Secondly, I have a question about what to do if a service does fail, but
not due to a timeout - eg the above example where the "new" image for
the FPGA is actually older than the one that currently exists.
Ideally, if a service fails due to something other than the transaction
timing out, I would go and read the status registers to see what the
cause of failure was.
I could not find a function in the mailbox framework that allows the
client to request that sort of information from the client. Trying to
do something with the auxiliary bus, or exporting some function to a
device specific header seemed like a circumvention of the mailbox
framework.
Do you think it would be a good idea to implement something like
mbox_client_peek_status(struct mbox_chan *chan, void *data) to allow
clients to request this type of information?

It'd certainly allow me to report the actual errors to the drivers
implementing the service & make better decisions there about how to
proceed.
Perhaps I have missed some way of doing this kind of thing that (should
have been) staring me in the face!

Thanks,
Conor.

CC: Conor Dooley <[email protected]>
CC: Daire McNamara <[email protected]>
CC: Jassi Brar <[email protected]>
CC: [email protected]
CC: [email protected]

Conor Dooley (7):
mailbox: mpfs: fix an incorrect mask width
mailbox: mpfs: switch to txdone_poll
mailbox: mpfs: ditch a useless busy check
soc: microchip: mpfs: fix some horrible alignment
soc: microchip: mpfs: use a consistent completion timeout
soc: microchip: mpfs: simplify error handling in
mpfs_blocking_transaction()
soc: microchip: mpfs: handle timeouts and failed services differently

drivers/mailbox/mailbox-mpfs.c | 25 +++++++----
drivers/soc/microchip/mpfs-sys-controller.c | 48 +++++++++++++--------
2 files changed, 47 insertions(+), 26 deletions(-)


base-commit: 88603b6dc419445847923fcb7fe5080067a30f98
--
2.39.0


2023-01-11 14:32:43

by Conor Dooley

[permalink] [raw]
Subject: [PATCH v1 1/7] mailbox: mpfs: fix an incorrect mask width

The system controller registers on PolarFire SoC are 32 bits wide, so
16 + 16 as the first input to GENMASK_ULL() gives a 33 bit wide mask.
It probably should have been immediately obvious when it was pointed
out during review that the width required using GENMASK_ULL() - but I
scarcely knew what I was doing at the time and missed it.
The mistake ends up being moot as it is a mask after all, but it is
incorrect and should be fixed.

No functional change intended.

Signed-off-by: Conor Dooley <[email protected]>
---
drivers/mailbox/mailbox-mpfs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/mailbox/mailbox-mpfs.c b/drivers/mailbox/mailbox-mpfs.c
index 853901acaeec..d37560e91116 100644
--- a/drivers/mailbox/mailbox-mpfs.c
+++ b/drivers/mailbox/mailbox-mpfs.c
@@ -39,7 +39,7 @@
#define SCB_CTRL_NOTIFY_MASK BIT(SCB_CTRL_NOTIFY)

#define SCB_CTRL_POS (16)
-#define SCB_CTRL_MASK GENMASK_ULL(SCB_CTRL_POS + SCB_MASK_WIDTH, SCB_CTRL_POS)
+#define SCB_CTRL_MASK GENMASK(SCB_CTRL_POS + SCB_MASK_WIDTH - 1, SCB_CTRL_POS)

/* SCBCTRL service status register */

@@ -118,6 +118,7 @@ static int mpfs_mbox_send_data(struct mbox_chan *chan, void *data)
}

opt_sel = ((msg->mbox_offset << 7u) | (msg->cmd_opcode & 0x7fu));
+
tx_trigger = (opt_sel << SCB_CTRL_POS) & SCB_CTRL_MASK;
tx_trigger |= SCB_CTRL_REQ_MASK | SCB_STATUS_NOTIFY_MASK;
writel_relaxed(tx_trigger, mbox->ctrl_base + SERVICES_CR_OFFSET);
--
2.39.0

2023-01-11 14:34:41

by Conor Dooley

[permalink] [raw]
Subject: [PATCH v1 7/7] soc: microchip: mpfs: handle timeouts and failed services differently

The system controller will only deliver an interrupt if a service
succeeds. This leaves us in the unfortunate position with current code
where there is no way to differentiate between a legitimate timeout
where the service has not completed & where it has completed, but
failed.

mbox_send_message() has its own completion, and it will time out of the
system controller does not lower the busy flag. In this case, a timeout
has occurred and the error can be propagated back to the caller.

If the busy flag is lowered, but no interrupt has arrived to trigger the
rx callback, the service can be deemed to have failed. Report EBADMSG in
this case so that callers can differentiate.

Signed-off-by: Conor Dooley <[email protected]>
---
drivers/soc/microchip/mpfs-sys-controller.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/soc/microchip/mpfs-sys-controller.c b/drivers/soc/microchip/mpfs-sys-controller.c
index 4aadd05769d2..75422db75fe5 100644
--- a/drivers/soc/microchip/mpfs-sys-controller.c
+++ b/drivers/soc/microchip/mpfs-sys-controller.c
@@ -41,14 +41,25 @@ int mpfs_blocking_transaction(struct mpfs_sys_controller *sys_controller, struct
reinit_completion(&sys_controller->c);

ret = mbox_send_message(sys_controller->chan, msg);
- if (ret < 0)
+ if (ret < 0) {
+ dev_warn(sys_controller->client.dev,
+ "MPFS sys controller service timeout\n");
goto out;
+ }

ret = 0; /* mbox_send_message returns postitive integers on success */
+
+ /*
+ * Unfortunately, the system controller will only deliver an interrupt
+ * if a service succeeds. mbox_send_message() will block until the busy
+ * flag is gone. If the busy flag is gone but no interrupt has arrived
+ * to trigger the rx callback then the service can be deemed to have
+ * failed.
+ */
if (!wait_for_completion_timeout(&sys_controller->c, timeout)) {
- ret = -ETIMEDOUT;
+ ret = -EBADMSG;
dev_warn(sys_controller->client.dev,
- "MPFS sys controller transaction timeout\n");
+ "MPFS sys controller service failed\n");
}

out:
@@ -106,6 +117,7 @@ static int mpfs_sys_controller_probe(struct platform_device *pdev)
sys_controller->client.dev = dev;
sys_controller->client.rx_callback = rx_callback;
sys_controller->client.tx_block = 1U;
+ sys_controller->client.tx_tout = msecs_to_jiffies(MPFS_SYS_CTRL_TIMEOUT_MS);

sys_controller->chan = mbox_request_channel(&sys_controller->client, 0);
if (IS_ERR(sys_controller->chan)) {
--
2.39.0

2023-01-18 14:27:34

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v1 0/7] MPFS system controller/mailbox fixes

On Wed, Jan 11, 2023 at 01:45:06PM +0000, Conor Dooley wrote:
> Hey Jassi, all,
>
> Here are some fixes for the system controller on PolarFire SoC that I
> ran into while implementing support for using the system controller to
> re-program the FPGA. A few are just minor bits that I fixed in passing,
> but the bulk of the patchset is changes to how the mailbox figures out
> if a "service" has completed.
>
> Prior to implementing this particular functionality, the services
> requested from the system controller, via its mailbox interface, always
> triggered an interrupt when the system controller was finished with
> the service.
>
> Unfortunately some of the services used to validate the FPGA images
> before programming them do not trigger an interrupt if they fail.
> For example, the service that checks whether an FPGA image is actually
> a newer version than what is already programmed, does not trigger an
> interrupt, unless the image is actually newer than the one currently
> programmed. If it has an earlier version, no interrupt is triggered
> and a status is set in the system controller's status register to
> signify the reason for the failure.

I think, with how things currently are, the timeout is insufficient.
I've noticed some services taking longer (significantly) longer than what
I have provisioned for.

I'll try to find an upper bound and respin a v2. Questions below are
still valid either way!

Thanks,
Conor.

> In order to differentiate between the service succeeding & the system
> controller being inoperative or otherwise unable to function, I had to
> switch the controller to poll a busy bit in the system controller's
> registers to see if it has completed a service.
> This makes sense anyway, as the interrupt corresponds to "data ready"
> rather than "tx done", so I have changed the mailbox controller driver
> to do that & left the interrupt solely for signalling data ready.
> It just so happened that all of the services that I had worked with and
> tested up to this point were "infallible" & did not set a status, so the
> particular code paths were never tested.
>
> Jassi, the mailbox and soc patches depend on each other, as the change
> in what the interrupt is used for requires changing the client driver's
> behaviour too, as mbox_send_message() will now return when the system
> controller is no longer busy rather than when the data is ready.
> I'm happy to send the lot via the soc tree with your Ack and/or reivew,
> if that also works you?
>
> Secondly, I have a question about what to do if a service does fail, but
> not due to a timeout - eg the above example where the "new" image for
> the FPGA is actually older than the one that currently exists.
> Ideally, if a service fails due to something other than the transaction
> timing out, I would go and read the status registers to see what the
> cause of failure was.
> I could not find a function in the mailbox framework that allows the
> client to request that sort of information from the client. Trying to
> do something with the auxiliary bus, or exporting some function to a
> device specific header seemed like a circumvention of the mailbox
> framework.
> Do you think it would be a good idea to implement something like
> mbox_client_peek_status(struct mbox_chan *chan, void *data) to allow
> clients to request this type of information?
>
> It'd certainly allow me to report the actual errors to the drivers
> implementing the service & make better decisions there about how to
> proceed.
> Perhaps I have missed some way of doing this kind of thing that (should
> have been) staring me in the face!
>
> Thanks,
> Conor.
>
> CC: Conor Dooley <[email protected]>
> CC: Daire McNamara <[email protected]>
> CC: Jassi Brar <[email protected]>
> CC: [email protected]
> CC: [email protected]
>
> Conor Dooley (7):
> mailbox: mpfs: fix an incorrect mask width
> mailbox: mpfs: switch to txdone_poll
> mailbox: mpfs: ditch a useless busy check
> soc: microchip: mpfs: fix some horrible alignment
> soc: microchip: mpfs: use a consistent completion timeout
> soc: microchip: mpfs: simplify error handling in
> mpfs_blocking_transaction()
> soc: microchip: mpfs: handle timeouts and failed services differently
>
> drivers/mailbox/mailbox-mpfs.c | 25 +++++++----
> drivers/soc/microchip/mpfs-sys-controller.c | 48 +++++++++++++--------
> 2 files changed, 47 insertions(+), 26 deletions(-)
>
>
> base-commit: 88603b6dc419445847923fcb7fe5080067a30f98
> --
> 2.39.0
>


Attachments:
(No filename) (4.58 kB)
signature.asc (235.00 B)
Download all attachments

2023-01-21 17:31:24

by Jassi Brar

[permalink] [raw]
Subject: Re: [PATCH v1 0/7] MPFS system controller/mailbox fixes

On Wed, Jan 11, 2023 at 7:45 AM Conor Dooley <[email protected]> wrote:
>
> In order to differentiate between the service succeeding & the system
> controller being inoperative or otherwise unable to function, I had to
> switch the controller to poll a busy bit in the system controller's
> registers to see if it has completed a service.
> This makes sense anyway, as the interrupt corresponds to "data ready"
> rather than "tx done", so I have changed the mailbox controller driver
> to do that & left the interrupt solely for signalling data ready.
> It just so happened that all of the services that I had worked with and
> tested up to this point were "infallible" & did not set a status, so the
> particular code paths were never tested.
>
> Jassi, the mailbox and soc patches depend on each other, as the change
> in what the interrupt is used for requires changing the client driver's
> behaviour too, as mbox_send_message() will now return when the system
> controller is no longer busy rather than when the data is ready.
> I'm happy to send the lot via the soc tree with your Ack and/or reivew,
> if that also works you?
>
Ok, let me review them and get back to you.

> Secondly, I have a question about what to do if a service does fail, but
> not due to a timeout - eg the above example where the "new" image for
> the FPGA is actually older than the one that currently exists.
> Ideally, if a service fails due to something other than the transaction
> timing out, I would go and read the status registers to see what the
> cause of failure was.
> I could not find a function in the mailbox framework that allows the
> client to request that sort of information from the client. Trying to
> do something with the auxiliary bus, or exporting some function to a
> device specific header seemed like a circumvention of the mailbox
> framework.
> Do you think it would be a good idea to implement something like
> mbox_client_peek_status(struct mbox_chan *chan, void *data) to allow
> clients to request this type of information?
>
.last_tx_done() is supposed to make sure everything is ok.
If the expected status bit is "sometimes not set", that means that bit
is not the complete status. You have to check multiple registers to
detect if and what caused the failure.

Cheers.

2023-01-21 20:19:25

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v1 0/7] MPFS system controller/mailbox fixes

On Sat, Jan 21, 2023 at 10:01:41AM -0600, Jassi Brar wrote:
> On Wed, Jan 11, 2023 at 7:45 AM Conor Dooley <[email protected]> wrote:
> >
> > In order to differentiate between the service succeeding & the system
> > controller being inoperative or otherwise unable to function, I had to
> > switch the controller to poll a busy bit in the system controller's
> > registers to see if it has completed a service.
> > This makes sense anyway, as the interrupt corresponds to "data ready"
> > rather than "tx done", so I have changed the mailbox controller driver
> > to do that & left the interrupt solely for signalling data ready.
> > It just so happened that all of the services that I had worked with and
> > tested up to this point were "infallible" & did not set a status, so the
> > particular code paths were never tested.
> >
> > Jassi, the mailbox and soc patches depend on each other, as the change
> > in what the interrupt is used for requires changing the client driver's
> > behaviour too, as mbox_send_message() will now return when the system
> > controller is no longer busy rather than when the data is ready.
> > I'm happy to send the lot via the soc tree with your Ack and/or reivew,
> > if that also works you?
> >
> Ok, let me review them and get back to you.

FYI, I did sent a v2 on Friday:
https://lore.kernel.org/all/[email protected]/

The change is just a timeout duration though.

> > Secondly, I have a question about what to do if a service does fail, but
> > not due to a timeout - eg the above example where the "new" image for
> > the FPGA is actually older than the one that currently exists.
> > Ideally, if a service fails due to something other than the transaction
> > timing out, I would go and read the status registers to see what the
> > cause of failure was.
> > I could not find a function in the mailbox framework that allows the
> > client to request that sort of information from the client. Trying to
> > do something with the auxiliary bus, or exporting some function to a
> > device specific header seemed like a circumvention of the mailbox
> > framework.
> > Do you think it would be a good idea to implement something like
> > mbox_client_peek_status(struct mbox_chan *chan, void *data) to allow
> > clients to request this type of information?
> >
> .last_tx_done() is supposed to make sure everything is ok.

Hm, might've explained badly as I think you've misunderstood. Or (see
below) I might have mistakenly thought that last_tx_done() was only meant
to signify that tx was done.

Anyways, I'll try to clarify.
Some services don't set a status, but whether a status is, or isn't,
set has nothing to do with whether the service has completed.
One service that sets a status is "Authenticate Bitstream". This
service sets a status of 0x0 if the bitstream in question is okay _and_
something that the FPGA can be upgraded to. It returns a failure of 0x18
if the bitstream is valid _but_ is the same as that currently programmed.
(and of course a whole host of other possible errors in-between)

These statuses, and whether they are a bad outcome or not, is dependant
on the service and I don't think should be handled in the mailbox
controller driver.

> If the expected status bit is "sometimes not set", that means that bit
> is not the complete status.

If the "busy" bit goes low, then the transmission must be complete,
there should be no need to check other bits for *completion*, but...

> You have to check multiple registers to
> detect if and what caused the failure.

...maybe I have just misunderstood the role of .last_tx_done(). The
comment in mailbox-controller.h lead me to believe that it was used just
to check if it had been completed.

Am I allowed to use .last_tx_done() to pass information back to the
mailbox client? If I could, that'd certainly be a nice way to get the
information on whether the service failed etc.

Hopefully that, plus when you have a chance to look at the code, will
make what I am asking about a little clearer!

Thanks,
Conor.


Attachments:
(No filename) (4.06 kB)
signature.asc (235.00 B)
Download all attachments

2023-02-16 22:24:55

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v1 0/7] MPFS system controller/mailbox fixes

Hey Jassi,

On Sat, Jan 21, 2023 at 07:12:52PM +0000, Conor Dooley wrote:
> On Sat, Jan 21, 2023 at 10:01:41AM -0600, Jassi Brar wrote:
> > On Wed, Jan 11, 2023 at 7:45 AM Conor Dooley <[email protected]> wrote:
> > >
> > > In order to differentiate between the service succeeding & the system
> > > controller being inoperative or otherwise unable to function, I had to
> > > switch the controller to poll a busy bit in the system controller's
> > > registers to see if it has completed a service.
> > > This makes sense anyway, as the interrupt corresponds to "data ready"
> > > rather than "tx done", so I have changed the mailbox controller driver
> > > to do that & left the interrupt solely for signalling data ready.
> > > It just so happened that all of the services that I had worked with and
> > > tested up to this point were "infallible" & did not set a status, so the
> > > particular code paths were never tested.
> > >
> > > Jassi, the mailbox and soc patches depend on each other, as the change
> > > in what the interrupt is used for requires changing the client driver's
> > > behaviour too, as mbox_send_message() will now return when the system
> > > controller is no longer busy rather than when the data is ready.
> > > I'm happy to send the lot via the soc tree with your Ack and/or reivew,
> > > if that also works you?
> > >
> > Ok, let me review them and get back to you.
>
> FYI, I did sent a v2 on Friday:
> https://lore.kernel.org/all/[email protected]/
>
> The change is just a timeout duration though.
>
> > > Secondly, I have a question about what to do if a service does fail, but
> > > not due to a timeout - eg the above example where the "new" image for
> > > the FPGA is actually older than the one that currently exists.
> > > Ideally, if a service fails due to something other than the transaction
> > > timing out, I would go and read the status registers to see what the
> > > cause of failure was.
> > > I could not find a function in the mailbox framework that allows the
> > > client to request that sort of information from the client. Trying to
> > > do something with the auxiliary bus, or exporting some function to a
> > > device specific header seemed like a circumvention of the mailbox
> > > framework.
> > > Do you think it would be a good idea to implement something like
> > > mbox_client_peek_status(struct mbox_chan *chan, void *data) to allow
> > > clients to request this type of information?
> > >
> > .last_tx_done() is supposed to make sure everything is ok.
>
> Hm, might've explained badly as I think you've misunderstood. Or (see
> below) I might have mistakenly thought that last_tx_done() was only meant
> to signify that tx was done.
>
> Anyways, I'll try to clarify.
> Some services don't set a status, but whether a status is, or isn't,
> set has nothing to do with whether the service has completed.
> One service that sets a status is "Authenticate Bitstream". This
> service sets a status of 0x0 if the bitstream in question is okay _and_
> something that the FPGA can be upgraded to. It returns a failure of 0x18
> if the bitstream is valid _but_ is the same as that currently programmed.
> (and of course a whole host of other possible errors in-between)
>
> These statuses, and whether they are a bad outcome or not, is dependant
> on the service and I don't think should be handled in the mailbox
> controller driver.
>
> > If the expected status bit is "sometimes not set", that means that bit
> > is not the complete status.
>
> If the "busy" bit goes low, then the transmission must be complete,
> there should be no need to check other bits for *completion*, but...
>
> > You have to check multiple registers to
> > detect if and what caused the failure.
>
> ...maybe I have just misunderstood the role of .last_tx_done(). The
> comment in mailbox-controller.h lead me to believe that it was used just
> to check if it had been completed.
>
> Am I allowed to use .last_tx_done() to pass information back to the
> mailbox client? If I could, that'd certainly be a nice way to get the
> information on whether the service failed etc.
>
> Hopefully that, plus when you have a chance to look at the code, will
> make what I am asking about a little clearer!

Just wondering if you've had a chance to look at this again! I know it's
missed the merge window this time around but I would like to get this
behaviour fixed as other work depends on it.

Thanks,
Conor.


Attachments:
(No filename) (4.39 kB)
signature.asc (228.00 B)
Download all attachments

2023-02-17 04:04:37

by Jassi Brar

[permalink] [raw]
Subject: Re: [PATCH v1 0/7] MPFS system controller/mailbox fixes

On Thu, Feb 16, 2023 at 4:24 PM Conor Dooley <[email protected]> wrote:
> >
> > > > Secondly, I have a question about what to do if a service does fail, but
> > > > not due to a timeout - eg the above example where the "new" image for
> > > > the FPGA is actually older than the one that currently exists.
> > > > Ideally, if a service fails due to something other than the transaction
> > > > timing out, I would go and read the status registers to see what the
> > > > cause of failure was.
> > > > I could not find a function in the mailbox framework that allows the
> > > > client to request that sort of information from the client. Trying to
> > > > do something with the auxiliary bus, or exporting some function to a
> > > > device specific header seemed like a circumvention of the mailbox
> > > > framework.
> > > > Do you think it would be a good idea to implement something like
> > > > mbox_client_peek_status(struct mbox_chan *chan, void *data) to allow
> > > > clients to request this type of information?
> > > >
> > > .last_tx_done() is supposed to make sure everything is ok.
> >
> > Hm, might've explained badly as I think you've misunderstood. Or (see
> > below) I might have mistakenly thought that last_tx_done() was only meant
> > to signify that tx was done.
> >
> > Anyways, I'll try to clarify.
> > Some services don't set a status, but whether a status is, or isn't,
> > set has nothing to do with whether the service has completed.
> > One service that sets a status is "Authenticate Bitstream". This
> > service sets a status of 0x0 if the bitstream in question is okay _and_
> > something that the FPGA can be upgraded to. It returns a failure of 0x18
> > if the bitstream is valid _but_ is the same as that currently programmed.
> > (and of course a whole host of other possible errors in-between)
> >
> > These statuses, and whether they are a bad outcome or not, is dependant
> > on the service and I don't think should be handled in the mailbox
> > controller driver.
> >
> > > If the expected status bit is "sometimes not set", that means that bit
> > > is not the complete status.
> >
> > If the "busy" bit goes low, then the transmission must be complete,
> > there should be no need to check other bits for *completion*, but...
> >
> > > You have to check multiple registers to
> > > detect if and what caused the failure.
> >
> > ...maybe I have just misunderstood the role of .last_tx_done(). The
> > comment in mailbox-controller.h lead me to believe that it was used just
> > to check if it had been completed.
> >
> > Am I allowed to use .last_tx_done() to pass information back to the
> > mailbox client? If I could, that'd certainly be a nice way to get the
> > information on whether the service failed etc.
> >
> > Hopefully that, plus when you have a chance to look at the code, will
> > make what I am asking about a little clearer!
>
> Just wondering if you've had a chance to look at this again! I know it's
> missed the merge window this time around but I would like to get this
> behaviour fixed as other work depends on it.
>
My opinion about adding a new api just to accommodate remote f/w's
behaviour change across updates is still no.
last_tx_done() is more abstract than you think -- it has to play with
dozens of behaviors of remotes. So may just wrap your whatever logic,
of "tx is done", in that.

This query within the patchset threw me off -- I thought you needed
the new api for the patchset, so I didn't look further.
Looking at it now, I am ok with applying Patches 1,2 and 3. If you want.

cheers.

2023-02-17 07:34:57

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v1 0/7] MPFS system controller/mailbox fixes

On Thu, Feb 16, 2023 at 10:04:17PM -0600, Jassi Brar wrote:
> On Thu, Feb 16, 2023 at 4:24 PM Conor Dooley <[email protected]> wrote:
> > >
> > > > > Secondly, I have a question about what to do if a service does fail, but
> > > > > not due to a timeout - eg the above example where the "new" image for
> > > > > the FPGA is actually older than the one that currently exists.
> > > > > Ideally, if a service fails due to something other than the transaction
> > > > > timing out, I would go and read the status registers to see what the
> > > > > cause of failure was.
> > > > > I could not find a function in the mailbox framework that allows the
> > > > > client to request that sort of information from the client. Trying to
> > > > > do something with the auxiliary bus, or exporting some function to a
> > > > > device specific header seemed like a circumvention of the mailbox
> > > > > framework.
> > > > > Do you think it would be a good idea to implement something like
> > > > > mbox_client_peek_status(struct mbox_chan *chan, void *data) to allow
> > > > > clients to request this type of information?
> > > > >
> > > > .last_tx_done() is supposed to make sure everything is ok.
> > >
> > > Hm, might've explained badly as I think you've misunderstood. Or (see
> > > below) I might have mistakenly thought that last_tx_done() was only meant
> > > to signify that tx was done.
> > >
> > > Anyways, I'll try to clarify.
> > > Some services don't set a status, but whether a status is, or isn't,
> > > set has nothing to do with whether the service has completed.
> > > One service that sets a status is "Authenticate Bitstream". This
> > > service sets a status of 0x0 if the bitstream in question is okay _and_
> > > something that the FPGA can be upgraded to. It returns a failure of 0x18
> > > if the bitstream is valid _but_ is the same as that currently programmed.
> > > (and of course a whole host of other possible errors in-between)
> > >
> > > These statuses, and whether they are a bad outcome or not, is dependant
> > > on the service and I don't think should be handled in the mailbox
> > > controller driver.
> > >
> > > > If the expected status bit is "sometimes not set", that means that bit
> > > > is not the complete status.
> > >
> > > If the "busy" bit goes low, then the transmission must be complete,
> > > there should be no need to check other bits for *completion*, but...
> > >
> > > > You have to check multiple registers to
> > > > detect if and what caused the failure.
> > >
> > > ...maybe I have just misunderstood the role of .last_tx_done(). The
> > > comment in mailbox-controller.h lead me to believe that it was used just
> > > to check if it had been completed.
> > >
> > > Am I allowed to use .last_tx_done() to pass information back to the
> > > mailbox client? If I could, that'd certainly be a nice way to get the
> > > information on whether the service failed etc.
> > >
> > > Hopefully that, plus when you have a chance to look at the code, will
> > > make what I am asking about a little clearer!
> >
> > Just wondering if you've had a chance to look at this again! I know it's
> > missed the merge window this time around but I would like to get this
> > behaviour fixed as other work depends on it.
> >
> My opinion about adding a new api just to accommodate remote f/w's
> behaviour change across updates is still no.
> last_tx_done() is more abstract than you think -- it has to play with
> dozens of behaviors of remotes. So may just wrap your whatever logic,
> of "tx is done", in that.

Okay, that's fine. I'd rather not add a new API, so adding that to
last_tx_done() is fine by me! I just wasn't sure from your earlier reply
if that was okay.

> This query within the patchset threw me off -- I thought you needed
> the new api for the patchset, so I didn't look further.

Ahh, sorry. The query was about improving things to report an accurate
status rather than only timeouts. What's here *works* but is
sub-optimal.

> Looking at it now, I am ok with applying Patches 1,2 and 3. If you want.

Ehh, the whole thing needs to go together to avoid breaking behaviour,
so, given the merge window is next week, I'd rather rework it to report
the actual status from last_tx_done().

Thanks!


Attachments:
(No filename) (4.15 kB)
signature.asc (228.00 B)
Download all attachments