2012-02-18 21:35:56

by Chris Boot

[permalink] [raw]
Subject: [PATCH] firewire-sbp2: Initialise sbp2_orb->rcode for management ORBs

When sending ORBs the struct sbp2_orb->rcode field should be initialised
to -1 otherwise complete_transaction() assumes the request is successful
(RCODE_COMPLETE is 0). When sending managament ORBs, such as LOGIN or
LOGOUT, this was not done and so the initiator would wait for the
request to time out before trying again.

Without this, LOGINs are only retried when the management ORB times out,
rather than the initiator noticing an error occurred and retrying soon
after. For targets that advertise more than one LUN per unit, and can
only accept one management request at a time, this means LUNs are only
logged in one per timeout period.

Signed-off-by: Chris Boot <[email protected]>
Cc: Stefan Richter <[email protected]>
---
drivers/firewire/sbp2.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index b12c6ba..7776c18 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -572,6 +572,9 @@ static int sbp2_send_management_orb(struct sbp2_logical_unit *lu, int node_id,
if (dma_mapping_error(device->card->device, orb->response_bus))
goto fail_mapping_response;

+ /* Initialize rcode to something not RCODE_COMPLETE. */
+ orb->base.rcode = -1;
+
orb->request.response.high = 0;
orb->request.response.low = cpu_to_be32(orb->response_bus);

--
1.7.9


2012-05-19 10:30:10

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH] firewire-sbp2: Initialise sbp2_orb->rcode for management ORBs

On Mar 04 Stefan Richter wrote:
> On Feb 18 Chris Boot wrote:
> > When sending ORBs the struct sbp2_orb->rcode field should be initialised
> > to -1 otherwise complete_transaction() assumes the request is successful
> > (RCODE_COMPLETE is 0). When sending managament ORBs, such as LOGIN or
> > LOGOUT, this was not done and so the initiator would wait for the
> > request to time out before trying again.
> >
> > Without this, LOGINs are only retried when the management ORB times out,
> > rather than the initiator noticing an error occurred and retrying soon
> > after. For targets that advertise more than one LUN per unit, and can
> > only accept one management request at a time, this means LUNs are only
> > logged in one per timeout period.
[...]
> I left this hanging in my inbox for too long, sorry...
>
> While I agree that the current initialization of orb->base.rcode with 0 is
> wrong, I don't think your change alone is sufficient:
>
> Consider the case that a login request to LU 0 causes the target to pull
> out the hardware behind that LU out of a powered-down state --- which may
> take a very long time --- and login requests to LU 1 would be aborted by
> the target with resp_conflict_error on any Management_Agent write
> request. Of course a reasonably clever target would accept login before
> full power-up, but you never now.
>
> We retry login 5 times in 0.2 seconds intervals, and this 1 s in total may
> not be enough.
[...]

Chris, I obviously haven't done anything about this potentially too short
retry period yet; it is still on my list.

Perhaps we should not count the number of retries but watch the time that
retries take. I.e. accumulate the time that each try takes; break out of
the retry loop after a maximum time; but reset the accumulated time at a
bus reset as a precaution for buses with many nodes coming online at
different times.
--
Stefan Richter
-=====-===-- -=-= =--==
http://arcgraph.de/sr/