Date: Sat, 19 May 2012 12:29:49 +0200
From: Stefan Richter <stefanr@s5r6.in-berlin.de>
To: Chris Boot <bootc@bootc.net>
Cc: linux1394-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] firewire-sbp2: Initialise sbp2_orb->rcode for
 management ORBs
Message-ID: <20120519122949.0024a909@stein>
In-Reply-To: <20120304134802.2ed6fbd6@stein>
References: <1329600949-55157-1-git-send-email-bootc@bootc.net>
	<20120304134802.2ed6fbd6@stein>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2195
Lines: 47

On Mar 04 Stefan Richter wrote:
> On Feb 18 Chris Boot wrote:
> > When sending ORBs the struct sbp2_orb->rcode field should be initialised
> > to -1 otherwise complete_transaction() assumes the request is successful
> > (RCODE_COMPLETE is 0). When sending managament ORBs, such as LOGIN or
> > LOGOUT, this was not done and so the initiator would wait for the
> > request to time out before trying again.
> > 
> > Without this, LOGINs are only retried when the management ORB times out,
> > rather than the initiator noticing an error occurred and retrying soon
> > after. For targets that advertise more than one LUN per unit, and can
> > only accept one management request at a time, this means LUNs are only
> > logged in one per timeout period.
[...]
> I left this hanging in my inbox for too long, sorry...
> 
> While I agree that the current initialization of orb->base.rcode with 0 is
> wrong, I don't think your change alone is sufficient:
> 
> Consider the case that a login request to LU 0 causes the target to pull
> out the hardware behind that LU out of a powered-down state --- which may
> take a very long time --- and login requests to LU 1 would be aborted by
> the target with resp_conflict_error on any Management_Agent write
> request.  Of course a reasonably clever target would accept login before
> full power-up, but you never now.
> 
> We retry login 5 times in 0.2 seconds intervals, and this 1 s in total may
> not be enough.
[...]

Chris, I obviously haven't done anything about this potentially too short
retry period yet; it is still on my list.

Perhaps we should not count the number of retries but watch the time that
retries take.  I.e. accumulate the time that each try takes; break out of
the retry loop after a maximum time; but reset the accumulated time at a
bus reset as a precaution for buses with many nodes coming online at
different times.
-- 
Stefan Richter
-=====-===-- -=-= =--==
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/