2024-02-06 23:39:01

by Takashi Sakamoto

[permalink] [raw]
Subject: Re: [PATCH v2] firewire: core: send bus reset promptly on gap count error

Hi,

Thanks for the patch. I applied it to for-linus branch and will send it
for v6.8-rc4 in this week. Thanks for your long patience in the review
process.

On Sun, Feb 04, 2024 at 09:21:14PM -0800, Adam Goldman wrote:
> If we are bus manager and the bus has inconsistent gap counts, send a
> bus reset immediately instead of trying to read the root node's config
> ROM first. Otherwise, we could spend a lot of time trying to read the
> config ROM but never succeeding.
>
> This eliminates a 50+ second delay before the FireWire bus is usable after
> a newly connected device is powered on in certain circumstances.
>
> The delay occurs if a gap count inconsistency occurs, we are not the root
> node, and we become bus manager. One scenario that causes this is with a TI
> XIO2213B OHCI, the first time a Sony DSR-25 is powered on after being
> connected to the FireWire cable. In this configuration, the Linux box will
> not receive the initial PHY configuration packet sent by the DSR-25 as IRM,
> resulting in the DSR-25 having a gap count of 44 while the Linux box has a
> gap count of 63.
>
> FireWire devices have a gap count parameter, which is set to 63 on power-up
> and can be changed with a PHY configuration packet. This determines the
> duration of the subaction and arbitration gaps. For reliable communication,
> all nodes on a FireWire bus must have the same gap count.
>
> A node may have zero or more of the following roles: root node, bus manager
> (BM), isochronous resource manager (IRM), and cycle master. Unless a root
> node was forced with a PHY configuration packet, any node might become root
> node after a bus reset. Only the root node can become cycle master. If the
> root node is not cycle master capable, the BM or IRM should force a change
> of root node.
>
> After a bus reset, each node sends a self-ID packet, which contains its
> current gap count. A single bus reset does not change the gap count, but
> two bus resets in a row will set the gap count to 63. Because a consistent
> gap count is required for reliable communication, IEEE 1394a-2000 requires
> that the bus manager generate a bus reset if it detects that the gap count
> is inconsistent.
>
> When the gap count is inconsistent, build_tree() will notice this after the
> self identification process. It will set card->gap_count to the invalid
> value 0. If we become bus master, this will force bm_work() to send a bus
> reset when it performs gap count optimization.
>
> After a bus reset, there is no bus manager. We will almost always try to
> become bus manager. Once we become bus manager, we will first determine
> whether the root node is cycle master capable. Then, we will determine if
> the gap count should be changed. If either the root node or the gap count
> should be changed, we will generate a bus reset.
>
> To determine if the root node is cycle master capable, we read its
> configuration ROM. bm_work() will wait until we have finished trying to
> read the configuration ROM.
>
> However, an inconsistent gap count can make this take a long time.
> read_config_rom() will read the first few quadlets from the config ROM. Due
> to the gap count inconsistency, eventually one of the reads will time out.
> When read_config_rom() fails, fw_device_init() calls it again until
> MAX_RETRIES is reached. This takes 50+ seconds.
>
> Once we give up trying to read the configuration ROM, bm_work() will wake
> up, assume that the root node is not cycle master capable, and do a bus
> reset. Hopefully, this will resolve the gap count inconsistency.
>
> This change makes bm_work() check for an inconsistent gap count before
> waiting for the root node's configuration ROM. If the gap count is
> inconsistent, bm_work() will immediately do a bus reset. This eliminates
> the 50+ second delay and rapidly brings the bus to a working state.
>
> I considered that if the gap count is inconsistent, a PHY configuration
> packet might not be successful, so it could be desirable to skip the PHY
> configuration packet before the bus reset in this case. However, IEEE
> 1394a-2000 and IEEE 1394-2008 say that the bus manager may transmit a PHY
> configuration packet before a bus reset when correcting a gap count error.
> Since the standard endorses this, I decided it's safe to retain the PHY
> configuration packet transmission.
>
> Normally, after a topology change, we will reset the bus a maximum of 5
> times to change the root node and perform gap count optimization. However,
> if there is a gap count inconsistency, we must always generate a bus reset.
> Otherwise the gap count inconsistency will persist and communication will
> be unreliable. For that reason, if there is a gap count inconstency, we
> generate a bus reset even if we already reached the 5 reset limit.
>
> Signed-off-by: Adam Goldman <[email protected]>
> Link: https://sourceforge.net/p/linux1394/mailman/message/58727806/
> ---


Takashi Sakamoto