2024-02-28 00:42:15

by Takashi Sakamoto

[permalink] [raw]
Subject: Re: [PATCH] firewire: core: use long bus reset on gap count error

Hi Adam,

Thanks for your effort and the patch. I would like to send it to upstream,
while found some nitpicks.

On Mon, Feb 26, 2024 at 12:12:42AM -0800, Adam Goldman wrote:
> From: Adam Goldman <[email protected]>
>
> When resetting the bus after a gap count error, use a long rather than
> short bus reset.
>
> IEEE 1394-1995 uses only long bus resets. IEEE 1394a adds the option of
> short bus resets. When video or audio transmision is in progress and a
> device is hot-plugged elsewhere on the bus, the resulting bus reset can
> cause video frame drops or audio dropouts. Short bus resets reduce or
> eliminate this problem. Accordingly, short bus resets are almost always
> preferred.
>
> However, on a mixed 1394/1394a bus, a short bus reset can trigger an
> immediate additional bus reset. This double bus reset can be interpreted
> differently by different nodes on the bus, resulting in an inconsistent gap
> count after the bus reset. An inconsistent gap count will cause another bus
> reset, leading to a neverending bus reset loop. This only happens for some
> bus topologies, not for all mixed 1394/1394a buses.
>
> By instead sending a long bus reset after a gap count inconsistency, we
> avoid the doubled bus reset, restoring the bus to normal operation.
>
> Signed-off-by: Adam Goldman <[email protected]>
> Link: https://sourceforge.net/p/linux1394/mailman/message/58741624/
> ---
>
> --- linux-6.8-rc1.orig/drivers/firewire/core-card.c 2024-01-21 14:11:32.000000000 -0800
> +++ linux-6.8-rc1/drivers/firewire/core-card.c 2024-02-12 01:16:15.000000000 -0800
> @@ -484,7 +484,17 @@
> fw_notice(card, "phy config: new root=%x, gap_count=%d\n",
> new_root_id, gap_count);
> fw_send_phy_config(card, new_root_id, generation, gap_count);
> - reset_bus(card, true);
> + /*
> + * Where possible, use a short bus reset to minimize
> + * disruption to isochronous transfers. But in the event
> + * of a gap count inconsistency, use a long bus reset. On
> + * a mixed 1394/1394a bus, a short bus reset can get
> + * doubled. Some nodes may treat this as one bus reset and
> + * others may treat it as two, causing a gap count
> + * inconsistency again. Using a long bus reset prevents
> + * this.
> + */
> + reset_bus(card, card->gap_count != 0);
> /* Will allocate broadcast channel after the reset. */
> goto out;
> }

In your report, you referred to the section of 1394 specification about a
mixed 1394/1394a bus responding differently to a reset (8.4.6.2). I think
it preferable to add the section number in the code comment.

Additionally, for your investigation, you added the debug print to get the
timing of bus reset scheduling. I think it useful for this kind of issue.
Would I ask you to write another patch to add it? In my opinion, the case
of mixed versions of 1394 PHYs in the same bus has more quirks and the
debug print is helpful to investigate it further.

And I'm sorry to be helpless to your work. I have some IEEE 1394 hardware
for consumer audio equipments, but the most of them is relatively new and
support 1394a already...


Thanks

Takashi Sakamoto


2024-03-20 09:38:19

by Adam Goldman

[permalink] [raw]
Subject: Re: [PATCH] firewire: core: use long bus reset on gap count error

Hi Takashi,

On Wed, Feb 28, 2024 at 09:41:44AM +0900, Takashi Sakamoto wrote:
> Additionally, for your investigation, you added the debug print to get the
> timing of bus reset scheduling. I think it useful for this kind of issue.
> Would I ask you to write another patch to add it? In my opinion, the case
> of mixed versions of 1394 PHYs in the same bus has more quirks and the
> debug print is helpful to investigate it further.

I'm sorry for my delay in preparing a patch.

I've submitted a patch to linux1394-devel to log when we schedule or
initiate a bus reset. This is enabled with a new parameter to the
firewire-core module. It provides logging similar to the debug print I
used to investigate the reset loop.

Also, there is already logging for bus reset interrupts in
firewire-ohci. This logs all bus resets and does not indicate whether we
initiated the reset or some other node on the bus initiated it. However,
the logging in firewire-ohci always froze my computer when I enabled it.
I've submitted a separate patch to fix the firewire-ohci logging.

I believe both forms of logging can be useful. firewire-ohci logs all
bus resets, but it doesn't tell where the resets came from. firewire-core
only logs bus resets we initiate.

I also considered adding an option to firewire-ohci to log PHY register
access. This would include writes to IBR and ISBR, so it would log when
we initiate resets. However, this logging would be more complicated to
add, so I didn't do it.

-- Adam

2024-03-21 13:25:32

by Takashi Sakamoto

[permalink] [raw]
Subject: Re: [PATCH] firewire: core: use long bus reset on gap count error

Hi Adam,

Thanks for the patches to improve the subsystem.

Inconveniently to you , we are now just at the merge window for v6.9
kernel, thus I would not put any changes except for the changes to
Linus. I'd like you to wait until the next week, sorry.

However, in the topic of logging PHY register, I have an idea to utilize
the Linux kernel tracepoints framework[1]. It is tangled to program with
the provided macros, and it is available just with the relevant tools[2],
but it would be helpful in the case, I think.

[1] https://docs.kernel.org/trace/tracepoints.html
[2] https://docs.kernel.org/trace/tracepoint-analysis.html

On Wed, Mar 20, 2024 at 02:38:05AM -0700, Adam Goldman wrote:
> Hi Takashi,
>
> On Wed, Feb 28, 2024 at 09:41:44AM +0900, Takashi Sakamoto wrote:
> > Additionally, for your investigation, you added the debug print to get the
> > timing of bus reset scheduling. I think it useful for this kind of issue.
> > Would I ask you to write another patch to add it? In my opinion, the case
> > of mixed versions of 1394 PHYs in the same bus has more quirks and the
> > debug print is helpful to investigate it further.
>
> I'm sorry for my delay in preparing a patch.
>
> I've submitted a patch to linux1394-devel to log when we schedule or
> initiate a bus reset. This is enabled with a new parameter to the
> firewire-core module. It provides logging similar to the debug print I
> used to investigate the reset loop.
>
> Also, there is already logging for bus reset interrupts in
> firewire-ohci. This logs all bus resets and does not indicate whether we
> initiated the reset or some other node on the bus initiated it. However,
> the logging in firewire-ohci always froze my computer when I enabled it.
> I've submitted a separate patch to fix the firewire-ohci logging.
>
> I believe both forms of logging can be useful. firewire-ohci logs all
> bus resets, but it doesn't tell where the resets came from. firewire-core
> only logs bus resets we initiate.
>
> I also considered adding an option to firewire-ohci to log PHY register
> access. This would include writes to IBR and ISBR, so it would log when
> we initiate resets. However, this logging would be more complicated to
> add, so I didn't do it.
>
> -- Adam


Thanks

Takashi Sakamoto