Hi,
On Wed, Mar 20, 2024 at 02:15:19AM -0700, Adam Goldman wrote:
> In the FireWire OHCI interrupt handler, if a bus reset interrupt has
> occurred, mask bus reset interrupts until bus_reset_work has serviced and
> cleared the interrupt.
>
> Normally, we always leave bus reset interrupts masked. We infer the bus
> reset from the self-ID interrupt that happens shortly thereafter. A
> scenario where we unmask bus reset interrupts was introduced in 2008 in
> a007bb857e0b26f5d8b73c2ff90782d9c0972620: If
> OHCI_PARAM_DEBUG_BUSRESETS (8) is set in the debug parameter bitmask, we
> will unmask bus reset interrupts so we can log them.
>
> irq_handler logs the bus reset interrupt. However, we can't clear the bus
> reset event flag in irq_handler, because we won't service the event until
> later. irq_handler exits with the event flag still set. If the
> corresponding interrupt is still unmasked, the first bus reset will
> usually freeze the system due to irq_handler being called again each
> time it exits. This freeze can be reproduced by loading firewire_ohci
> with "modprobe firewire_ohci debug=-1" (to enable all debugging output).
> Apparently there are also some cases where bus_reset_work will get called
> soon enough to clear the event, and operation will continue normally.
>
> This freeze was first reported a few months after a007bb85 was committed,
> but until now it was never fixed. The debug level could safely be set
> to -1 through sysfs after the module was loaded, but this would be
> ineffectual in logging bus reset interrupts since they were only
> unmasked during initialization.
>
> irq_handler will now leave the event flag set but mask bus reset
> interrupts, so irq_handler won't be called again and there will be no
> freeze. If OHCI_PARAM_DEBUG_BUSRESETS is enabled, bus_reset_work will
> unmask the interrupt after servicing the event, so future interrupts
> will be caught as desired.
>
> As a side effect to this change, OHCI_PARAM_DEBUG_BUSRESETS can now be
> enabled through sysfs in addition to during initial module loading.
> However, when enabled through sysfs, logging of bus reset interrupts will
> be effective only starting with the second bus reset, after
> bus_reset_work has executed.
>
> Signed-off-by: Adam Goldman <[email protected]>
> ---
>
> --- linux-6.8-rc1.orig/drivers/firewire/ohci.c 2024-01-21 14:11:32.000000000 -0800
> +++ linux-6.8-rc1/drivers/firewire/ohci.c 2024-03-12 01:15:10.000000000 -0700
> @@ -2060,6 +2060,8 @@ static void bus_reset_work(struct work_s
>
> ohci->generation = generation;
> reg_write(ohci, OHCI1394_IntEventClear, OHCI1394_busReset);
> + if (param_debug & OHCI_PARAM_DEBUG_BUSRESETS)
> + reg_write(ohci, OHCI1394_IntMaskSet, OHCI1394_busReset);
>
> if (ohci->quirks & QUIRK_RESET_PACKET)
> ohci->request_generation = generation;
> @@ -2125,12 +2127,14 @@ static irqreturn_t irq_handler(int irq,
> return IRQ_NONE;
>
> /*
> - * busReset and postedWriteErr must not be cleared yet
> + * busReset and postedWriteErr events must not be cleared yet
> * (OHCI 1.1 clauses 7.2.3.2 and 13.2.8.1)
> */
> reg_write(ohci, OHCI1394_IntEventClear,
> event & ~(OHCI1394_busReset | OHCI1394_postedWriteErr));
> log_irqs(ohci, event);
> + if (event & OHCI1394_busReset)
> + reg_write(ohci, OHCI1394_IntMaskClear, OHCI1394_busReset);
>
> if (event & OHCI1394_selfIDComplete)
> queue_work(selfid_workqueue, &ohci->bus_reset_work);
Thanks for the patch. I pushed topic branch[1] for it, since I'm
considering about whether to send it to stable and longterm releases.
I had realized that the debug=8 for firewire-ohci module provides tons
of logs triggering by the irq handler, since the irq for bus reset is
not unmasked, so I rely on selfID events when debugging bus-reset. I have
few objections to the change.
My concern is how much invasive it is. The unmasking is kept until
bus_reset_work() is executed in the workqueue. When considering about
the delay of workqueue (since it is a kind of schedulable task), many
irq events for bus reset is potentially skipped from the logging. Of
course, it is the aim of change.
Let me take more time to evaluate the change, but I'm willing to send it
to upstream until -rc3 or -rc4, at least, if receiving no objections
from the others.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git/log/?h=v6.9-firewire-mask-bus-reset-event-during-handling
Thanks
Takashi Sakamoto
Hi Adam,
On Mon, Mar 25, 2024 at 09:58:28AM +0900, Takashi Sakamoto wrote:
> Hi,
>
> On Wed, Mar 20, 2024 at 02:15:19AM -0700, Adam Goldman wrote:
> > In the FireWire OHCI interrupt handler, if a bus reset interrupt has
> > occurred, mask bus reset interrupts until bus_reset_work has serviced and
> > cleared the interrupt.
> >
> > Normally, we always leave bus reset interrupts masked. We infer the bus
> > reset from the self-ID interrupt that happens shortly thereafter. A
> > scenario where we unmask bus reset interrupts was introduced in 2008 in
> > a007bb857e0b26f5d8b73c2ff90782d9c0972620: If
> > OHCI_PARAM_DEBUG_BUSRESETS (8) is set in the debug parameter bitmask, we
> > will unmask bus reset interrupts so we can log them.
> >
> > irq_handler logs the bus reset interrupt. However, we can't clear the bus
> > reset event flag in irq_handler, because we won't service the event until
> > later. irq_handler exits with the event flag still set. If the
> > corresponding interrupt is still unmasked, the first bus reset will
> > usually freeze the system due to irq_handler being called again each
> > time it exits. This freeze can be reproduced by loading firewire_ohci
> > with "modprobe firewire_ohci debug=-1" (to enable all debugging output).
> > Apparently there are also some cases where bus_reset_work will get called
> > soon enough to clear the event, and operation will continue normally.
> >
> > This freeze was first reported a few months after a007bb85 was committed,
> > but until now it was never fixed. The debug level could safely be set
> > to -1 through sysfs after the module was loaded, but this would be
> > ineffectual in logging bus reset interrupts since they were only
> > unmasked during initialization.
> >
> > irq_handler will now leave the event flag set but mask bus reset
> > interrupts, so irq_handler won't be called again and there will be no
> > freeze. If OHCI_PARAM_DEBUG_BUSRESETS is enabled, bus_reset_work will
> > unmask the interrupt after servicing the event, so future interrupts
> > will be caught as desired.
> >
> > As a side effect to this change, OHCI_PARAM_DEBUG_BUSRESETS can now be
> > enabled through sysfs in addition to during initial module loading.
> > However, when enabled through sysfs, logging of bus reset interrupts will
> > be effective only starting with the second bus reset, after
> > bus_reset_work has executed.
> >
> > Signed-off-by: Adam Goldman <[email protected]>
> > ---
> >
> > --- linux-6.8-rc1.orig/drivers/firewire/ohci.c 2024-01-21 14:11:32.000000000 -0800
> > +++ linux-6.8-rc1/drivers/firewire/ohci.c 2024-03-12 01:15:10.000000000 -0700
> > @@ -2060,6 +2060,8 @@ static void bus_reset_work(struct work_s
> >
> > ohci->generation = generation;
> > reg_write(ohci, OHCI1394_IntEventClear, OHCI1394_busReset);
> > + if (param_debug & OHCI_PARAM_DEBUG_BUSRESETS)
> > + reg_write(ohci, OHCI1394_IntMaskSet, OHCI1394_busReset);
> >
> > if (ohci->quirks & QUIRK_RESET_PACKET)
> > ohci->request_generation = generation;
> > @@ -2125,12 +2127,14 @@ static irqreturn_t irq_handler(int irq,
> > return IRQ_NONE;
> >
> > /*
> > - * busReset and postedWriteErr must not be cleared yet
> > + * busReset and postedWriteErr events must not be cleared yet
> > * (OHCI 1.1 clauses 7.2.3.2 and 13.2.8.1)
> > */
> > reg_write(ohci, OHCI1394_IntEventClear,
> > event & ~(OHCI1394_busReset | OHCI1394_postedWriteErr));
> > log_irqs(ohci, event);
> > + if (event & OHCI1394_busReset)
> > + reg_write(ohci, OHCI1394_IntMaskClear, OHCI1394_busReset);
> >
> > if (event & OHCI1394_selfIDComplete)
> > queue_work(selfid_workqueue, &ohci->bus_reset_work);
>
> Thanks for the patch. I pushed topic branch[1] for it, since I'm
> considering about whether to send it to stable and longterm releases.
>
> I had realized that the debug=8 for firewire-ohci module provides tons
> of logs triggering by the irq handler, since the irq for bus reset is
> not unmasked, so I rely on selfID events when debugging bus-reset. I have
> few objections to the change.
>
> My concern is how much invasive it is. The unmasking is kept until
> bus_reset_work() is executed in the workqueue. When considering about
> the delay of workqueue (since it is a kind of schedulable task), many
> irq events for bus reset is potentially skipped from the logging. Of
> course, it is the aim of change.
>
> Let me take more time to evaluate the change, but I'm willing to send it
> to upstream until -rc3 or -rc4, at least, if receiving no objections
> from the others.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git/log/?h=v6.9-firewire-mask-bus-reset-event-during-handling
As long as I tested, the patch is enough good to suppress the spurious
IRQ event. I'll send it to mainline in this weekend.
I sent an additional patch[1] to handle the bus-reset event at the first
time. I'd like you to review and test it as well, especially under your
environment in which 1394:1995 and 1394a phys exist.
[1] https://lore.kernel.org/lkml/[email protected]/
Thanks
Takashi Sakamoto
Hi Takashi,
On Mon, Apr 01, 2024 at 09:18:00PM +0900, Takashi Sakamoto wrote:
> I sent an additional patch[1] to handle the bus-reset event at the first
> time. I'd like you to review and test it as well, especially under your
> environment in which 1394:1995 and 1394a phys exist.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
I'm sorry for another very late reply.
Now that we eliminated the IRQ storm, it makes sense to always enable
the bus-reset interrupt at startup. I tested your patch with various
devices, with a FW800 repeater, with a FW400 hub, without a hub, etc.
Everything works OK. However, I only tested with XIO2213B OHCI.
-- Adam
Hi Adam,
On Wed, May 01, 2024 at 04:39:23AM -0700, Adam Goldman wrote:
> Hi Takashi,
>
> On Mon, Apr 01, 2024 at 09:18:00PM +0900, Takashi Sakamoto wrote:
> > I sent an additional patch[1] to handle the bus-reset event at the first
> > time. I'd like you to review and test it as well, especially under your
> > environment in which 1394:1995 and 1394a phys exist.
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/
>
> I'm sorry for another very late reply.
>
> Now that we eliminated the IRQ storm, it makes sense to always enable
> the bus-reset interrupt at startup. I tested your patch with various
> devices, with a FW800 repeater, with a FW400 hub, without a hub, etc.
> Everything works OK. However, I only tested with XIO2213B OHCI.
>
> -- Adam
Thanks for your test. The content of patch is equivalent to the first
one in the candidate series[1], so I appended Tested-by tag when
applying to for-next branch.
The for-next branch includes the commits to provide the following
tracepoints events:
* firewire:async_request_outbound_initiate
* firewire:async_request_outbound_complete
* firewire:async_response_inbound
* firewire:async_request_inbound
* firewire:async_response_outbound_initiate
* firewire:async_response_outbound_complete
* firewire:async_phy_outbound_initiate
* firewire:async_phy_outbound_complete
* firewire:async_phy_inbound
* firewire:bus_reset_initiate
* firewire:bus_reset_schedule
* firewire:bus_reset_postpone
* firewire:bus_reset_handle
All of them are used to trace the action of firewire core function,
instead of 1394 OHCI driver. I think they are helpful to debug the kind
of issue which we handled for v6.8 kernel.
[1] https://lore.kernel.org/lkml/[email protected]/
Thanks
Takashi Sakamoto