2024-03-25 11:20:37

by Takashi Sakamoto

[permalink] [raw]
Subject: Re: [PATCH] firewire: ohci: mask bus reset interrupts between ISR and bottom half

Hi,

On Wed, Mar 20, 2024 at 02:15:19AM -0700, Adam Goldman wrote:
> In the FireWire OHCI interrupt handler, if a bus reset interrupt has
> occurred, mask bus reset interrupts until bus_reset_work has serviced and
> cleared the interrupt.
>
> Normally, we always leave bus reset interrupts masked. We infer the bus
> reset from the self-ID interrupt that happens shortly thereafter. A
> scenario where we unmask bus reset interrupts was introduced in 2008 in
> a007bb857e0b26f5d8b73c2ff90782d9c0972620: If
> OHCI_PARAM_DEBUG_BUSRESETS (8) is set in the debug parameter bitmask, we
> will unmask bus reset interrupts so we can log them.
>
> irq_handler logs the bus reset interrupt. However, we can't clear the bus
> reset event flag in irq_handler, because we won't service the event until
> later. irq_handler exits with the event flag still set. If the
> corresponding interrupt is still unmasked, the first bus reset will
> usually freeze the system due to irq_handler being called again each
> time it exits. This freeze can be reproduced by loading firewire_ohci
> with "modprobe firewire_ohci debug=-1" (to enable all debugging output).
> Apparently there are also some cases where bus_reset_work will get called
> soon enough to clear the event, and operation will continue normally.
>
> This freeze was first reported a few months after a007bb85 was committed,
> but until now it was never fixed. The debug level could safely be set
> to -1 through sysfs after the module was loaded, but this would be
> ineffectual in logging bus reset interrupts since they were only
> unmasked during initialization.
>
> irq_handler will now leave the event flag set but mask bus reset
> interrupts, so irq_handler won't be called again and there will be no
> freeze. If OHCI_PARAM_DEBUG_BUSRESETS is enabled, bus_reset_work will
> unmask the interrupt after servicing the event, so future interrupts
> will be caught as desired.
>
> As a side effect to this change, OHCI_PARAM_DEBUG_BUSRESETS can now be
> enabled through sysfs in addition to during initial module loading.
> However, when enabled through sysfs, logging of bus reset interrupts will
> be effective only starting with the second bus reset, after
> bus_reset_work has executed.
>
> Signed-off-by: Adam Goldman <[email protected]>
> ---
>
> --- linux-6.8-rc1.orig/drivers/firewire/ohci.c 2024-01-21 14:11:32.000000000 -0800
> +++ linux-6.8-rc1/drivers/firewire/ohci.c 2024-03-12 01:15:10.000000000 -0700
> @@ -2060,6 +2060,8 @@ static void bus_reset_work(struct work_s
>
> ohci->generation = generation;
> reg_write(ohci, OHCI1394_IntEventClear, OHCI1394_busReset);
> + if (param_debug & OHCI_PARAM_DEBUG_BUSRESETS)
> + reg_write(ohci, OHCI1394_IntMaskSet, OHCI1394_busReset);
>
> if (ohci->quirks & QUIRK_RESET_PACKET)
> ohci->request_generation = generation;
> @@ -2125,12 +2127,14 @@ static irqreturn_t irq_handler(int irq,
> return IRQ_NONE;
>
> /*
> - * busReset and postedWriteErr must not be cleared yet
> + * busReset and postedWriteErr events must not be cleared yet
> * (OHCI 1.1 clauses 7.2.3.2 and 13.2.8.1)
> */
> reg_write(ohci, OHCI1394_IntEventClear,
> event & ~(OHCI1394_busReset | OHCI1394_postedWriteErr));
> log_irqs(ohci, event);
> + if (event & OHCI1394_busReset)
> + reg_write(ohci, OHCI1394_IntMaskClear, OHCI1394_busReset);
>
> if (event & OHCI1394_selfIDComplete)
> queue_work(selfid_workqueue, &ohci->bus_reset_work);

Thanks for the patch. I pushed topic branch[1] for it, since I'm
considering about whether to send it to stable and longterm releases.

I had realized that the debug=8 for firewire-ohci module provides tons
of logs triggering by the irq handler, since the irq for bus reset is
not unmasked, so I rely on selfID events when debugging bus-reset. I have
few objections to the change.

My concern is how much invasive it is. The unmasking is kept until
bus_reset_work() is executed in the workqueue. When considering about
the delay of workqueue (since it is a kind of schedulable task), many
irq events for bus reset is potentially skipped from the logging. Of
course, it is the aim of change.

Let me take more time to evaluate the change, but I'm willing to send it
to upstream until -rc3 or -rc4, at least, if receiving no objections
from the others.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git/log/?h=v6.9-firewire-mask-bus-reset-event-during-handling


Thanks

Takashi Sakamoto


2024-04-01 12:18:14

by Takashi Sakamoto

[permalink] [raw]
Subject: Re: [PATCH] firewire: ohci: mask bus reset interrupts between ISR and bottom half

Hi Adam,

On Mon, Mar 25, 2024 at 09:58:28AM +0900, Takashi Sakamoto wrote:
> Hi,
>
> On Wed, Mar 20, 2024 at 02:15:19AM -0700, Adam Goldman wrote:
> > In the FireWire OHCI interrupt handler, if a bus reset interrupt has
> > occurred, mask bus reset interrupts until bus_reset_work has serviced and
> > cleared the interrupt.
> >
> > Normally, we always leave bus reset interrupts masked. We infer the bus
> > reset from the self-ID interrupt that happens shortly thereafter. A
> > scenario where we unmask bus reset interrupts was introduced in 2008 in
> > a007bb857e0b26f5d8b73c2ff90782d9c0972620: If
> > OHCI_PARAM_DEBUG_BUSRESETS (8) is set in the debug parameter bitmask, we
> > will unmask bus reset interrupts so we can log them.
> >
> > irq_handler logs the bus reset interrupt. However, we can't clear the bus
> > reset event flag in irq_handler, because we won't service the event until
> > later. irq_handler exits with the event flag still set. If the
> > corresponding interrupt is still unmasked, the first bus reset will
> > usually freeze the system due to irq_handler being called again each
> > time it exits. This freeze can be reproduced by loading firewire_ohci
> > with "modprobe firewire_ohci debug=-1" (to enable all debugging output).
> > Apparently there are also some cases where bus_reset_work will get called
> > soon enough to clear the event, and operation will continue normally.
> >
> > This freeze was first reported a few months after a007bb85 was committed,
> > but until now it was never fixed. The debug level could safely be set
> > to -1 through sysfs after the module was loaded, but this would be
> > ineffectual in logging bus reset interrupts since they were only
> > unmasked during initialization.
> >
> > irq_handler will now leave the event flag set but mask bus reset
> > interrupts, so irq_handler won't be called again and there will be no
> > freeze. If OHCI_PARAM_DEBUG_BUSRESETS is enabled, bus_reset_work will
> > unmask the interrupt after servicing the event, so future interrupts
> > will be caught as desired.
> >
> > As a side effect to this change, OHCI_PARAM_DEBUG_BUSRESETS can now be
> > enabled through sysfs in addition to during initial module loading.
> > However, when enabled through sysfs, logging of bus reset interrupts will
> > be effective only starting with the second bus reset, after
> > bus_reset_work has executed.
> >
> > Signed-off-by: Adam Goldman <[email protected]>
> > ---
> >
> > --- linux-6.8-rc1.orig/drivers/firewire/ohci.c 2024-01-21 14:11:32.000000000 -0800
> > +++ linux-6.8-rc1/drivers/firewire/ohci.c 2024-03-12 01:15:10.000000000 -0700
> > @@ -2060,6 +2060,8 @@ static void bus_reset_work(struct work_s
> >
> > ohci->generation = generation;
> > reg_write(ohci, OHCI1394_IntEventClear, OHCI1394_busReset);
> > + if (param_debug & OHCI_PARAM_DEBUG_BUSRESETS)
> > + reg_write(ohci, OHCI1394_IntMaskSet, OHCI1394_busReset);
> >
> > if (ohci->quirks & QUIRK_RESET_PACKET)
> > ohci->request_generation = generation;
> > @@ -2125,12 +2127,14 @@ static irqreturn_t irq_handler(int irq,
> > return IRQ_NONE;
> >
> > /*
> > - * busReset and postedWriteErr must not be cleared yet
> > + * busReset and postedWriteErr events must not be cleared yet
> > * (OHCI 1.1 clauses 7.2.3.2 and 13.2.8.1)
> > */
> > reg_write(ohci, OHCI1394_IntEventClear,
> > event & ~(OHCI1394_busReset | OHCI1394_postedWriteErr));
> > log_irqs(ohci, event);
> > + if (event & OHCI1394_busReset)
> > + reg_write(ohci, OHCI1394_IntMaskClear, OHCI1394_busReset);
> >
> > if (event & OHCI1394_selfIDComplete)
> > queue_work(selfid_workqueue, &ohci->bus_reset_work);
>
> Thanks for the patch. I pushed topic branch[1] for it, since I'm
> considering about whether to send it to stable and longterm releases.
>
> I had realized that the debug=8 for firewire-ohci module provides tons
> of logs triggering by the irq handler, since the irq for bus reset is
> not unmasked, so I rely on selfID events when debugging bus-reset. I have
> few objections to the change.
>
> My concern is how much invasive it is. The unmasking is kept until
> bus_reset_work() is executed in the workqueue. When considering about
> the delay of workqueue (since it is a kind of schedulable task), many
> irq events for bus reset is potentially skipped from the logging. Of
> course, it is the aim of change.
>
> Let me take more time to evaluate the change, but I'm willing to send it
> to upstream until -rc3 or -rc4, at least, if receiving no objections
> from the others.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git/log/?h=v6.9-firewire-mask-bus-reset-event-during-handling

As long as I tested, the patch is enough good to suppress the spurious
IRQ event. I'll send it to mainline in this weekend.

I sent an additional patch[1] to handle the bus-reset event at the first
time. I'd like you to review and test it as well, especially under your
environment in which 1394:1995 and 1394a phys exist.

[1] https://lore.kernel.org/lkml/[email protected]/


Thanks

Takashi Sakamoto