Per DMA API docs, when using dma_map_single(), DMA_TO_DEVICE synchronization
must be done after the last modification of the memory region by software
before it can be handed off to the device and safely read. Such a sync is
currently missing from firewire_ohci:at_context_queue_packet().
At least on my setup, where I could within seconds reliably reproduce a panic
in handle_at_packet() by simply dd'ing from two drives on different controllers,
the panic is gone.
See http://bugzilla.kernel.org/show_bug.cgi?id=9617
Signed-off-by: Jarod Wilson <[email protected]>
---
drivers/firewire/fw-ohci.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/drivers/firewire/fw-ohci.c b/drivers/firewire/fw-ohci.c
index 081a434..fc45868 100644
--- a/drivers/firewire/fw-ohci.c
+++ b/drivers/firewire/fw-ohci.c
@@ -780,6 +780,10 @@ at_context_queue_packet(struct context *ctx, struct fw_packet *packet)
context_append(ctx, d, z, 4 - z);
+ /* Sync the DMA buffer up for the device to read from */
+ dma_sync_single_for_device(ohci->card.device, payload_bus,
+ packet->payload_length, DMA_TO_DEVICE);
+
/* If the context isn't already running, start it up. */
reg = reg_read(ctx->ohci, CONTROL_SET(ctx->regs));
if ((reg & CONTEXT_RUN) == 0)
--
Jarod Wilson
[email protected]
On Wed, 2008-03-12 at 17:43 -0400, Jarod Wilson wrote:
> diff --git a/drivers/firewire/fw-ohci.c b/drivers/firewire/fw-ohci.c
> index 081a434..fc45868 100644
> --- a/drivers/firewire/fw-ohci.c
> +++ b/drivers/firewire/fw-ohci.c
> @@ -780,6 +780,10 @@ at_context_queue_packet(struct context *ctx, struct fw_packet *packet)
>
> context_append(ctx, d, z, 4 - z);
>
> + /* Sync the DMA buffer up for the device to read from */
> + dma_sync_single_for_device(ohci->card.device, payload_bus,
> + packet->payload_length, DMA_TO_DEVICE);
> +
> /* If the context isn't already running, start it up. */
> reg = reg_read(ctx->ohci, CONTROL_SET(ctx->regs));
> if ((reg & CONTEXT_RUN) == 0)
>
Two things:
1. The dma_sync should probably be done before the context_append
because the controller could in theory start reading the data as soon as
context_append is called right?
2. As an optimization, we should attempt to allocate the payload in the
lower 32-bits of physical memory, to prevent extra memcopies on x86_64.
I think this can be done by adding GFP_DMA32 to kmalloc where the
payload was allocated in fw-cdev.c. There might be other places where
we would benefit from GFP_DMA32 also. Of course, these optimizations
are probably better saved for another patch.
-David
Jarod Wilson wrote:
> At least on my setup, where I could within seconds reliably reproduce a panic
> in handle_at_packet() by simply dd'ing from two drives on different controllers,
> the panic is gone.
>
> See http://bugzilla.kernel.org/show_bug.cgi?id=9617
Alas the panic from comment #10 is still there, i.e. instant crash when
plugging in an LSI based CD-RW (shortly after SCSI inquiry) --- but only
if CONFIG_DEBUG_PAGEALLOC=y.
Jarod, did your crashes happen with CONFIG_DEBUG_PAGEALLOC=n?
> --- a/drivers/firewire/fw-ohci.c
> +++ b/drivers/firewire/fw-ohci.c
> @@ -780,6 +780,10 @@ at_context_queue_packet(struct context *ctx, struct fw_packet *packet)
>
> context_append(ctx, d, z, 4 - z);
>
> + /* Sync the DMA buffer up for the device to read from */
> + dma_sync_single_for_device(ohci->card.device, payload_bus,
> + packet->payload_length, DMA_TO_DEVICE);
> +
> /* If the context isn't already running, start it up. */
> reg = reg_read(ctx->ohci, CONTROL_SET(ctx->regs));
> if ((reg & CONTEXT_RUN) == 0)
>
The dma_sync_single_ call should be conditional for
packet->payload_length > 0. You would have noticed that if my patch
"firewire: fw-ohci: shut up false compiler warning on PPC32" wouldn't
have shadowed the corresponding compiler warning, which would be for
real after your patch. And, as David wrote, the call should come before
context_append.
However, we actually don't need it at all.
The dma_map_single(...) already syncs the payload for the device, and we
don't access the payload after that anymore. So this patch shouldn't do
anything, except that it inserts a call which happens to have barrier
characteristics on some platforms.
What we rather have to check is:
- Are we really writing into the context program the order that we
need to? This includes ordering WRT MMIO writes.
- Are we writing the branch address atomically? (No, we don't enforce
an atomic access at the moment, although it is very likely that the
compiler uses an atomic access.)
(We have to expect that the controller reads a descriptor while we write
into it.)
- Is there a use-after-free problem somewhere?
(A pattern in the original report and in a crash that you mentioned
looked like use of freed memory: "Faulting instruction address:
0x6b6b6b68" in comment #1.)
--
Stefan Richter
-=====-==--- --== -==--
http://arcgraph.de/sr/
David Moore wrote:
> 1. The dma_sync should probably be done before the context_append
> because the controller could in theory start reading the data as soon as
> context_append is called right?
Yes, but its syncing is superfluous anyway, as mentioned in the other
post. *If* it is doing something for Jarod's setup, then only because
of side effects like memory barrier properties.
> 2. As an optimization, we should attempt to allocate the payload in the
> lower 32-bits of physical memory, to prevent extra memcopies on x86_64.
> I think this can be done by adding GFP_DMA32 to kmalloc where the
> payload was allocated in fw-cdev.c. There might be other places where
> we would benefit from GFP_DMA32 also. Of course, these optimizations
> are probably better saved for another patch.
I think so too. GFP_DMA32 would be appropriate on machines with
"software IOMMU" alias swiotlb.
Does GFP_DMA32 have adverse affects on machines with a real IOMMU?
--
Stefan Richter
-=====-==--- --== -==--
http://arcgraph.de/sr/
On Wednesday 12 March 2008 07:16:43 pm Stefan Richter wrote:
> Jarod Wilson wrote:
> > At least on my setup, where I could within seconds reliably reproduce a
> > panic in handle_at_packet() by simply dd'ing from two drives on different
> > controllers, the panic is gone.
> >
> > See http://bugzilla.kernel.org/show_bug.cgi?id=9617
>
> Alas the panic from comment #10 is still there, i.e. instant crash when
> plugging in an LSI based CD-RW (shortly after SCSI inquiry) --- but only
> if CONFIG_DEBUG_PAGEALLOC=y.
>
> Jarod, did your crashes happen with CONFIG_DEBUG_PAGEALLOC=n?
No, they're with it turned on (and still on w/this change where it doesn't
panic). If I run with CONFIG_DEBUG_PAGEALLOC=n, no panic.
> > --- a/drivers/firewire/fw-ohci.c
> > +++ b/drivers/firewire/fw-ohci.c
> > @@ -780,6 +780,10 @@ at_context_queue_packet(struct context *ctx, struct
> > fw_packet *packet)
> >
> > context_append(ctx, d, z, 4 - z);
> >
> > + /* Sync the DMA buffer up for the device to read from */
> > + dma_sync_single_for_device(ohci->card.device, payload_bus,
> > + packet->payload_length, DMA_TO_DEVICE);
> > +
> > /* If the context isn't already running, start it up. */
> > reg = reg_read(ctx->ohci, CONTROL_SET(ctx->regs));
> > if ((reg & CONTEXT_RUN) == 0)
>
> The dma_sync_single_ call should be conditional for
> packet->payload_length > 0. You would have noticed that if my patch
> "firewire: fw-ohci: shut up false compiler warning on PPC32" wouldn't
> have shadowed the corresponding compiler warning, which would be for
> real after your patch. And, as David wrote, the call should come before
> context_append.
>
> However, we actually don't need it at all.
>
> The dma_map_single(...) already syncs the payload for the device, and we
> don't access the payload after that anymore.
D'oh, got overzealous reading dma docs, missed the fact that we didn't
actually touch it on the host side, so nothing to actually sync...
> So this patch shouldn't do
> anything, except that it inserts a call which happens to have barrier
> characteristics on some platforms.
...but got lucky in that it actually helps this particular setup (x86_64
kernel, dual quad-core opteron, 8G RAM, 3 FireWire controllers). Hrm.
> What we rather have to check is:
>
> - Are we really writing into the context program the order that we
> need to? This includes ordering WRT MMIO writes.
>
> - Are we writing the branch address atomically? (No, we don't enforce
> an atomic access at the moment, although it is very likely that the
> compiler uses an atomic access.)
>
> (We have to expect that the controller reads a descriptor while we write
> into it.)
More investigative fun for tomorrow...
> - Is there a use-after-free problem somewhere?
>
> (A pattern in the original report and in a crash that you mentioned
> looked like use of freed memory: "Faulting instruction address:
> 0x6b6b6b68" in comment #1.)
Seems both of the ones that looked like slab poison were on PowerPC. I'll have
to spin up a ppc kernel on the old powerbook and poke around more.
--
Jarod Wilson
[email protected]
Jarod Wilson wrote:
> On Wednesday 12 March 2008 07:16:43 pm Stefan Richter wrote:
>> Jarod Wilson wrote:
>>> See http://bugzilla.kernel.org/show_bug.cgi?id=9617
>> Alas the panic from comment #10 is still there, i.e. instant crash when
>> plugging in an LSI based CD-RW (shortly after SCSI inquiry) --- but only
>> if CONFIG_DEBUG_PAGEALLOC=y.
>>
>> Jarod, did your crashes happen with CONFIG_DEBUG_PAGEALLOC=n?
>
> No, they're with it turned on (and still on w/this change where it doesn't
> panic). If I run with CONFIG_DEBUG_PAGEALLOC=n, no panic.
So then that's like "my" panic.
The other thing that I see here is that it only happens with my two LSI
based devices (both CD-RWs), always quickly after SCSI inquiry.
I shall test with my Prolific based DVD-RW to find out whether it is
about some requests that are sent to CD/DVD-RWs or about the split
transactions that I get from the LSI bridges. (See bugzilla.)
>> So this patch shouldn't do
>> anything, except that it inserts a call which happens to have barrier
>> characteristics on some platforms.
...and potentially delays execution.
> ...but got lucky in that it actually helps this particular setup (x86_64
> kernel, dual quad-core opteron, 8G RAM, 3 FireWire controllers). Hrm.
Unless you or I spot the real solution earlier, you could also try
replacing your dma_sync_ with mb() and with mdelay() respectively to see
what aspect of the dma_sync_ is fixing your setup. Also move the mb()
to other interesting places of the involved code.
--
Stefan Richter
-=====-==--- --== -==-=
http://arcgraph.de/sr/
On Thursday 13 March 2008 04:49:03 am Stefan Richter wrote:
> >> So this patch shouldn't do
> >> anything, except that it inserts a call which happens to have barrier
> >> characteristics on some platforms.
>
> ...and potentially delays execution.
>
> > ...but got lucky in that it actually helps this particular setup (x86_64
> > kernel, dual quad-core opteron, 8G RAM, 3 FireWire controllers). Hrm.
>
> Unless you or I spot the real solution earlier, you could also try
> replacing your dma_sync_ with mb() and with mdelay() respectively to see
> what aspect of the dma_sync_ is fixing your setup. ?Also move the mb()
> to other interesting places of the involved code.
So it would seem I screwed up something in my testing, and failed to reproduce
the panic after adding '[PATCH] firewire: fw-ohci: use dma_alloc_coherent for
ar_buffer' to the mix. Best as I can tell now, the panic was actually
resolved by that patch, as backing out the sync altogether now works just as
well as with the sync, with an mb and with an mdelay.
This sort of makes some degree of sense, since this is another x86_64 system
w/>= 4GB of RAM. It would appear possible that at some higher layer, where AT
and AR transactions are coordinated with one another, we were getting an AT
packet into a bad state due to its corresponding AR traffic being stuck in a
non-coherent buffer we couldn't read. But this is only semi-informed
speculation...
--
Jarod Wilson
[email protected]