Hi Matthew,
I've been trying for some time to use a rootfs (ext2) on a USB memory
stick on ARM platforms but without any success. The USB HCD driver is
ISP1760 which doesn't use DMA.
ARM has a Harvard cache architecture and what I get is incoherency
between the I and D caches. The CPU I'm using (ARM11MPCore) has PIPT
caches with D-cache lines allocation on write.
Basically, when user space tries to execute from a new page, it faults
and the data is requested via the VFS layer, SCSI block device and USB
mass storage from the ISP1760 driver. The page is then mapped into user
space and update_mmu_cache() called.
However, since the driver is PIO, the data copied from the USB device
into RAM gets stuck in the D-cache. On the above page requesting path
there is no call to flush_dcache_page() to handle D-cache maintenance
(for DMA drivers, that's handled by the DMA API).
Since the USB mass storage code has the information about the USB driver
capabilities (DMA or PIO), it looks like the best place to call
flush_dcache_page(). But I got lost in the SCSI emulation and all my
attempts failed to get a working rootfs.
Adding flush_dcache_page() higher up in mpage_end_io_read() solves the
problem but that's not the correct fix as it has wider implications and
it's not needed for DMA-capable devices.
Thanks.
--
Catalin
Am Freitag, 29. Januar 2010 15:34:15 schrieb Catalin Marinas:
> Basically, when user space tries to execute from a new page, it faults
> and the data is requested via the VFS layer, SCSI block device and USB
> mass storage from the ISP1760 driver. The page is then mapped into user
> space and update_mmu_cache() called.
>
> However, since the driver is PIO, the data copied from the USB device
> into RAM gets stuck in the D-cache. On the above page requesting path
> there is no call to flush_dcache_page() to handle D-cache maintenance
> (for DMA drivers, that's handled by the DMA API).
>
> Since the USB mass storage code has the information about the USB driver
> capabilities (DMA or PIO), it looks like the best place to call
> flush_dcache_page(). But I got lost in the SCSI emulation and all my
> attempts failed to get a working rootfs.
No, that would be a very bad place in the layering to do this.
The problem would happen with ub and storage. It might also
happen with any other driver.
Please add this to the HCD driver.
Regards
Oliver
2010/1/29 Catalin Marinas <[email protected]>:
> Hi Matthew,
>
> I've been trying for some time to use a rootfs (ext2) on a USB memory
> stick on ARM platforms but without any success. The USB HCD driver is
> ISP1760 which doesn't use DMA.
>
> ARM has a Harvard cache architecture and what I get is incoherency
> between the I and D caches. The CPU I'm using (ARM11MPCore) has PIPT
> caches with D-cache lines allocation on write.
>
> Basically, when user space tries to execute from a new page, it faults
> and the data is requested via the VFS layer, SCSI block device and USB
> mass storage from the ISP1760 driver. The page is then mapped into user
> space and update_mmu_cache() called.
>
> However, since the driver is PIO, the data copied from the USB device
> into RAM gets stuck in the D-cache. On the above page requesting path
> there is no call to flush_dcache_page() to handle D-cache maintenance
> (for DMA drivers, that's handled by the DMA API).
>
> Since the USB mass storage code has the information about the USB driver
Sorry, I am a little confused that usb mass storage has what information
about DMA or PIO of low level usb transfer?
Thanks,
--
Lei Ming
On Fri, 2010-01-29 at 16:23 +0000, Ming Lei wrote:
> 2010/1/29 Catalin Marinas <[email protected]>:
> > I've been trying for some time to use a rootfs (ext2) on a USB memory
> > stick on ARM platforms but without any success. The USB HCD driver is
> > ISP1760 which doesn't use DMA.
> >
> > ARM has a Harvard cache architecture and what I get is incoherency
> > between the I and D caches. The CPU I'm using (ARM11MPCore) has PIPT
> > caches with D-cache lines allocation on write.
> >
> > Basically, when user space tries to execute from a new page, it faults
> > and the data is requested via the VFS layer, SCSI block device and USB
> > mass storage from the ISP1760 driver. The page is then mapped into user
> > space and update_mmu_cache() called.
> >
> > However, since the driver is PIO, the data copied from the USB device
> > into RAM gets stuck in the D-cache. On the above page requesting path
> > there is no call to flush_dcache_page() to handle D-cache maintenance
> > (for DMA drivers, that's handled by the DMA API).
> >
> > Since the USB mass storage code has the information about the USB driver
>
> Sorry, I am a little confused that usb mass storage has what information
> about DMA or PIO of low level usb transfer?
I was thinking about checking dev->bus->controller->dma_mask which the
code (though not the storage one) seems to imply that if the dma_mask is
0, the HCD driver is only capable of PIO.
That would be a more general solution rather than going through each HCD
driver since my understanding is that flush_dcache_page() is only needed
together with the mass storage support.
--
Catalin
Am Freitag, 29. Januar 2010 17:34:03 schrieb Catalin Marinas:
> I was thinking about checking dev->bus->controller->dma_mask which the
> code (though not the storage one) seems to imply that if the dma_mask is
> 0, the HCD driver is only capable of PIO.
That a HCD is capable of DMA need not imply that DMA is used for every
transfer.
> That would be a more general solution rather than going through each HCD
> driver since my understanding is that flush_dcache_page() is only needed
> together with the mass storage support.
What about ub, nfs or nbd over a USB<->ethernet converter?
This, I am afraid is best solved at the HCD or glue layer.
Regards
Oliver
On Fri, 2010-01-29 at 16:41 +0000, Oliver Neukum wrote:
> Am Freitag, 29. Januar 2010 17:34:03 schrieb Catalin Marinas:
>
> > I was thinking about checking dev->bus->controller->dma_mask which the
> > code (though not the storage one) seems to imply that if the dma_mask is
> > 0, the HCD driver is only capable of PIO.
>
> That a HCD is capable of DMA need not imply that DMA is used for every
> transfer.
Actually the DMA drivers are safe in this respect only if the transfer
happens directly to a page cache page that may be (later) mapped into
user space. I'm not familiar with the USB drivers to fully understand
the data flow, so any help would be appreciated.
> > That would be a more general solution rather than going through each HCD
> > driver since my understanding is that flush_dcache_page() is only needed
> > together with the mass storage support.
>
> What about ub, nfs or nbd over a USB<->ethernet converter?
> This, I am afraid is best solved at the HCD or glue layer.
NFS handles the cache flushing itself, so in this case there is no need
to duplicate the cache flushing at the HCD level. AFAICT, the HCD driver
may be used in several cases and it's only the storage case (via either
ub, mass storage etc.) that requires cache flushing. Is there a way to
differentiate between these at the HCD driver level?
Regarding nbd, is there any copying happening between the HCD driver
receiving the network packet from the USB-ethernet converter and the nbd
bio_vec buffers (most likely during the TCP/IP stack flow)? In this case
it would be for the nbd driver (doesn't seem to be the case now) to
flush the D-cache as the HCD flushing is not necessary as long as it
doesn't write directly to the page cache page.
The ub case is similar to the USB mass storage one, so they could both
benefit from flushing at the HCD driver level. But is this possible
without duplicating the flushing in the nfs case?
Regards.
--
Catalin
Hello.
Catalin Marinas wrote:
>>> I've been trying for some time to use a rootfs (ext2) on a USB memory
>>> stick on ARM platforms but without any success. The USB HCD driver is
>>> ISP1760 which doesn't use DMA.
>>>
>>> ARM has a Harvard cache architecture and what I get is incoherency
>>> between the I and D caches. The CPU I'm using (ARM11MPCore) has PIPT
>>> caches with D-cache lines allocation on write.
>>>
>>> Basically, when user space tries to execute from a new page, it faults
>>> and the data is requested via the VFS layer, SCSI block device and USB
>>> mass storage from the ISP1760 driver. The page is then mapped into user
>>> space and update_mmu_cache() called.
>>>
>>> However, since the driver is PIO, the data copied from the USB device
>>> into RAM gets stuck in the D-cache. On the above page requesting path
>>> there is no call to flush_dcache_page() to handle D-cache maintenance
>>> (for DMA drivers, that's handled by the DMA API).
>>>
>>> Since the USB mass storage code has the information about the USB driver
>>>
>> Sorry, I am a little confused that usb mass storage has what information
>> about DMA or PIO of low level usb transfer?
>>
>
> I was thinking about checking dev->bus->controller->dma_mask which the
> code (though not the storage one) seems to imply that if the dma_mask is
> 0, the HCD driver is only capable of PIO.
>
> That would be a more general solution rather than going through each HCD
> driver since my understanding is that flush_dcache_page() is only needed
> together with the mass storage support.
Note that DMA capable driver can be doing some transfers in PIO mode
or falling back to PIO mode if DMA mode transfer is unsuccessful (the
musb driver is an example of the latter and if the DMA rewrite patches
will get accepted, it'll do short transfers in PIO mode).
MBR, Sergei
On Fri, Jan 29, 2010 at 08:51:47PM +0300, Sergei Shtylyov wrote:
> Catalin Marinas wrote:
>
> >That would be a more general solution rather than going through each HCD
> >driver since my understanding is that flush_dcache_page() is only needed
> >together with the mass storage support.
>
> Note that DMA capable driver can be doing some transfers in PIO mode
> or falling back to PIO mode if DMA mode transfer is unsuccessful (the
> musb driver is an example of the latter and if the DMA rewrite patches
> will get accepted, it'll do short transfers in PIO mode).
Given that an HCD can choose, on the fly, if it's using DMA or PIO, the HCD
driver is the only place to reasonably put any cache-synchronization code.
That said, what do the other SCSI HCDs do? I'm guessing the question gets
kinda muddy there, since the other SCSI HCDs all talk directly to some
piece of hardware, and thus are responsible for the cache management
themselves.
Based on that, one could argue that ub and usb-storage should be doing
this.
HOWEVER, I firmly believe that the cache-management functions belong with
the driver that actually talks to the low-level hardware, as that's the
only place where you can be 100% certain of what cache operations are
needed. After all, I think someone is working on a USB-over-IP transport,
and trying to manage cache at the usb-storage level in that scenario is
just silly.
So, let's put this in the HCD drivers and be done with it.
Matt
--
Matthew Dharm Home: [email protected]
Maintainer, Linux USB Mass Storage Driver
I see you've been reading alt.sex.chubby.sheep voraciously.
-- Tanya
User Friendly, 11/24/97
On Fri, Jan 29, 2010 at 10:54:34AM -0800, Matthew Dharm wrote:
> On Fri, Jan 29, 2010 at 08:51:47PM +0300, Sergei Shtylyov wrote:
> > Catalin Marinas wrote:
> >
> > >That would be a more general solution rather than going through each HCD
> > >driver since my understanding is that flush_dcache_page() is only needed
> > >together with the mass storage support.
> >
> > Note that DMA capable driver can be doing some transfers in PIO mode
> > or falling back to PIO mode if DMA mode transfer is unsuccessful (the
> > musb driver is an example of the latter and if the DMA rewrite patches
> > will get accepted, it'll do short transfers in PIO mode).
>
> Given that an HCD can choose, on the fly, if it's using DMA or PIO, the HCD
> driver is the only place to reasonably put any cache-synchronization code.
>
> That said, what do the other SCSI HCDs do? I'm guessing the question gets
> kinda muddy there, since the other SCSI HCDs all talk directly to some
> piece of hardware, and thus are responsible for the cache management
> themselves.
>
> Based on that, one could argue that ub and usb-storage should be doing
> this.
>
> HOWEVER, I firmly believe that the cache-management functions belong with
> the driver that actually talks to the low-level hardware, as that's the
> only place where you can be 100% certain of what cache operations are
> needed. After all, I think someone is working on a USB-over-IP transport,
> and trying to manage cache at the usb-storage level in that scenario is
> just silly.
>
> So, let's put this in the HCD drivers and be done with it.
I agree, that's the place to fix this issue.
thanks,
greg k-h
On Fri, 2010-01-29 at 18:54 +0000, Matthew Dharm wrote:
> On Fri, Jan 29, 2010 at 08:51:47PM +0300, Sergei Shtylyov wrote:
> > Catalin Marinas wrote:
> >
> > >That would be a more general solution rather than going through each HCD
> > >driver since my understanding is that flush_dcache_page() is only needed
> > >together with the mass storage support.
> >
> > Note that DMA capable driver can be doing some transfers in PIO mode
> > or falling back to PIO mode if DMA mode transfer is unsuccessful (the
> > musb driver is an example of the latter and if the DMA rewrite patches
> > will get accepted, it'll do short transfers in PIO mode).
>
> Given that an HCD can choose, on the fly, if it's using DMA or PIO, the HCD
> driver is the only place to reasonably put any cache-synchronization code.
>
> That said, what do the other SCSI HCDs do? I'm guessing the question gets
> kinda muddy there, since the other SCSI HCDs all talk directly to some
> piece of hardware, and thus are responsible for the cache management
> themselves.
>
> Based on that, one could argue that ub and usb-storage should be doing
> this.
>
> HOWEVER, I firmly believe that the cache-management functions belong with
> the driver that actually talks to the low-level hardware, as that's the
> only place where you can be 100% certain of what cache operations are
> needed. After all, I think someone is working on a USB-over-IP transport,
> and trying to manage cache at the usb-storage level in that scenario is
> just silly.
>
> So, let's put this in the HCD drivers and be done with it.
Doing this (flush_dcache_page) in the HCD driver (ISP1760) solves my
problem. I'll post a patch and also cc the driver maintainer.
Thanks.
--
Catalin
On Fri, 2010-01-29 at 18:54 +0000, Matthew Dharm wrote:
> HOWEVER, I firmly believe that the cache-management functions belong with
> the driver that actually talks to the low-level hardware, as that's the
> only place where you can be 100% certain of what cache operations are
> needed. After all, I think someone is working on a USB-over-IP transport,
> and trying to manage cache at the usb-storage level in that scenario is
> just silly.
>
> So, let's put this in the HCD drivers and be done with it.
The patch below is what fixes the I-D cache incoherency issues on ARM. I
don't particularly like the solution but it seems to be the only one
available.
IMHO, Linux should have functions similar to the DMA API but for PIO
drivers (e.g. pio_map_single/pio_unmap_single) that non-coherent
architectures can define, otherwise being no-ops. Any thoughts?
Thanks.
isp1760: Flush the D-cache for the pipe-in transfer buffers
From: Catalin Marinas <[email protected]>
When the HDC driver writes the data to the transfer buffers it pollutes
the D-cache (unlike DMA drivers where the device writes the data). If
the corresponding pages get mapped into user space, there are no
additional cache flushing operations performed and this causes random
user space faults on architectures with separate I and D caches
(Harvard) or those with aliasing D-cache.
Signed-off-by: Catalin Marinas <[email protected]>
Cc: Matthew Dharm <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Sebastian Siewior <[email protected]>
---
drivers/usb/host/isp1760-hcd.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/drivers/usb/host/isp1760-hcd.c b/drivers/usb/host/isp1760-hcd.c
index 27b8f7c..4d3eeee 100644
--- a/drivers/usb/host/isp1760-hcd.c
+++ b/drivers/usb/host/isp1760-hcd.c
@@ -18,6 +18,8 @@
#include <linux/uaccess.h>
#include <linux/io.h>
#include <asm/unaligned.h>
+#include <asm/cacheflush.h>
+#include <asm/memory.h>
#include "../core/hcd.h"
#include "isp1760-hcd.h"
@@ -904,6 +906,14 @@ __acquires(priv->lock)
status = 0;
}
+ if (usb_pipein(urb->pipe) && usb_pipetype(urb->pipe) == PIPE_BULK) {
+ void *ptr;
+ for (ptr = urb->transfer_buffer;
+ ptr < urb->transfer_buffer + urb->transfer_buffer_length;
+ ptr += PAGE_SIZE)
+ flush_dcache_page(virt_to_page(ptr));
+ }
+
/* complete() can reenter this HCD */
usb_hcd_unlink_urb_from_ep(priv_to_hcd(priv), urb);
spin_unlock(&priv->lock);
--
Catalin
On Mon, 1 Feb 2010, Catalin Marinas wrote:
> On Fri, 2010-01-29 at 18:54 +0000, Matthew Dharm wrote:
> > HOWEVER, I firmly believe that the cache-management functions belong with
> > the driver that actually talks to the low-level hardware, as that's the
> > only place where you can be 100% certain of what cache operations are
> > needed. After all, I think someone is working on a USB-over-IP transport,
> > and trying to manage cache at the usb-storage level in that scenario is
> > just silly.
> >
> > So, let's put this in the HCD drivers and be done with it.
>
> The patch below is what fixes the I-D cache incoherency issues on ARM. I
> don't particularly like the solution but it seems to be the only one
> available.
>
> IMHO, Linux should have functions similar to the DMA API but for PIO
> drivers (e.g. pio_map_single/pio_unmap_single) that non-coherent
> architectures can define, otherwise being no-ops. Any thoughts?
You should bring this up on the linux-arm-kernel mailing list and CC:
the ARM maintainer. They are the ones most directly affected.
Alan Stern
[CC Takashi]
On Mon, Feb 01, 2010 at 05:29:14PM +0000, Catalin Marinas wrote:
> On Fri, 2010-01-29 at 18:54 +0000, Matthew Dharm wrote:
> > HOWEVER, I firmly believe that the cache-management functions belong with
> > the driver that actually talks to the low-level hardware, as that's the
> > only place where you can be 100% certain of what cache operations are
> > needed. After all, I think someone is working on a USB-over-IP transport,
> > and trying to manage cache at the usb-storage level in that scenario is
> > just silly.
> >
> > So, let's put this in the HCD drivers and be done with it.
>
> The patch below is what fixes the I-D cache incoherency issues on ARM. I
> don't particularly like the solution but it seems to be the only one
> available.
Thanks very much for working on this amazingly large problem!
I took some time to add your patch to ehci-q.c / ohci-q.c
(for my *hci-ssb.c ASUS WL-500gP v2), on my now _heavily_ patched-up 2.6.31.9,
but _UNFORTUNATELY_ it kept locking up the same way as always when stopping
playback despite being damn sure this time that this patch could have
the potential to finally fix it ;)
(I had to replace memory.h with page.h on my arch though, to fix the build)
This is on MIPSEL (not one of my many ARM devices, unfortunately ;),
with usb-audio, and the madplay process crashes in __bzero(),
which strongly indicates cache coherency issues (other subsequent backtraces
have lots of mmap and vma listed, see also my "snd_usb_audio OOPS on MIPSEL -
is that the mmap issue?").
Next thing I'll do is fire up gdb and get a good backtrace of the
__bzero() address to find out which page handling in mpd exactly
is hampered with crashes. This is now ~ the third patch that I applied
on-the-go and that didn't help, so it's probably time to do
some earnest analysis on what's really going on locally.
Note that usb-storage itself does work on this platform though.
Rather annoying to be so close (sound works) yet so far away,
especially after all that USB host trouble I already had.
Thanks a lot again,
Andreas Mohr
On Mon, Feb 01, 2010 at 03:14:04PM -0500, Alan Stern wrote:
> On Mon, 1 Feb 2010, Catalin Marinas wrote:
>
> > On Fri, 2010-01-29 at 18:54 +0000, Matthew Dharm wrote:
> > > HOWEVER, I firmly believe that the cache-management functions belong with
> > > the driver that actually talks to the low-level hardware, as that's the
> > > only place where you can be 100% certain of what cache operations are
> > > needed. After all, I think someone is working on a USB-over-IP transport,
> > > and trying to manage cache at the usb-storage level in that scenario is
> > > just silly.
> > >
> > > So, let's put this in the HCD drivers and be done with it.
> >
> > The patch below is what fixes the I-D cache incoherency issues on ARM. I
> > don't particularly like the solution but it seems to be the only one
> > available.
> >
> > IMHO, Linux should have functions similar to the DMA API but for PIO
> > drivers (e.g. pio_map_single/pio_unmap_single) that non-coherent
> > architectures can define, otherwise being no-ops. Any thoughts?
>
> You should bring this up on the linux-arm-kernel mailing list and CC:
> the ARM maintainer. They are the ones most directly affected.
>
No, this belongs on linux-arch, as it's something that impacts a lot of
people besides ARM.
On Mon, Feb 01, 2010 at 05:29:14PM +0000, Catalin Marinas wrote:
> On Fri, 2010-01-29 at 18:54 +0000, Matthew Dharm wrote:
> > HOWEVER, I firmly believe that the cache-management functions belong with
> > the driver that actually talks to the low-level hardware, as that's the
> > only place where you can be 100% certain of what cache operations are
> > needed. After all, I think someone is working on a USB-over-IP transport,
> > and trying to manage cache at the usb-storage level in that scenario is
> > just silly.
> >
> > So, let's put this in the HCD drivers and be done with it.
>
> The patch below is what fixes the I-D cache incoherency issues on ARM. I
> don't particularly like the solution but it seems to be the only one
> available.
>
> IMHO, Linux should have functions similar to the DMA API but for PIO
> drivers (e.g. pio_map_single/pio_unmap_single) that non-coherent
> architectures can define, otherwise being no-ops. Any thoughts?
>
I would certainly be in favour of such a thing, particularly since on SH
we often find ourselves with coherent PIO and non-coherent MMIO.
This is however something that should be prototyped and submitted to
linux-arch for discussion.
> diff --git a/drivers/usb/host/isp1760-hcd.c b/drivers/usb/host/isp1760-hcd.c
> index 27b8f7c..4d3eeee 100644
> --- a/drivers/usb/host/isp1760-hcd.c
> +++ b/drivers/usb/host/isp1760-hcd.c
> @@ -18,6 +18,8 @@
> #include <linux/uaccess.h>
> #include <linux/io.h>
> #include <asm/unaligned.h>
> +#include <asm/cacheflush.h>
> +#include <asm/memory.h>
>
asm/memory.h isn't a portable header. If you are including it for
virt_to_page(), linux/io.h should already bring that in via asm/io.h.
If arm doesn't bring in virt_to_page() through its asm/io.h, then fix the
headers there please.
FWIW I used the same fix you came up with on r8a66597_hcd and it fixed up
crashes we were seeing there, too.
Am Montag, 1. Februar 2010 23:30:01 schrieb Andreas Mohr:
> I took some time to add your patch to ehci-q.c / ohci-q.c
> (for my *hci-ssb.c ASUS WL-500gP v2), on my now heavily patched-up 2.6.31.9,
> but UNFORTUNATELY it kept locking up the same way as always when stopping
> playback despite being damn sure this time that this patch could have
> the potential to finally fix it ;)
> (I had to replace memory.h with page.h on my arch though, to fix the build)
A moment please. You are using ehci and ohci. Both are using dma.
Why does this issue arise?
Regards
Oliver
* Catalin Marinas | 2010-02-01 17:29:14 [+0000]:
>> So, let's put this in the HCD drivers and be done with it.
That is the correct place. MMC -hcd drivers for instance are doing this
way.
>The patch below is what fixes the I-D cache incoherency issues on ARM. I
>don't particularly like the solution but it seems to be the only one
>available.
The PIO-MMC drivers walk through a scatter list via sg_miter_start() and
friends. Those helpers take care of this automaticly.
>IMHO, Linux should have functions similar to the DMA API but for PIO
>drivers (e.g. pio_map_single/pio_unmap_single) that non-coherent
>architectures can define, otherwise being no-ops. Any thoughts?
What is wrong with flush_dcache_page() ? And I think linux-arch is the
appropriate place.
>isp1760: Flush the D-cache for the pipe-in transfer buffers
>
>From: Catalin Marinas <[email protected]>
>
>When the HDC driver writes the data to the transfer buffers it pollutes
>the D-cache (unlike DMA drivers where the device writes the data). If
>the corresponding pages get mapped into user space, there are no
>additional cache flushing operations performed and this causes random
>user space faults on architectures with separate I and D caches
>(Harvard) or those with aliasing D-cache.
>
>Signed-off-by: Catalin Marinas <[email protected]>
>Cc: Matthew Dharm <[email protected]>
>Cc: Greg KH <[email protected]>
>Cc: Sebastian Siewior <[email protected]>
>---
> drivers/usb/host/isp1760-hcd.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
>diff --git a/drivers/usb/host/isp1760-hcd.c b/drivers/usb/host/isp1760-hcd.c
>index 27b8f7c..4d3eeee 100644
>--- a/drivers/usb/host/isp1760-hcd.c
>+++ b/drivers/usb/host/isp1760-hcd.c
>@@ -18,6 +18,8 @@
> #include <linux/uaccess.h>
> #include <linux/io.h>
> #include <asm/unaligned.h>
>+#include <asm/cacheflush.h>
>+#include <asm/memory.h>
I'm fine with the patch generally but I don't like the asm headers.
cacheflush.h is available on most architectures as far as I can see it but
memory.h is only available on arm. So you break the build on !arm and
therefore I NAK this.
> #include "../core/hcd.h"
> #include "isp1760-hcd.h"
>@@ -904,6 +906,14 @@ __acquires(priv->lock)
> status = 0;
> }
>
>+ if (usb_pipein(urb->pipe) && usb_pipetype(urb->pipe) == PIPE_BULK) {
>+ void *ptr;
>+ for (ptr = urb->transfer_buffer;
>+ ptr < urb->transfer_buffer + urb->transfer_buffer_length;
>+ ptr += PAGE_SIZE)
>+ flush_dcache_page(virt_to_page(ptr));
>+ }
>+
> /* complete() can reenter this HCD */
> usb_hcd_unlink_urb_from_ep(priv_to_hcd(priv), urb);
> spin_unlock(&priv->lock);
>
Sebastian
On Tuesday 02 February 2010 07:58:42 Oliver Neukum wrote:
> Am Montag, 1. Februar 2010 23:30:01 schrieb Andreas Mohr:
> > I took some time to add your patch to ehci-q.c / ohci-q.c
> > (for my *hci-ssb.c ASUS WL-500gP v2), on my now heavily patched-up
> > 2.6.31.9, but UNFORTUNATELY it kept locking up the same way as always
> > when stopping playback despite being damn sure this time that this patch
> > could have the potential to finally fix it ;)
> > (I had to replace memory.h with page.h on my arch though, to fix the
> > build)
>
> A moment please. You are using ehci and ohci. Both are using dma.
> Why does this issue arise?
Because the BCM4710 CPU core is know to have cache problems and we have been
trying to workaround this, your problem Andreas is imho a different one.
--
Regards, Florian
On Tue, 2010-02-02 at 04:24 +0000, Paul Mundt wrote:
> On Mon, Feb 01, 2010 at 03:14:04PM -0500, Alan Stern wrote:
> > On Mon, 1 Feb 2010, Catalin Marinas wrote:
> >
> > > On Fri, 2010-01-29 at 18:54 +0000, Matthew Dharm wrote:
> > > > HOWEVER, I firmly believe that the cache-management functions belong with
> > > > the driver that actually talks to the low-level hardware, as that's the
> > > > only place where you can be 100% certain of what cache operations are
> > > > needed. After all, I think someone is working on a USB-over-IP transport,
> > > > and trying to manage cache at the usb-storage level in that scenario is
> > > > just silly.
> > > >
> > > > So, let's put this in the HCD drivers and be done with it.
> > >
> > > The patch below is what fixes the I-D cache incoherency issues on ARM. I
> > > don't particularly like the solution but it seems to be the only one
> > > available.
> > >
> > > IMHO, Linux should have functions similar to the DMA API but for PIO
> > > drivers (e.g. pio_map_single/pio_unmap_single) that non-coherent
> > > architectures can define, otherwise being no-ops. Any thoughts?
> >
> > You should bring this up on the linux-arm-kernel mailing list and CC:
> > the ARM maintainer. They are the ones most directly affected.
>
> No, this belongs on linux-arch, as it's something that impacts a lot of
> people besides ARM.
I agree. I'll try to come up with a proposal and post it there.
BTW, this was already raised on the ARM Linux lists and people there are
aware of these problems. Their suggestion was to take it to LKML.
--
Catalin
On Tue, 2010-02-02 at 06:39 +0000, Paul Mundt wrote:
> On Mon, Feb 01, 2010 at 05:29:14PM +0000, Catalin Marinas wrote:
> > diff --git a/drivers/usb/host/isp1760-hcd.c b/drivers/usb/host/isp1760-hcd.c
> > index 27b8f7c..4d3eeee 100644
> > --- a/drivers/usb/host/isp1760-hcd.c
> > +++ b/drivers/usb/host/isp1760-hcd.c
> > @@ -18,6 +18,8 @@
> > #include <linux/uaccess.h>
> > #include <linux/io.h>
> > #include <asm/unaligned.h>
> > +#include <asm/cacheflush.h>
> > +#include <asm/memory.h>
>
> asm/memory.h isn't a portable header. If you are including it for
> virt_to_page(), linux/io.h should already bring that in via asm/io.h.
> If arm doesn't bring in virt_to_page() through its asm/io.h, then fix the
> headers there please.
In the ARM case, yes, it brings virt_to_page() but I'm not sure that's
the case for the other architectures. I think a better header is
linux/mm.h which already uses this function in virt_to_head_page().
--
Catalin
On Tue, 2010-02-02 at 09:11 +0000, Sebastian Andrzej Siewior wrote:
> * Catalin Marinas | 2010-02-01 17:29:14 [+0000]:
> >> So, let's put this in the HCD drivers and be done with it.
>
> That is the correct place. MMC -hcd drivers for instance are doing this
> way.
>
> >The patch below is what fixes the I-D cache incoherency issues on ARM. I
> >don't particularly like the solution but it seems to be the only one
> >available.
>
> The PIO-MMC drivers walk through a scatter list via sg_miter_start() and
> friends. Those helpers take care of this automaticly.
>
> >IMHO, Linux should have functions similar to the DMA API but for PIO
> >drivers (e.g. pio_map_single/pio_unmap_single) that non-coherent
> >architectures can define, otherwise being no-ops. Any thoughts?
>
> What is wrong with flush_dcache_page() ?
In this particular case, it's too many lines to do the virt_to_page for
the transfer buffer since the HCD driver doesn't have access to the
individual pages (via something like urb->sg). A better solution would
be to move such loop in a flush_dcache_range() function to make it
easier for drivers.
Apart from that, flush_dcache_page() doesn't have any data flow
information. Optimisations could be done on ARM if we know that the
kernel only intends to read from a page (no flushing necessary with a
non-aliasing D-cache).
> And I think linux-arch is the appropriate place.
For changes to the cache flushing API, yes, that's the right place. I'll
get there with a patch.
>
> >diff --git a/drivers/usb/host/isp1760-hcd.c b/drivers/usb/host/isp1760-hcd.c
> >index 27b8f7c..4d3eeee 100644
> >--- a/drivers/usb/host/isp1760-hcd.c
> >+++ b/drivers/usb/host/isp1760-hcd.c
> >@@ -18,6 +18,8 @@
> > #include <linux/uaccess.h>
> > #include <linux/io.h>
> > #include <asm/unaligned.h>
> >+#include <asm/cacheflush.h>
> >+#include <asm/memory.h>
>
> I'm fine with the patch generally but I don't like the asm headers.
> cacheflush.h is available on most architectures as far as I can see it but
> memory.h is only available on arm. So you break the build on !arm and
> therefore I NAK this.
Yes, that was already pointed out. I'll post a revised patch (until we
maybe get a better API for such things but that's for linux-arch).
Thanks.
--
Catalin
On Tue, Feb 02, 2010 at 11:05:39AM +0000, Catalin Marinas wrote:
> On Tue, 2010-02-02 at 06:39 +0000, Paul Mundt wrote:
> > On Mon, Feb 01, 2010 at 05:29:14PM +0000, Catalin Marinas wrote:
> > > diff --git a/drivers/usb/host/isp1760-hcd.c b/drivers/usb/host/isp1760-hcd.c
> > > index 27b8f7c..4d3eeee 100644
> > > --- a/drivers/usb/host/isp1760-hcd.c
> > > +++ b/drivers/usb/host/isp1760-hcd.c
> > > @@ -18,6 +18,8 @@
> > > #include <linux/uaccess.h>
> > > #include <linux/io.h>
> > > #include <asm/unaligned.h>
> > > +#include <asm/cacheflush.h>
> > > +#include <asm/memory.h>
> >
> > asm/memory.h isn't a portable header. If you are including it for
> > virt_to_page(), linux/io.h should already bring that in via asm/io.h.
> > If arm doesn't bring in virt_to_page() through its asm/io.h, then fix the
> > headers there please.
>
> In the ARM case, yes, it brings virt_to_page() but I'm not sure that's
> the case for the other architectures. I think a better header is
> linux/mm.h which already uses this function in virt_to_head_page().
>
For some reason I was thinking virt_to_phys() instead of virt_to_page()
when I wrote that, so just ignore me. linux/mm.h is obviously fine.
Am Montag, 1. Februar 2010 18:29:14 schrieb Catalin Marinas:
> + if (usb_pipein(urb->pipe) && usb_pipetype(urb->pipe) == PIPE_BULK) {
> + void *ptr;
> + for (ptr = urb->transfer_buffer;
> + ptr < urb->transfer_buffer + urb->transfer_buffer_length;
> + ptr += PAGE_SIZE)
> + flush_dcache_page(virt_to_page(ptr));
Is it correct to limit this to BULK pipes?
Regards
Oliver
On Tue, 2010-02-02 at 11:48 +0000, Oliver Neukum wrote:
> Am Montag, 1. Februar 2010 18:29:14 schrieb Catalin Marinas:
> > + if (usb_pipein(urb->pipe) && usb_pipetype(urb->pipe) == PIPE_BULK) {
> > + void *ptr;
> > + for (ptr = urb->transfer_buffer;
> > + ptr < urb->transfer_buffer + urb->transfer_buffer_length;
> > + ptr += PAGE_SIZE)
> > + flush_dcache_page(virt_to_page(ptr));
>
> Is it correct to limit this to BULK pipes?
I'm not entirely sure. The flush_dcache_page() should only be called for
pages that may be mapped into user space (page cache pages). We don't
need this for control buffers. It was my impression that what's coming
from the mass storage layer intended for page cache pages has the
PIPE_BULK type (I may be wrong though).
--
Catalin
Am Dienstag, 2. Februar 2010 13:01:12 schrieb Catalin Marinas:
> On Tue, 2010-02-02 at 11:48 +0000, Oliver Neukum wrote:
> > Am Montag, 1. Februar 2010 18:29:14 schrieb Catalin Marinas:
> > > + if (usb_pipein(urb->pipe) && usb_pipetype(urb->pipe) == PIPE_BULK) {
> > > + void *ptr;
> > > + for (ptr = urb->transfer_buffer;
> > > + ptr < urb->transfer_buffer + urb->transfer_buffer_length;
> > > + ptr += PAGE_SIZE)
> > > + flush_dcache_page(virt_to_page(ptr));
> >
> > Is it correct to limit this to BULK pipes?
>
> I'm not entirely sure. The flush_dcache_page() should only be called for
> pages that may be mapped into user space (page cache pages). We don't
> need this for control buffers. It was my impression that what's coming
> from the mass storage layer intended for page cache pages has the
> PIPE_BULK type (I may be wrong though).
For storage that is correct. But what about other sources of pages,
for example iSCSI?
Regards
Oliver
Hi,
On Tue, Feb 02, 2010 at 01:07:56PM +0100, Oliver Neukum wrote:
> Am Dienstag, 2. Februar 2010 13:01:12 schrieb Catalin Marinas:
> > On Tue, 2010-02-02 at 11:48 +0000, Oliver Neukum wrote:
> > > Am Montag, 1. Februar 2010 18:29:14 schrieb Catalin Marinas:
> > > Is it correct to limit this to BULK pipes?
> >
> > I'm not entirely sure. The flush_dcache_page() should only be called for
> > pages that may be mapped into user space (page cache pages). We don't
> > need this for control buffers. It was my impression that what's coming
> > from the mass storage layer intended for page cache pages has the
> > PIPE_BULK type (I may be wrong though).
>
> For storage that is correct. But what about other sources of pages,
> for example iSCSI?
Or... usb-audio? I should have verified that it is using bulk endpoints
(and thus the patch applies to my case).
usb-audio probably uses isochronous transfers, thus that would be
an obvious reason why the patch didn't work for me.
(with some other reason possibly being BCM4710 issues, of course)
Andreas Mohr
On Tue, 2010-02-02 at 12:07 +0000, Oliver Neukum wrote:
> Am Dienstag, 2. Februar 2010 13:01:12 schrieb Catalin Marinas:
> > On Tue, 2010-02-02 at 11:48 +0000, Oliver Neukum wrote:
> > > Am Montag, 1. Februar 2010 18:29:14 schrieb Catalin Marinas:
> > > > + if (usb_pipein(urb->pipe) && usb_pipetype(urb->pipe) == PIPE_BULK) {
> > > > + void *ptr;
> > > > + for (ptr = urb->transfer_buffer;
> > > > + ptr < urb->transfer_buffer + urb->transfer_buffer_length;
> > > > + ptr += PAGE_SIZE)
> > > > + flush_dcache_page(virt_to_page(ptr));
> > >
> > > Is it correct to limit this to BULK pipes?
> >
> > I'm not entirely sure. The flush_dcache_page() should only be called for
> > pages that may be mapped into user space (page cache pages). We don't
> > need this for control buffers. It was my impression that what's coming
> > from the mass storage layer intended for page cache pages has the
> > PIPE_BULK type (I may be wrong though).
>
> For storage that is correct. But what about other sources of pages,
> for example iSCSI?
In the iSCSI case, does the HCD driver write directly to a page cache
page? Or it just fills in network packets that are copied to page cache
pages by the iSCSI code (sorry, I'm not familiar with this part of the
kernel). If the latter, the cache flushing in the HCD driver would not
help and it needs to be done in the iSCSI code.
--
Catalin
Am Dienstag, 2. Februar 2010 13:39:35 schrieb Catalin Marinas:
> > For storage that is correct. But what about other sources of pages,
> > for example iSCSI?
>
> In the iSCSI case, does the HCD driver write directly to a page cache
> page? Or it just fills in network packets that are copied to page cache
> pages by the iSCSI code (sorry, I'm not familiar with this part of the
> kernel). If the latter, the cache flushing in the HCD driver would not
> help and it needs to be done in the iSCSI code.
As far as I can tell iSCSI does a private copy. But I don't know how
many methods to transfer code pages over USB exist. I'd say the
conservative solution is to flush for everything but control transfers.
Regards
Oliver
2010/2/2 Catalin Marinas <[email protected]>:
> In the iSCSI case, does the HCD driver write directly to a page cache
> page? Or it just fills in network packets that are copied to page cache
> pages by the iSCSI code (sorry, I'm not familiar with this part of the
> kernel). If the latter, the cache flushing in the HCD driver would not
> help and it needs to be done in the iSCSI code.
So we should flush dcache page in the place where the user mapped
page is copied to, instead of low level driver which does not do such
thing always.
--
Lei Ming
On Tue, 2010-02-02 at 13:08 +0000, Oliver Neukum wrote:
> Am Dienstag, 2. Februar 2010 13:39:35 schrieb Catalin Marinas:
> > > For storage that is correct. But what about other sources of pages,
> > > for example iSCSI?
> >
> > In the iSCSI case, does the HCD driver write directly to a page cache
> > page? Or it just fills in network packets that are copied to page cache
> > pages by the iSCSI code (sorry, I'm not familiar with this part of the
> > kernel). If the latter, the cache flushing in the HCD driver would not
> > help and it needs to be done in the iSCSI code.
>
> As far as I can tell iSCSI does a private copy. But I don't know how
> many methods to transfer code pages over USB exist. I'd say the
> conservative solution is to flush for everything but control transfers.
flush_dcache_page() is on many architectures implemented lazily so that
if the page isn't mapped in user space no flushing takes place. It's
mainly the cost of virt_to_page() which I suspect is slightly higher
with sparsemem enabled.
--
Catalin
On Tue, 2010-02-02 at 13:36 +0000, Ming Lei wrote:
> 2010/2/2 Catalin Marinas <[email protected]>:
>
> > In the iSCSI case, does the HCD driver write directly to a page cache
> > page? Or it just fills in network packets that are copied to page cache
> > pages by the iSCSI code (sorry, I'm not familiar with this part of the
> > kernel). If the latter, the cache flushing in the HCD driver would not
> > help and it needs to be done in the iSCSI code.
>
> So we should flush dcache page in the place where the user mapped
> page is copied to, instead of low level driver which does not do such
> thing always.
Or both if you can't be sure whether the driver copies directly to a
page cache page.
--
Catalin
Andreas Mohr wrote:
> On Tue, Feb 02, 2010 at 01:07:56PM +0100, Oliver Neukum wrote:
> > Am Dienstag, 2. Februar 2010 13:01:12 schrieb Catalin Marinas:
> > > On Tue, 2010-02-02 at 11:48 +0000, Oliver Neukum wrote:
> > > > Am Montag, 1. Februar 2010 18:29:14 schrieb Catalin Marinas:
> > > > Is it correct to limit this to BULK pipes?
> > >
> > > I'm not entirely sure. The flush_dcache_page() should only be called for
> > > pages that may be mapped into user space (page cache pages). We don't
> > > need this for control buffers. It was my impression that what's coming
> > > from the mass storage layer intended for page cache pages has the
> > > PIPE_BULK type (I may be wrong though).
> >
> > For storage that is correct. But what about other sources of pages,
> > for example iSCSI?
>
> Or... usb-audio? I should have verified that it is using bulk endpoints
> (and thus the patch applies to my case).
> usb-audio probably uses isochronous transfers, thus that would be
> an obvious reason why the patch didn't work for me.
snd-usb-audio indeed uses isochronous transfers, but those buffers are
never mapped into user space. The intermediate vmalloc()ed buffer is,
however, and there was a bugfix for this recently. Do you have these
patches in your tree?
http://git.kernel.org/?p=linux/kernel/git/tiwai/sound-2.6.git;a=commit;h=3e879d7bac705be4813a0ec9560cbe31db4b269f
http://git.kernel.org/?p=linux/kernel/git/tiwai/sound-2.6.git;a=commit;h=c32d977b8157bf67cdf47729ce7dd054a26eb534
Best regards,
Clemens
Am Dienstag, 2. Februar 2010 15:42:49 schrieb Clemens Ladisch:
> > Or... usb-audio? I should have verified that it is using bulk endpoints
> > (and thus the patch applies to my case).
> > usb-audio probably uses isochronous transfers, thus that would be
> > an obvious reason why the patch didn't work for me.
>
> snd-usb-audio indeed uses isochronous transfers, but those buffers are
> never mapped into user space. The intermediate vmalloc()ed buffer is,
> however, and there was a bugfix for this recently. Do you have these
> patches in your tree?
Now that I think about it, several video drivers do map it to user space.
Regards
Oliver
[added another __bzero coherency crash victim, see
http://lkml.org/lkml/2008/6/9/14 ]
On Tue, Feb 02, 2010 at 03:52:19PM +0100, Oliver Neukum wrote:
> Am Dienstag, 2. Februar 2010 15:42:49 schrieb Clemens Ladisch:
> > > Or... usb-audio? I should have verified that it is using bulk endpoints
> > > (and thus the patch applies to my case).
> > > usb-audio probably uses isochronous transfers, thus that would be
> > > an obvious reason why the patch didn't work for me.
> >
> > snd-usb-audio indeed uses isochronous transfers, but those buffers are
> > never mapped into user space. The intermediate vmalloc()ed buffer is,
> > however, and there was a bugfix for this recently. Do you have these
> > patches in your tree?
>
> Now that I think about it, several video drivers do map it to user space.
OK, then the urb loop needs to also handle isochronous pipes,
and IMHO we should have a generic helper for this instead of open-coding
it, since it probably needs to be done in a couple affected HCDs
(and, most importantly, only on affected architectures - which the helper
could handle transparently).
Clemens: no, both of these patches haven't been applied (yet!!),
many thanks for the notification!
Will apply both patches and the isochronous addition, hopefully that
improves things (will be painful to check which of these things managed to
fix it - in case it does! -, though). Nope, will apply step by step,
both patches, then isochronous as a last resort.
Andreas Mohr
On Tue, 2010-02-02 at 15:10 +0000, Andreas Mohr wrote:
> [added another __bzero coherency crash victim, see
> http://lkml.org/lkml/2008/6/9/14 ]
>
> On Tue, Feb 02, 2010 at 03:52:19PM +0100, Oliver Neukum wrote:
> > Am Dienstag, 2. Februar 2010 15:42:49 schrieb Clemens Ladisch:
> > > > Or... usb-audio? I should have verified that it is using bulk endpoints
> > > > (and thus the patch applies to my case).
> > > > usb-audio probably uses isochronous transfers, thus that would be
> > > > an obvious reason why the patch didn't work for me.
> > >
> > > snd-usb-audio indeed uses isochronous transfers, but those buffers are
> > > never mapped into user space. The intermediate vmalloc()ed buffer is,
> > > however, and there was a bugfix for this recently. Do you have these
> > > patches in your tree?
> >
> > Now that I think about it, several video drivers do map it to user space.
>
> OK, then the urb loop needs to also handle isochronous pipes,
> and IMHO we should have a generic helper for this instead of open-coding
> it, since it probably needs to be done in a couple affected HCDs
> (and, most importantly, only on affected architectures - which the helper
> could handle transparently).
I'm planning to send a proposal to linux-arch for a flush_dcache_range()
function.
--
Catalin
On Tue, 2 Feb 2010, Oliver Neukum wrote:
> Am Dienstag, 2. Februar 2010 13:39:35 schrieb Catalin Marinas:
> > > For storage that is correct. But what about other sources of pages,
> > > for example iSCSI?
> >
> > In the iSCSI case, does the HCD driver write directly to a page cache
> > page? Or it just fills in network packets that are copied to page cache
> > pages by the iSCSI code (sorry, I'm not familiar with this part of the
> > kernel). If the latter, the cache flushing in the HCD driver would not
> > help and it needs to be done in the iSCSI code.
>
> As far as I can tell iSCSI does a private copy. But I don't know how
> many methods to transfer code pages over USB exist. I'd say the
> conservative solution is to flush for everything but control transfers.
This doesn't make any sense. Nobody would ever use isochronous
transfers to store data into a code page because isochronous is
unreliable. (Audio isn't a counterexample -- audio data may be mapped
to userspace, but only to data pages, not code pages. And the problem
here is to maintain consistency between the D and I caches.)
In principle interrupt transfers could be used, but it is most
unlikely. They are intended for bounded-latency transfers, not
transfers of potentially large amounts of data.
The only transfer type that makes sense to worry about is bulk.
Alan Stern
On Tue, 2010-02-02 at 17:11 +0000, Alan Stern wrote:
> On Tue, 2 Feb 2010, Oliver Neukum wrote:
>
> > Am Dienstag, 2. Februar 2010 13:39:35 schrieb Catalin Marinas:
> > > > For storage that is correct. But what about other sources of pages,
> > > > for example iSCSI?
> > >
> > > In the iSCSI case, does the HCD driver write directly to a page cache
> > > page? Or it just fills in network packets that are copied to page cache
> > > pages by the iSCSI code (sorry, I'm not familiar with this part of the
> > > kernel). If the latter, the cache flushing in the HCD driver would not
> > > help and it needs to be done in the iSCSI code.
> >
> > As far as I can tell iSCSI does a private copy. But I don't know how
> > many methods to transfer code pages over USB exist. I'd say the
> > conservative solution is to flush for everything but control transfers.
>
> This doesn't make any sense. Nobody would ever use isochronous
> transfers to store data into a code page because isochronous is
> unreliable. (Audio isn't a counterexample -- audio data may be mapped
> to userspace, but only to data pages, not code pages. And the problem
> here is to maintain consistency between the D and I caches.)
My issues is with both I-D coherency and D-cache aliasing caused by
pages mapped in both user and kernel space (with different colours). The
flush_dcache_page() call should target both cases.
--
Catalin
On Tue, Feb 02, 2010 at 03:42:49PM +0100, Clemens Ladisch wrote:
> Andreas Mohr wrote:
> > Or... usb-audio? I should have verified that it is using bulk endpoints
> > (and thus the patch applies to my case).
> > usb-audio probably uses isochronous transfers, thus that would be
> > an obvious reason why the patch didn't work for me.
>
> snd-usb-audio indeed uses isochronous transfers, but those buffers are
> never mapped into user space. The intermediate vmalloc()ed buffer is,
> however, and there was a bugfix for this recently. Do you have these
> patches in your tree?
> http://git.kernel.org/?p=linux/kernel/git/tiwai/sound-2.6.git;a=commit;h=3e879d7bac705be4813a0ec9560cbe31db4b269f
> http://git.kernel.org/?p=linux/kernel/git/tiwai/sound-2.6.git;a=commit;h=c32d977b8157bf67cdf47729ce7dd054a26eb534
OK, I've now added both patches to my quilt series (and pushed everything),
rebuilt, reflashed image and copied modules, and it still bombs
just the very same way.
And this also with Catalins latest patch version (the one using != PIPE_CONTROL
to hit iso transfers etc. as well).
So it seems I still haven't got to the core of the issue despite all these
rather different patch attempts.
I'm afraid if it turns out that keeping open the sound device manually
via another process manages to workaround it, then I'll simply
give it all up completely and live with the current semi-satisfying situation
on my custom 2.6.31.9 build.
Any further ideas or patches that I could try?
(I might investigate the issue myself in a serious way sometime later,
but don't count on it)
Thanks!
Andreas Mohr
netconsole log (some previous crashes were at __bzero, now it was two times
at __copy_user - maybe the patches changed something for real?):
Instruction bus error, epc == 80004dd8, ra == 80000018
Oops[#1]:
Cpu 0
$ 0 : 00000000 1000d000 00000000 00000000
$ 4 : 7f9e6be8 81ee7ec4 00000004 00000000
$ 8 : 00000000 00000000 00000000 81fac000
$12 : 4b688a80 80340000 81d6e868 00000400
$16 : 81ee7f00 00000000 7f9e6be8 00000001
$20 : 81ee7eb8 00000000 7f9e6c9c 7f9eb320
$24 : 00000000 2b565ed0
$28 : 81ee6000 81ee7ea8 7f9f6c98 80000018
Hi : 00000000
Lo : 00000000
epc : 80004dd8 __copy_user+0xd4/0x2bc
Not tainted
ra : 80000018 0x80000018
Status: 1000d003 KERNEL EXL IE
Cause : 00800018
PrId : 00029029 (Broadcom BCM3302)
Modules linked in: snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm evdev snd_timer snd_page_alloc snd_usb_lib snd_rawmidi usbhid snd_seq_device snd_hwdep hid snd input_core soundcore ipv6 arc4 ecb cryptomgr aead pcompress crypto_blkcipher crypto_hash crypto_algapi b43 mac80211 cfg80211
Process mpd (pid: 1310, threadinfo=81ee6000, task=81d92838, tls=00000000)
Stack : 00000000 800afd4c 81ee56f8 00000219 00000000 00000000 00000000 00000000
ffffffff 3b043616 81ee7f00 7f9e6be8 00000000 00000000 00000000 800b1370
7f9e6ca0 81ee5680 000182fc 00493a83 81ee7f00 8009f6b0 00000219 03b1daad
00000000 00002710 00000000 00000000 7f9f6ca0 7f9e6ca0 7f9ec528 00000127
00000000 800031f0 00000000 2ae49060 7f9f75a8 2ae49060 7f9e6be8 7f9f5bd8
...
Call Trace:
[<80004dd8>] __copy_user+0xd4/0x2bc
Code: 8ca80000 24a50004 24c6fffc <ac880000> 1706fffb 24840004 10c00040 00864821 240a0020
Disabling lock debugging due to kernel taint
Instruction bus error, epc == 80096fa0, ra == 80000018
Oops[#2]:
Cpu 0
$ 0 : 00000000 1000d000 c0156064 00000064
$ 4 : 00000032 803b5514 00000032 81fa8000
$ 8 : 8037f840 00080000 81040000 00000003
$12 : 00000010 8037f840 00000004 00000000
$16 : 00000032 81fa8d54 2ab55000 00398f45
$20 : 2ab56000 0064d613 00000000 00000000
$24 : 00000000 80018ff0
$28 : 81ee6000 81ee7c58 00000000 80000018
Hi : 00000000
Lo : 00000000
epc : 80096fa0 swap_info_get+0x74/0xfc
Tainted: G D
ra : 80000018 0x80000018
Status: 1000d003 KERNEL EXL IE
Cause : 00800018
PrId : 00029029 (Broadcom BCM3302)
Modules linked in: snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm evdev snd_timer snd_page_alloc snd_usb_lib snd_rawmidi usbhid snd_seq_device snd_hwdep hid snd input_core soundcore ipv6 arc4 ecb cryptomgr aead pcompress crypto_blkcipher crypto_hash crypto_algapi b43 mac80211 cfg80211
Process mpd (pid: 1310, threadinfo=81ee6000, task=81d92838, tls=00000000)
Stack : ffffffff 800736a0 00000004 81d6e870 80350fd0 80099498 81d929cc 81036e40
81fa8bfc 2aaff000 00064000 81fa8d54 2ab55000 80088a30 8033dab8 80029d78
803402d4 00000006 2ab55fff 8033d9c0 81eaae60 81f0c2a8 81f0c2a8 2ab56000
00000000 00000001 80c4c0bc 81eaae60 8037f840 81eaae94 81d92838 00000000
00000001 7f9eb320 7f9f6c98 8008db1c 81ee7d00 80c4ce90 00000000 ffffffff
...
Call Trace:
[<80096fa0>] swap_info_get+0x74/0xfc
[<80099498>] free_swap_and_cache+0x1c/0x218
[<80088a30>] unmap_vmas+0x418/0x63c
[<8008db1c>] exit_mmap+0xb8/0x148
[<8002e3c4>] mmput+0xc0/0x1d8
[<800333e8>] exit_mm+0x260/0x298
[<800357cc>] do_exit+0x1cc/0x688
[<80014658>] nmi_exception_handler+0x0/0x34
Code: 00041840 8ca20020 00431021 <94440000> 1480001d 8fbf0014 3c048030 3c05802c 24a5f280
Fixing recursive fault but reboot is needed!
Instruction bus error, epc == 80004dd8, ra == 80000018
Oops[#3]:
Cpu 0
$ 0 : 00000000 1000d000 00000000 00000000
$ 4 : 7fcd81b0 81c0ddd0 00000000 1000d001
$ 8 : 00000000 00000000 00000000 806f8000
$12 : 4b688a84 7f9f7f18 81d6e868 00000000
$16 : 00000004 00000000 81c0ddc0 81c0ddc0
$20 : 7fcd81b0 00000000 81c0ddcc 00000000
$24 : 00000000 2b565ed0
$28 : 81c0c000 81c0dd98 00000001 80000018
Hi : 0000007d
Lo : eb254400
epc : 80004dd8 __copy_user+0xd4/0x2bc
Tainted: G D
ra : 80000018 0x80000018
Status: 1000d003 KERNEL EXL IE
Cause : 00800018
PrId : 00029029 (Broadcom BCM3302)
Modules linked in: snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm evdev snd_timer snd_page_alloc snd_usb_lib snd_rawmidi usbhid snd_seq_device snd_hwdep hid snd input_core soundcore ipv6 arc4 ecb cryptomgr aead pcompress crypto_blkcipher crypto_hash crypto_algapi b43 mac80211 cfg80211
Process init (pid: 1, threadinfo=81c0c000, task=81c08480, tls=00000000)
Stack : 00000000 81c0dda8 81c0df00 80350fb0 81c0ddc0 81c0ddc4 81c0ddc8 81c0ddcc
81c0ddd0 81c0ddd4 00000400 00000000 00000000 00000000 00000000 00000000
00000000 00000000 81d98000 ffffff9c 81c0dea8 0044a234 7fcd8598 8009b864
00000001 81c0dea8 00000001 81d98000 ffffff9c 800a2d18 00000003 00000002
00000003 00000003 0000000d 00000000 00000000 00000000 000000c9 00001180
...
Call Trace:
[<80004dd8>] __copy_user+0xd4/0x2bc
Code: 8ca80000 24a50004 24c6fffc <ac880000> 1706fffb 24840004 10c00040 00864821 240a0020
kobject: 'ep_01' (81e52f10): kobject_uevent_env
kobject: 'ep_01' (81e52f10): kobject_uevent_env: filter function caused the event to drop!
Kernel panic - not syncing: Attempted to kill init!
Hi,
On Tue, Feb 02, 2010 at 05:20:11PM +0000, Catalin Marinas wrote:
> My issues is with both I-D coherency and D-cache aliasing caused by
> pages mapped in both user and kernel space (with different colours). The
> flush_dcache_page() call should target both cases.
Yup, it does, and quite successfully at that (aka "at that point in time we
having nothing any more to worry about, everything dealt with" ;-)
usbcore: registered new interface driver ums-datafab
hub 2-1:1.0: state 7 ports 2 chg 0002 evt 0000
kobject: 'ums-freecom' (81de0a80): kobject_add_internal: parent: 'drivers', set: 'drivers'
hub 2-1:1.0: port 1, status 0101, change 0000, 12 Mb/s
kobject: 'ums-freecom' (81de0a80): kobject_uevent_env
kobject: 'ums-freecom' (81de0a80): fill_kobj_path: path = '/bus/usb/drivers/ums-freecom'
usbcore: registered new interface driver ums-freecom
kobject: 'ums-jumpshot' (81de0c80): kobject_add_internal: parent: 'drivers', set: 'drivers'
CPU 0 Unable to handle kernel paging request at virtual address 0000041c, epc == 800171e8, ra == 801da5dc
Oops[#1]:
Cpu 0
$ 0 : 00000000 10008000 803b0000 00010000
$ 4 : 00000408 8143bc60 0043bc60 00000001
$ 8 : 81dd7124 81dd7190 00000004 00000000
$12 : 0000003b 80380000 00000002 f2d9b780
$16 : a1de4020 803b0000 8037f840 81de7f00
$20 : 00000000 81dd7080 80000000 00000000
$24 : 00000000 80016bb8
$28 : 81c0c000 81c0da98 a1dd414c 801da5dc
Hi : 00000008
Lo : 00000000
epc : 800171e8 __flush_dcache_page+0x38/0x120
Not tainted
ra : 801da5dc ehci_urb_done+0x178/0x1dc
Status: 10008002 KERNEL EXL
Cause : 00805008
BadVA : 0000041c
PrId : 00029029 (Broadcom BCM3302)
Modules linked in:
Process swapper (pid: 1, threadinfo=81c0c000, task=81c08480, tls=00000000)
Stack : 81dd7080 00000001 10009000 8033dab8 a1dd8120 a1dd4114 ffffff6a ffffff6a
81de7f00 a1dd414c a1dd4100 801db39c 05b8d800 00000000 00000018 803a0000
803a0000 0000054c 00000001 00000000 a1dd8180 81dd7080 00000000 a1dd4100
00000000 81c0dbb8 00000000 80318d24 81dd7158 81dd7080 81dda004 801deb38
81dd7158 8004f984 01f63104 0000003c 81c0dc78 8033feb8 00000008 00000042
...
Call Trace:
[<800171e8>] __flush_dcache_page+0x38/0x120
[<801da5dc>] ehci_urb_done+0x178/0x1dc
[<801db39c>] qh_completions+0x484/0x554
[<801deb38>] ehci_work+0x438/0xb68
[<801df2bc>] ehci_watchdog+0x54/0x94
[<8003d3ec>] run_timer_softirq+0x1b0/0x268
[<80037fbc>] __do_softirq+0xb8/0x174
[<800380d4>] do_softirq+0x5c/0x98
[<80038244>] irq_exit+0x40/0x88
[<8000e12c>] plat_irq_dispatch+0x60/0x178
[<80001444>] ret_from_irq+0x0/0x4
[<80031de8>] vprintk+0x36c/0x3bc
[<8000a48c>] printk+0x24/0x30
[<80151918>] kobject_add_internal+0x124/0x254
[<80151f80>] kobject_init_and_add+0x40/0x58
[<8018e854>] bus_add_driver+0xdc/0x2b4
[<801902c8>] driver_register+0xe0/0x19c
[<801ce000>] usb_register_driver+0x84/0x118
[<8000d640>] do_one_initcall+0x70/0x1f4
[<80354334>] kernel_init+0xd0/0x140
[<8000fb4c>] kernel_thread_helper+0x10/0x18
Code: 00000000 10800029 3c02803b <8c820014> 14400026 3c02803b 8c83001c 2482001c 14620021
Disabling lock debugging due to kernel taint
Kernel panic - not syncing: Fatal exception in interrupt
Any ideas? To my uncaring mind this would look like __flush_dcache_page()
not being quite so happy with a NULL pointer that it is being served
(although I haven't managed to precisely investigate yet where the
dereferencing offset 0000041c is coming from).
Yes, crash is reproducible (three times on boot already, although some bootup
does make it successfully).
My ehci-q.c has:
if (usb_pipein(urb->pipe) && usb_pipetype(urb->pipe) != PIPE_CONTROL) {
void *ptr;
for (ptr = urb->transfer_buffer;
ptr < urb->transfer_buffer + urb->transfer_buffer_length;
ptr += PAGE_SIZE)
flush_dcache_page(virt_to_page(ptr));
}
Hmm, OTOH this code seems to postulate that urb->transfer_buffer_length
is that 0x41c from above...
(IOW the code is simply missing an urb->transfer_buffer NULL check)
OTOH there would also be the question whether flush_dcache_page() should
have caught the NULL pointer input...
And then there's the question whether urb->transfer_buffer is allowed to end
up as NULL anyway...
BTW, trying to keep open /dev/dsp by another app when closing the playback app
does not prevent the audio OOPS.
Been seeing a nano-tiny wee bit too many crashes these days,
Andreas Mohr
On Tue, 2 Feb 2010, Andreas Mohr wrote:
> Any ideas? To my uncaring mind this would look like __flush_dcache_page()
> not being quite so happy with a NULL pointer that it is being served
> (although I haven't managed to precisely investigate yet where the
> dereferencing offset 0000041c is coming from).
>
> Yes, crash is reproducible (three times on boot already, although some bootup
> does make it successfully).
>
> My ehci-q.c has:
>
> if (usb_pipein(urb->pipe) && usb_pipetype(urb->pipe) != PIPE_CONTROL) {
> void *ptr;
> for (ptr = urb->transfer_buffer;
> ptr < urb->transfer_buffer + urb->transfer_buffer_length;
> ptr += PAGE_SIZE)
> flush_dcache_page(virt_to_page(ptr));
> }
>
> Hmm, OTOH this code seems to postulate that urb->transfer_buffer_length
> is that 0x41c from above...
> (IOW the code is simply missing an urb->transfer_buffer NULL check)
> OTOH there would also be the question whether flush_dcache_page() should
> have caught the NULL pointer input...
> And then there's the question whether urb->transfer_buffer is allowed to end
> up as NULL anyway...
Have you looked at the code in qh_urb_transaction() in ehci-q.c
involving this_sg_len and buf? It's quite possible that
urb->transfer_buffer is a NULL pointer and that the actual buffer is
not a contiguous set of pages -- but only if DMA is used.
Alan Stern
> Apart from that, flush_dcache_page() doesn't have any data flow
> information. Optimisations could be done on ARM if we know that the
> kernel only intends to read from a page (no flushing necessary with a
> non-aliasing D-cache).
Already done in flush_dcache_page(). If possible (uniprocessor), it just
flags the page as PG_dcache_dirty, and defers the actual flush operation
until it's mapped somewhere else (either a virtual alias or executable).
See Documentation/cachetlb.txt. (Really, all PIO drivers should
be calling flush_dcache_page.)
On Wed, Feb 03, 2010 at 06:56:44PM -0500, George Spelvin wrote:
> > Apart from that, flush_dcache_page() doesn't have any data flow
> > information. Optimisations could be done on ARM if we know that the
> > kernel only intends to read from a page (no flushing necessary with a
> > non-aliasing D-cache).
>
> Already done in flush_dcache_page(). If possible (uniprocessor), it just
> flags the page as PG_dcache_dirty, and defers the actual flush operation
> until it's mapped somewhere else (either a virtual alias or executable).
>
Try reading the thread again, as you seem to have missed the point
completely. The issue isn't with lazy dcache writeback, the issue is that
flush_dcache_page() is a bit of a sledgehammer for cases when directional
information is available. The DMA mapping operations conversely are aware
of data flow and optimize accordingly.
Additionally, with something like a flush_dcache_range() it's possible
to optimize for large ranges as opposed to page-at-a-time looping for
anything that needs to flag PG_dcache_dirty on a bulk group of pages.
Hi!
> > So, let's put this in the HCD drivers and be done with it.
>
> The patch below is what fixes the I-D cache incoherency issues on ARM. I
> don't particularly like the solution but it seems to be the only one
> available.
Really? It looks like arm should just flush the caches when mapping
executable page to the userspace.... you can't expect all the drivers
to be modified like that...
Plus it does unneccessary flushes on x86, etc...
> @@ -904,6 +906,14 @@ __acquires(priv->lock)
> status = 0;
> }
>
> + if (usb_pipein(urb->pipe) && usb_pipetype(urb->pipe) == PIPE_BULK) {
> + void *ptr;
> + for (ptr = urb->transfer_buffer;
> + ptr < urb->transfer_buffer + urb->transfer_buffer_length;
> + ptr += PAGE_SIZE)
> + flush_dcache_page(virt_to_page(ptr));
> + }
> +
> /* complete() can reenter this HCD */
> usb_hcd_unlink_urb_from_ep(priv_to_hcd(priv), urb);
> spin_unlock(&priv->lock);
>
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Tue 2010-02-02 12:11:25, Alan Stern wrote:
> On Tue, 2 Feb 2010, Oliver Neukum wrote:
>
> > Am Dienstag, 2. Februar 2010 13:39:35 schrieb Catalin Marinas:
> > > > For storage that is correct. But what about other sources of pages,
> > > > for example iSCSI?
> > >
> > > In the iSCSI case, does the HCD driver write directly to a page cache
> > > page? Or it just fills in network packets that are copied to page cache
> > > pages by the iSCSI code (sorry, I'm not familiar with this part of the
> > > kernel). If the latter, the cache flushing in the HCD driver would not
> > > help and it needs to be done in the iSCSI code.
> >
> > As far as I can tell iSCSI does a private copy. But I don't know how
> > many methods to transfer code pages over USB exist. I'd say the
> > conservative solution is to flush for everything but control transfers.
>
> This doesn't make any sense. Nobody would ever use isochronous
> transfers to store data into a code page because isochronous is
> unreliable. (Audio isn't a counterexample -- audio data may be
Why not?
Use isochronous transfer to load data, verify it is okay, exec it.
Or maybe someone is doing crashme testing with usb audio as random
generator :-).
Sure, unlikely, but...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi,
On Mon, Feb 08, 2010 at 07:55:19AM +0100, Pavel Machek wrote:
> Plus it does unneccessary flushes on x86, etc...
Noticed that as well, there should be an arch-obeying helper for this.
On my MIPSEL, I had urb->transfer_buffer NULL ptr crashes
(I think that was expected in case of a certain DMA setup, Alan said).
However, even with NULL check added I still had:
hub 2-1.1:1.0: state 7 ports 7 chg 0000 evt 0010
Unhandled kernel unaligned access[#1]:
Cpu 0
$ 0 : 00000000 fffffffd 803b0000 00010000
$ 4 : 08002042 8143bfe0 0043bfe0 0000000d
$ 8 : 00000001 3b9aca00 c4653600 00000000
$12 : 00000049 3b9aca00 81dbc868 00000000
$16 : a1e00000 803b0000 8037f840 81dfaa80
$20 : 00000000 81dd5080 80000000 00000000
$24 : 00000000 80015a64
$28 : 8033a000 8033bc10 a1dd83cc 801da5e4
Hi : 00000000
Lo : 00000000
epc : 800171e8 __flush_dcache_page+0x38/0x120
Not tainted
ra : 801da5e4 ehci_urb_done+0x180/0x1e4
Status: 10009002 KERNEL EXL
Cause : 00800010
BadVA : 08002056
PrId : 00029029 (Broadcom BCM3302)
Modules linked in:
Process swapper (pid: 0, threadinfo=8033a000, task=8033c000, tls=00000000)
Stack : 00000000 00000000 81e04980 801c80ac a1dd9060 a1dd8394 ffffff6a ffffff6a
81dfaa80 a1dd83cc a1dd8380 801db3a4 803a6a28 80068e9c 000003f8 00003fc0
a1dd81cc 801dea58 00000001 00000000 a1dd9360 81dd5080 a1dd8380 10009001
a1dd83cc 81dd5158 00000000 80318d44 81dd5158 00000001 00010031 801de8f4
81dd5158 8033bce0 803a76a0 803a0000 8033d860 8004f924 00000219 00000043
...
Call Trace:
[<800171e8>] __flush_dcache_page+0x38/0x120
[<801da5e4>] ehci_urb_done+0x180/0x1e4
[<801db3a4>] qh_completions+0x484/0x554
[<801de8f4>] ehci_work+0x1ec/0xb68
[<801e2598>] ehci_irq+0x360/0x3a4
[<801c7cf8>] usb_hcd_irq+0x64/0x15c
[<80066d58>] handle_IRQ_event+0x90/0x280
[<80068e80>] handle_percpu_irq+0x48/0x9c
[<8000e228>] plat_irq_dispatch+0x15c/0x178
[<80001444>] ret_from_irq+0x0/0x4
[<80001680>] r4k_wait+0x20/0x40
[<8000fe34>] cpu_idle+0x30/0x60
[<80354a34>] start_kernel+0x338/0x350
Code: 00000000 10800029 3c02803b <8c820014> 14400026 3c02803b 8c83001c 2482001c 14620021
Disabling lock debugging due to kernel taint
Kernel panic - not syncing: Fatal exception in interrupt
Seems like BadVA : 08002056 really isn't as aligned (offset 0x6) as it should be.
I've given up on this now BTW, I'll wait until the dust has settled (i.e. some nice improvements
have found their way to the kernel) and retry in some months with a much newer kernel version
(currently patched-up 2.6.31.9) whether something remains to be fixed.
I'll work on more productive things such as submitting some waiting patches.
Andreas Mohr
Hi,
On Mon, 2010-02-08 at 06:55 +0000, Pavel Machek wrote:
> > > So, let's put this in the HCD drivers and be done with it.
> >
> > The patch below is what fixes the I-D cache incoherency issues on ARM. I
> > don't particularly like the solution but it seems to be the only one
> > available.
>
> Really? It looks like arm should just flush the caches when mapping
> executable page to the userspace.... you can't expect all the drivers
> to be modified like that...
We could of course flush the caches every time we get a page fault but
that's far from optimal, especially since DMA-capable drivers to do not
pollute the D-cache and don't need this extra flushing. Note that the
recent ARM processors have PIPT caches but separate for I and D and it's
the PIO drivers that pollute the D-cache.
The kernel API provides flush_dcache_page() to be called every time the
kernel writes to a page cache page. This is further optimised for
working in pair with update_mmu_cache() to delay the flushing until the
actual page is mapped into user space and this latter function is called
(which in general is not a cache maintenance function).
The problem with some PIO drivers and a filesystems like ext2 is that
there is no call to flush_dcache_page() when getting data into a page
cache page. Since the page isn't marked as dirty (PG_arch_1), a
subsequent call to update_mmu_cache() as a result of a page fault
doesn't flush the caches.
There is a flush_icache_page() function called from __do_fault(),
however, Documentation/cachetlb.txt states that all the functionality of
this function can be implemented in flush_dcache_page() and
update_mmu_cache(), hence this function is a no-op.
Please suggest a better solution that does not involve modifying generic
Linux code.
> Plus it does unneccessary flushes on x86, etc...
On x86, it should indeed be conditionally compiled based on
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE.
Regards.
--
Catalin
On 02/08/10 10:51, Somebody in the thread at some point said:
> We could of course flush the caches every time we get a page fault but
> that's far from optimal, especially since DMA-capable drivers to do not
> pollute the D-cache and don't need this extra flushing. Note that the
> recent ARM processors have PIPT caches but separate for I and D and it's
> the PIO drivers that pollute the D-cache.
Just noting that AFAIK iMX31 USB and MMC drivers both are PIO at the
moment, for lack of any platform DMA support of its unusual DMA engine.
-Andy
On Mon, 2010-02-08 at 07:33 +0000, Andreas Mohr wrote:
> On Mon, Feb 08, 2010 at 07:55:19AM +0100, Pavel Machek wrote:
> > Plus it does unneccessary flushes on x86, etc...
>
> Noticed that as well, there should be an arch-obeying helper for this.
>
>
> On my MIPSEL, I had urb->transfer_buffer NULL ptr crashes
> (I think that was expected in case of a certain DMA setup, Alan said).
>
> However, even with NULL check added I still had:
>
> hub 2-1.1:1.0: state 7 ports 7 chg 0000 evt 0010
> Unhandled kernel unaligned access[#1]:
Just to avoid confusion - that's a similar patch applied to a different
driver. The ISP1760 HCD driver works fine with my patch (transfer_buffer
never seems to be NULL with latest mainline). I can't comment on the
ehci-q.c driver (it looks like it has some support for DMA while my
patch only applies to PIO drivers where transfer_buffer should be set).
--
Catalin
> Hi,
>
> On Mon, 2010-02-08 at 06:55 +0000, Pavel Machek wrote:
> > > > So, let's put this in the HCD drivers and be done with it.
> > >
> > > The patch below is what fixes the I-D cache incoherency issues on ARM. I
> > > don't particularly like the solution but it seems to be the only one
> > > available.
> >
> > Really? It looks like arm should just flush the caches when mapping
> > executable page to the userspace.... you can't expect all the drivers
> > to be modified like that...
>
> We could of course flush the caches every time we get a page fault but
> that's far from optimal, especially since DMA-capable drivers to do
> not
Maybe far for optimal, but it is something that should be done,
_first_. Correctness is more important than performance, and you can't
expect all drivers to behave like you want them.
Then you can add optimalizations not to do the flushes on drivers you
audited and where you care...
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Mon, 2010-02-08 at 10:52 +0000, Pavel Machek wrote:
> > On Mon, 2010-02-08 at 06:55 +0000, Pavel Machek wrote:
> > > > > So, let's put this in the HCD drivers and be done with it.
> > > >
> > > > The patch below is what fixes the I-D cache incoherency issues on ARM. I
> > > > don't particularly like the solution but it seems to be the only one
> > > > available.
> > >
> > > Really? It looks like arm should just flush the caches when mapping
> > > executable page to the userspace.... you can't expect all the drivers
> > > to be modified like that...
> >
> > We could of course flush the caches every time we get a page fault but
> > that's far from optimal, especially since DMA-capable drivers to do
> > not
>
> Maybe far for optimal, but it is something that should be done,
> _first_. Correctness is more important than performance, and you can't
> expect all drivers to behave like you want them.
I wouldn't call heavy cache flushing "correctness". We could as well
disable the caches so that it is fully coherent.
The arch code follows an API defined in cachetlb.txt but the PIO drivers
don't (some do, like mmci.c). It may be inconvenient to call
flush_dcache_page() in the driver, hence I started a discussion on
linux-arch on a PIO mapping API that x86 or other fully coherent
architectures can leave it as no-ops.
> Then you can add optimalizations not to do the flushes on drivers you
> audited and where you care...
Sorry but that's not really feasible (unless I don't fully understand
what you mean) - if we do the cache flushing on the fault handling path
in the arch code, there is no way for the arch code to know what driver
is doing, unless we make this conditionally compiled with something like
CONFIG_ARCH_NEEDS_HEAVY_FLUSHING.
--
Catalin
> -----Original Message-----
> From: [email protected] [mailto:linux-arm-kernel-
> [email protected]] On Behalf Of Catalin Marinas
> Sent: Monday, February 08, 2010 4:58 PM
> To: Pavel Machek
> Cc: Matthew Dharm; Sergei Shtylyov; Ming Lei; Sebastian Siewior; [email protected]; linux-
> kernel; Greg KH; linux-arm-kernel
> Subject: Re: USB mass storage and ARM cache coherency
>
> On Mon, 2010-02-08 at 10:52 +0000, Pavel Machek wrote:
> > > On Mon, 2010-02-08 at 06:55 +0000, Pavel Machek wrote:
> > > > > > So, let's put this in the HCD drivers and be done with it.
> > > > >
> > > > > The patch below is what fixes the I-D cache incoherency issues on ARM. I
> > > > > don't particularly like the solution but it seems to be the only one
> > > > > available.
> > > >
> > > > Really? It looks like arm should just flush the caches when mapping
> > > > executable page to the userspace.... you can't expect all the drivers
> > > > to be modified like that...
> > >
> > > We could of course flush the caches every time we get a page fault but
> > > that's far from optimal, especially since DMA-capable drivers to do
> > > not
> >
> > Maybe far for optimal, but it is something that should be done,
> > _first_. Correctness is more important than performance, and you can't
> > expect all drivers to behave like you want them.
>
> I wouldn't call heavy cache flushing "correctness". We could as well
> disable the caches so that it is fully coherent.
>
> The arch code follows an API defined in cachetlb.txt but the PIO drivers
> don't (some do, like mmci.c). It may be inconvenient to call
> flush_dcache_page() in the driver, hence I started a discussion on
> linux-arch on a PIO mapping API that x86 or other fully coherent
> architectures can leave it as no-ops.
>
> > Then you can add optimalizations not to do the flushes on drivers you
> > audited and where you care...
>
> Sorry but that's not really feasible (unless I don't fully understand
> what you mean) - if we do the cache flushing on the fault handling path
> in the arch code, there is no way for the arch code to know what driver
> is doing, unless we make this conditionally compiled with something like
> CONFIG_ARCH_NEEDS_HEAVY_FLUSHING.
Continuing on the USB issue w.r.t cache coherency, the usb host
code is violating the buffer ownership rules of streaming APIs from
dma and non-dma transfers point if view.
We have a below temporary patch to get around the issue and probably it
needs to be fixed in the right way in the stack because some controllers
may not have PIO option even for control transfers. (e.g. Synopsis EHCI
controller)
From: Maulik Mankad <[email protected]>
USB: Avoid DMA map/unmap of control transfer buffers.
This patch avoids the DMA mapping of buffers for control
transfers.
Signed-off-by: Maulik Mankad <[email protected]>
---
Index: omap4_integration/drivers/usb/core/hcd.c
===================================================================
--- omap4_integration.orig/drivers/usb/core/hcd.c
+++ omap4_integration/drivers/usb/core/hcd.c
@@ -1274,6 +1274,10 @@ static int map_urb_for_dma(struct usb_hc
if (is_root_hub(urb->dev))
return 0;
+ if (usb_endpoint_xfer_control(&urb->ep->desc))
+ urb->transfer_flags = URB_NO_SETUP_DMA_MAP |
+ URB_NO_TRANSFER_DMA_MAP;
+
if (usb_endpoint_xfer_control(&urb->ep->desc)
&& !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
if (hcd->self.uses_dma) {
Am Dienstag, 16. Februar 2010 08:57:53 schrieb Shilimkar, Santosh:
> Continuing on the USB issue w.r.t cache coherency, the usb host
> code is violating the buffer ownership rules of streaming APIs from
> dma and non-dma transfers point if view.
>
> We have a below temporary patch to get around the issue and probably it
> needs to be fixed in the right way in the stack because some controllers
> may not have PIO option even for control transfers. (e.g. Synopsis EHCI
> controller)
This seems wrong to me. Buffers for control transfers may be transfered
by DMA, so the caches must be flushed on architectures whose caches
are not coherent with respect to DMA.
Would you care to elaborate on the exact nature of the bug you are fixing?
Regards
Oliver
On Tue, Feb 16, 2010 at 01:27:53PM +0530, Shilimkar, Santosh wrote:
> Continuing on the USB issue w.r.t cache coherency, the usb host
> code is violating the buffer ownership rules of streaming APIs from
> dma and non-dma transfers point if view.
>
> We have a below temporary patch to get around the issue and probably it
> needs to be fixed in the right way in the stack because some controllers
> may not have PIO option even for control transfers. (e.g. Synopsis EHCI
> controller)
if (usb_endpoint_xfer_control(&urb->ep->desc)
&& !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
if (hcd->self.uses_dma) { <=================
urb->setup_dma = dma_map_single(
hcd->self.controller,
urb->setup_packet,
sizeof(struct usb_ctrlrequest),
DMA_TO_DEVICE);
struct usb_hcd *usb_create_hcd (const struct hc_driver *driver,
struct device *dev, const char *bus_name)
{
...
hcd->self.uses_dma = (dev->dma_mask != NULL);
Is it easier to make sure that PIO devices don't have dev->dma_mask set?
Russell King - ARM Linux wrote:
> On Tue, Feb 16, 2010 at 01:27:53PM +0530, Shilimkar, Santosh wrote:
> > Continuing on the USB issue w.r.t cache coherency, the usb host
> > code is violating the buffer ownership rules of streaming APIs from
> > dma and non-dma transfers point if view.
> >
> > We have a below temporary patch to get around the issue and probably it
> > needs to be fixed in the right way in the stack because some controllers
> > may not have PIO option even for control transfers. (e.g. Synopsis EHCI
> > controller)
>
> if (usb_endpoint_xfer_control(&urb->ep->desc)
> && !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
> if (hcd->self.uses_dma) { <=================
> urb->setup_dma = dma_map_single(
> hcd->self.controller,
> urb->setup_packet,
> sizeof(struct usb_ctrlrequest),
> DMA_TO_DEVICE);
>
> struct usb_hcd *usb_create_hcd (const struct hc_driver *driver,
> struct device *dev, const char *bus_name)
> {
> ...
> hcd->self.uses_dma = (dev->dma_mask != NULL);
>
> Is it easier to make sure that PIO devices don't have dev->dma_mask set?
Not really. For instance, in the case of the DMA engine in the MUSB
controller in OMAP3, we can only use DMA with endpoints other than
EP0, and EP0 is what is used for control transfers.
It's not PIO for all the endpoints or DMA for all of them.
- Anand
> -----Original Message-----
> From: Oliver Neukum [mailto:[email protected]]
> Sent: Tuesday, February 16, 2010 1:53 PM
> To: Shilimkar, Santosh
> Cc: Catalin Marinas; Pavel Machek; Greg KH; Russell King - ARM Linux; Matthew Dharm; Sergei Shtylyov;
> Ming Lei; Sebastian Siewior; [email protected]; linux-kernel; linux-arm-kernel; Mankad,
> Maulik Ojas
> Subject: Re: USB mass storage and ARM cache coherency
>
> Am Dienstag, 16. Februar 2010 08:57:53 schrieb Shilimkar, Santosh:
> > Continuing on the USB issue w.r.t cache coherency, the usb host
> > code is violating the buffer ownership rules of streaming APIs from
> > dma and non-dma transfers point if view.
> >
> > We have a below temporary patch to get around the issue and probably it
> > needs to be fixed in the right way in the stack because some controllers
> > may not have PIO option even for control transfers. (e.g. Synopsis EHCI
> > controller)
>
> This seems wrong to me. Buffers for control transfers may be transfered
> by DMA, so the caches must be flushed on architectures whose caches
> are not coherent with respect to DMA.
Indeed and that's what I mentioned in the comment. But we shouldn't have dma
cache maintenance operations done for the buffers which would use pio based transfer.
> Would you care to elaborate on the exact nature of the bug you are fixing?
On the OMAP4 (ARM cortex-a9) platform, the enumeration fails because control
transfer buffers are corrupted. On our platform, we use PIO mode for control
transfers and DMA for bulk transfers.
The current stack performs dma cache maintenance even for the PIO transfers
which leads to the corruption issue. The control buffers are handled by CPU
and they already coherent from CPU point of view.
Regards,
Santosh
Am Dienstag, 16. Februar 2010 09:55:55 schrieb Shilimkar, Santosh:
> > This seems wrong to me. Buffers for control transfers may be transfered
> > by DMA, so the caches must be flushed on architectures whose caches
> > are not coherent with respect to DMA.
> Indeed and that's what I mentioned in the comment. But we shouldn't have dma
> cache maintenance operations done for the buffers which would use pio based transfer.
Given that the generic layer can't know which buffers will be used for DMA
that would require a callback into the hcd driver.
> > Would you care to elaborate on the exact nature of the bug you are fixing?
> On the OMAP4 (ARM cortex-a9) platform, the enumeration fails because control
> transfer buffers are corrupted. On our platform, we use PIO mode for control
> transfers and DMA for bulk transfers.
>
> The current stack performs dma cache maintenance even for the PIO transfers
> which leads to the corruption issue. The control buffers are handled by CPU
> and they already coherent from CPU point of view.
How does the mapping corrupt buffers? It might impact performance, but why
do you see corruption?
Regards
Oliver
On Tue, Feb 16, 2010 at 10:07:20AM +0100, Oliver Neukum wrote:
> Am Dienstag, 16. Februar 2010 09:55:55 schrieb Shilimkar, Santosh:
> > > Would you care to elaborate on the exact nature of the bug you are fixing?
> > On the OMAP4 (ARM cortex-a9) platform, the enumeration fails because control
> > transfer buffers are corrupted. On our platform, we use PIO mode for control
> > transfers and DMA for bulk transfers.
> >
> > The current stack performs dma cache maintenance even for the PIO transfers
> > which leads to the corruption issue. The control buffers are handled by CPU
> > and they already coherent from CPU point of view.
>
> How does the mapping corrupt buffers? It might impact performance, but why
> do you see corruption?
On map, buffers are cleaned if they're being used for DMA_TO_DEVICE and
DMA_BIDIRECTIONAL, or invalidated in the case of DMA_FROM_DEVICE.
However, because ARM CPUs can now speculatively prefetch, just leaving it
at that results in corruption of buffers used for DMA. So we have to
invalidate DMA_FROM_DEVICE and DMA_BIDIRECTIONAL buffers on unmap to
ensure coherency with DMA operations.
If the CPU writes to a DMA_FROM_DEVICE buffer between map and unmap, the
writes can sit in the cache, and on unmap, they will be discarded.
Cleaning the cache on unmap is not an option; that too can lead to DMA
buffer corruption in the DMA case.
USB and associated host driver must abide by the DMA API buffer
ownership rules otherwise the result will be data corruption; either
that or USB/host driver people need to have a discussion with the
DMA API authors to remove this sensible "restriction".
Am Dienstag, 16. Februar 2010 10:39:46 schrieb Russell King - ARM Linux:
> However, because ARM CPUs can now speculatively prefetch, just leaving it
> at that results in corruption of buffers used for DMA. So we have to
> invalidate DMA_FROM_DEVICE and DMA_BIDIRECTIONAL buffers on unmap to
> ensure coherency with DMA operations.
>
> If the CPU writes to a DMA_FROM_DEVICE buffer between map and unmap, the
> writes can sit in the cache, and on unmap, they will be discarded.
>
> Cleaning the cache on unmap is not an option; that too can lead to DMA
> buffer corruption in the DMA case.
I am afraid for these controllers the controller driver must be responsible
for all DMA and cache issues. Indicating the exact requirements to the
upper layer would be a battle already lost.
so the safe choice is not to set has_dma and the generic layer will leave
the issue to the lower level.
Regards
Oliver
> -----Original Message-----
> From: Oliver Neukum [mailto:[email protected]]
> Sent: Tuesday, February 16, 2010 7:03 PM
> To: Russell King - ARM Linux
> Cc: Shilimkar, Santosh; Catalin Marinas; Pavel Machek; Greg KH; Matthew Dharm; Sergei Shtylyov; Ming
> Lei; Sebastian Siewior; [email protected]; linux-kernel; linux-arm-kernel; Mankad, Maulik
> Ojas
> Subject: Re: USB mass storage and ARM cache coherency
>
> Am Dienstag, 16. Februar 2010 10:39:46 schrieb Russell King - ARM Linux:
> > However, because ARM CPUs can now speculatively prefetch, just leaving it
> > at that results in corruption of buffers used for DMA. So we have to
> > invalidate DMA_FROM_DEVICE and DMA_BIDIRECTIONAL buffers on unmap to
> > ensure coherency with DMA operations.
> >
> > If the CPU writes to a DMA_FROM_DEVICE buffer between map and unmap, the
> > writes can sit in the cache, and on unmap, they will be discarded.
> >
> > Cleaning the cache on unmap is not an option; that too can lead to DMA
> > buffer corruption in the DMA case.
>
> I am afraid for these controllers the controller driver must be responsible
> for all DMA and cache issues. Indicating the exact requirements to the
> upper layer would be a battle already lost.
> so the safe choice is not to set has_dma and the generic layer will leave
> the issue to the lower level.
This means don't use dma at all which will almost kill the performance.
Regards,
Santosh
Am Dienstag, 16. Februar 2010 14:40:45 schrieb Shilimkar, Santosh:
> > > If the CPU writes to a DMA_FROM_DEVICE buffer between map and unmap, the
> > > writes can sit in the cache, and on unmap, they will be discarded.
> > >
> > > Cleaning the cache on unmap is not an option; that too can lead to DMA
> > > buffer corruption in the DMA case.
> >
> > I am afraid for these controllers the controller driver must be responsible
> > for all DMA and cache issues. Indicating the exact requirements to the
> > upper layer would be a battle already lost.
> > so the safe choice is not to set has_dma and the generic layer will leave
> > the issue to the lower level.
> This means don't use dma at all which will almost kill the performance.
Why would you be unable to map a buffer in the hcd driver when you know
that you'll use DMA?
Regards
Oliver
> -----Original Message-----
> From: Oliver Neukum [mailto:[email protected]]
> Sent: Tuesday, February 16, 2010 7:17 PM
> To: Shilimkar, Santosh
> Cc: Russell King - ARM Linux; Catalin Marinas; Pavel Machek; Greg KH; Matthew Dharm; Sergei Shtylyov;
> Ming Lei; Sebastian Siewior; [email protected]; linux-kernel; linux-arm-kernel; Mankad,
> Maulik Ojas
> Subject: Re: USB mass storage and ARM cache coherency
>
> Am Dienstag, 16. Februar 2010 14:40:45 schrieb Shilimkar, Santosh:
> > > > If the CPU writes to a DMA_FROM_DEVICE buffer between map and unmap, the
> > > > writes can sit in the cache, and on unmap, they will be discarded.
> > > >
> > > > Cleaning the cache on unmap is not an option; that too can lead to DMA
> > > > buffer corruption in the DMA case.
> > >
> > > I am afraid for these controllers the controller driver must be responsible
> > > for all DMA and cache issues. Indicating the exact requirements to the
> > > upper layer would be a battle already lost.
> > > so the safe choice is not to set has_dma and the generic layer will leave
> > > the issue to the lower level.
> > This means don't use dma at all which will almost kill the performance.
>
> Why would you be unable to map a buffer in the hcd driver when you know
> that you'll use DMA?
Probably it can be. The USB stack has the dma maintenance code at common
place for all controllers and hence we were just trying to see if there is
way to handle that way.
We shall check this possibility
Regards,
Santosh
Am Dienstag, 16. Februar 2010 15:12:45 schrieb Shilimkar, Santosh:
> > > > I am afraid for these controllers the controller driver must be responsible
> > > > for all DMA and cache issues. Indicating the exact requirements to the
> > > > upper layer would be a battle already lost.
> > > > so the safe choice is not to set has_dma and the generic layer will leave
> > > > the issue to the lower level.
> > > This means don't use dma at all which will almost kill the performance.
> >
> > Why would you be unable to map a buffer in the hcd driver when you know
> > that you'll use DMA?
> Probably it can be. The USB stack has the dma maintenance code at common
> place for all controllers and hence we were just trying to see if there is
> way to handle that way.
This is true. If you can find a clean way to describe your requirements
to the generic layer, that would be better. The problem is that we must
not end up with a dozen flags.
Your original patch however kills ehci, ohci and uhci on some architectures.
Regards
Oliver
> -----Original Message-----
> From: Oliver Neukum [mailto:[email protected]]
> Sent: Tuesday, February 16, 2010 7:53 PM
> To: Shilimkar, Santosh
> Cc: Russell King - ARM Linux; Catalin Marinas; Pavel Machek; Greg KH; Matthew Dharm; Sergei Shtylyov;
> Ming Lei; Sebastian Siewior; [email protected]; linux-kernel; linux-arm-kernel; Mankad,
> Maulik Ojas
> Subject: Re: USB mass storage and ARM cache coherency
>
> Am Dienstag, 16. Februar 2010 15:12:45 schrieb Shilimkar, Santosh:
> > > > > I am afraid for these controllers the controller driver must be responsible
> > > > > for all DMA and cache issues. Indicating the exact requirements to the
> > > > > upper layer would be a battle already lost.
> > > > > so the safe choice is not to set has_dma and the generic layer will leave
> > > > > the issue to the lower level.
> > > > This means don't use dma at all which will almost kill the performance.
> > >
> > > Why would you be unable to map a buffer in the hcd driver when you know
> > > that you'll use DMA?
> > Probably it can be. The USB stack has the dma maintenance code at common
> > place for all controllers and hence we were just trying to see if there is
> > way to handle that way.
>
> This is true. If you can find a clean way to describe your requirements
> to the generic layer, that would be better. The problem is that we must
> not end up with a dozen flags.
Agree
> Your original patch however kills ehci, ohci and uhci on some architectures.
Well the patch was making _ONLY_ control transfers use PIO and rest of
the transfer would still use dma. So not sure how much performance impact would
be because of that.
Another issue with that patch is there are few controllers which can't do PIO
at all and hence the patch would broke those controllers.
So we need a clean way to handle it as you described.
Regards,
Santosh
On Tue, 16 Feb 2010, Shilimkar, Santosh wrote:
> > Your original patch however kills ehci, ohci and uhci on some architectures.
> Well the patch was making _ONLY_ control transfers use PIO and rest of
> the transfer would still use dma. So not sure how much performance impact would
> be because of that.
> Another issue with that patch is there are few controllers which can't do PIO
> at all and hence the patch would broke those controllers.
More than "a few"! None of the EHCI, OHCI, or UHCI controllers used in
Intel-compatible desktop and laptop systems can do PIO. That's what
Oliver meant.
Alan Stern
2010/2/16 Shilimkar, Santosh <[email protected]>:
>> > We have a below temporary patch to get around the issue and probably it
>> > needs to be fixed in the right way in the stack because some controllers
>> > may not have PIO option even for control transfers. (e.g. Synopsis EHCI
>> > controller)
Your temporary patch only removes dma map and umap for setup buffer in
control transfer.
>>
>> This seems wrong to me. Buffers for control transfers may be transfered
>> by DMA, so the caches must be flushed on architectures whose caches
>> are not coherent with respect to DMA.
> Indeed and that's what I mentioned in the comment. But we shouldn't have dma
> cache maintenance operations done for the buffers which would use pio based transfer.
>> Would you care to elaborate on the exact nature of the bug you are fixing?
> On the OMAP4 (ARM cortex-a9) platform, the enumeration fails because control
> transfer buffers are corrupted. On our platform, we use PIO mode for control
> transfers and DMA for bulk transfers.
I don't know you mean you use PIO mode for seup buffer only or whole control
transfer(setup sent, data in or data out). If you mean do not use DMA
for setup sent, data in or data out in a control transfer, your
temporary patch maybe is not enough, right?
>
> The current stack performs dma cache maintenance even for the PIO transfers
> which leads to the corruption issue. The control buffers are handled by CPU
> and they already coherent from CPU point of view.
--
Lei Ming
> -----Original Message-----
> From: Oliver Neukum [mailto:[email protected]]
> Sent: Tuesday, February 16, 2010 7:53 PM
> To: Shilimkar, Santosh
> Cc: Russell King - ARM Linux; Catalin Marinas; Pavel Machek; Greg KH; Matthew Dharm; Sergei Shtylyov;
> Ming Lei; Sebastian Siewior; [email protected]; linux-kernel; linux-arm-kernel; Mankad,
> Maulik Ojas
> Subject: Re: USB mass storage and ARM cache coherency
>
> Am Dienstag, 16. Februar 2010 15:12:45 schrieb Shilimkar, Santosh:
> > > > > I am afraid for these controllers the controller driver must be responsible
> > > > > for all DMA and cache issues. Indicating the exact requirements to the
> > > > > upper layer would be a battle already lost.
> > > > > so the safe choice is not to set has_dma and the generic layer will leave
> > > > > the issue to the lower level.
> > > > This means don't use dma at all which will almost kill the performance.
> > >
> > > Why would you be unable to map a buffer in the hcd driver when you know
> > > that you'll use DMA?
> > Probably it can be. The USB stack has the dma maintenance code at common
> > place for all controllers and hence we were just trying to see if there is
> > way to handle that way.
>
> This is true. If you can find a clean way to describe your requirements
> to the generic layer, that would be better. The problem is that we must
> not end up with a dozen flags.
>
> Your original patch however kills ehci, ohci and uhci on some architectures.
How about below approach? Controller driver can set
"uses_pio_for_control" if it can't do dma for control transfer.
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 80995ef..e3eae02 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1276,7 +1276,7 @@ static int map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb,
if (usb_endpoint_xfer_control(&urb->ep->desc)
&& !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
- if (hcd->self.uses_dma) {
+ if (hcd->self.uses_dma && !hcd->self.uses_pio_for_control) {
urb->setup_dma = dma_map_single(
hcd->self.controller,
urb->setup_packet,
@@ -1335,7 +1335,7 @@ static void unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb)
if (usb_endpoint_xfer_control(&urb->ep->desc)
&& !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
- if (hcd->self.uses_dma)
+ if (hcd->self.uses_dma && !hcd->self.uses_pio_for_control)
dma_unmap_single(hcd->self.controller, urb->setup_dma,
sizeof(struct usb_ctrlrequest),
DMA_TO_DEVICE);
diff --git a/include/linux/usb.h b/include/linux/usb.h
index d7ace1b..ba5b0a2 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -329,6 +329,9 @@ struct usb_bus {
int busnum; /* Bus number (in order of reg) */
const char *bus_name; /* stable id (PCI slot_name etc) */
u8 uses_dma; /* Does the host controller use DMA? */
+ u8 uses_pio_for_control; /* Does the host controller use PIO
+ * for control tansfers?
+ */
u8 otg_port; /* 0, or number of OTG/HNP port */
unsigned is_b_host:1; /* true during some HNP roleswitches */
unsigned b_hnp_enable:1; /* OTG: did A-Host enable HNP? */
Regards,
Santosh
On Tue, 2010-02-16 at 09:22 +0100, Oliver Neukum wrote:
> This seems wrong to me. Buffers for control transfers may be
> transfered
> by DMA, so the caches must be flushed on architectures whose caches
> are not coherent with respect to DMA.
>
> Would you care to elaborate on the exact nature of the bug you are
> fixing?
I missed part of this thread, so forgive me if I'm a bit off here, but
if the problem is indeed I$/D$ cache coherency vs. PIO transfers, then
this is a long solved issue on other archs such as ppc (and I _think_
sparc).
The way we do it, at least on powerpc which is PIPT, is to keep track on
a per-page basis, whether a given page is clean for execution using
PG_arch1 bit. This bit is cleared when a new page is popped into the
page cache, and we clear it from flush_dcache_page() iirc (you may want
to dbl check I don't have the code at hand right now, or rather, I do
but I'm to lazy to look right now :-)
Any page with that not set is mapped into userspace with execute
permission disabled. We do the flush and set PG_arch1 on the first exec
fault to that page.
Cheers,
Ben.
Am Mittwoch, 17. Februar 2010 09:55:08 schrieb Shilimkar, Santosh:
> > Your original patch however kills ehci, ohci and uhci on some architectures.
>
> How about below approach? Controller driver can set
> "uses_pio_for_control" if it can't do dma for control transfer.
>
> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> index 80995ef..e3eae02 100644
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -1276,7 +1276,7 @@ static int map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb,
>
> if (usb_endpoint_xfer_control(&urb->ep->desc)
> && !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
> - if (hcd->self.uses_dma) {
> + if (hcd->self.uses_dma && !hcd->self.uses_pio_for_control) {
It is not elegant to describe exceptions. It would be better, if you split up
the flag into two flags, called uses_dma_for_ordinary_transfers and
uses_dma_for control_transfers. Doing so also makes sure you look at
all hcd drivers ;-)
And the tests become straightforward. And please add a detailed comment
to explain why this differentiation is needed on ARM.
Regards
Oliver
Am Mittwoch, 17. Februar 2010 10:05:43 schrieb Benjamin Herrenschmidt:
> > Would you care to elaborate on the exact nature of the bug you are
> > fixing?
>
> I missed part of this thread, so forgive me if I'm a bit off here, but
> if the problem is indeed I$/D$ cache coherency vs. PIO transfers, then
> this is a long solved issue on other archs such as ppc (and I think
> sparc).
We should have changed the subject line.
There's a second problem. It turns out that on ARM
mapping for DMA must not be done if PIO will be used. Some HCDs
use PIO for some transfers but DMA for others. The generic layer
must learn about this.
Regards
Oliver
> -----Original Message-----
> From: Oliver Neukum [mailto:[email protected]]
> Sent: Wednesday, February 17, 2010 2:41 PM
> To: Shilimkar, Santosh
> Cc: Russell King - ARM Linux; Catalin Marinas; Pavel Machek; Greg KH; Matthew Dharm; Sergei Shtylyov;
> Ming Lei; Sebastian Siewior; [email protected]; linux-kernel; linux-arm-kernel; Mankad,
> Maulik Ojas; Gadiyar, Anand
> Subject: Re: USB mass storage and ARM cache coherency
>
> Am Mittwoch, 17. Februar 2010 09:55:08 schrieb Shilimkar, Santosh:
> > > Your original patch however kills ehci, ohci and uhci on some architectures.
> >
> > How about below approach? Controller driver can set
> > "uses_pio_for_control" if it can't do dma for control transfer.
> >
> > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> > index 80995ef..e3eae02 100644
> > --- a/drivers/usb/core/hcd.c
> > +++ b/drivers/usb/core/hcd.c
> > @@ -1276,7 +1276,7 @@ static int map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb,
> >
> > if (usb_endpoint_xfer_control(&urb->ep->desc)
> > && !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
> > - if (hcd->self.uses_dma) {
> > + if (hcd->self.uses_dma && !hcd->self.uses_pio_for_control) {
>
> It is not elegant to describe exceptions. It would be better, if you split up
> the flag into two flags, called uses_dma_for_ordinary_transfers and
> uses_dma_for control_transfers. Doing so also makes sure you look at
> all hcd drivers ;-)
>
Good point. Negative checks are any way not elegant
> And the tests become straightforward. And please add a detailed comment
> to explain why this differentiation is needed on ARM.
OK. I shall create a patch with description about the problem.
Thanks for feedback!!
Regards,
Santosh
On Wed, 2010-02-17 at 10:15 +0100, Oliver Neukum wrote:
> We should have changed the subject line.
>
> There's a second problem. It turns out that on ARM
> mapping for DMA must not be done if PIO will be used. Some HCDs
> use PIO for some transfers but DMA for others. The generic layer
> must learn about this.
Ah, that makes a lot of sense and the same problem would happen on
any non-DMA coherent architecture, including some embedded ppc's.
I can see why the dma unmap would invalidate the dcache and blow
away the PIO.
What bugs me here is that the dma_map_* operation should always
be done at the lowest level, ie, the actual HCD driver, and thus
it should be up to the HCD to decide whether to dma_map or not
depending on whether it's going to do DMA or not. I haven't
scrutinized USB lately but if that isn't the case and the dma_map_*
operations are done behind your back by the USB core then that needs to
be changed in a way or another, or hooked at least.
Cheers,
Ben.
On Mon, Feb 08, 2010 at 11:03:14AM +0100, Andy Green wrote:
> On 02/08/10 10:51, Somebody in the thread at some point said:
>
>> We could of course flush the caches every time we get a page fault but
>> that's far from optimal, especially since DMA-capable drivers to do not
>> pollute the D-cache and don't need this extra flushing. Note that the
>> recent ARM processors have PIPT caches but separate for I and D and it's
>> the PIO drivers that pollute the D-cache.
>
> Just noting that AFAIK iMX31 USB and MMC drivers both are PIO at the
> moment, for lack of any platform DMA support of its unusual DMA engine.
The EHCI module has its own DMA engine and has nothing to do with the
SDMA engine.
Sascha
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
On 02/17/10 10:50, Somebody in the thread at some point said:
> On Mon, Feb 08, 2010 at 11:03:14AM +0100, Andy Green wrote:
>> On 02/08/10 10:51, Somebody in the thread at some point said:
>>
>>> We could of course flush the caches every time we get a page fault but
>>> that's far from optimal, especially since DMA-capable drivers to do not
>>> pollute the D-cache and don't need this extra flushing. Note that the
>>> recent ARM processors have PIPT caches but separate for I and D and it's
>>> the PIO drivers that pollute the D-cache.
>>
>> Just noting that AFAIK iMX31 USB and MMC drivers both are PIO at the
>> moment, for lack of any platform DMA support of its unusual DMA engine.
>
> The EHCI module has its own DMA engine and has nothing to do with the
> SDMA engine.
You're right, my mistake. iMX31 MMC is PIO due to no SDMA support though.
-Andy
On Wed, Feb 17, 2010 at 08:05:43PM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2010-02-16 at 09:22 +0100, Oliver Neukum wrote:
> > This seems wrong to me. Buffers for control transfers may be
> > transfered
> > by DMA, so the caches must be flushed on architectures whose caches
> > are not coherent with respect to DMA.
> >
> > Would you care to elaborate on the exact nature of the bug you are
> > fixing?
>
> I missed part of this thread, so forgive me if I'm a bit off here, but
> if the problem is indeed I$/D$ cache coherency vs. PIO transfers, then
> this is a long solved issue on other archs such as ppc (and I _think_
> sparc).
Nope. It's to do with mapping a buffer for DMA, and then doing PIO
reads/writes to it.
With speculative prefetches, you have to deal with cache coherency with
hardware DMA on DMA unmap. If you've written to the buffer in violation
of the DMA API buffer ownership rules, then your writes get thrown away
resulting in immediate data corruption.
On Wed, 2010-02-17 at 09:55 +0000, Russell King - ARM Linux wrote:
> Nope. It's to do with mapping a buffer for DMA, and then doing PIO
> reads/writes to it.
>
> With speculative prefetches, you have to deal with cache coherency with
> hardware DMA on DMA unmap. If you've written to the buffer in violation
> of the DMA API buffer ownership rules, then your writes get thrown away
> resulting in immediate data corruption.
Right, and this exact same problem will bite some embedded powerpc
too I suppose :-)
Hrm... actually not :-) We don't do the invalidate at unmap time
today because we know 44x have such a broken prefetcher that we disable
it ... interesting considering that there are machines around that
do non-coherent DMA with 750's style chips who -do- have a prefetcher...
damn, we have a bug :-)
In any case, same problem here.
See my reply to Oliver. Basically, the problem boils down to the
dma_map/unmap being done at the wrong layer. The driver should
simply not do these if it's going to do PIO over that range.
Cheers,
Ben.
Am Mittwoch, 17. Februar 2010 10:40:09 schrieb Benjamin Herrenschmidt:
> What bugs me here is that the dma_map_* operation should always
> be done at the lowest level, ie, the actual HCD driver, and thus
> it should be up to the HCD to decide whether to dma_map or not
> depending on whether it's going to do DMA or not. I haven't
> scrutinized USB lately but if that isn't the case and the dma_map_*
> operations are done behind your back by the USB core then that needs to
> be changed in a way or another, or hooked at least.
No problem here. USB core does the mapping only if the low-level driver
so requests. The only exception is in usb_buffer_alloc(), but that boils
down to dma_alloc_coherent()
Regards
Oliver
On Wed, 2010-02-17 at 11:09 +0100, Oliver Neukum wrote:
>
> No problem here. USB core does the mapping only if the low-level driver
> so requests. The only exception is in usb_buffer_alloc(), but that boils
> down to dma_alloc_coherent()
Allright, so why do we need to "fix" anything ? Or is the whole thread
moot ? :-)
It's pretty clear that between dma_map* and subsequent unmap, the memory
is owned by the device and must not be touched by the CPU. If that is
violated, then we have a driver bug.
Cheers,
Ben.
Am Mittwoch, 17. Februar 2010 11:18:01 schrieb Benjamin Herrenschmidt:
> > No problem here. USB core does the mapping only if the low-level driver
> > so requests. The only exception is in usb_buffer_alloc(), but that boils
> > down to dma_alloc_coherent()
>
> Allright, so why do we need to "fix" anything ? Or is the whole thread
> moot ? :-)
The request a low-level driver does is all or nothing. Either DMA
issues have to be handled by that driver alone, or a finer-grained
description of the DMA requirements is needed. A fix using the latter
approach is being worked on.
Regards
Oliver
On Wed, 2010-02-17 at 11:23 +0100, Oliver Neukum wrote:
>
> The request a low-level driver does is all or nothing. Either DMA
> issues have to be handled by that driver alone, or a finer-grained
> description of the DMA requirements is needed. A fix using the latter
> approach is being worked on.
Well, that's what I'm trying to understand.
IE. It's a pretty strong rule ... don't do CPU accesses between dma_map
and unmap. So it's all in driver land at that stage. I'm not sure how
the DMA requirements get into the picture here. IE. That rule is
globally true. It's not going to hurt just non-coherent archs, it's
going to hurt anybody using swiotlb too... So I don't see you need more
info about the DMA requirements, but maybe I did miss something :-)
Cheers,
Ben.
Russell King - ARM Linux wrote:
> On Tue, Feb 16, 2010 at 10:07:20AM +0100, Oliver Neukum wrote:
> > Am Dienstag, 16. Februar 2010 09:55:55 schrieb Shilimkar, Santosh:
> > > > Would you care to elaborate on the exact nature of the bug you are fixing?
> > > On the OMAP4 (ARM cortex-a9) platform, the enumeration fails because control
> > > transfer buffers are corrupted. On our platform, we use PIO mode for control
> > > transfers and DMA for bulk transfers.
> > >
> > > The current stack performs dma cache maintenance even for the PIO transfers
> > > which leads to the corruption issue. The control buffers are handled by CPU
> > > and they already coherent from CPU point of view.
> >
> > How does the mapping corrupt buffers? It might impact performance, but why
> > do you see corruption?
>
> On map, buffers are cleaned if they're being used for DMA_TO_DEVICE and
> DMA_BIDIRECTIONAL, or invalidated in the case of DMA_FROM_DEVICE.
>
> However, because ARM CPUs can now speculatively prefetch, just leaving it
> at that results in corruption of buffers used for DMA. So we have to
> invalidate DMA_FROM_DEVICE and DMA_BIDIRECTIONAL buffers on unmap to
> ensure coherency with DMA operations.
>
> If the CPU writes to a DMA_FROM_DEVICE buffer between map and unmap, the
> writes can sit in the cache, and on unmap, they will be discarded.
>
> Cleaning the cache on unmap is not an option; that too can lead to DMA
> buffer corruption in the DMA case.
Provided the buffers are cleaned on map for
DMA_TO_DEVICE/DMA_BIDIRECTIONAL, I don't see how cleaning on unmap for
DMA_FROM_DEVICE/DMA_BIDIRECTIONAL can cause corruption. The only way
to get dirty cache lines while mapped is if the CPU did PIO to them.
If it was real DMA, the second clean should be a no-op. (Assume it's
all one or the other).
Can you explain why cleanining the cache on unmap (as well as map, in
DMA_BIDIRECTIONAL case) is not an option? Just curious, because I
don't see what would go wrong.
> USB and associated host driver must abide by the DMA API buffer
> ownership rules otherwise the result will be data corruption; either
> that or USB/host driver people need to have a discussion with the
> DMA API authors to remove this sensible "restriction".
Just in case my question gives the wrong impression, I agree that the
DMA API must be followed. Additional flushes/cleans are not good for
performance either.
-- Jamie
On Wed, 2010-02-17 at 20:05 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2010-02-16 at 09:22 +0100, Oliver Neukum wrote:
> > This seems wrong to me. Buffers for control transfers may be
> > transfered
> > by DMA, so the caches must be flushed on architectures whose caches
> > are not coherent with respect to DMA.
> >
> > Would you care to elaborate on the exact nature of the bug you are
> > fixing?
>
> I missed part of this thread, so forgive me if I'm a bit off here, but
> if the problem is indeed I$/D$ cache coherency vs. PIO transfers, then
> this is a long solved issue on other archs such as ppc (and I _think_
> sparc).
The thread I started was indeed regarding I/D cache coherency and PIO.
But it diverged into DMA issues a few days ago (should have been a new
thread).
> The way we do it, at least on powerpc which is PIPT, is to keep track on
> a per-page basis, whether a given page is clean for execution using
> PG_arch1 bit. This bit is cleared when a new page is popped into the
> page cache, and we clear it from flush_dcache_page() iirc (you may want
> to dbl check I don't have the code at hand right now, or rather, I do
> but I'm to lazy to look right now :-)
We do the same on ARM. The problem with most (all) HCD drivers that do
PIO is that they copy the data to the transfer buffer but there is no
call in this driver to flush_dcache_page(). The upper mass storage or
filesystem layers don't call this function either, so there isn't
anything that would set the PG_arch1 bit.
--
Catalin
On Wed, 2010-02-17 at 20:05 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2010-02-16 at 09:22 +0100, Oliver Neukum wrote:
> > This seems wrong to me. Buffers for control transfers may be
> > transfered
> > by DMA, so the caches must be flushed on architectures whose caches
> > are not coherent with respect to DMA.
> >=20
> > Would you care to elaborate on the exact nature of the bug you are
> > fixing?
>=20
> I missed part of this thread, so forgive me if I'm a bit off here, but
> if the problem is indeed I$/D$ cache coherency vs. PIO transfers, then
> this is a long solved issue on other archs such as ppc (and I _think_
> sparc).
The thread I started was indeed regarding I/D cache coherency and PIO.
But it diverged into DMA issues a few days ago (should have been a new
thread).
> The way we do it, at least on powerpc which is PIPT, is to keep track on
> a per-page basis, whether a given page is clean for execution using
> PG_arch1 bit. This bit is cleared when a new page is popped into the
> page cache, and we clear it from flush_dcache_page() iirc (you may want
> to dbl check I don't have the code at hand right now, or rather, I do
> but I'm to lazy to look right now :-)
We do the same on ARM. The problem with most (all) HCD drivers that do
PIO is that they copy the data to the transfer buffer but there is no
call in this driver to flush_dcache_page(). The upper mass storage or
filesystem layers don't call this function either, so there isn't
anything that would set the PG_arch1 bit.
--=20
Catalin
--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Wed, 2010-02-17 at 20:05 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2010-02-16 at 09:22 +0100, Oliver Neukum wrote:
> > This seems wrong to me. Buffers for control transfers may be
> > transfered
> > by DMA, so the caches must be flushed on architectures whose caches
> > are not coherent with respect to DMA.
> >=20
> > Would you care to elaborate on the exact nature of the bug you are
> > fixing?
>=20
> I missed part of this thread, so forgive me if I'm a bit off here, but
> if the problem is indeed I$/D$ cache coherency vs. PIO transfers, then
> this is a long solved issue on other archs such as ppc (and I _think_
> sparc).
The thread I started was indeed regarding I/D cache coherency and PIO.
But it diverged into DMA issues a few days ago (should have been a new
thread).
> The way we do it, at least on powerpc which is PIPT, is to keep track on
> a per-page basis, whether a given page is clean for execution using
> PG_arch1 bit. This bit is cleared when a new page is popped into the
> page cache, and we clear it from flush_dcache_page() iirc (you may want
> to dbl check I don't have the code at hand right now, or rather, I do
> but I'm to lazy to look right now :-)
We do the same on ARM. The problem with most (all) HCD drivers that do
PIO is that they copy the data to the transfer buffer but there is no
call in this driver to flush_dcache_page(). The upper mass storage or
filesystem layers don't call this function either, so there isn't
anything that would set the PG_arch1 bit.
--=20
Catalin
--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Wed, 2010-02-17 at 20:05 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2010-02-16 at 09:22 +0100, Oliver Neukum wrote:
> > This seems wrong to me. Buffers for control transfers may be
> > transfered
> > by DMA, so the caches must be flushed on architectures whose caches
> > are not coherent with respect to DMA.
> >=20
> > Would you care to elaborate on the exact nature of the bug you are
> > fixing?
>=20
> I missed part of this thread, so forgive me if I'm a bit off here, but
> if the problem is indeed I$/D$ cache coherency vs. PIO transfers, then
> this is a long solved issue on other archs such as ppc (and I _think_
> sparc).
The thread I started was indeed regarding I/D cache coherency and PIO.
But it diverged into DMA issues a few days ago (should have been a new
thread).
> The way we do it, at least on powerpc which is PIPT, is to keep track on
> a per-page basis, whether a given page is clean for execution using
> PG_arch1 bit. This bit is cleared when a new page is popped into the
> page cache, and we clear it from flush_dcache_page() iirc (you may want
> to dbl check I don't have the code at hand right now, or rather, I do
> but I'm to lazy to look right now :-)
We do the same on ARM. The problem with most (all) HCD drivers that do
PIO is that they copy the data to the transfer buffer but there is no
call in this driver to flush_dcache_page(). The upper mass storage or
filesystem layers don't call this function either, so there isn't
anything that would set the PG_arch1 bit.
--=20
Catalin
--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Wed, 2010-02-17 at 20:05 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2010-02-16 at 09:22 +0100, Oliver Neukum wrote:
> > This seems wrong to me. Buffers for control transfers may be
> > transfered
> > by DMA, so the caches must be flushed on architectures whose caches
> > are not coherent with respect to DMA.
> >=20
> > Would you care to elaborate on the exact nature of the bug you are
> > fixing?
>=20
> I missed part of this thread, so forgive me if I'm a bit off here, but
> if the problem is indeed I$/D$ cache coherency vs. PIO transfers, then
> this is a long solved issue on other archs such as ppc (and I _think_
> sparc).
The thread I started was indeed regarding I/D cache coherency and PIO.
But it diverged into DMA issues a few days ago (should have been a new
thread).
> The way we do it, at least on powerpc which is PIPT, is to keep track on
> a per-page basis, whether a given page is clean for execution using
> PG_arch1 bit. This bit is cleared when a new page is popped into the
> page cache, and we clear it from flush_dcache_page() iirc (you may want
> to dbl check I don't have the code at hand right now, or rather, I do
> but I'm to lazy to look right now :-)
We do the same on ARM. The problem with most (all) HCD drivers that do
PIO is that they copy the data to the transfer buffer but there is no
call in this driver to flush_dcache_page(). The upper mass storage or
filesystem layers don't call this function either, so there isn't
anything that would set the PG_arch1 bit.
--=20
Catalin
--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
SORRY - one more message to apologise for the multiple reposts (and
automatically appended legal disclaimer). I've been moved to Exchange
2007 and trying to use Evolution + Exchange-MAPI. It looks like it went
terribly wrong.
Catalin
On Wed, 2010-02-17 at 15:40 +0000, Catalin Marinas wrote:
> On Wed, 2010-02-17 at 20:05 +1100, Benjamin Herrenschmidt wrote:
> > On Tue, 2010-02-16 at 09:22 +0100, Oliver Neukum wrote:
> > > This seems wrong to me. Buffers for control transfers may be
> > > transfered
> > > by DMA, so the caches must be flushed on architectures whose caches
> > > are not coherent with respect to DMA.
> > >=20
> > > Would you care to elaborate on the exact nature of the bug you are
> > > fixing?
> >=20
> > I missed part of this thread, so forgive me if I'm a bit off here, but
> > if the problem is indeed I$/D$ cache coherency vs. PIO transfers, then
> > this is a long solved issue on other archs such as ppc (and I _think_
> > sparc).
>
> The thread I started was indeed regarding I/D cache coherency and PIO.
> But it diverged into DMA issues a few days ago (should have been a new
> thread).
>
> > The way we do it, at least on powerpc which is PIPT, is to keep track on
> > a per-page basis, whether a given page is clean for execution using
> > PG_arch1 bit. This bit is cleared when a new page is popped into the
> > page cache, and we clear it from flush_dcache_page() iirc (you may want
> > to dbl check I don't have the code at hand right now, or rather, I do
> > but I'm to lazy to look right now :-)
>
> We do the same on ARM. The problem with most (all) HCD drivers that do
> PIO is that they copy the data to the transfer buffer but there is no
> call in this driver to flush_dcache_page(). The upper mass storage or
> filesystem layers don't call this function either, so there isn't
> anything that would set the PG_arch1 bit.
>
> --=20
> Catalin
> --
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or
> copy the information in any medium. Thank you.
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
SORRY - one more message to apologise for the multiple reposts (and
automatically appended legal disclaimer). I've been moved to Exchange
2007 and trying to use Evolution + Exchange-MAPI. It looks like it went
terribly wrong.
Catalin
On Wed, 2010-02-17 at 15:40 +0000, Catalin Marinas wrote:
> On Wed, 2010-02-17 at 20:05 +1100, Benjamin Herrenschmidt wrote:
> > On Tue, 2010-02-16 at 09:22 +0100, Oliver Neukum wrote:
> > > This seems wrong to me. Buffers for control transfers may be
> > > transfered
> > > by DMA, so the caches must be flushed on architectures whose caches
> > > are not coherent with respect to DMA.
> > >=3D20
> > > Would you care to elaborate on the exact nature of the bug you are
> > > fixing?
> >=3D20
> > I missed part of this thread, so forgive me if I'm a bit off here, but
> > if the problem is indeed I$/D$ cache coherency vs. PIO transfers, then
> > this is a long solved issue on other archs such as ppc (and I _think_
> > sparc).
>=20
> The thread I started was indeed regarding I/D cache coherency and PIO.
> But it diverged into DMA issues a few days ago (should have been a new
> thread).
>=20
> > The way we do it, at least on powerpc which is PIPT, is to keep track o=
n
> > a per-page basis, whether a given page is clean for execution using
> > PG_arch1 bit. This bit is cleared when a new page is popped into the
> > page cache, and we clear it from flush_dcache_page() iirc (you may want
> > to dbl check I don't have the code at hand right now, or rather, I do
> > but I'm to lazy to look right now :-)
>=20
> We do the same on ARM. The problem with most (all) HCD drivers that do
> PIO is that they copy the data to the transfer buffer but there is no
> call in this driver to flush_dcache_page(). The upper mass storage or
> filesystem layers don't call this function either, so there isn't
> anything that would set the PG_arch1 bit.
>=20
> --=3D20
> Catalin
> --=20
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or
> copy the information in any medium. Thank you.
>=20
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Wed, 17 Feb 2010, Shilimkar, Santosh wrote:
> How about below approach? Controller driver can set
> "uses_pio_for_control" if it can't do dma for control transfer.
>
> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> index 80995ef..e3eae02 100644
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -1276,7 +1276,7 @@ static int map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb,
>
> if (usb_endpoint_xfer_control(&urb->ep->desc)
> && !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
> - if (hcd->self.uses_dma) {
> + if (hcd->self.uses_dma && !hcd->self.uses_pio_for_control) {
> urb->setup_dma = dma_map_single(
> hcd->self.controller,
> urb->setup_packet,
> @@ -1335,7 +1335,7 @@ static void unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb)
>
> if (usb_endpoint_xfer_control(&urb->ep->desc)
> && !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
> - if (hcd->self.uses_dma)
> + if (hcd->self.uses_dma && !hcd->self.uses_pio_for_control)
> dma_unmap_single(hcd->self.controller, urb->setup_dma,
> sizeof(struct usb_ctrlrequest),
> DMA_TO_DEVICE);
> diff --git a/include/linux/usb.h b/include/linux/usb.h
> index d7ace1b..ba5b0a2 100644
> --- a/include/linux/usb.h
> +++ b/include/linux/usb.h
> @@ -329,6 +329,9 @@ struct usb_bus {
> int busnum; /* Bus number (in order of reg) */
> const char *bus_name; /* stable id (PCI slot_name etc) */
> u8 uses_dma; /* Does the host controller use DMA? */
> + u8 uses_pio_for_control; /* Does the host controller use PIO
> + * for control tansfers?
> + */
> u8 otg_port; /* 0, or number of OTG/HNP port */
> unsigned is_b_host:1; /* true during some HNP roleswitches */
> unsigned b_hnp_enable:1; /* OTG: did A-Host enable HNP? */
Why do you skip mapping the setup packet but not the data packet?
Alan Stern
On Wed, Feb 17, 2010 at 12:02:21PM -0500, Alan Stern wrote:
> Why do you skip mapping the setup packet but not the data packet?
This is something of a FAQ in this thread. Here are the responses to
similar questions yesterday:
"Gadiyar, Anand" <[email protected]> said:
> Not really. For instance, in the case of the DMA engine in the MUSB
> controller in OMAP3, we can only use DMA with endpoints other than
> EP0, and EP0 is what is used for control transfers.
>
> It's not PIO for all the endpoints or DMA for all of them.
"Shilimkar, Santosh" <[email protected]> said:
> On the OMAP4 (ARM cortex-a9) platform, the enumeration fails because control
> transfer buffers are corrupted. On our platform, we use PIO mode for control
> transfers and DMA for bulk transfers.
>
> The current stack performs dma cache maintenance even for the PIO transfers
> which leads to the corruption issue. The control buffers are handled by CPU
> and they already coherent from CPU point of view.
Alan Stern wrote:
> On Wed, 17 Feb 2010, Shilimkar, Santosh wrote:
>
> > How about below approach? Controller driver can set
> > "uses_pio_for_control" if it can't do dma for control transfer.
> >
> > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> > index 80995ef..e3eae02 100644
> > --- a/drivers/usb/core/hcd.c
> > +++ b/drivers/usb/core/hcd.c
> > @@ -1276,7 +1276,7 @@ static int map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb,
> >
> > if (usb_endpoint_xfer_control(&urb->ep->desc)
> > && !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
> > - if (hcd->self.uses_dma) {
> > + if (hcd->self.uses_dma && !hcd->self.uses_pio_for_control) {
> > urb->setup_dma = dma_map_single(
> > hcd->self.controller,
> > urb->setup_packet,
> > @@ -1335,7 +1335,7 @@ static void unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb)
> >
> > if (usb_endpoint_xfer_control(&urb->ep->desc)
> > && !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
> > - if (hcd->self.uses_dma)
> > + if (hcd->self.uses_dma && !hcd->self.uses_pio_for_control)
> > dma_unmap_single(hcd->self.controller, urb->setup_dma,
> > sizeof(struct usb_ctrlrequest),
> > DMA_TO_DEVICE);
> > diff --git a/include/linux/usb.h b/include/linux/usb.h
> > index d7ace1b..ba5b0a2 100644
> > --- a/include/linux/usb.h
> > +++ b/include/linux/usb.h
> > @@ -329,6 +329,9 @@ struct usb_bus {
> > int busnum; /* Bus number (in order of reg) */
> > const char *bus_name; /* stable id (PCI slot_name etc) */
> > u8 uses_dma; /* Does the host controller use DMA? */
> > + u8 uses_pio_for_control; /* Does the host controller use PIO
> > + * for control tansfers?
> > + */
> > u8 otg_port; /* 0, or number of OTG/HNP port */
> > unsigned is_b_host:1; /* true during some HNP roleswitches */
> > unsigned b_hnp_enable:1; /* OTG: did A-Host enable HNP? */
>
> Why do you skip mapping the setup packet but not the data packet?
>
I think that's oversight. For this controller, we need to skip mapping
all buffers used to do transfers on EP0, which is all control transfers.
Will fix in the next version of the patch.
- Anand
On Wed, 2010-02-17 at 15:27 +0000, Catalin Marinas wrote:
> We do the same on ARM. The problem with most (all) HCD drivers that do
> PIO is that they copy the data to the transfer buffer but there is no
> call in this driver to flush_dcache_page(). The upper mass storage or
> filesystem layers don't call this function either, so there isn't
> anything that would set the PG_arch1 bit.
Actually, clear it :-)
I suppose that's one thing that needs to be fixed in the drivers.
Cheers,
Ben.
On Thu, Feb 18, 2010 at 07:37:00AM +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2010-02-17 at 15:27 +0000, Catalin Marinas wrote:
> > We do the same on ARM. The problem with most (all) HCD drivers that do
> > PIO is that they copy the data to the transfer buffer but there is no
> > call in this driver to flush_dcache_page(). The upper mass storage or
> > filesystem layers don't call this function either, so there isn't
> > anything that would set the PG_arch1 bit.
>
> Actually, clear it :-)
>
> I suppose that's one thing that needs to be fixed in the drivers.
No, because that'd probably bugger up the Sparc64 method of delaying
flush_dcache_page.
This method works as follows:
- a page cache page is allocated - this has PG_arch_1 clear.
- IO happens on it and it's placed into the page cache. PG_arch_1 is
still clear.
- someone calls read()/write() which accesses the page. The generic
file IO layers call flush_dcache_page() in response to read()/write()
fs calls. flush_dcache_page() spots that the page is not yet mapped
into userspace, and sets PG_arch_1 to mark the fact that the kernel
mapping is dirty.
- when someone maps the page, we check PG_arch_1 in update_mmu_cache.
If PG_arch_1 is set, we flush the kernel mapping.
Clearly, if we go around having drivers clearing PG_arch_1, this is going
to break horribly.
On Wed, 2010-02-17 at 20:44 +0000, Russell King - ARM Linux wrote:
> No, because that'd probably bugger up the Sparc64 method of delaying
> flush_dcache_page.
>
> This method works as follows:
>
> - a page cache page is allocated - this has PG_arch_1 clear.
>
> - IO happens on it and it's placed into the page cache. PG_arch_1 is
> still clear.
>
> - someone calls read()/write() which accesses the page. The generic
> file IO layers call flush_dcache_page() in response to
> read()/write()
> fs calls. flush_dcache_page() spots that the page is not yet mapped
> into userspace, and sets PG_arch_1 to mark the fact that the kernel
> mapping is dirty.
>
> - when someone maps the page, we check PG_arch_1 in update_mmu_cache.
> If PG_arch_1 is set, we flush the kernel mapping.
>
> Clearly, if we go around having drivers clearing PG_arch_1, this is
> going to break horribly.
Ok, you do things very differently than us on ppc then. We clear
PG_arch_1 in flush_dcache_page(), and we set it when the page has been
cache cleaned for execution.
We assume that anybody that dirties a page in the kernel will call
flush_dcache_page() which removes our PG_arch_1 bit thus marking the
page "dirty".
Note that from experience, doing the check & flushes in
update_mmu_cache() is racy on SMP. At least for I$/D$, we have the case
where processor one does set_pte followed by update_mmu_cache(). The
later isn't done yet but processor 2 sees the PTE now and starts using
it, cache hasn't been fully flushed yet. You may avoid that race in some
ways, but on ppc, I've stopped using that.
I now do things directly in set_pte_at(). In fact, that's why I want
your patch to change update_mmu_cache() to take a PTE pointer :-) Since
my set_pte_at() can now remove the _PAGE_EXEC bit, I need
update_mmu_cache() to re-read the PTE before it updates the hash table
or TLB.
Cheers,
Ben.
Am Mittwoch, 17. Februar 2010 21:30:24 schrieb Gadiyar, Anand:
> > Why do you skip mapping the setup packet but not the data packet?
> >
>
> I think that's oversight. For this controller, we need to skip mapping
> all buffers used to do transfers on EP0, which is all control transfers.
One thing more. Do you have an issue with EP 0 only or all control
endpoints? EP 0 must be control, but devices are within spec if they
have multiple control endpoints provided EP 0 is control.
Regards
Oliver
Oliver Neukum wrote:
> Am Mittwoch, 17. Februar 2010 21:30:24 schrieb Gadiyar, Anand:
> > > Why do you skip mapping the setup packet but not the data packet?
> > >
> >
> > I think that's oversight. For this controller, we need to skip mapping
> > all buffers used to do transfers on EP0, which is all control transfers.
>
> One thing more. Do you have an issue with EP 0 only or all control
> endpoints? EP 0 must be control, but devices are within spec if they
> have multiple control endpoints provided EP 0 is control.
Sorry for the confusion. The issue is not with EP 0 of devices
connected to the controller; the problem is with EP 0 on the host
controller itself.
The controller in question is the MUSB OTG controller present in
OMAPs, Davinci chips, and some Blackfins. The MUSB HCD driver is
written such that it carries out all control transfers on EP 0 of
the controller. All bulk transfers are carried out on other hardware
endpoints.
(This is the same "hardware endpoint" that is used in when the MUSB
is used in gadget mode.)
I'm not really sure why EP0 was chosen for control transfers, or
if there is a restriction that we *need* to use it. Let me study
the docs some more.
The problem is that with the driver code as written today, we use
EP 0 for all control transfers, and the DMA engine cannot do DMA
to this endpoint's FIFO.
- Anand
On Wed, 2010-02-17 at 22:31 +0000, Benjamin Herrenschmidt wrote:
> On Wed, 2010-02-17 at 20:44 +0000, Russell King - ARM Linux wrote:
> > No, because that'd probably bugger up the Sparc64 method of delaying
> > flush_dcache_page.
> >
> > This method works as follows:
> >
> > - a page cache page is allocated - this has PG_arch_1 clear.
> >
> > - IO happens on it and it's placed into the page cache. PG_arch_1 is
> > still clear.
> >
> > - someone calls read()/write() which accesses the page. The generic
> > file IO layers call flush_dcache_page() in response to
> > read()/write()
> > fs calls. flush_dcache_page() spots that the page is not yet mapped
> > into userspace, and sets PG_arch_1 to mark the fact that the kernel
> > mapping is dirty.
> >
> > - when someone maps the page, we check PG_arch_1 in update_mmu_cache.
> > If PG_arch_1 is set, we flush the kernel mapping.
> >
> > Clearly, if we go around having drivers clearing PG_arch_1, this is
> > going to break horribly.
>
> Ok, you do things very differently than us on ppc then. We clear
> PG_arch_1 in flush_dcache_page(), and we set it when the page has been
> cache cleaned for execution.
For this perspective it's not that different, just that we use the
negated PG_arch_1.
> We assume that anybody that dirties a page in the kernel will call
> flush_dcache_page() which removes our PG_arch_1 bit thus marking the
> page "dirty".
This assumption is not valid with some drivers like USB HCD doing PIO.
But, yes, that's how it should be done.
> Note that from experience, doing the check & flushes in
> update_mmu_cache() is racy on SMP. At least for I$/D$, we have the case
> where processor one does set_pte followed by update_mmu_cache(). The
> later isn't done yet but processor 2 sees the PTE now and starts using
> it, cache hasn't been fully flushed yet. You may avoid that race in some
> ways, but on ppc, I've stopped using that.
I think that's possible on ARM too. Having two threads on different
CPUs, one thread triggers a prefetch abort (instruction page fault) on
CPU0 but the second thread on CPU1 may branch into this page after
set_pte() (hence not fault) but before update_mmu_cache() doing the
flush.
On ARM11MPCore we flush the caches in flush_dcache_page() because the
cache maintenance operations weren't visible to the other CPUs.
Cortex-A9 broadcasts the cache operations in hardware so we can use lazy
flushing but with the race you pointed out.
Using set_pte_at() for delayed flushing may be a better option for ARM
as well (and maybe Documentation/cachetlb.txt updated).
Thanks.
--
Catalin
On Fri, 2010-02-19 at 17:15 +0000, Catalin Marinas wrote:
> On Wed, 2010-02-17 at 22:31 +0000, Benjamin Herrenschmidt wrote:
> > On Wed, 2010-02-17 at 20:44 +0000, Russell King - ARM Linux wrote:
> > > No, because that'd probably bugger up the Sparc64 method of delaying
> > > flush_dcache_page.
> > >
> > > This method works as follows:
> > >
> > > - a page cache page is allocated - this has PG_arch_1 clear.
> > >
> > > - IO happens on it and it's placed into the page cache. PG_arch_1 is
> > > still clear.
> > >
> > > - someone calls read()/write() which accesses the page. The generic
> > > file IO layers call flush_dcache_page() in response to
> > > read()/write()
> > > fs calls. flush_dcache_page() spots that the page is not yet mapped
> > > into userspace, and sets PG_arch_1 to mark the fact that the kernel
> > > mapping is dirty.
> > >
> > > - when someone maps the page, we check PG_arch_1 in update_mmu_cache.
> > > If PG_arch_1 is set, we flush the kernel mapping.
> > >
> > > Clearly, if we go around having drivers clearing PG_arch_1, this is
> > > going to break horribly.
> >
> > Ok, you do things very differently than us on ppc then. We clear
> > PG_arch_1 in flush_dcache_page(), and we set it when the page has been
> > cache cleaned for execution.
>
> For this perspective it's not that different, just that we use the
> negated PG_arch_1.
I got your point now (after reading the replies on linux-arch :)).
So PPC assumes that if PG_arch_1 is clear (the default), the page wasn't
cleaned. If there is no call to flush_dcache_page() but the page gets
mapped to user space, update_mmu_cache() (or set_pte_at()) would simply
assume that the page was dirtied, flush the caches and set this bit.
We could easily do this on ARM as well and assume that the page is dirty
if !PG_arch_1. But it only partially solves the problem (only for
faulted-in pages).
If a page is already mapped in user space, flush_dcache_page() on ARM
does the flushing rather than deferring it to update_mmu_cache(). The
PIO HCD drivers, however, don't call flush_dcache_page(). Is it possible
that the HCD could transfer data into a page cache page already mapped
in user space? My understanding is that the scenario above is possible.
--
Catalin
Am Freitag, 19. Februar 2010 18:36:51 schrieb Catalin Marinas:
> If a page is already mapped in user space, flush_dcache_page() on ARM
> does the flushing rather than deferring it to update_mmu_cache(). The
> PIO HCD drivers, however, don't call flush_dcache_page(). Is it possible
> that the HCD could transfer data into a page cache page already mapped
> in user space? My understanding is that the scenario above is possible.
Yes, video drivers do that.
Regards
Oliver
On Tue, 16 Feb 2010 14:21:48 +0530
"Gadiyar, Anand" <[email protected]> wrote:
> > hcd->self.uses_dma = (dev->dma_mask != NULL);
> >
> > Is it easier to make sure that PIO devices don't have dev->dma_mask set?
>
> Not really. For instance, in the case of the DMA engine in the MUSB
> controller in OMAP3, we can only use DMA with endpoints other than
> EP0, and EP0 is what is used for control transfers.
>
> It's not PIO for all the endpoints or DMA for all of them.
The HC driver does not have to be 100% truthful here. If the system
is not HIGHMEM, HCD can easily set uses_dma to false yet use DMA
by mapping buffers itself, without relying on the quoted code.
On a HIGHMEM system, block layer will bounce-buffer data in such case.
Hopefuly not a problem for ARM?
All network stack drivers work that way, BTW.
-- Pete
On Fri, 2010-02-19 at 17:15 +0000, Catalin Marinas wrote:
> > Ok, you do things very differently than us on ppc then. We clear
> > PG_arch_1 in flush_dcache_page(), and we set it when the page has
> been
> > cache cleaned for execution.
>
> For this perspective it's not that different, just that we use the
> negated PG_arch_1.
Right, though you default as "clean" while we default as "dirty".
> > We assume that anybody that dirties a page in the kernel will call
> > flush_dcache_page() which removes our PG_arch_1 bit thus marking the
> > page "dirty".
>
> This assumption is not valid with some drivers like USB HCD doing PIO.
> But, yes, that's how it should be done.
So we go back to the fix should be done at the individual drivers level.
If it's going to write into the page cache, it needs to whack the bits.
Now there's of course the question as to whether you really only want to
do that for a PIO access and not for a DMA access, I think on power, we
don't really discriminate that much (since in any case our icache still
needs flushing). Maybe it would be useful to separate the I$ and D$ bits
but I'm not sure I can be bothered.
> > Note that from experience, doing the check & flushes in
> > update_mmu_cache() is racy on SMP. At least for I$/D$, we have the
> case
> > where processor one does set_pte followed by update_mmu_cache(). The
> > later isn't done yet but processor 2 sees the PTE now and starts
> using
> > it, cache hasn't been fully flushed yet. You may avoid that race in
> some
> > ways, but on ppc, I've stopped using that.
>
> I think that's possible on ARM too. Having two threads on different
> CPUs, one thread triggers a prefetch abort (instruction page fault) on
> CPU0 but the second thread on CPU1 may branch into this page after
> set_pte() (hence not fault) but before update_mmu_cache() doing the
> flush.
>
> On ARM11MPCore we flush the caches in flush_dcache_page() because the
> cache maintenance operations weren't visible to the other CPUs.
I'm not even sure that's going to be 100% correct. Don't you also need
to flush the remote icaches when you are dealing with instructions (such
as swap) anyways ?
I've had some discussions in the past with Russell and others around the
problem of non-broadcast cache ops on ARM SMP since that's also hurting
you hard with dma mappings.
Can you issue IPIs as FIQs if needed (from my old ARM knowledge, FIQs
are still on even in local_irq_save() blocks right ? I haven't touched
low level ARM for years tho, I may have forgotten things).
In this case, you should probably use the same bits as A9 and simply
make them use FIQs on 11MP to make the other cores flush as well.
> Cortex-A9 broadcasts the cache operations in hardware so we can use
> lazy flushing but with the race you pointed out.
Right.
> Using set_pte_at() for delayed flushing may be a better option for ARM
> as well (and maybe Documentation/cachetlb.txt updated).
Cheers,
Ben.
On Fri, 2010-02-19 at 17:36 +0000, Catalin Marinas wrote:
>
> If a page is already mapped in user space, flush_dcache_page() on ARM
> does the flushing rather than deferring it to update_mmu_cache().
This is for D-cache aliases on VIVT right ? Or are you still talking
about I/D coherency on PIPT ARMs ? Because the later should not matter
for already mapped userspace pages in the sense that if user space
explicitely read() onto a page, it's up to userspace to cache clean that
page before executing from it in my book :-)
> The PIO HCD drivers, however, don't call flush_dcache_page(). Is it possible
> that the HCD could transfer data into a page cache page already mapped
> in user space? My understanding is that the scenario above is possible.
It is but I'm not confident the responsibility for doing that cleanup
is at the HCD level. That would impact a lot of HCD activities that
don't need such flushing since the use of the page is purely in-kernel.
Though I suppose that could be optimized out in most case using the page
use count.
But I still wonder whether it should be pushed down to the actual
interface drivers, that's always been the case I believe. In fact, in
the case of block ops, it's generally done at the BIO or even file
system layer right ?
Cheers,
Ben.
On Fri, 2010-02-19 at 21:53 +0100, Oliver Neukum wrote:
> Am Freitag, 19. Februar 2010 18:36:51 schrieb Catalin Marinas:
> > If a page is already mapped in user space, flush_dcache_page() on ARM
> > does the flushing rather than deferring it to update_mmu_cache(). The
> > PIO HCD drivers, however, don't call flush_dcache_page(). Is it possible
> > that the HCD could transfer data into a page cache page already mapped
> > in user space? My understanding is that the scenario above is possible.
>
> Yes, video drivers do that.
In which case it would be up to the video driver to call
flush_dcache_page() (though if it's v4l you are talking about, maybe it
might make sense to push it into the v4l layer itself).
Cheers,
Ben.
Am Mittwoch, 24. Februar 2010 03:48:09 schrieb Benjamin Herrenschmidt:
> On Fri, 2010-02-19 at 21:53 +0100, Oliver Neukum wrote:
> > Am Freitag, 19. Februar 2010 18:36:51 schrieb Catalin Marinas:
> > > If a page is already mapped in user space, flush_dcache_page() on ARM
> > > does the flushing rather than deferring it to update_mmu_cache(). The
> > > PIO HCD drivers, however, don't call flush_dcache_page(). Is it possible
> > > that the HCD could transfer data into a page cache page already mapped
> > > in user space? My understanding is that the scenario above is possible.
> >
> > Yes, video drivers do that.
>
> In which case it would be up to the video driver to call
> flush_dcache_page() (though if it's v4l you are talking about, maybe it
> might make sense to push it into the v4l layer itself).
I don't know. The issue seems quite complex. It would seem better to
centralize it as far as practical. Do you have a wrapper drivers could
call?
Regards
Oliver
On Wed, 24 Feb 2010, Benjamin Herrenschmidt wrote:
> > The PIO HCD drivers, however, don't call flush_dcache_page(). Is it possible
> > that the HCD could transfer data into a page cache page already mapped
> > in user space? My understanding is that the scenario above is possible.
>
> It is but I'm not confident the responsibility for doing that cleanup
> is at the HCD level. That would impact a lot of HCD activities that
> don't need such flushing since the use of the page is purely in-kernel.
That's right. The HCD merely puts data wherever it's told to. It
doesn't know whether the destination is in the page cache, in
userspace, or anywhere else. The same is true for usb-storage.
Alan Stern
On Wed, 2010-02-24 at 08:16 +0100, Oliver Neukum wrote:
> I don't know. The issue seems quite complex. It would seem better to
> centralize it as far as practical. Do you have a wrapper drivers could
> call?
flush_dcache_page() ? :-)
Now, the subsystem might be the one to know whether something is mapped
into userspace or not (v4l in our case) in which case a wrapper could be
created.
Cheers,
Ben.
On Wed, 2010-02-24 at 11:19 -0500, Alan Stern wrote:
> > It is but I'm not confident the responsibility for doing that
> cleanup
> > is at the HCD level. That would impact a lot of HCD activities that
> > don't need such flushing since the use of the page is purely
> in-kernel.
>
> That's right. The HCD merely puts data wherever it's told to. It
> doesn't know whether the destination is in the page cache, in
> userspace, or anywhere else. The same is true for usb-storage.
I'm surprised that usb-storage has an issue here. It shouldn't afaik,
since it's just a SCSI driver (or not anymore ?) and the BIO or
filesystems handle things there no ? I haven't seen a single call to
flush_dcache_page() in any of drivers/scsi, drivers/ata or drivers/ide
when I looked...
Cheers,
Ben.
On Thu, 25 Feb 2010, Benjamin Herrenschmidt wrote:
> I'm surprised that usb-storage has an issue here. It shouldn't afaik,
> since it's just a SCSI driver (or not anymore ?)
It still is. There's also the ub driver, which is a non-SCSI block
device driver for some of the same devices handled by usb-storage.
> and the BIO or
> filesystems handle things there no ? I haven't seen a single call to
> flush_dcache_page() in any of drivers/scsi, drivers/ata or drivers/ide
> when I looked...
There is no real issue; it's just that the problem was first noted in
connection with usb-storage reading in executable pages, so Catalin's
initial post was oriented toward modifying usb-storage.
The main issue here is that the same host controller will use PIO
sometimes and DMA sometimes, depending on the details of the transfer.
The USB core didn't expect this and consequently we violated the rules
for DMA mapping. The question is: If the core is fixed so that the
rules aren't violated, will everything work correctly?
Alan Stern
Am Mittwoch, 24. Februar 2010 22:12:34 schrieb Benjamin Herrenschmidt:
> On Wed, 2010-02-24 at 08:16 +0100, Oliver Neukum wrote:
> > I don't know. The issue seems quite complex. It would seem better to
> > centralize it as far as practical. Do you have a wrapper drivers could
> > call?
>
> flush_dcache_page() ? :-)
Will this do anything on arches that don't need it?
Secondly, can we have a wrapper that you can pass a pointer and an
offset?
> Now, the subsystem might be the one to know whether something is mapped
> into userspace or not (v4l in our case) in which case a wrapper could be
> created.
If possible, I'd like to centralize this. Drivers are likely to get this wrong.
Regards
Oliver
On Wed, 2010-02-24 at 16:50 -0500, Alan Stern wrote:
> The main issue here is that the same host controller will use PIO
> sometimes and DMA sometimes, depending on the details of the
> transfer.
> The USB core didn't expect this and consequently we violated the rules
> for DMA mapping. The question is: If the core is fixed so that the
> rules aren't violated, will everything work correctly?
As long as the only issue is that one (ie, doing PIO while dma-map'ed),
then yes, I'd say things should work. If not, then there is -another-
problem to be fixed :-)
Cheers,
Ben.
On Thu, 2010-02-25 at 04:48 +0100, Oliver Neukum wrote:
> Am Mittwoch, 24. Februar 2010 22:12:34 schrieb Benjamin Herrenschmidt:
> > On Wed, 2010-02-24 at 08:16 +0100, Oliver Neukum wrote:
> > > I don't know. The issue seems quite complex. It would seem better to
> > > centralize it as far as practical. Do you have a wrapper drivers could
> > > call?
> >
> > flush_dcache_page() ? :-)
>
> Will this do anything on arches that don't need it?
No, it's going to be an empty inline:
arch/x86/include/asm/cacheflush.h:static inline void flush_dcache_page(struct page *page) { }
> Secondly, can we have a wrapper that you can pass a pointer and an
> offset?
I'm sure you can make one :-) Use virt_to_page() though that will not
work for vmap/vmalloc space of course.
> > Now, the subsystem might be the one to know whether something is mapped
> > into userspace or not (v4l in our case) in which case a wrapper could be
> > created.
>
> If possible, I'd like to centralize this. Drivers are likely to get this wrong.
Right. In the case of v4l, it's probably something that should go into
the subsystem. IE. That's how it works for block too, it's done at the
BIO and/or filesystem layer (though individual filesystems do have their
hand in the pudding).
Cheers,
Ben.
> Regards
> Oliver
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Thu, 2010-02-25 at 08:12 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2010-02-24 at 08:16 +0100, Oliver Neukum wrote:
> > I don't know. The issue seems quite complex. It would seem better to
> > centralize it as far as practical. Do you have a wrapper drivers could
> > call?
>
> flush_dcache_page() ? :-)
Actually, that can be wrong depending on the implementation. The
problem is incoherency of the kernel page (dirty) with respect to user
space aliases (clean). What has to happen on parisc is that the kernel
alias needs flushing. We can guarantee the userspace aliases to be
clean (and not moved in). We wouldn't want to incur the expense of
flushing the user space pages as well.
> Now, the subsystem might be the one to know whether something is mapped
> into userspace or not (v4l in our case) in which case a wrapper could be
> created.
Right, so it's the responsibility of the API used by the subsystem.
Thus Caitlin's pio_kmap seems the right one ... I don't understand what
the additional problems are.
James
On Wed, 2010-02-24 at 21:13 +0000, Benjamin Herrenschmidt wrote:
> On Wed, 2010-02-24 at 11:19 -0500, Alan Stern wrote:
> > > It is but I'm not confident the responsibility for doing that cleanup
> > > is at the HCD level. That would impact a lot of HCD activities that
> > > don't need such flushing since the use of the page is purely in-kernel.
> >
> > That's right. The HCD merely puts data wherever it's told to. It
> > doesn't know whether the destination is in the page cache, in
> > userspace, or anywhere else. The same is true for usb-storage.
>
> I'm surprised that usb-storage has an issue here. It shouldn't afaik,
> since it's just a SCSI driver (or not anymore ?) and the BIO or
> filesystems handle things there no ? I haven't seen a single call to
> flush_dcache_page() in any of drivers/scsi, drivers/ata or drivers/ide
> when I looked...
The BIO or filesystem code don't call flush_dcache_page() either (well
some do like cramfs or jffs but they decompress the data received from
the block device).
--
Catalin
On Wed, 2010-02-24 at 02:47 +0000, Benjamin Herrenschmidt wrote:
> On Fri, 2010-02-19 at 17:36 +0000, Catalin Marinas wrote:
> >
> > If a page is already mapped in user space, flush_dcache_page() on ARM
> > does the flushing rather than deferring it to update_mmu_cache().
>
> This is for D-cache aliases on VIVT right ? Or are you still talking
> about I/D coherency on PIPT ARMs ? Because the later should not matter
> for already mapped userspace pages in the sense that if user space
> explicitely read() onto a page, it's up to userspace to cache clean that
> page before executing from it in my book :-)
I was still thinking about PIPT I/D coherency. The read() case you
mention is pretty clear, no need or the kernel to ensure coherency
(especially since writing is done via copy_to_user rather than to the
page cache page).
For mmap'ed pages (and present in the page cache), is it guaranteed that
the HCD driver won't write to it once it has been mapped into user
space? If that's the case, it may solve the problem by just reversing
the meaning of PG_arch_1 on ARM and assume that a newly allocated page
has dirty D-cache by default.
> > The PIO HCD drivers, however, don't call flush_dcache_page(). Is it possible
> > that the HCD could transfer data into a page cache page already mapped
> > in user space? My understanding is that the scenario above is possible.
>
> It is but I'm not confident the responsibility for doing that cleanup
> is at the HCD level. That would impact a lot of HCD activities that
> don't need such flushing since the use of the page is purely in-kernel.
>
> Though I suppose that could be optimized out in most case using the page
> use count.
>
> But I still wonder whether it should be pushed down to the actual
> interface drivers, that's always been the case I believe. In fact, in
> the case of block ops, it's generally done at the BIO or even file
> system layer right ?
The filesystem layer does it only if it needs to touch the data written
by the block device (e.g. cramfs, jffs). Some block devices call
flush_dcache_page (like mmci.c) while some others don't (and those that
use DMA actually don't since the DMA API handles the flushing).
--
Catalin
On Wed, 2010-02-24 at 02:39 +0000, Benjamin Herrenschmidt wrote:
> On Fri, 2010-02-19 at 17:15 +0000, Catalin Marinas wrote:
> > > We assume that anybody that dirties a page in the kernel will call
> > > flush_dcache_page() which removes our PG_arch_1 bit thus marking the
> > > page "dirty".
> >
> > This assumption is not valid with some drivers like USB HCD doing PIO.
> > But, yes, that's how it should be done.
>
> So we go back to the fix should be done at the individual drivers level.
> If it's going to write into the page cache, it needs to whack the bits.
>
> Now there's of course the question as to whether you really only want to
> do that for a PIO access and not for a DMA access, I think on power, we
> don't really discriminate that much (since in any case our icache still
> needs flushing). Maybe it would be useful to separate the I$ and D$ bits
> but I'm not sure I can be bothered.
On ARM, update_mmu_cache() invalidates the I-cache (if VM_EXEC)
independent of whether the D-cache was dirty (since we can get
speculative fetches into the I-cache before it was even mapped).
> > > Note that from experience, doing the check & flushes in
> > > update_mmu_cache() is racy on SMP. At least for I$/D$, we have the case
> > > where processor one does set_pte followed by update_mmu_cache(). The
> > > later isn't done yet but processor 2 sees the PTE now and starts using
> > > it, cache hasn't been fully flushed yet. You may avoid that race in some
> > > ways, but on ppc, I've stopped using that.
> >
> > I think that's possible on ARM too. Having two threads on different
> > CPUs, one thread triggers a prefetch abort (instruction page fault) on
> > CPU0 but the second thread on CPU1 may branch into this page after
> > set_pte() (hence not fault) but before update_mmu_cache() doing the
> > flush.
> >
> > On ARM11MPCore we flush the caches in flush_dcache_page() because the
> > cache maintenance operations weren't visible to the other CPUs.
>
> I'm not even sure that's going to be 100% correct. Don't you also need
> to flush the remote icaches when you are dealing with instructions (such
> as swap) anyways ?
I don't think we tried swap but for pages that have been mapped for the
first time, the I-cache would be clean. At mm switching, if a thread
migrates to a new CPU we invalidate the cache at that point.
> I've had some discussions in the past with Russell and others around the
> problem of non-broadcast cache ops on ARM SMP since that's also hurting
> you hard with dma mappings.
>
> Can you issue IPIs as FIQs if needed (from my old ARM knowledge, FIQs
> are still on even in local_irq_save() blocks right ? I haven't touched
> low level ARM for years tho, I may have forgotten things).
I have a patch for using IPIs via IRQ from the DMA API functions but,
while it works, it can deadlock with some drivers (complex situation).
Note that the patch added a specific IPI implementation which can cope
with interrupts being disabled (unlike the generic one).
My latest solution - http://bit.ly/apJv3O - is to use dummy
read-for-ownership or write-for-ownership accesses in the DMA cache
flushing functions to force cache line migration from the other CPUs.
Our current benchmarks only show around 10% disc throughput penalty
compared to the normal SMP case (compared to the UP case the penalty is
bigger but that's due to other things).
--
Catalin
On Fri, 26 Feb 2010, Catalin Marinas wrote:
> For mmap'ed pages (and present in the page cache), is it guaranteed that
> the HCD driver won't write to it once it has been mapped into user
> space? If that's the case, it may solve the problem by just reversing
> the meaning of PG_arch_1 on ARM and assume that a newly allocated page
> has dirty D-cache by default.
Nothing is guaranteed. The HCD will write to wherever it is asked. If
a driver does input to an mmap'ed page, the HCD won't even know that
the page is mmap'ed.
Alan Stern
On Fri, Feb 26, 2010 at 04:25:21PM +0000, Catalin Marinas wrote:
> For mmap'ed pages (and present in the page cache), is it guaranteed that
> the HCD driver won't write to it once it has been mapped into user
> space? If that's the case, it may solve the problem by just reversing
> the meaning of PG_arch_1 on ARM and assume that a newly allocated page
> has dirty D-cache by default.
I guess we could also set PG_arch_1 in the DMA API as well, to avoid the
unnecessary D cache flushing when clean pages get mapped into userspace.
On Fri, 2010-02-26 at 16:25 +0000, Catalin Marinas wrote:
>
> For mmap'ed pages (and present in the page cache), is it guaranteed that
> the HCD driver won't write to it once it has been mapped into user
> space? If that's the case, it may solve the problem by just reversing
> the meaning of PG_arch_1 on ARM and assume that a newly allocated page
> has dirty D-cache by default.
Well, I don't see why the HCD would write to it unless it's swapped out,
and thus unmapped or read() to or similar.
> The filesystem layer does it only if it needs to touch the data written
> by the block device (e.g. cramfs, jffs). Some block devices call
> flush_dcache_page (like mmci.c) while some others don't (and those that
> use DMA actually don't since the DMA API handles the flushing).
Hrm, the DMA API certainly doesn't handle the I$/D$ coherency on
powerpc.. I'm afraid that whole cache handling stuff is totally
inconsistent since different archs have different expectations here.
Maybe we need to revisit things in that area, though it might require to
be done properly to have not one but two bits in struct page to
separately track the D$ and I$ state ...
Cheers,
Ben.
On Sat, Feb 27, 2010 at 08:40:29AM +1100, Benjamin Herrenschmidt wrote:
> Hrm, the DMA API certainly doesn't handle the I$/D$ coherency on
> powerpc.. I'm afraid that whole cache handling stuff is totally
> inconsistent since different archs have different expectations here.
It doesn't on ARM either.
> On ARM, update_mmu_cache() invalidates the I-cache (if VM_EXEC)
> independent of whether the D-cache was dirty (since we can get
> speculative fetches into the I-cache before it was even mapped).
We can get those speculative fetches too on power.
However, we only do the invalidate when PG_arch_1 is clear to avoid
doing it multiple time for a page that was already "cleaned". But it
seems that might not be that a good idea if indeed flush_dcache_page()
is not called for DMA transfers in most cases.
(In addition there is the race I mentioned with update_mmu_cache on SMP)
> > > > Note that from experience, doing the check & flushes in
> > > > update_mmu_cache() is racy on SMP. At least for I$/D$, we have the case
> > > > where processor one does set_pte followed by update_mmu_cache(). The
> > > > later isn't done yet but processor 2 sees the PTE now and starts using
> > > > it, cache hasn't been fully flushed yet. You may avoid that race in some
> > > > ways, but on ppc, I've stopped using that.
> > >
> > > I think that's possible on ARM too. Having two threads on different
> > > CPUs, one thread triggers a prefetch abort (instruction page fault) on
> > > CPU0 but the second thread on CPU1 may branch into this page after
> > > set_pte() (hence not fault) but before update_mmu_cache() doing the
> > > flush.
> > >
> > > On ARM11MPCore we flush the caches in flush_dcache_page() because the
> > > cache maintenance operations weren't visible to the other CPUs.
> >
> > I'm not even sure that's going to be 100% correct. Don't you also need
> > to flush the remote icaches when you are dealing with instructions (such
> > as swap) anyways ?
>
> I don't think we tried swap but for pages that have been mapped for the
> first time, the I-cache would be clean.
>
> At mm switching, if a thread
> migrates to a new CPU we invalidate the cache at that point.
That sounds fragile. What about a multithread app with one thread on
each core hitting the pages at the same time ? Sounds racy to me...
> > I've had some discussions in the past with Russell and others around the
> > problem of non-broadcast cache ops on ARM SMP since that's also hurting
> > you hard with dma mappings.
> >
> > Can you issue IPIs as FIQs if needed (from my old ARM knowledge, FIQs
> > are still on even in local_irq_save() blocks right ? I haven't touched
> > low level ARM for years tho, I may have forgotten things).
>
> I have a patch for using IPIs via IRQ from the DMA API functions but,
> while it works, it can deadlock with some drivers (complex situation).
> Note that the patch added a specific IPI implementation which can cope
> with interrupts being disabled (unlike the generic one).
It will deadlock if you use normal IRQs. I don't see a good way around
that other than using a higher-level type of IRQs. I though ARM has
something like that (FIQs ?). Can you use those guys for IPIs ?
> My latest solution - http://bit.ly/apJv3O - is to use dummy
> read-for-ownership or write-for-ownership accesses in the DMA cache
> flushing functions to force cache line migration from the other CPUs.
That might do, but won't help for the icache, will it ?
> Our current benchmarks only show around 10% disc throughput penalty
> compared to the normal SMP case (compared to the UP case the penalty is
> bigger but that's due to other things).
Cheers,
Ben.
On Fri, 2010-02-26 at 11:52 -0500, Alan Stern wrote:
> > For mmap'ed pages (and present in the page cache), is it guaranteed that
> > the HCD driver won't write to it once it has been mapped into user
> > space? If that's the case, it may solve the problem by just reversing
> > the meaning of PG_arch_1 on ARM and assume that a newly allocated page
> > has dirty D-cache by default.
>
> Nothing is guaranteed. The HCD will write to wherever it is asked. If
> a driver does input to an mmap'ed page, the HCD won't even know that
> the page is mmap'ed.
Right but that won't happen unless somebody explicitely caused that
input to happen, typically, a userspace read(). I$/D$ coherency isn't
implicit in that case.
The question is more when the kernel itself moves a page in/out from
underneath the application (mmap'ed executable pages). One it's mapped
in, it won't be written to by the HCD unless something explicitely does
something to cause that write. If it's swapped out and back in, it will
have been unmapped.
Cheers,
Ben.
On Fri, 2010-02-26 at 16:00 +0000, Catalin Marinas wrote:
> > I'm surprised that usb-storage has an issue here. It shouldn't
> afaik,
> > since it's just a SCSI driver (or not anymore ?) and the BIO or
> > filesystems handle things there no ? I haven't seen a single call to
> > flush_dcache_page() in any of drivers/scsi, drivers/ata or
> drivers/ide
> > when I looked...
>
> The BIO or filesystem code don't call flush_dcache_page() either (well
> some do like cramfs or jffs but they decompress the data received from
> the block device).
That's weird... that would mean that all existing PIO IDE or SCSI is
broken etc... Including I$/D$ cache coherency on powerpc and more. That
surprises me :-)
On an older kernel tree here:
$ grep -r flush_dcache_page fs | wc -l
118
So maybe that's where things need fixing ?
Cheers,
Ben.
On Sat, Feb 27, 2010 at 08:49:40AM +1100, Benjamin Herrenschmidt wrote:
> It will deadlock if you use normal IRQs. I don't see a good way around
> that other than using a higher-level type of IRQs. I though ARM has
> something like that (FIQs ?). Can you use those guys for IPIs ?
If the hardware did support using FIQs for IPIs, this would not be
desirable because then it takes it away from the SoC folk to do what
they will with it.
In the past, it's been used as a fast CPU-driven "DMA" interface -
some SoCs have been wired up in such a way that's the only use
available for the FIQ.
The other problem we'd encounter using FIQs for IPIs is that some IPIs
need to take locks - and in order to make that safe, we'd either need
another class of locks which disable IRQs and FIQs together, or we'd
need to disable FIQs everywhere we disable IRQs - at which point FIQs
become utterly pointless.
(There only differences between FIQ and IRQ are:
- on simultaneous raising of both, the FIQ will be called before the IRQ.
- each has its own (single) vector.
- invocation of FIQ masks IRQ.
What I'm saying is that what gives FIQ an advantage for SoC people is
that it's bare bones light weight and therefore extremely fast - as soon
as you load it up with additional complexity, it becomes less useful.)
On Fri, 2010-02-26 at 21:00 +0000, Russell King - ARM Linux wrote:
> On Fri, Feb 26, 2010 at 04:25:21PM +0000, Catalin Marinas wrote:
> > For mmap'ed pages (and present in the page cache), is it guaranteed that
> > the HCD driver won't write to it once it has been mapped into user
> > space? If that's the case, it may solve the problem by just reversing
> > the meaning of PG_arch_1 on ARM and assume that a newly allocated page
> > has dirty D-cache by default.
>
> I guess we could also set PG_arch_1 in the DMA API as well, to avoid the
> unnecessary D cache flushing when clean pages get mapped into userspace.
That's an interesting thought for us too. When doing I$/D$ coherency, we
have to fist flush the D$ and then invalidate the I$. If we could keep
track of D$ and I$ separately, we could avoid the first step in many
cases, including the DMA API trick you mentioned.
I wonder if it's time to get a PG_arch_2 :-)
Cheers,
Ben.
On Fri, 2010-02-26 at 21:49 +0000, Russell King - ARM Linux wrote:
> On Sat, Feb 27, 2010 at 08:40:29AM +1100, Benjamin Herrenschmidt wrote:
> > Hrm, the DMA API certainly doesn't handle the I$/D$ coherency on
> > powerpc.. I'm afraid that whole cache handling stuff is totally
> > inconsistent since different archs have different expectations here.
>
> It doesn't on ARM either.
Ok, pfiew :-)
So far, my understanding with I$/D$ is that we only care in a few cases
which is executing of an mmap'ed piece of executable that is -not- being
written to, and swap.
I -think- that in both cases, the page cache always pops up a new page
with PG_arch_1 clear before the driver gets to either DMA or PIO to it
when faulted the first time around, before any PTE is inserted.
So the current approach on powerpc with I$/D$ should work fine, and it
-might- make sense to use a similar one on PIPT ARM, provided we don't
have expectations of the I$/D$ coherency being maintained on
-subsequent- writes (PIO or DMA either) to such a page by the same
program transparently by the kernel.
There's two potential problems with the approach, and maybe more that I
have missed though. One is the case of a networked filesystem where the
executable pages are modified remotely. However, I would expect such a
program to invalidate the PTE mappings before making the change visible,
so we -do- get a chance to re-flush provided something clears PG_arch_1.
Then, there's In the case of a multithread app, where one thread does
the cache flush and another thread then executes, the earlier ARMs
without broadcast ops have a potential problem there. In fact, some
variant of PowerPC 440 have the same problem and some people are
(ab)using those for SMP setups I'm being told.
For that case, I see two options. One is a big hammer but would make
existing code work to "most" extent: Don't allow a page to be both
writable and executable. Ping-pong the page permission lazily and flush
when transitioning from write to exec.
That means using a spare bit for Linux _PAGE_RW separate from your real
RW bit I suppose, since you have HW loaded PTEs (on 440 it's easier
since we SW load, we can do the fixup there, though it has a perf impact
obviously).
Another option would be to make some syscall mandatory to "sync" caches
which could then do IPIs or whatever else is needed. But that would
require changing existing userspace code.
On Fri, 2010-02-26 at 22:03 +0000, Russell King - ARM Linux wrote:
> On Sat, Feb 27, 2010 at 08:49:40AM +1100, Benjamin Herrenschmidt wrote:
> > It will deadlock if you use normal IRQs. I don't see a good way around
> > that other than using a higher-level type of IRQs. I though ARM has
> > something like that (FIQs ?). Can you use those guys for IPIs ?
>
> If the hardware did support using FIQs for IPIs, this would not be
> desirable because then it takes it away from the SoC folk to do what
> they will with it.
>
> In the past, it's been used as a fast CPU-driven "DMA" interface -
> some SoCs have been wired up in such a way that's the only use
> available for the FIQ.
This is an issue indeed.
> The other problem we'd encounter using FIQs for IPIs is that some IPIs
> need to take locks - and in order to make that safe, we'd either need
> another class of locks which disable IRQs and FIQs together, or we'd
> need to disable FIQs everywhere we disable IRQs - at which point FIQs
> become utterly pointless.
That's solvable easily :-) I mentioned having potentially to deal with a
similar problem with people using PowerPC 440 for SMP (doesn't broadcast
cache ops either). 440 has critical interrupts, which are akin to FIQs.
The trick here is that you don't use -only- critical interrupts for
IPIs. You use normal interrupts for all the current IPI types. You -add-
a fast one using critical interrupts specifically for cache ops, with a
very fast asm only path.
This works for us because masking interrupts doesn't mask critical
interrupts (it's a separate mask bit in our MSR). If that isn't the case
with FIQs then the whole idea is moot.
> (There only differences between FIQ and IRQ are:
> - on simultaneous raising of both, the FIQ will be called before the IRQ.
> - each has its own (single) vector.
> - invocation of FIQ masks IRQ.
>
> What I'm saying is that what gives FIQ an advantage for SoC people is
> that it's bare bones light weight and therefore extremely fast - as soon
> as you load it up with additional complexity, it becomes less useful.)
I understand.
Then Catalin idea of tricking the cache with load and stores would work
for the D$ side of thing. The I$ side of thing probably still needs IPIs
though, and you might need to use non-blocking async SMP call function
for that if you're going to do it from set_pte_at() instead of
update_mmu_cache() since the later is racy. In any case, it's a lot less
of a deadlock nest than the D$ side which needs to be dealt with in the
DMA ops, called below layers of driver and subsystem locks.
Note: Somebody at ARM needs to be severely beaten up for coming up with
that SMP scheme without broadcast cache ops and not also mandating some
kind FIQ IPI scheme that isn't masked with normal interrupts :-)
Cheers,
Ben.
On Sun, 2010-02-28 at 11:14 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2010-02-26 at 21:00 +0000, Russell King - ARM Linux wrote:
> > On Fri, Feb 26, 2010 at 04:25:21PM +0000, Catalin Marinas wrote:
> > > For mmap'ed pages (and present in the page cache), is it guaranteed that
> > > the HCD driver won't write to it once it has been mapped into user
> > > space? If that's the case, it may solve the problem by just reversing
> > > the meaning of PG_arch_1 on ARM and assume that a newly allocated page
> > > has dirty D-cache by default.
> >
> > I guess we could also set PG_arch_1 in the DMA API as well, to avoid the
> > unnecessary D cache flushing when clean pages get mapped into userspace.
>
> That's an interesting thought for us too. When doing I$/D$ coherency, we
> have to fist flush the D$ and then invalidate the I$. If we could keep
> track of D$ and I$ separately, we could avoid the first step in many
> cases, including the DMA API trick you mentioned.
>
> I wonder if it's time to get a PG_arch_2 :-)
Sorry to be a bit late to the party (on holiday), but I/D coherency is
supposed to be taken care of using flush_cache_page in the memory
mapping routines. On parisc, at least, we don't use any PG_arch flags
to help. The way it's supposed to work is that I is invalidated on
mapping or remapping, so the I/O code only needs to worry about flushing
D. The guarantee we pass to userland is that any page we do I/O to has
a clean D cache before it goes back to userspace. Thus if userspace
executes the page, the I cache gets its first movein there. There is an
underlying assumption to all of this: The CPU won't speculatively move
in I cache until the page is executed, so we can rely on the
flush_cache_page in the mapping to keep the I cache invalidated until
we're ready to execute. The other fundamental assumption is that if
userspace needs to modify an executable region (say for dynamic linking)
it has to take care of reinvalidating the I cache itself ... although it
can do this by remapping the region to alter the flags (i.e W no X then
X no W).
But the point of all of this is that I cache invalidation doesn't appear
anywhere in the I/O path ... so if we're getting I/D incoherency,
there's some problem in the mm code (or there's a missing arch
assumption ... like I cache gets moved in more aggressively than we
expect). Parisc is very sensitive to I/D incoherency, so we'd notice if
there were a serious generic problem here.
James
> There's two potential problems with the approach, and maybe more that I
> have missed though. One is the case of a networked filesystem where the
> executable pages are modified remotely. However, I would expect such a
> program to invalidate the PTE mappings before making the change visible,
> so we -do- get a chance to re-flush provided something clears PG_arch_1.
>
> Then, there's In the case of a multithread app, where one thread does
> the cache flush and another thread then executes, the earlier ARMs
> without broadcast ops have a potential problem there. In fact, some
> variant of PowerPC 440 have the same problem and some people are
> (ab)using those for SMP setups I'm being told.
>
> For that case, I see two options. One is a big hammer but would make
> existing code work to "most" extent: Don't allow a page to be both
> writable and executable. Ping-pong the page permission lazily and flush
> when transitioning from write to exec.
>
> That means using a spare bit for Linux _PAGE_RW separate from your real
> RW bit I suppose, since you have HW loaded PTEs (on 440 it's easier
> since we SW load, we can do the fixup there, though it has a perf impact
> obviously).
>
> Another option would be to make some syscall mandatory to "sync" caches
> which could then do IPIs or whatever else is needed. But that would
> require changing existing userspace code.
Or you could do first option by default, and add mmap flag that says
that application is responsible for cross-cpu cache flushes...?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Fri, 2010-02-26 at 21:49 +0000, Benjamin Herrenschmidt wrote:
> > > > On ARM11MPCore we flush the caches in flush_dcache_page() because the
> > > > cache maintenance operations weren't visible to the other CPUs.
> > >
> > > I'm not even sure that's going to be 100% correct. Don't you also need
> > > to flush the remote icaches when you are dealing with instructions (such
> > > as swap) anyways ?
> >
> > I don't think we tried swap but for pages that have been mapped for the
> > first time, the I-cache would be clean.
> >
> > At mm switching, if a thread
> > migrates to a new CPU we invalidate the cache at that point.
>
> That sounds fragile. What about a multithread app with one thread on
> each core hitting the pages at the same time ? Sounds racy to me...
Interestingly, until commit 826cbdaff29 (< 2 years ago), we didn't have
any I-cache flushing in update_mmu_cache() and it was working fine. I
added it for correctness reasons rather than to fix something. My theory
is that it was working because a page cache page tends to keep the same
physical address, especially if we don't swap pages, and a 16KB PIPT
cache cannot hold enough lines to show any issues (lines are replaced
frequently).
I suspect that's one of the reasons why only invalidating the whole
I-cache when switching the mm to a new CPU seems to suffice. Once we
enable some form of swapping, it may show the problem.
--
Catalin
On Fri, 2010-02-26 at 22:03 +0000, Russell King - ARM Linux wrote:
> On Sat, Feb 27, 2010 at 08:49:40AM +1100, Benjamin Herrenschmidt wrote:
> > It will deadlock if you use normal IRQs. I don't see a good way around
> > that other than using a higher-level type of IRQs. I though ARM has
> > something like that (FIQs ?). Can you use those guys for IPIs ?
[...]
> The other problem we'd encounter using FIQs for IPIs is that some IPIs
> need to take locks - and in order to make that safe, we'd either need
> another class of locks which disable IRQs and FIQs together, or we'd
> need to disable FIQs everywhere we disable IRQs - at which point FIQs
> become utterly pointless.
You could use the FIQ only for the DMA cache maintenance operations and
not as a generic IPI mechanism. But the hardware needs to be modified.
--
Catalin