On Mon, Apr 20, 2015 at 11:53:47AM +0100, David Vrabel wrote:
> On 20/04/15 11:48, Chen Baozi wrote:
> > Make sure that xen_swiotlb_init allocates buffers that is DMA capable.
> >
> > Signed-off-by: Chen Baozi <[email protected]>
> > ---
> > drivers/xen/swiotlb-xen.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index 810ad41..7345afd 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -235,7 +235,8 @@ retry:
> > #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
> > #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
> > while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
> > - xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN, order);
> > + xen_io_tlb_start = (void *)__get_free_pages(
> > + __GFP_NOWARN|__GFP_DMA, order);
>
> I think this breaks x86 where __GFP_DMA means below 16 MB. Perhaps you
> mean __GFP_DMA32?
__GFP_DMA32 doesn't help on arm64...
I guess there might be conflicts about the meaning of __GFP_DMA between x86 and
arm?
Cheers,
Baozi.
On 20/04/15 12:07, Chen Baozi wrote:
> On Mon, Apr 20, 2015 at 11:53:47AM +0100, David Vrabel wrote:
>> On 20/04/15 11:48, Chen Baozi wrote:
>>> Make sure that xen_swiotlb_init allocates buffers that is DMA capable.
>>>
>>> Signed-off-by: Chen Baozi <[email protected]>
>>> ---
>>> drivers/xen/swiotlb-xen.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
>>> index 810ad41..7345afd 100644
>>> --- a/drivers/xen/swiotlb-xen.c
>>> +++ b/drivers/xen/swiotlb-xen.c
>>> @@ -235,7 +235,8 @@ retry:
>>> #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
>>> #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
>>> while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
>>> - xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN, order);
>>> + xen_io_tlb_start = (void *)__get_free_pages(
>>> + __GFP_NOWARN|__GFP_DMA, order);
>>
>> I think this breaks x86 where __GFP_DMA means below 16 MB. Perhaps you
>> mean __GFP_DMA32?
>
> __GFP_DMA32 doesn't help on arm64...
>
> I guess there might be conflicts about the meaning of __GFP_DMA between x86 and
> arm?
Yes.
This is also conceptually wrong since it doesn't matter where the pages
are in PFN space, but where they are in bus address (MFN) space (which
is what the subsequent hypercall is required to sort out).
David
On Mon, 20 Apr 2015, David Vrabel wrote:
> On 20/04/15 12:07, Chen Baozi wrote:
> > On Mon, Apr 20, 2015 at 11:53:47AM +0100, David Vrabel wrote:
> >> On 20/04/15 11:48, Chen Baozi wrote:
> >>> Make sure that xen_swiotlb_init allocates buffers that is DMA capable.
> >>>
> >>> Signed-off-by: Chen Baozi <[email protected]>
> >>> ---
> >>> drivers/xen/swiotlb-xen.c | 3 ++-
> >>> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> >>> index 810ad41..7345afd 100644
> >>> --- a/drivers/xen/swiotlb-xen.c
> >>> +++ b/drivers/xen/swiotlb-xen.c
> >>> @@ -235,7 +235,8 @@ retry:
> >>> #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
> >>> #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
> >>> while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
> >>> - xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN, order);
> >>> + xen_io_tlb_start = (void *)__get_free_pages(
> >>> + __GFP_NOWARN|__GFP_DMA, order);
> >>
> >> I think this breaks x86 where __GFP_DMA means below 16 MB. Perhaps you
> >> mean __GFP_DMA32?
> >
> > __GFP_DMA32 doesn't help on arm64...
> >
> > I guess there might be conflicts about the meaning of __GFP_DMA between x86 and
> > arm?
>
> Yes.
This should definitely be done only on ARM and ARM64, as on x86 PVH
assumes the presence of an IOMMU. We need an ifdef.
Also we need to figure out a way to try without GFP_DMA in case no ram
under 4g is available at all, as some arm64 platforms don't have any. Of
course in those cases we don't need to worry about devices and their dma
masks. Maybe we could use memblock for that? Something like:
struct memblock_region *reg;
gfp_t flags = __GFP_NOWARN;
#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
for_each_memblock(memory, reg) {
unsigned long start = memblock_region_memory_base_pfn(reg);
if (start < 4G) {
flags |= __GFP_DMA;
break;
}
}
#endif
[...]
xen_io_tlb_start = (void *)__get_free_pages(flags, order);
> This is also conceptually wrong since it doesn't matter where the pages
> are in PFN space, but where they are in bus address (MFN) space (which
> is what the subsequent hypercall is required to sort out).
Actually on ARM dom0 is mapped 1:1, so it is the same thing.
On Mon, 2015-04-20 at 18:54 +0100, Stefano Stabellini wrote:
> This should definitely be done only on ARM and ARM64, as on x86 PVH
> assumes the presence of an IOMMU. We need an ifdef.
>
> Also we need to figure out a way to try without GFP_DMA in case no ram
> under 4g is available at all, as some arm64 platforms don't have any. Of
> course in those cases we don't need to worry about devices and their dma
> masks. Maybe we could use memblock for that?
It's pretty ugly, but I've not got any better ideas.
It would perhaps be less ugly as a an arch-specific
get_me_a_swiotlb_region type function, with the bare __get_free_pages as
the generic fallback.
> Something like:
>
> struct memblock_region *reg;
> gfp_t flags = __GFP_NOWARN;
>
> #if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> for_each_memblock(memory, reg) {
> unsigned long start = memblock_region_memory_base_pfn(reg);
>
> if (start < 4G) {
> flags |= __GFP_DMA;
> break;
> }
> }
> #endif
>
> [...]
>
> xen_io_tlb_start = (void *)__get_free_pages(flags, order);
>
>
>
>
>
> > This is also conceptually wrong since it doesn't matter where the pages
> > are in PFN space, but where they are in bus address (MFN) space (which
> > is what the subsequent hypercall is required to sort out).
>
> Actually on ARM dom0 is mapped 1:1, so it is the same thing.
On a system with a fully functional SMMU dom0 may not be 1:1 mapped, but
I think that dom0 can still treat that as 1:1 mapped for these purposes,
since the SMMU will provide that illusion.
Dumb question, and this might affect PVH too, if you have an IOMMU and a
device with a limited DMA range, I suppose you need to provide DMA
addresses in the <4G for the input to the IOMMU (i.e. PFN) and not the
output (i.e. MFN) space (since the device only sees PFNs).
So even for x86 PVH isn't something required here to ensure that the
swiotlb has suitable pages under 4GB in PFN space too?
(On ARM PFN==IPA and MFN==PA)
Second dumb question, on x86 PVH or ARM with an SMMU, would we even hit
the Xen swiotlb code, or would we want to arrange to go via the native
swiotlb paths? I initially thought the latter, but does e.g. grant
mappings still need some special handling?
Ian.
On Tue, 21 Apr 2015, Ian Campbell wrote:
> On Mon, 2015-04-20 at 18:54 +0100, Stefano Stabellini wrote:
> > This should definitely be done only on ARM and ARM64, as on x86 PVH
> > assumes the presence of an IOMMU. We need an ifdef.
> >
> > Also we need to figure out a way to try without GFP_DMA in case no ram
> > under 4g is available at all, as some arm64 platforms don't have any. Of
> > course in those cases we don't need to worry about devices and their dma
> > masks. Maybe we could use memblock for that?
>
> It's pretty ugly, but I've not got any better ideas.
>
> It would perhaps be less ugly as a an arch-specific
> get_me_a_swiotlb_region type function, with the bare __get_free_pages as
> the generic fallback.
We could do that, but even open code like this isn't too bad: it might
be ugly but at least is very obvious.
> > Something like:
> >
> > struct memblock_region *reg;
> > gfp_t flags = __GFP_NOWARN;
> >
> > #if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> > for_each_memblock(memory, reg) {
> > unsigned long start = memblock_region_memory_base_pfn(reg);
> >
> > if (start < 4G) {
> > flags |= __GFP_DMA;
> > break;
> > }
> > }
> > #endif
> >
> > [...]
> >
> > xen_io_tlb_start = (void *)__get_free_pages(flags, order);
> >
> >
> >
> >
> >
> > > This is also conceptually wrong since it doesn't matter where the pages
> > > are in PFN space, but where they are in bus address (MFN) space (which
> > > is what the subsequent hypercall is required to sort out).
> >
> > Actually on ARM dom0 is mapped 1:1, so it is the same thing.
>
> On a system with a fully functional SMMU dom0 may not be 1:1 mapped, but
> I think that dom0 can still treat that as 1:1 mapped for these purposes,
> since the SMMU will provide that illusion.
>
> Dumb question, and this might affect PVH too, if you have an IOMMU and a
> device with a limited DMA range, I suppose you need to provide DMA
> addresses in the <4G for the input to the IOMMU (i.e. PFN) and not the
> output (i.e. MFN) space (since the device only sees PFNs).
I think you mean "for the input to the device (PFN)", but I presume the
same.
> So even for x86 PVH isn't something required here to ensure that the
> swiotlb has suitable pages under 4GB in PFN space too?
>
> (On ARM PFN==IPA and MFN==PA)
I guess that is true. PVH people, any thoughts?
> Second dumb question, on x86 PVH or ARM with an SMMU, would we even hit
> the Xen swiotlb code, or would we want to arrange to go via the native
> swiotlb paths? I initially thought the latter, but does e.g. grant
> mappings still need some special handling?
In both cases I think it would be simpler to just go through the normal
swiotlb.
FYI at the moment there is no knowledge about SMMUs availability on ARM.
The code is still missing.
On Tue, 2015-04-21 at 11:36 +0100, Stefano Stabellini wrote:
> On Tue, 21 Apr 2015, Ian Campbell wrote:
> > On Mon, 2015-04-20 at 18:54 +0100, Stefano Stabellini wrote:
> > > This should definitely be done only on ARM and ARM64, as on x86 PVH
> > > assumes the presence of an IOMMU. We need an ifdef.
> > >
> > > Also we need to figure out a way to try without GFP_DMA in case no ram
> > > under 4g is available at all, as some arm64 platforms don't have any. Of
> > > course in those cases we don't need to worry about devices and their dma
> > > masks. Maybe we could use memblock for that?
> >
> > It's pretty ugly, but I've not got any better ideas.
> >
> > It would perhaps be less ugly as a an arch-specific
> > get_me_a_swiotlb_region type function, with the bare __get_free_pages as
> > the generic fallback.
>
> We could do that, but even open code like this isn't too bad: it might
> be ugly but at least is very obvious.
>
>
> > > Something like:
> > >
> > > struct memblock_region *reg;
> > > gfp_t flags = __GFP_NOWARN;
> > >
> > > #if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> > > for_each_memblock(memory, reg) {
> > > unsigned long start = memblock_region_memory_base_pfn(reg);
> > >
> > > if (start < 4G) {
> > > flags |= __GFP_DMA;
> > > break;
> > > }
> > > }
> > > #endif
> > >
> > > [...]
> > >
> > > xen_io_tlb_start = (void *)__get_free_pages(flags, order);
> > >
> > >
> > >
> > >
> > >
> > > > This is also conceptually wrong since it doesn't matter where the pages
> > > > are in PFN space, but where they are in bus address (MFN) space (which
> > > > is what the subsequent hypercall is required to sort out).
> > >
> > > Actually on ARM dom0 is mapped 1:1, so it is the same thing.
> >
> > On a system with a fully functional SMMU dom0 may not be 1:1 mapped, but
> > I think that dom0 can still treat that as 1:1 mapped for these purposes,
> > since the SMMU will provide that illusion.
> >
> > Dumb question, and this might affect PVH too, if you have an IOMMU and a
> > device with a limited DMA range, I suppose you need to provide DMA
> > addresses in the <4G for the input to the IOMMU (i.e. PFN) and not the
> > output (i.e. MFN) space (since the device only sees PFNs).
>
> I think you mean "for the input to the device (PFN)", but I presume the
> same.
Yes, either the numbers the driver programs into the device, or the ones
the device itself puts on the bus (i.e. inputs to the IOMMU). They are
in the same address space.
> > So even for x86 PVH isn't something required here to ensure that the
> > swiotlb has suitable pages under 4GB in PFN space too?
> >
> > (On ARM PFN==IPA and MFN==PA)
>
> I guess that is true. PVH people, any thoughts?
>
>
> > Second dumb question, on x86 PVH or ARM with an SMMU, would we even hit
> > the Xen swiotlb code, or would we want to arrange to go via the native
> > swiotlb paths? I initially thought the latter, but does e.g. grant
> > mappings still need some special handling?
>
> In both cases I think it would be simpler to just go through the normal
> swiotlb.
No special grant table magic in this case == great!
> FYI at the moment there is no knowledge about SMMUs availability on ARM.
> The code is still missing.
Right, I just want to avoid painting ourselves into a corner before that
stuff lands.
On Tue, 21 Apr 2015, Stefano Stabellini wrote:
> On Tue, 21 Apr 2015, Ian Campbell wrote:
> > On Mon, 2015-04-20 at 18:54 +0100, Stefano Stabellini wrote:
> > > This should definitely be done only on ARM and ARM64, as on x86 PVH
> > > assumes the presence of an IOMMU. We need an ifdef.
> > >
> > > Also we need to figure out a way to try without GFP_DMA in case no ram
> > > under 4g is available at all, as some arm64 platforms don't have any. Of
> > > course in those cases we don't need to worry about devices and their dma
> > > masks. Maybe we could use memblock for that?
> >
> > It's pretty ugly, but I've not got any better ideas.
> >
> > It would perhaps be less ugly as a an arch-specific
> > get_me_a_swiotlb_region type function, with the bare __get_free_pages as
> > the generic fallback.
>
> We could do that, but even open code like this isn't too bad: it might
> be ugly but at least is very obvious.
Chen,
could you please try the patch below in your repro scenario?
I have only build tested it.
---
xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM
From: Chen Baozi <[email protected]>
Make sure that xen_swiotlb_init allocates buffers that are DMA capable
when at least one memblock is available below 4G. Otherwise we assume
that all devices on the SoC can cope with >4G addresses.
No functional changes on x86.
Signed-off-by: Chen Baozi <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 2f7e6ff..0b579b2 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -110,5 +110,6 @@ static inline bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
bool xen_arch_need_swiotlb(struct device *dev,
unsigned long pfn,
unsigned long mfn);
+unsigned long xen_get_swiotlb_free_pages(unsigned int order);
#endif /* _ASM_ARM_XEN_PAGE_H */
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 793551d..4983250 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -4,6 +4,7 @@
#include <linux/gfp.h>
#include <linux/highmem.h>
#include <linux/export.h>
+#include <linux/memblock.h>
#include <linux/of_address.h>
#include <linux/slab.h>
#include <linux/types.h>
@@ -21,6 +22,20 @@
#include <asm/xen/hypercall.h>
#include <asm/xen/interface.h>
+unsigned long xen_get_swiotlb_free_pages(unsigned int order)
+{
+ struct memblock_region *reg;
+ gfp_t flags = __GFP_NOWARN;
+
+ for_each_memblock(memory, reg) {
+ if (reg->base < (phys_addr_t)0xffffffff) {
+ flags |= __GFP_DMA;
+ break;
+ }
+ }
+ return __get_free_pages(flags, order);
+}
+
enum dma_cache_op {
DMA_UNMAP,
DMA_MAP,
diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 358dcd3..c44a5d5 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -269,4 +269,9 @@ static inline bool xen_arch_need_swiotlb(struct device *dev,
return false;
}
+static inline unsigned long xen_get_swiotlb_free_pages(unsigned int order)
+{
+ return __get_free_pages(__GFP_NOWARN, order);
+}
+
#endif /* _ASM_X86_XEN_PAGE_H */
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 810ad41..4c54932 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -235,7 +235,7 @@ retry:
#define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
#define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
- xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN, order);
+ xen_io_tlb_start = (void *)xen_get_swiotlb_free_pages(order);
if (xen_io_tlb_start)
break;
order--;
El 21/04/15 a les 12.36, Stefano Stabellini ha escrit:
> On Tue, 21 Apr 2015, Ian Campbell wrote:
>> On Mon, 2015-04-20 at 18:54 +0100, Stefano Stabellini wrote:
>>> This should definitely be done only on ARM and ARM64, as on x86 PVH
>>> assumes the presence of an IOMMU. We need an ifdef.
>>>
>>> Also we need to figure out a way to try without GFP_DMA in case no ram
>>> under 4g is available at all, as some arm64 platforms don't have any. Of
>>> course in those cases we don't need to worry about devices and their dma
>>> masks. Maybe we could use memblock for that?
>>
>> It's pretty ugly, but I've not got any better ideas.
>>
>> It would perhaps be less ugly as a an arch-specific
>> get_me_a_swiotlb_region type function, with the bare __get_free_pages as
>> the generic fallback.
>
> We could do that, but even open code like this isn't too bad: it might
> be ugly but at least is very obvious.
>
>
>>> Something like:
>>>
>>> struct memblock_region *reg;
>>> gfp_t flags = __GFP_NOWARN;
>>>
>>> #if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>>> for_each_memblock(memory, reg) {
>>> unsigned long start = memblock_region_memory_base_pfn(reg);
>>>
>>> if (start < 4G) {
>>> flags |= __GFP_DMA;
>>> break;
>>> }
>>> }
>>> #endif
>>>
>>> [...]
>>>
>>> xen_io_tlb_start = (void *)__get_free_pages(flags, order);
>>>
>>>
>>>
>>>
>>>
>>>> This is also conceptually wrong since it doesn't matter where the pages
>>>> are in PFN space, but where they are in bus address (MFN) space (which
>>>> is what the subsequent hypercall is required to sort out).
>>>
>>> Actually on ARM dom0 is mapped 1:1, so it is the same thing.
>>
>> On a system with a fully functional SMMU dom0 may not be 1:1 mapped, but
>> I think that dom0 can still treat that as 1:1 mapped for these purposes,
>> since the SMMU will provide that illusion.
>>
>> Dumb question, and this might affect PVH too, if you have an IOMMU and a
>> device with a limited DMA range, I suppose you need to provide DMA
>> addresses in the <4G for the input to the IOMMU (i.e. PFN) and not the
>> output (i.e. MFN) space (since the device only sees PFNs).
>
> I think you mean "for the input to the device (PFN)", but I presume the
> same.
>
>
>> So even for x86 PVH isn't something required here to ensure that the
>> swiotlb has suitable pages under 4GB in PFN space too?
>>
>> (On ARM PFN==IPA and MFN==PA)
>
> I guess that is true. PVH people, any thoughts?
AFAIK Linux PVH uses the native swiotlb, and FreeBSD does the same. I
expect the device expects PFNs (not MFNs) below the 4GB range, or else
the design is broken because there's no way the guest can figure out if
the MFN behind a PFN is below 4GB.
Roger.
On Tue, Apr 21, 2015 at 12:11:01PM +0100, Stefano Stabellini wrote:
> Chen,
> could you please try the patch below in your repro scenario?
> I have only build tested it.
>
> ---
>
> xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM
>
> From: Chen Baozi <[email protected]>
>
> Make sure that xen_swiotlb_init allocates buffers that are DMA capable
> when at least one memblock is available below 4G. Otherwise we assume
> that all devices on the SoC can cope with >4G addresses.
>
> No functional changes on x86.
>
> Signed-off-by: Chen Baozi <[email protected]>
> Signed-off-by: Stefano Stabellini <[email protected]>
Tested-by: Chen Baozi <[email protected]>
Cheers,
Baozi