Hi all,
I am writing this email to ask for your advice.
On architectures where dma addresses are different from physical
addresses, it can be difficult to retrieve the physical address of a
page from its dma address.
Specifically this is the case for Xen on arm and arm64 but I think that
other architectures might have the same issue.
Knowing the physical address is necessary to be able to issue any
required cache maintenance operations when unmap_page,
sync_single_for_cpu and sync_single_for_device are called.
Adding a struct page* parameter to unmap_page, sync_single_for_cpu and
sync_single_for_device would make Linux dma handling on Xen on arm and
arm64 much easier and quicker.
I think that other drivers have similar problems, such as the Intel
IOMMU driver having to call find_iova and walking down an rbtree to get
the physical address in its implementation of unmap_page.
Callers have the struct page* in their hands already from the previous
map_page call so it shouldn't be an issue for them. A problem does
exist however: there are about 280 callers of dma_unmap_page and
pci_unmap_page. We have even more callers of the dma_sync_single_for_*
functions.
Is such a change even conceivable? How would one go about it?
I think that Xen would not be the only one to gain from it, but I would
like to have a confirmation from others: given the magnitude of the
changes involved I would actually prefer to avoid them unless multiple
drivers/archs/subsystems could really benefit from them.
Cheers,
Stefano
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index d5d3881..158a765 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -31,8 +31,9 @@ struct dma_map_ops {
unsigned long offset, size_t size,
enum dma_data_direction dir,
struct dma_attrs *attrs);
- void (*unmap_page)(struct device *dev, dma_addr_t dma_handle,
- size_t size, enum dma_data_direction dir,
+ void (*unmap_page)(struct device *dev, struct page *page,
+ dma_addr_t dma_handle, size_t size,
+ enum dma_data_direction dir,
struct dma_attrs *attrs);
int (*map_sg)(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir,
@@ -41,10 +42,10 @@ struct dma_map_ops {
struct scatterlist *sg, int nents,
enum dma_data_direction dir,
struct dma_attrs *attrs);
- void (*sync_single_for_cpu)(struct device *dev,
+ void (*sync_single_for_cpu)(struct device *dev, struct page *page,
dma_addr_t dma_handle, size_t size,
enum dma_data_direction dir);
- void (*sync_single_for_device)(struct device *dev,
+ void (*sync_single_for_device)(struct device *dev, struct page *page,
dma_addr_t dma_handle, size_t size,
enum dma_data_direction dir);
void (*sync_sg_for_cpu)(struct device *dev,
On 17/11/14 14:11, Stefano Stabellini wrote:
> Hi all,
> I am writing this email to ask for your advice.
>
> On architectures where dma addresses are different from physical
> addresses, it can be difficult to retrieve the physical address of a
> page from its dma address.
>
> Specifically this is the case for Xen on arm and arm64 but I think that
> other architectures might have the same issue.
>
> Knowing the physical address is necessary to be able to issue any
> required cache maintenance operations when unmap_page,
> sync_single_for_cpu and sync_single_for_device are called.
>
> Adding a struct page* parameter to unmap_page, sync_single_for_cpu and
> sync_single_for_device would make Linux dma handling on Xen on arm and
> arm64 much easier and quicker.
Using an opaque handle instead of struct page * would be more beneficial
for the Intel IOMMU driver. e.g.,
typedef dma_addr_t dma_handle_t;
dma_handle_t dma_map_single(struct device *dev,
void *va, size_t size,
enum dma_data_direction dir);
void dma_unmap_single(struct device *dev,
dma_handle_t handle, size_t size,
enum dma_data_direction dir);
etc.
Drivers would then use:
dma_addr_t dma_addr(dma_handle_t handle);
To obtain the bus address from the handle.
> I think that other drivers have similar problems, such as the Intel
> IOMMU driver having to call find_iova and walking down an rbtree to get
> the physical address in its implementation of unmap_page.
>
> Callers have the struct page* in their hands already from the previous
> map_page call so it shouldn't be an issue for them. A problem does
> exist however: there are about 280 callers of dma_unmap_page and
> pci_unmap_page. We have even more callers of the dma_sync_single_for_*
> functions.
You will also need to fix dma_unmap_single() and pci_unmap_single()
(another 1000+ callers).
You may need to consider a parallel set of map/unmap API calls that
return/accept a handle, and then converting drivers one-by-one as
required, instead of trying to convert every single driver at once.
David
On Mon, 17 Nov 2014, Stefano Stabellini wrote:
> Hi all,
> I am writing this email to ask for your advice.
>
> On architectures where dma addresses are different from physical
> addresses, it can be difficult to retrieve the physical address of a
> page from its dma address.
>
> Specifically this is the case for Xen on arm and arm64 but I think that
> other architectures might have the same issue.
>
> Knowing the physical address is necessary to be able to issue any
> required cache maintenance operations when unmap_page,
> sync_single_for_cpu and sync_single_for_device are called.
>
> Adding a struct page* parameter to unmap_page, sync_single_for_cpu and
> sync_single_for_device would make Linux dma handling on Xen on arm and
> arm64 much easier and quicker.
>
> I think that other drivers have similar problems, such as the Intel
> IOMMU driver having to call find_iova and walking down an rbtree to get
> the physical address in its implementation of unmap_page.
>
> Callers have the struct page* in their hands already from the previous
> map_page call so it shouldn't be an issue for them. A problem does
> exist however: there are about 280 callers of dma_unmap_page and
> pci_unmap_page. We have even more callers of the dma_sync_single_for_*
> functions.
>
>
>
> Is such a change even conceivable? How would one go about it?
>
> I think that Xen would not be the only one to gain from it, but I would
> like to have a confirmation from others: given the magnitude of the
> changes involved I would actually prefer to avoid them unless multiple
> drivers/archs/subsystems could really benefit from them.
Given the lack of interest from the community, I am going to drop this
idea.
> Cheers,
>
> Stefano
>
>
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index d5d3881..158a765 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -31,8 +31,9 @@ struct dma_map_ops {
> unsigned long offset, size_t size,
> enum dma_data_direction dir,
> struct dma_attrs *attrs);
> - void (*unmap_page)(struct device *dev, dma_addr_t dma_handle,
> - size_t size, enum dma_data_direction dir,
> + void (*unmap_page)(struct device *dev, struct page *page,
> + dma_addr_t dma_handle, size_t size,
> + enum dma_data_direction dir,
> struct dma_attrs *attrs);
> int (*map_sg)(struct device *dev, struct scatterlist *sg,
> int nents, enum dma_data_direction dir,
> @@ -41,10 +42,10 @@ struct dma_map_ops {
> struct scatterlist *sg, int nents,
> enum dma_data_direction dir,
> struct dma_attrs *attrs);
> - void (*sync_single_for_cpu)(struct device *dev,
> + void (*sync_single_for_cpu)(struct device *dev, struct page *page,
> dma_addr_t dma_handle, size_t size,
> enum dma_data_direction dir);
> - void (*sync_single_for_device)(struct device *dev,
> + void (*sync_single_for_device)(struct device *dev, struct page *page,
> dma_addr_t dma_handle, size_t size,
> enum dma_data_direction dir);
> void (*sync_sg_for_cpu)(struct device *dev,
>
On Fri, Nov 21 2014 at 03:48:33 AM, Stefano Stabellini <[email protected]> wrote:
> On Mon, 17 Nov 2014, Stefano Stabellini wrote:
>> Hi all,
>> I am writing this email to ask for your advice.
>>
>> On architectures where dma addresses are different from physical
>> addresses, it can be difficult to retrieve the physical address of a
>> page from its dma address.
>>
>> Specifically this is the case for Xen on arm and arm64 but I think that
>> other architectures might have the same issue.
>>
>> Knowing the physical address is necessary to be able to issue any
>> required cache maintenance operations when unmap_page,
>> sync_single_for_cpu and sync_single_for_device are called.
>>
>> Adding a struct page* parameter to unmap_page, sync_single_for_cpu and
>> sync_single_for_device would make Linux dma handling on Xen on arm and
>> arm64 much easier and quicker.
>>
>> I think that other drivers have similar problems, such as the Intel
>> IOMMU driver having to call find_iova and walking down an rbtree to get
>> the physical address in its implementation of unmap_page.
>>
>> Callers have the struct page* in their hands already from the previous
>> map_page call so it shouldn't be an issue for them. A problem does
>> exist however: there are about 280 callers of dma_unmap_page and
>> pci_unmap_page. We have even more callers of the dma_sync_single_for_*
>> functions.
>>
>>
>>
>> Is such a change even conceivable? How would one go about it?
>>
>> I think that Xen would not be the only one to gain from it, but I would
>> like to have a confirmation from others: given the magnitude of the
>> changes involved I would actually prefer to avoid them unless multiple
>> drivers/archs/subsystems could really benefit from them.
>
> Given the lack of interest from the community, I am going to drop this
> idea.
Actually it sounds like the right API design to me. As a bonus it
should help performance a bit as well. For example, the current
implementations of dma_sync_single_for_{cpu,device} and dma_unmap_page
on ARM while using the IOMMU mapper
(arm_iommu_sync_single_for_{cpu,device}, arm_iommu_unmap_page) all call
iommu_iova_to_phys which generally results in a page table walk or a
hardware register write/poll/read.
The problem, as you mentioned, is that there are a ton of callers of the
existing APIs. I think David Vrabel had a good suggestion for dealing
with this:
On Mon, Nov 17 2014 at 06:43:46 AM, David Vrabel <[email protected]> wrote:
> You may need to consider a parallel set of map/unmap API calls that
> return/accept a handle, and then converting drivers one-by-one as
> required, instead of trying to convert every single driver at once.
However, I'm not sure whether the costs of having a parallel set of APIs
outweigh the benefits of a cleaner API and a slight performance boost...
But I hope the idea isn't completely abandoned without some profiling or
other evidence of its benefits (e.g. patches showing how drivers could
be simplified with the new APIs).
-Mitch
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project