2017-04-25 09:58:17

by Sunil Kovvuri

[permalink] [raw]
Subject: [PATCH v2] iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed

From: Sunil Goutham <[email protected]>

For software initiated address translation, when domain type is
IOMMU_DOMAIN_IDENTITY i.e SMMU is bypassed, mimic HW behavior
i.e return the same IOVA as translated address.

This patch is an extension to Will Deacon's patchset
"Implement SMMU passthrough using the default domain".

Signed-off-by: Sunil Goutham <[email protected]>
---

V2
- As per Will's suggestion applied fix to SMMUv3 driver as well.

drivers/iommu/arm-smmu-v3.c | 3 +++
drivers/iommu/arm-smmu.c | 3 +++
2 files changed, 6 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 05b4592..d412bdd 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1714,6 +1714,9 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;

+ if (domain->type == IOMMU_DOMAIN_IDENTITY)
+ return iova;
+
if (!ops)
return 0;

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index bfab4f7..81088cd 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1459,6 +1459,9 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct io_pgtable_ops *ops= smmu_domain->pgtbl_ops;

+ if (domain->type == IOMMU_DOMAIN_IDENTITY)
+ return iova;
+
if (!ops)
return 0;

--
2.7.4


2017-04-26 09:26:49

by Sunil Kovvuri

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed

On Tue, Apr 25, 2017 at 3:27 PM, <[email protected]> wrote:
> From: Sunil Goutham <[email protected]>
>
> For software initiated address translation, when domain type is
> IOMMU_DOMAIN_IDENTITY i.e SMMU is bypassed, mimic HW behavior
> i.e return the same IOVA as translated address.
>
> This patch is an extension to Will Deacon's patchset
> "Implement SMMU passthrough using the default domain".
>
> Signed-off-by: Sunil Goutham <[email protected]>
> ---
>
> V2
> - As per Will's suggestion applied fix to SMMUv3 driver as well.
>
> drivers/iommu/arm-smmu-v3.c | 3 +++
> drivers/iommu/arm-smmu.c | 3 +++
> 2 files changed, 6 insertions(+)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 05b4592..d412bdd 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1714,6 +1714,9 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>
> + if (domain->type == IOMMU_DOMAIN_IDENTITY)
> + return iova;
> +
> if (!ops)
> return 0;
>
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index bfab4f7..81088cd 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1459,6 +1459,9 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> struct io_pgtable_ops *ops= smmu_domain->pgtbl_ops;
>
> + if (domain->type == IOMMU_DOMAIN_IDENTITY)
> + return iova;
> +
> if (!ops)
> return 0;
>
> --
> 2.7.4
>

Will,

if you are okay with the patch, can you please ACK.

Thanks,
Sunil.

2017-04-26 10:01:59

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed

Hi Sunil,

On Tue, Apr 25, 2017 at 03:27:52PM +0530, [email protected] wrote:
> From: Sunil Goutham <[email protected]>
>
> For software initiated address translation, when domain type is
> IOMMU_DOMAIN_IDENTITY i.e SMMU is bypassed, mimic HW behavior
> i.e return the same IOVA as translated address.
>
> This patch is an extension to Will Deacon's patchset
> "Implement SMMU passthrough using the default domain".
>
> Signed-off-by: Sunil Goutham <[email protected]>
> ---
>
> V2
> - As per Will's suggestion applied fix to SMMUv3 driver as well.

This follows what the AMD driver does, so:

Acked-by: Will Deacon <[email protected]>

but I still think that having drivers/net/ethernet/cavium/thunder/nicvf_queues.c
poke around with the physical address to get at the struct pages underlying
a DMA buffer is really dodgy. Is there no way this can be avoided, perhaps
by tracking the pages some other way (although I don't understand why you're
having to mess with the page reference counts to start with)?

At least, I think you should be checking the domain type in
nicvf_iova_to_phys, which clearly expects a DMA domain if one exists at all.

Joerg: sorry, this is another one for you to pick up if possible.

Cheers,

Will

> drivers/iommu/arm-smmu-v3.c | 3 +++
> drivers/iommu/arm-smmu.c | 3 +++
> 2 files changed, 6 insertions(+)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 05b4592..d412bdd 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1714,6 +1714,9 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>
> + if (domain->type == IOMMU_DOMAIN_IDENTITY)
> + return iova;
> +
> if (!ops)
> return 0;
>
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index bfab4f7..81088cd 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1459,6 +1459,9 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> struct io_pgtable_ops *ops= smmu_domain->pgtbl_ops;
>
> + if (domain->type == IOMMU_DOMAIN_IDENTITY)
> + return iova;
> +
> if (!ops)
> return 0;
>
> --
> 2.7.4
>

2017-04-26 10:31:21

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed

On Wed, Apr 26, 2017 at 11:01:50AM +0100, Will Deacon wrote:
> Joerg: sorry, this is another one for you to pick up if possible.

Applied.

2017-04-26 10:43:43

by Sunil Kovvuri

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed

On Wed, Apr 26, 2017 at 3:31 PM, Will Deacon <[email protected]> wrote:
> Hi Sunil,
>
> On Tue, Apr 25, 2017 at 03:27:52PM +0530, [email protected] wrote:
>> From: Sunil Goutham <[email protected]>
>>
>> For software initiated address translation, when domain type is
>> IOMMU_DOMAIN_IDENTITY i.e SMMU is bypassed, mimic HW behavior
>> i.e return the same IOVA as translated address.
>>
>> This patch is an extension to Will Deacon's patchset
>> "Implement SMMU passthrough using the default domain".
>>
>> Signed-off-by: Sunil Goutham <[email protected]>
>> ---
>>
>> V2
>> - As per Will's suggestion applied fix to SMMUv3 driver as well.
>
> This follows what the AMD driver does, so:
>
> Acked-by: Will Deacon <[email protected]>

Thanks,

>
> but I still think that having drivers/net/ethernet/cavium/thunder/nicvf_queues.c
> poke around with the physical address to get at the struct pages underlying
> a DMA buffer is really dodgy.

Driver is not dealing with page structures to be precise, just like
for any other NIC device, driver needs to know the virtual address
of the packet to where it's DMA'ed, so that SKB if framed and
handed over to network stack. Due to reasons mentioned below,
in this driver it's not possible to maintain a list of DMA addresses to
Virtual address mappings. Hence using IOMMU API, DMA address
is translated to physical address and finally to virtual address. I don't
see anything dodgy here.

> Is there no way this can be avoided, perhaps by tracking the pages some other way

I have explained that in the commit message
--
Also VNIC doesn't have a seperate receive buffer ring per receive
queue, so there is no 1:1 descriptor index matching between CQE_RX
and the index in buffer ring from where a buffer has been used for
DMA'ing. Unlike other NICs, here it's not possible to maintain dma
address to virt address mappings within the driver. This leaves us
no other choice but to use IOMMU's IOVA address conversion API to
get buffer's virtual address which can be given to network stack
for processing.
--

>(although I don't understand why you're having to mess with the page reference
>counts to start with)?
Not sure why you say it's a mess, adjusting page reference counts is quite
common if you check other NIC drivers. On ARM64 especially when using
64KB pages, if we have only one packet buffer for each page then we
will have to set aside a whole lot of memory which sometimes is not possible
on embedded platforms. Hence multiple pkt buffers per page, and page reference
is set accordingly.

>
> At least, I think you should be checking the domain type in
> nicvf_iova_to_phys, which clearly expects a DMA domain if one exists at all.

Probably, but I don't think network maintainers would be okay with it, since
such stuff should be hidden from a network driver's point of view. In reverse
the argument can be that NIC driver shouldn't even have to check if domain
is set or not.

Thanks,
Sunil.

>
> Joerg: sorry, this is another one for you to pick up if possible.
>
> Cheers,
>
> Will
>
>> drivers/iommu/arm-smmu-v3.c | 3 +++
>> drivers/iommu/arm-smmu.c | 3 +++
>> 2 files changed, 6 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 05b4592..d412bdd 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -1714,6 +1714,9 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
>> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>
>> + if (domain->type == IOMMU_DOMAIN_IDENTITY)
>> + return iova;
>> +
>> if (!ops)
>> return 0;
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index bfab4f7..81088cd 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1459,6 +1459,9 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
>> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> struct io_pgtable_ops *ops= smmu_domain->pgtbl_ops;
>>
>> + if (domain->type == IOMMU_DOMAIN_IDENTITY)
>> + return iova;
>> +
>> if (!ops)
>> return 0;
>>
>> --
>> 2.7.4
>>

2017-04-26 11:37:01

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed

On Wed, Apr 26, 2017 at 04:13:29PM +0530, Sunil Kovvuri wrote:
> On Wed, Apr 26, 2017 at 3:31 PM, Will Deacon <[email protected]> wrote:
> > Hi Sunil,
> >
> > On Tue, Apr 25, 2017 at 03:27:52PM +0530, [email protected] wrote:
> >> From: Sunil Goutham <[email protected]>
> >>
> >> For software initiated address translation, when domain type is
> >> IOMMU_DOMAIN_IDENTITY i.e SMMU is bypassed, mimic HW behavior
> >> i.e return the same IOVA as translated address.
> >>
> >> This patch is an extension to Will Deacon's patchset
> >> "Implement SMMU passthrough using the default domain".
> >>
> >> Signed-off-by: Sunil Goutham <[email protected]>
> >> ---
> >>
> >> V2
> >> - As per Will's suggestion applied fix to SMMUv3 driver as well.
> >
> > This follows what the AMD driver does, so:
> >
> > Acked-by: Will Deacon <[email protected]>
>
> Thanks,
>
> >
> > but I still think that having drivers/net/ethernet/cavium/thunder/nicvf_queues.c
> > poke around with the physical address to get at the struct pages underlying
> > a DMA buffer is really dodgy.
>
> Driver is not dealing with page structures to be precise, just like
> for any other NIC device, driver needs to know the virtual address
> of the packet to where it's DMA'ed, so that SKB if framed and
> handed over to network stack. Due to reasons mentioned below,
> in this driver it's not possible to maintain a list of DMA addresses to
> Virtual address mappings. Hence using IOMMU API, DMA address
> is translated to physical address and finally to virtual address. I don't
> see anything dodgy here.

It's dodgy because you're the only NIC driver using iommu_iova_to_phys
directly and, afaict, the driver could just stash either the struct page
or the virtual address at the point of allocation.

> > Is there no way this can be avoided, perhaps by tracking the pages some other way
>
> I have explained that in the commit message
> --
> Also VNIC doesn't have a seperate receive buffer ring per receive
> queue, so there is no 1:1 descriptor index matching between CQE_RX
> and the index in buffer ring from where a buffer has been used for
> DMA'ing. Unlike other NICs, here it's not possible to maintain dma
> address to virt address mappings within the driver. This leaves us
> no other choice but to use IOMMU's IOVA address conversion API to
> get buffer's virtual address which can be given to network stack
> for processing.
> --
>
> >(although I don't understand why you're having to mess with the page reference
> >counts to start with)?
> Not sure why you say it's a mess, adjusting page reference counts is quite
> common if you check other NIC drivers. On ARM64 especially when using
> 64KB pages, if we have only one packet buffer for each page then we
> will have to set aside a whole lot of memory which sometimes is not possible
> on embedded platforms. Hence multiple pkt buffers per page, and page reference
> is set accordingly.

I wasn't saying that was a mess, I was just saying that I didn't understand
why you mess (verb) with the page reference counts (my ignorance of the
network layer). The code that I think is a mess is:

phys_addr = nicvf_iova_to_phys(nic, buf_addr);
[...]
put_page(virt_to_page(phys_to_virt(phys_addr)));

because:

(a) You have the information you need at allocation time, but you've
failed to record that and are trying to use the IOMMU API to
reconstruct the CPU virtual address

(b) When there isn't an IOMMU present, you assume that bus addresses ==
physical addresses

(c) You assume that the DMA buffer is mapped in the linear mapping

that's probably all true for ThunderX/arm64, but it's generally not portable
or reliable code. If you could get a handle to the struct page that you
allocated in the first place, then you could use page_address to get its
virtual address instead of having to go via the physical address.

Will

2017-04-26 12:03:21

by Sunil Kovvuri

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed

On Wed, Apr 26, 2017 at 5:06 PM, Will Deacon <[email protected]> wrote:
> On Wed, Apr 26, 2017 at 04:13:29PM +0530, Sunil Kovvuri wrote:
>> On Wed, Apr 26, 2017 at 3:31 PM, Will Deacon <[email protected]> wrote:
>> > Hi Sunil,
>> >
>> > On Tue, Apr 25, 2017 at 03:27:52PM +0530, [email protected] wrote:
>> >> From: Sunil Goutham <[email protected]>
>> >>
>> >> For software initiated address translation, when domain type is
>> >> IOMMU_DOMAIN_IDENTITY i.e SMMU is bypassed, mimic HW behavior
>> >> i.e return the same IOVA as translated address.
>> >>
>> >> This patch is an extension to Will Deacon's patchset
>> >> "Implement SMMU passthrough using the default domain".
>> >>
>> >> Signed-off-by: Sunil Goutham <[email protected]>
>> >> ---
>> >>
>> >> V2
>> >> - As per Will's suggestion applied fix to SMMUv3 driver as well.
>> >
>> > This follows what the AMD driver does, so:
>> >
>> > Acked-by: Will Deacon <[email protected]>
>>
>> Thanks,
>>
>> >
>> > but I still think that having drivers/net/ethernet/cavium/thunder/nicvf_queues.c
>> > poke around with the physical address to get at the struct pages underlying
>> > a DMA buffer is really dodgy.
>>
>> Driver is not dealing with page structures to be precise, just like
>> for any other NIC device, driver needs to know the virtual address
>> of the packet to where it's DMA'ed, so that SKB if framed and
>> handed over to network stack. Due to reasons mentioned below,
>> in this driver it's not possible to maintain a list of DMA addresses to
>> Virtual address mappings. Hence using IOMMU API, DMA address
>> is translated to physical address and finally to virtual address. I don't
>> see anything dodgy here.
>
> It's dodgy because you're the only NIC driver using iommu_iova_to_phys
> directly and, afaict, the driver could just stash either the struct page
> or the virtual address at the point of allocation.

Well the driver needs to be written based on how HW functions even if
it results in making use of an API which isn't used earlier by others.

>
>> > Is there no way this can be avoided, perhaps by tracking the pages some other way
>>
>> I have explained that in the commit message
>> --
>> Also VNIC doesn't have a seperate receive buffer ring per receive
>> queue, so there is no 1:1 descriptor index matching between CQE_RX
>> and the index in buffer ring from where a buffer has been used for
>> DMA'ing. Unlike other NICs, here it's not possible to maintain dma
>> address to virt address mappings within the driver. This leaves us
>> no other choice but to use IOMMU's IOVA address conversion API to
>> get buffer's virtual address which can be given to network stack
>> for processing.
>> --
>>
>> >(although I don't understand why you're having to mess with the page reference
>> >counts to start with)?
>> Not sure why you say it's a mess, adjusting page reference counts is quite
>> common if you check other NIC drivers. On ARM64 especially when using
>> 64KB pages, if we have only one packet buffer for each page then we
>> will have to set aside a whole lot of memory which sometimes is not possible
>> on embedded platforms. Hence multiple pkt buffers per page, and page reference
>> is set accordingly.
>
> I wasn't saying that was a mess, I was just saying that I didn't understand
> why you mess (verb) with the page reference counts (my ignorance of the
> network layer). The code that I think is a mess is:
>
> phys_addr = nicvf_iova_to_phys(nic, buf_addr);
> [...]
> put_page(virt_to_page(phys_to_virt(phys_addr)));

Even if it's possible to record info info in this driver, still page reference
count needs to be released to free it otherwise the page is gone.

>
> because:
>
> (a) You have the information you need at allocation time, but you've
> failed to record that and are trying to use the IOMMU API to
> reconstruct the CPU virtual address

That's exactly what I have explained in the commit message, i.e why
I cannot record info at the time of allocation. Also, HW gives address of
the buffer (IOVA or physcial) where it has DMA'ed the packet and not an
index into buffer ring. There is one single buffer ring for 8 receive queues,
so there is no way to do a mapping btw DMA address at receive queue to
recorded info in buffer ring.

All you said is possible and that is exactly what I would have done if HW
gives me an index into buffer ring instead of DMA'ed address and I wouldn't
have been hit so hard with all the bottlenecks in ARM IOMMU infrastructure.

Thanks,
Sunil.

>
> (b) When there isn't an IOMMU present, you assume that bus addresses ==
> physical addresses
>
> (c) You assume that the DMA buffer is mapped in the linear mapping
>
> that's probably all true for ThunderX/arm64, but it's generally not portable
> or reliable code. If you could get a handle to the struct page that you
> allocated in the first place, then you could use page_address to get its
> virtual address instead of having to go via the physical address.
>
> Will