The s390 iommu can only allow DMA transactions between the zPCI device
entries start_dma and end_dma.
Let's declare the regions before start_dma and after end_dma as
reserved regions using the appropriate callback in iommu_ops.
The reserved region may later be retrieved from sysfs or from
the vfio iommu internal interface.
This seems to me related with the work Shameer has started on
vfio_iommu_type1 so I add Alex and Shameer to the CC list.
Pierre Morel (1):
iommu/s390: Declare s390 iommu reserved regions
drivers/iommu/s390-iommu.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
--
2.7.4
The s390 iommu can only allow DMA transactions between the zPCI device
entries start_dma and end_dma.
Let's declare the regions before start_dma and after end_dma as
reserved regions using the appropriate callback in iommu_ops.
Signed-off-by: Pierre Morel <[email protected]>
---
drivers/iommu/s390-iommu.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 22d4db3..5ca91a1 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -363,6 +363,33 @@ void zpci_destroy_iommu(struct zpci_dev *zdev)
iommu_device_sysfs_remove(&zdev->iommu_dev);
}
+static void s390_get_resv_regions(struct device *dev, struct list_head *head)
+{
+ struct iommu_resv_region *region;
+ struct zpci_dev *zdev = to_pci_dev(dev)->sysdata;
+
+ region = iommu_alloc_resv_region(0, zdev->start_dma,
+ 0, IOMMU_RESV_RESERVED);
+ if (!region)
+ return;
+ list_add_tail(®ion->list, head);
+
+ region = iommu_alloc_resv_region(zdev->end_dma + 1,
+ ~0UL - zdev->end_dma,
+ 0, IOMMU_RESV_RESERVED);
+ if (!region)
+ return;
+ list_add_tail(®ion->list, head);
+}
+
+static void s390_put_resv_regions(struct device *dev, struct list_head *head)
+{
+ struct iommu_resv_region *entry, *next;
+
+ list_for_each_entry_safe(entry, next, head, list)
+ kfree(entry);
+}
+
static const struct iommu_ops s390_iommu_ops = {
.capable = s390_iommu_capable,
.domain_alloc = s390_domain_alloc,
@@ -376,6 +403,8 @@ static const struct iommu_ops s390_iommu_ops = {
.remove_device = s390_iommu_remove_device,
.device_group = generic_device_group,
.pgsize_bitmap = S390_IOMMU_PGSIZES,
+ .get_resv_regions = s390_get_resv_regions,
+ .put_resv_regions = s390_put_resv_regions,
};
static int __init s390_iommu_init(void)
--
2.7.4
On Tue, 15 Jan 2019 18:37:30 +0100
Pierre Morel <[email protected]> wrote:
> The s390 iommu can only allow DMA transactions between the zPCI device
> entries start_dma and end_dma.
>
> Let's declare the regions before start_dma and after end_dma as
> reserved regions using the appropriate callback in iommu_ops.
>
> Signed-off-by: Pierre Morel <[email protected]>
> ---
> drivers/iommu/s390-iommu.c | 29 +++++++++++++++++++++++++++++
> 1 file changed, 29 insertions(+)
>
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 22d4db3..5ca91a1 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -363,6 +363,33 @@ void zpci_destroy_iommu(struct zpci_dev *zdev)
> iommu_device_sysfs_remove(&zdev->iommu_dev);
> }
>
> +static void s390_get_resv_regions(struct device *dev, struct list_head *head)
> +{
> + struct iommu_resv_region *region;
> + struct zpci_dev *zdev = to_pci_dev(dev)->sysdata;
> +
> + region = iommu_alloc_resv_region(0, zdev->start_dma,
> + 0, IOMMU_RESV_RESERVED);
> + if (!region)
> + return;
> + list_add_tail(®ion->list, head);
> +
> + region = iommu_alloc_resv_region(zdev->end_dma + 1,
> + ~0UL - zdev->end_dma,
> + 0, IOMMU_RESV_RESERVED);
Can you guarantee that start_dma will never be 0 and end_dma never ~0UL,
even with future HW?
In any of these cases, your code would reserve strange ranges, and sysfs
would report broken reserved ranges.
Maybe add a check for start_dma > 0 and end_dma < ULONG_MAX?
> + if (!region)
> + return;
> + list_add_tail(®ion->list, head);
> +}
> +
> +static void s390_put_resv_regions(struct device *dev, struct list_head *head)
> +{
> + struct iommu_resv_region *entry, *next;
> +
> + list_for_each_entry_safe(entry, next, head, list)
> + kfree(entry);
> +}
It looks very wrong that there is no matching list_del() for the previous
list_add_tail(). However, it seems to be done like this everywhere else,
and the calling functions (currently) only use temporary list_heads as
far as I can see, so I guess it should be OK (for now).
Still, a list_del() would be nice :-)
> +
> static const struct iommu_ops s390_iommu_ops = {
> .capable = s390_iommu_capable,
> .domain_alloc = s390_domain_alloc,
> @@ -376,6 +403,8 @@ static const struct iommu_ops s390_iommu_ops = {
> .remove_device = s390_iommu_remove_device,
> .device_group = generic_device_group,
> .pgsize_bitmap = S390_IOMMU_PGSIZES,
> + .get_resv_regions = s390_get_resv_regions,
> + .put_resv_regions = s390_put_resv_regions,
> };
>
> static int __init s390_iommu_init(void)
With the start/end_dma issue addressed (if necessary):
Acked-by: Gerald Schaefer <[email protected]>
On 15/01/2019 20:33, Gerald Schaefer wrote:
> On Tue, 15 Jan 2019 18:37:30 +0100
> Pierre Morel <[email protected]> wrote:
>
>> The s390 iommu can only allow DMA transactions between the zPCI device
>> entries start_dma and end_dma.
>>
>> Let's declare the regions before start_dma and after end_dma as
>> reserved regions using the appropriate callback in iommu_ops.
>>
>> Signed-off-by: Pierre Morel <[email protected]>
>> ---
>> drivers/iommu/s390-iommu.c | 29 +++++++++++++++++++++++++++++
>> 1 file changed, 29 insertions(+)
>>
>> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
>> index 22d4db3..5ca91a1 100644
>> --- a/drivers/iommu/s390-iommu.c
>> +++ b/drivers/iommu/s390-iommu.c
>> @@ -363,6 +363,33 @@ void zpci_destroy_iommu(struct zpci_dev *zdev)
>> iommu_device_sysfs_remove(&zdev->iommu_dev);
>> }
>>
>> +static void s390_get_resv_regions(struct device *dev, struct list_head *head)
>> +{
>> + struct iommu_resv_region *region;
>> + struct zpci_dev *zdev = to_pci_dev(dev)->sysdata;
>> +
>> + region = iommu_alloc_resv_region(0, zdev->start_dma,
>> + 0, IOMMU_RESV_RESERVED);
>> + if (!region)
>> + return;
>> + list_add_tail(®ion->list, head);
>> +
>> + region = iommu_alloc_resv_region(zdev->end_dma + 1,
>> + ~0UL - zdev->end_dma,
>> + 0, IOMMU_RESV_RESERVED);
>
> Can you guarantee that start_dma will never be 0 and end_dma never ~0UL,
> even with future HW?
>
> In any of these cases, your code would reserve strange ranges, and sysfs
> would report broken reserved ranges.
>
> Maybe add a check for start_dma > 0 and end_dma < ULONG_MAX?
Yes, thanks.
>
>> + if (!region)
>> + return;
>> + list_add_tail(®ion->list, head);
>> +}
>> +
>> +static void s390_put_resv_regions(struct device *dev, struct list_head *head)
>> +{
>> + struct iommu_resv_region *entry, *next;
>> +
>> + list_for_each_entry_safe(entry, next, head, list)
>> + kfree(entry);
>> +}
>
> It looks very wrong that there is no matching list_del() for the previous
> list_add_tail(). However, it seems to be done like this everywhere else,
> and the calling functions (currently) only use temporary list_heads as
> far as I can see, so I guess it should be OK (for now).
>
> Still, a list_del() would be nice :-)
hum.
right.
>
>> +
>> static const struct iommu_ops s390_iommu_ops = {
>> .capable = s390_iommu_capable,
>> .domain_alloc = s390_domain_alloc,
>> @@ -376,6 +403,8 @@ static const struct iommu_ops s390_iommu_ops = {
>> .remove_device = s390_iommu_remove_device,
>> .device_group = generic_device_group,
>> .pgsize_bitmap = S390_IOMMU_PGSIZES,
>> + .get_resv_regions = s390_get_resv_regions,
>> + .put_resv_regions = s390_put_resv_regions,
>> };
>>
>> static int __init s390_iommu_init(void)
>
> With the start/end_dma issue addressed (if necessary):
> Acked-by: Gerald Schaefer <[email protected]>
>
Thanks.
Regards,
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
Hi Pierre,
> -----Original Message-----
> From: Pierre Morel [mailto:[email protected]]
> Sent: 15 January 2019 17:37
> To: [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; Shameerali Kolothum Thodi
> <[email protected]>; [email protected]
> Subject: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions
>
> The s390 iommu can only allow DMA transactions between the zPCI device
> entries start_dma and end_dma.
>
> Let's declare the regions before start_dma and after end_dma as
> reserved regions using the appropriate callback in iommu_ops.
>
> The reserved region may later be retrieved from sysfs or from
> the vfio iommu internal interface.
Just in case you are planning to use the sysfs interface to retrieve the valid
regions, and intend to use that in Qemu vfio path, please see the discussion
here [1] (If you haven't seen this already)
Thanks,
Shameer
[1] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03651.html
> This seems to me related with the work Shameer has started on
> vfio_iommu_type1 so I add Alex and Shameer to the CC list.
>
> Pierre Morel (1):
> iommu/s390: Declare s390 iommu reserved regions
>
> drivers/iommu/s390-iommu.c | 29 +++++++++++++++++++++++++++++
> 1 file changed, 29 insertions(+)
>
> --
> 2.7.4
On 15/01/2019 17:37, Pierre Morel wrote:
> The s390 iommu can only allow DMA transactions between the zPCI device
> entries start_dma and end_dma.
>
> Let's declare the regions before start_dma and after end_dma as
> reserved regions using the appropriate callback in iommu_ops.
>
> The reserved region may later be retrieved from sysfs or from
> the vfio iommu internal interface.
For this particular case, I think the best solution is to give VFIO the
ability to directly interrogate the domain geometry (which s390 appears
to set correctly already). The idea of reserved regions was really for
'unexpected' holes inside the usable address space - using them to also
describe places that are entirely outside that address space rather
confuses things IMO.
Furthermore, even if we *did* end up going down the route of actively
reserving addresses beyond the usable aperture, it doesn't seem sensible
for individual drivers to do it themselves when the core API already
describes the relevant information generically.
Robin.
>
> This seems to me related with the work Shameer has started on
> vfio_iommu_type1 so I add Alex and Shameer to the CC list.
>
>
> Pierre Morel (1):
> iommu/s390: Declare s390 iommu reserved regions
>
> drivers/iommu/s390-iommu.c | 29 +++++++++++++++++++++++++++++
> 1 file changed, 29 insertions(+)
>
On 17/01/2019 10:23, Shameerali Kolothum Thodi wrote:
> Hi Pierre,
>
>> -----Original Message-----
>> From: Pierre Morel [mailto:[email protected]]
>> Sent: 15 January 2019 17:37
>> To: [email protected]
>> Cc: [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; Shameerali Kolothum Thodi
>> <[email protected]>; [email protected]
>> Subject: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions
>>
>> The s390 iommu can only allow DMA transactions between the zPCI device
>> entries start_dma and end_dma.
>>
>> Let's declare the regions before start_dma and after end_dma as
>> reserved regions using the appropriate callback in iommu_ops.
>>
>> The reserved region may later be retrieved from sysfs or from
>> the vfio iommu internal interface.
>
> Just in case you are planning to use the sysfs interface to retrieve the valid
> regions, and intend to use that in Qemu vfio path, please see the discussion
> here [1] (If you haven't seen this already)
>
> Thanks,
> Shameer
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03651.html
>
>> This seems to me related with the work Shameer has started on
>> vfio_iommu_type1 so I add Alex and Shameer to the CC list.
>>
>> Pierre Morel (1):
>> iommu/s390: Declare s390 iommu reserved regions
>>
>> drivers/iommu/s390-iommu.c | 29 +++++++++++++++++++++++++++++
>> 1 file changed, 29 insertions(+)
>>
>> --
>> 2.7.4
>
Thanks Shameer,
Interesting discussion indeed.
AFAIK the patch series you are working on will provide a way to retrieve
the reserved region through the VFIO IOMMU interface, using capabilities
in the VFIO_IOMMU_GET_INFO.
Before this patch, the iommu_type1 was not able to retrieve the
forbidden region from the s390_iommu.
See this patch is a contribution, so that these regions will appear in
the reserved list when the VFIO_IOMM_GET_INFO will be able to report the
information.
I am expecting to be able to to retrieve this information from the
VFIO_IOMMU_GET_INFO syscall as soon as it is available.
Regards,
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 17/01/2019 14:02, Robin Murphy wrote:
> On 15/01/2019 17:37, Pierre Morel wrote:
>> The s390 iommu can only allow DMA transactions between the zPCI device
>> entries start_dma and end_dma.
>>
>> Let's declare the regions before start_dma and after end_dma as
>> reserved regions using the appropriate callback in iommu_ops.
>>
>> The reserved region may later be retrieved from sysfs or from
>> the vfio iommu internal interface.
>
> For this particular case, I think the best solution is to give VFIO the
> ability to directly interrogate the domain geometry (which s390 appears
> to set correctly already). The idea of reserved regions was really for
> 'unexpected' holes inside the usable address space - using them to also
> describe places that are entirely outside that address space rather
> confuses things IMO.
>
> Furthermore, even if we *did* end up going down the route of actively
> reserving addresses beyond the usable aperture, it doesn't seem sensible
> for individual drivers to do it themselves when the core API already
> describes the relevant information generically.
>
> Robin.
Robin,
I already posted a patch retrieving the geometry through
VFIO_IOMMU_GET_INFO using a specific capability for the geometry [1],
and AFAIU, Alex did not agree with this.
What is different in what you propose?
@Alex: I was hoping that this patch goes in your direction. What do you
think?
Thanks,
Pierre
[1]: https://lore.kernel.org/patchwork/patch/1030369/
>
>>
>> This seems to me related with the work Shameer has started on
>> vfio_iommu_type1 so I add Alex and Shameer to the CC list.
>>
>>
>> Pierre Morel (1):
>> iommu/s390: Declare s390 iommu reserved regions
>>
>> drivers/iommu/s390-iommu.c | 29 +++++++++++++++++++++++++++++
>> 1 file changed, 29 insertions(+)
>>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
Hi Pierre,
On 18/01/2019 13:29, Pierre Morel wrote:
> On 17/01/2019 14:02, Robin Murphy wrote:
>> On 15/01/2019 17:37, Pierre Morel wrote:
>>> The s390 iommu can only allow DMA transactions between the zPCI device
>>> entries start_dma and end_dma.
>>>
>>> Let's declare the regions before start_dma and after end_dma as
>>> reserved regions using the appropriate callback in iommu_ops.
>>>
>>> The reserved region may later be retrieved from sysfs or from
>>> the vfio iommu internal interface.
>>
>> For this particular case, I think the best solution is to give VFIO
>> the ability to directly interrogate the domain geometry (which s390
>> appears to set correctly already). The idea of reserved regions was
>> really for 'unexpected' holes inside the usable address space - using
>> them to also describe places that are entirely outside that address
>> space rather confuses things IMO.
>>
>> Furthermore, even if we *did* end up going down the route of actively
>> reserving addresses beyond the usable aperture, it doesn't seem
>> sensible for individual drivers to do it themselves when the core API
>> already describes the relevant information generically.
>>
>> Robin.
>
> Robin,
>
> I already posted a patch retrieving the geometry through
> VFIO_IOMMU_GET_INFO using a specific capability for the geometry [1],
> and AFAIU, Alex did not agree with this.
On arm we also need to report the IOMMU geometry to userspace (max IOVA
size in particular). Shameer has been working on a solution [2] that
presents a unified view of both geometry and reserved regions into the
VFIO_IOMMU_GET_INFO call, and I think we should go with that. If I
understand correctly it's currently blocked on the RMRR problem and
we're waiting for Jacob or Ashok to take a look at it, as Kevin pinged
them on thread [1]?
[2] https://lkml.org/lkml/2018/4/18/293
Thanks,
Jean
>
> What is different in what you propose?
>
> @Alex: I was hoping that this patch goes in your direction. What do you
> think?
>
> Thanks,
> Pierre
>
> [1]: https://lore.kernel.org/patchwork/patch/1030369/
>
>>
>>>
>>> This seems to me related with the work Shameer has started on
>>> vfio_iommu_type1 so I add Alex and Shameer to the CC list.
>>>
>>>
>>> Pierre Morel (1):
>>> iommu/s390: Declare s390 iommu reserved regions
>>>
>>> drivers/iommu/s390-iommu.c | 29 +++++++++++++++++++++++++++++
>>> 1 file changed, 29 insertions(+)
>>>
>>
>
>
On 18/01/2019 14:51, Jean-Philippe Brucker wrote:
> Hi Pierre,
>
> On 18/01/2019 13:29, Pierre Morel wrote:
>> On 17/01/2019 14:02, Robin Murphy wrote:
>>> On 15/01/2019 17:37, Pierre Morel wrote:
>>>> The s390 iommu can only allow DMA transactions between the zPCI device
>>>> entries start_dma and end_dma.
>>>>
...
>>
>> I already posted a patch retrieving the geometry through
>> VFIO_IOMMU_GET_INFO using a specific capability for the geometry [1],
>> and AFAIU, Alex did not agree with this.
>
> On arm we also need to report the IOMMU geometry to userspace (max IOVA
> size in particular). Shameer has been working on a solution [2] that
> presents a unified view of both geometry and reserved regions into the
> VFIO_IOMMU_GET_INFO call, and I think we should go with that. If I
> understand correctly it's currently blocked on the RMRR problem and
> we're waiting for Jacob or Ashok to take a look at it, as Kevin pinged
> them on thread [1]?
>
> [2] https://lkml.org/lkml/2018/4/18/293
>
> Thanks,
> Jean
>
Hi Jean,
I hopped that this proposition went in the same direction based on the
following assumptions:
- The goal of the get_resv_region is defined in iommu.h as:
-----
* @get_resv_regions: Request list of reserved regions for a device
-----
- A iommu reserve region is a region which should not be mapped.
Isn't it exactly what happens outside the aperture?
Shouldn't it be reported by the iommu reserved region?
- If we use VFIO and want to get all reserved region we will have the
VFIO_IOMMU_GET_INFO call provided by Shameer and it can get all reserved
regions depending from the iommu driver itself at once by calling the
get_reserved_region callback instead of having to merge them with the
aperture.
- If there are other reserved region, depending on the system
configuration and not on the IOMMU itself, the VFIO_IOMMU_GET_INFO call
will have to merge them with the region gotten from the iommu driver.
- If we do not use QEMU nor VFIO at all, AFAIU, the standard way to
retrieve the reserved regions associated with a device is to call the
get_reserved_region callback from the associated iommu.
Please tell me were I am wrong.
Regards,
Pierre
>>
>> What is different in what you propose?
>>
>> @Alex: I was hoping that this patch goes in your direction. What do you
>> think?
>>
>> Thanks,
>> Pierre
>>
>> [1]: https://lore.kernel.org/patchwork/patch/1030369/
>>
>>>
>>>>
>>>> This seems to me related with the work Shameer has started on
>>>> vfio_iommu_type1 so I add Alex and Shameer to the CC list.
>>>>
>>>>
>>>> Pierre Morel (1):
>>>> iommu/s390: Declare s390 iommu reserved regions
>>>>
>>>> drivers/iommu/s390-iommu.c | 29 +++++++++++++++++++++++++++++
>>>> 1 file changed, 29 insertions(+)
>>>>
>>>
>>
>>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 18/01/2019 13:29, Pierre Morel wrote:
> On 17/01/2019 14:02, Robin Murphy wrote:
>> On 15/01/2019 17:37, Pierre Morel wrote:
>>> The s390 iommu can only allow DMA transactions between the zPCI device
>>> entries start_dma and end_dma.
>>>
>>> Let's declare the regions before start_dma and after end_dma as
>>> reserved regions using the appropriate callback in iommu_ops.
>>>
>>> The reserved region may later be retrieved from sysfs or from
>>> the vfio iommu internal interface.
>>
>> For this particular case, I think the best solution is to give VFIO
>> the ability to directly interrogate the domain geometry (which s390
>> appears to set correctly already). The idea of reserved regions was
>> really for 'unexpected' holes inside the usable address space - using
>> them to also describe places that are entirely outside that address
>> space rather confuses things IMO.
>>
>> Furthermore, even if we *did* end up going down the route of actively
>> reserving addresses beyond the usable aperture, it doesn't seem
>> sensible for individual drivers to do it themselves when the core API
>> already describes the relevant information generically.
>>
>> Robin.
>
> Robin,
>
> I already posted a patch retrieving the geometry through
> VFIO_IOMMU_GET_INFO using a specific capability for the geometry [1],
> and AFAIU, Alex did not agree with this.
>
> What is different in what you propose?
I didn't mean to imply that aperture and reserved regions are mutually
exclusive, just that they are conceptually distinct things, i.e. there
is a fundamental difference between "address which could in theory be
mapped but wouldn't work as expected" and "address which is physically
impossible to map at all".
Admittedly I hadn't closely followed all of the previous discussions,
and Alex has a fair point - for VFIO users who will mostly care about
checking whether two address maps are compatible, it probably is more
useful to just describe a single list of usable regions, rather than the
absolute bounds plus a list of unusable holes within them. That still
doesn't give us any need to conflate things throughout the kernel
internals, though - the typical usage there is to size an IOVA allocator
or page table based on the aperture, then carve out any necessary
reservations. In that context, having to be aware of and handle
'impossible' reservations outside the aperture just invites bugs and
adds complexity that would be better avoided.
Robin.
>
> @Alex: I was hoping that this patch goes in your direction. What do you
> think?
>
> Thanks,
> Pierre
>
> [1]: https://lore.kernel.org/patchwork/patch/1030369/
>
>>
>>>
>>> This seems to me related with the work Shameer has started on
>>> vfio_iommu_type1 so I add Alex and Shameer to the CC list.
>>>
>>>
>>> Pierre Morel (1):
>>> iommu/s390: Declare s390 iommu reserved regions
>>>
>>> drivers/iommu/s390-iommu.c | 29 +++++++++++++++++++++++++++++
>>> 1 file changed, 29 insertions(+)
>>>
>>
>
>
On 21/01/2019 11:51, Pierre Morel wrote:
> On 18/01/2019 14:51, Jean-Philippe Brucker wrote:
>> Hi Pierre,
>>
>> On 18/01/2019 13:29, Pierre Morel wrote:
>>> On 17/01/2019 14:02, Robin Murphy wrote:
>>>> On 15/01/2019 17:37, Pierre Morel wrote:
>>>>> The s390 iommu can only allow DMA transactions between the zPCI device
>>>>> entries start_dma and end_dma.
>>>>>
>
> ...
>
>>>
>>> I already posted a patch retrieving the geometry through
>>> VFIO_IOMMU_GET_INFO using a specific capability for the geometry [1],
>>> and AFAIU, Alex did not agree with this.
>>
>> On arm we also need to report the IOMMU geometry to userspace (max IOVA
>> size in particular). Shameer has been working on a solution [2] that
>> presents a unified view of both geometry and reserved regions into the
>> VFIO_IOMMU_GET_INFO call, and I think we should go with that. If I
>> understand correctly it's currently blocked on the RMRR problem and
>> we're waiting for Jacob or Ashok to take a look at it, as Kevin pinged
>> them on thread [1]?
>>
>> [2] https://lkml.org/lkml/2018/4/18/293
>>
>> Thanks,
>> Jean
>>
>
> Hi Jean,
>
> I hopped that this proposition went in the same direction based on the
> following assumptions:
>
>
> - The goal of the get_resv_region is defined in iommu.h as:
> -----
> * @get_resv_regions: Request list of reserved regions for a device
> -----
>
> - A iommu reserve region is a region which should not be mapped.
> Isn't it exactly what happens outside the aperture?
> Shouldn't it be reported by the iommu reserved region?
>
> - If we use VFIO and want to get all reserved region we will have the
> VFIO_IOMMU_GET_INFO call provided by Shameer and it can get all reserved
> regions depending from the iommu driver itself at once by calling the
> get_reserved_region callback instead of having to merge them with the
> aperture.
>
> - If there are other reserved region, depending on the system
> configuration and not on the IOMMU itself, the VFIO_IOMMU_GET_INFO call
> will have to merge them with the region gotten from the iommu driver.
I guess that would only happen if VFIO wanted to reserve IOVA regions
for its own use. But I don't see how that relates to the aperture
> - If we do not use QEMU nor VFIO at all, AFAIU, the standard way to
> retrieve the reserved regions associated with a device is to call the
> get_reserved_region callback from the associated iommu.
>
> Please tell me were I am wrong.
Those are good points in my opinion (assuming the new reserved regions
for aperture is done in the core API)
To be reliable though, userspace can't get the aperture from sysfs
reserved_regions, it will have to get it from VFIO. If a new application
expects the aperture to be described in reserved_regions but is running
under an old kernel, it will just assume a 64-bit aperture by mistake.
Unless we introduce a new IOMMU_RESV_* type to distinguish the aperture?
Another thing to note is that we're currently adding nested support to
VFIO for Arm and x86 archs. It requires providing low-level and
sometimes arch-specific information about the IOMMU to userspace,
information that is easier to describe using sysfs (e.g.
/sys/class/iommu/<iommu>/*) than a VFIO ioctl. Among them is the input
address size of the IOMMU, so we'll end up with redundant information in
sysfs on x86 and Arm sides, but that's probably harmless.
Thanks,
Jean
On Mon, 21 Jan 2019 12:51:14 +0100
Pierre Morel <[email protected]> wrote:
> On 18/01/2019 14:51, Jean-Philippe Brucker wrote:
> > Hi Pierre,
> >
> > On 18/01/2019 13:29, Pierre Morel wrote:
> >> On 17/01/2019 14:02, Robin Murphy wrote:
> >>> On 15/01/2019 17:37, Pierre Morel wrote:
> >>>> The s390 iommu can only allow DMA transactions between the zPCI device
> >>>> entries start_dma and end_dma.
> >>>>
>
> ...
>
> >>
> >> I already posted a patch retrieving the geometry through
> >> VFIO_IOMMU_GET_INFO using a specific capability for the geometry [1],
> >> and AFAIU, Alex did not agree with this.
> >
> > On arm we also need to report the IOMMU geometry to userspace (max IOVA
> > size in particular). Shameer has been working on a solution [2] that
> > presents a unified view of both geometry and reserved regions into the
> > VFIO_IOMMU_GET_INFO call, and I think we should go with that. If I
> > understand correctly it's currently blocked on the RMRR problem and
> > we're waiting for Jacob or Ashok to take a look at it, as Kevin pinged
> > them on thread [1]?
> >
> > [2] https://lkml.org/lkml/2018/4/18/293
> >
> > Thanks,
> > Jean
> >
>
> Hi Jean,
>
> I hopped that this proposition went in the same direction based on the
> following assumptions:
>
>
> - The goal of the get_resv_region is defined in iommu.h as:
> -----
> * @get_resv_regions: Request list of reserved regions for a device
> -----
>
> - A iommu reserve region is a region which should not be mapped.
> Isn't it exactly what happens outside the aperture?
> Shouldn't it be reported by the iommu reserved region?
>
> - If we use VFIO and want to get all reserved region we will have the
> VFIO_IOMMU_GET_INFO call provided by Shameer and it can get all reserved
> regions depending from the iommu driver itself at once by calling the
> get_reserved_region callback instead of having to merge them with the
> aperture.
>
> - If there are other reserved region, depending on the system
> configuration and not on the IOMMU itself, the VFIO_IOMMU_GET_INFO call
> will have to merge them with the region gotten from the iommu driver.
>
> - If we do not use QEMU nor VFIO at all, AFAIU, the standard way to
> retrieve the reserved regions associated with a device is to call the
> get_reserved_region callback from the associated iommu.
>
> Please tell me were I am wrong.
The existing proposal by Shameer exposes an IOVA list, which includes
ranges that the user _can_ map through the IOMMU, therefore this
original patch is unnecessary, the IOMMU geometry is already the
starting point of the IOVA list, creating the original node, which is
split as necessary to account for the reserved regions. To me,
presenting a user interface that exposes ranges that _cannot_ be mapped
is a strange interface. If we started from a position of what
information do we want to provide to the user and how will they consume
it, would we present a list of reserved ranges? I think the only
argument for the negative ranges is that we already have them in the
kernel, but I don't see that it necessarily makes them the right
solution for userspace.
> >> What is different in what you propose?
> >>
> >> @Alex: I was hoping that this patch goes in your direction. What do you
> >> think?
I think it's unnecessary given that in Shameer's proposal:
- We start from the IOMMU exposed geometry
- We present a list of usable IOVA ranges, not a list of reserved
ranges
Thanks,
Alex