On 30/11/2017 19:30, Alex Williamson wrote:
> On Thu, 30 Nov 2017 16:11:35 +0100
> Pierre Morel <[email protected]> wrote:
>
>> On 30/11/2017 15:08, Alex Williamson wrote:
>>> On Thu, 30 Nov 2017 12:34:38 +0100
>>> Pierre Morel <[email protected]> wrote:
>>>
>>>> When userland VFIO defines a new IOMMU for a guest it may
>>>> want to specify to the guest the physical limits of
>>>> the underlying host IOMMU to avoid access to forbidden
>>>> memory ranges.
>>>>
>>>> Currently, the vfio_iommu_type1 driver does not report this
>>>> information to userland.
>>>>
>>>> Let's extend the vfio_iommu_type1_info structure reported
>>>> by the ioctl VFIO_IOMMU_GET_INFO command to report the
>>>> IOMMU limits as new uint64_t entries aperture_start and
>>>> aperture_end.
>>>>
>>>> Let's also extend the flags bit map to add a flag specifying
>>>> if this extension of the info structure is reported or not.
>>>>
>>>> Signed-off-by: Pierre Morel <[email protected]>
>>>> ---
>>>> drivers/vfio/vfio_iommu_type1.c | 42 +++++++++++++++++++++++++++++++++++++++++
>>>> include/uapi/linux/vfio.h | 3 +++
>>>> 2 files changed, 45 insertions(+)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>> index 8549cb1..7da5fe0 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -1526,6 +1526,40 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>>> return ret;
>>>> }
>>>>
>>>> +/**
>>>> + * vfio_get_aperture - report minimal aperture of a vfio_iommu
>>>> + * @iommu: the current vfio_iommu
>>>> + * @start: a pointer to the aperture start
>>>> + * @end : a pointer to the aperture end
>>>> + *
>>>> + * This function iterate on the domains using the given vfio_iommu
>>>> + * and restrict the aperture to the minimal aperture common
>>>> + * to all domains sharing this vfio_iommu.
>>>> + */
>>>> +static void vfio_get_aperture(struct vfio_iommu *iommu, uint64_t *start,
>>>> + uint64_t *end)
>>>> +{
>>>> + struct iommu_domain_geometry geometry;
>>>> + struct vfio_domain *domain;
>>>> +
>>>> + *start = 0;
>>>> + *end = U64_MAX;
>>>> +
>>>> + mutex_lock(&iommu->lock);
>>>> + /* loop on all domains using this vfio_iommu */
>>>> + list_for_each_entry(domain, &iommu->domain_list, next) {
>>>> + iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
>>>> + &geometry);
>>>> + if (geometry.force_aperture) {
>>>> + if (geometry.aperture_start > *start)
>>>> + *start = geometry.aperture_start;
>>>> + if (geometry.aperture_end < *end)
>>>> + *end = geometry.aperture_end;
>>>> + }
>>>> + }
>>>> + mutex_unlock(&iommu->lock);
>>>> +}
>>>> +
>>>> static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>> unsigned int cmd, unsigned long arg)
>>>> {
>>>> @@ -1560,6 +1594,14 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>
>>>> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>>>
>>>> + minsz = min_t(size_t, info.argsz, sizeof(info));
>>>> + if (minsz >= offsetofend(struct vfio_iommu_type1_info,
>>>> + aperture_end)) {
>>>> + info.flags |= VFIO_IOMMU_INFO_APERTURE;
>>>> + vfio_get_aperture(iommu, &info.aperture_start,
>>>> + &info.aperture_end);
>>>> + }
>>>> +
>>>> return copy_to_user((void __user *)arg, &info, minsz) ?
>>>> -EFAULT : 0;
>>>>
>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>> index 0fb25fb..780d909 100644
>>>> --- a/include/uapi/linux/vfio.h
>>>> +++ b/include/uapi/linux/vfio.h
>>>> @@ -519,6 +519,9 @@ struct vfio_iommu_type1_info {
>>>> __u32 flags;
>>>> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
>>>> __u64 iova_pgsizes; /* Bitmap of supported page sizes */
>>>> +#define VFIO_IOMMU_INFO_APERTURE (1 << 1) /* supported aperture info */
>>>> + __u64 aperture_start; /* start of DMA aperture */
>>>> + __u64 aperture_end; /* end of DMA aperture */
>>>> };
>>>>
>>>> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>>
>>> This only supports the most simple topology, even x86 cannot claim to
>>> have a single contiguous aperture, it's typically bisected by an MSI
>>> window. I think we need an API that supports one or more apertures
>>> out of the box. Also as Eric indicates, a capability is probably the
>>> better option for creating a flexible structure. Thanks,
>>>
>>> Alex
>>>
>>
>>
>> Yes, I understand that a capability here is a must, I will follow this way.
>>
>> For having multiple aperture and MSI protection, I understood it was
>> done using windows and reserved regions.
>> Can you point me to my error?
>
> See the thread from Huawei, I don't think that's a solved problem:
>
> https://lists.gnu.org/archive/html/qemu-arm/2017-11/msg00237.html
>
> If you want sysfs to be consumed separately by the user and fed into
> new QEMU command line options for creating a VM layout, perhaps that's
> sufficient, but I think the vfio api for the iommu should encompass
> describing available ranges of mappable iova space without cobbling
> together arbitrary info from sysfs. Thanks,
>
> Alex
>
Hi Alex,
I resume to see if I understood you well:
We may have physical IOMMUs with a more complex access that can not be
specified by only defining the start and end of a read/write region.
Windows can be used to reserve regions for the VM but it is not what we
want. What we want is to know what the host can offer which is a mix of
aperture and windows.
To report this we can use capabilities in a positive way, describing
what the host offers not what it can not provide.
To achieve this we have to use two interfaces:
- VFIO user interface with VFIO_IOMMU_GET_INFO and capabilities
- Physical IOMMU interface with both geometry and window iommu_ops
callbacks.
If it is sufficiently near from what you thought I will provide a new
version in this direction.
Regards,
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On Fri, 1 Dec 2017 10:38:07 +0100
Pierre Morel <[email protected]> wrote:
> On 30/11/2017 19:30, Alex Williamson wrote:
> > On Thu, 30 Nov 2017 16:11:35 +0100
> > Pierre Morel <[email protected]> wrote:
> >
> >> On 30/11/2017 15:08, Alex Williamson wrote:
> >>> On Thu, 30 Nov 2017 12:34:38 +0100
> >>> Pierre Morel <[email protected]> wrote:
> >>>
> >>>> When userland VFIO defines a new IOMMU for a guest it may
> >>>> want to specify to the guest the physical limits of
> >>>> the underlying host IOMMU to avoid access to forbidden
> >>>> memory ranges.
> >>>>
> >>>> Currently, the vfio_iommu_type1 driver does not report this
> >>>> information to userland.
> >>>>
> >>>> Let's extend the vfio_iommu_type1_info structure reported
> >>>> by the ioctl VFIO_IOMMU_GET_INFO command to report the
> >>>> IOMMU limits as new uint64_t entries aperture_start and
> >>>> aperture_end.
> >>>>
> >>>> Let's also extend the flags bit map to add a flag specifying
> >>>> if this extension of the info structure is reported or not.
> >>>>
> >>>> Signed-off-by: Pierre Morel <[email protected]>
> >>>> ---
> >>>> drivers/vfio/vfio_iommu_type1.c | 42 +++++++++++++++++++++++++++++++++++++++++
> >>>> include/uapi/linux/vfio.h | 3 +++
> >>>> 2 files changed, 45 insertions(+)
> >>>>
> >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>>> index 8549cb1..7da5fe0 100644
> >>>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>>> @@ -1526,6 +1526,40 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >>>> return ret;
> >>>> }
> >>>>
> >>>> +/**
> >>>> + * vfio_get_aperture - report minimal aperture of a vfio_iommu
> >>>> + * @iommu: the current vfio_iommu
> >>>> + * @start: a pointer to the aperture start
> >>>> + * @end : a pointer to the aperture end
> >>>> + *
> >>>> + * This function iterate on the domains using the given vfio_iommu
> >>>> + * and restrict the aperture to the minimal aperture common
> >>>> + * to all domains sharing this vfio_iommu.
> >>>> + */
> >>>> +static void vfio_get_aperture(struct vfio_iommu *iommu, uint64_t *start,
> >>>> + uint64_t *end)
> >>>> +{
> >>>> + struct iommu_domain_geometry geometry;
> >>>> + struct vfio_domain *domain;
> >>>> +
> >>>> + *start = 0;
> >>>> + *end = U64_MAX;
> >>>> +
> >>>> + mutex_lock(&iommu->lock);
> >>>> + /* loop on all domains using this vfio_iommu */
> >>>> + list_for_each_entry(domain, &iommu->domain_list, next) {
> >>>> + iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
> >>>> + &geometry);
> >>>> + if (geometry.force_aperture) {
> >>>> + if (geometry.aperture_start > *start)
> >>>> + *start = geometry.aperture_start;
> >>>> + if (geometry.aperture_end < *end)
> >>>> + *end = geometry.aperture_end;
> >>>> + }
> >>>> + }
> >>>> + mutex_unlock(&iommu->lock);
> >>>> +}
> >>>> +
> >>>> static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>> unsigned int cmd, unsigned long arg)
> >>>> {
> >>>> @@ -1560,6 +1594,14 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>>
> >>>> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> >>>>
> >>>> + minsz = min_t(size_t, info.argsz, sizeof(info));
> >>>> + if (minsz >= offsetofend(struct vfio_iommu_type1_info,
> >>>> + aperture_end)) {
> >>>> + info.flags |= VFIO_IOMMU_INFO_APERTURE;
> >>>> + vfio_get_aperture(iommu, &info.aperture_start,
> >>>> + &info.aperture_end);
> >>>> + }
> >>>> +
> >>>> return copy_to_user((void __user *)arg, &info, minsz) ?
> >>>> -EFAULT : 0;
> >>>>
> >>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>>> index 0fb25fb..780d909 100644
> >>>> --- a/include/uapi/linux/vfio.h
> >>>> +++ b/include/uapi/linux/vfio.h
> >>>> @@ -519,6 +519,9 @@ struct vfio_iommu_type1_info {
> >>>> __u32 flags;
> >>>> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
> >>>> __u64 iova_pgsizes; /* Bitmap of supported page sizes */
> >>>> +#define VFIO_IOMMU_INFO_APERTURE (1 << 1) /* supported aperture info */
> >>>> + __u64 aperture_start; /* start of DMA aperture */
> >>>> + __u64 aperture_end; /* end of DMA aperture */
> >>>> };
> >>>>
> >>>> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> >>>
> >>> This only supports the most simple topology, even x86 cannot claim to
> >>> have a single contiguous aperture, it's typically bisected by an MSI
> >>> window. I think we need an API that supports one or more apertures
> >>> out of the box. Also as Eric indicates, a capability is probably the
> >>> better option for creating a flexible structure. Thanks,
> >>>
> >>> Alex
> >>>
> >>
> >>
> >> Yes, I understand that a capability here is a must, I will follow this way.
> >>
> >> For having multiple aperture and MSI protection, I understood it was
> >> done using windows and reserved regions.
> >> Can you point me to my error?
> >
> > See the thread from Huawei, I don't think that's a solved problem:
> >
> > https://lists.gnu.org/archive/html/qemu-arm/2017-11/msg00237.html
> >
> > If you want sysfs to be consumed separately by the user and fed into
> > new QEMU command line options for creating a VM layout, perhaps that's
> > sufficient, but I think the vfio api for the iommu should encompass
> > describing available ranges of mappable iova space without cobbling
> > together arbitrary info from sysfs. Thanks,
> >
> > Alex
> >
>
> Hi Alex,
>
> I resume to see if I understood you well:
>
> We may have physical IOMMUs with a more complex access that can not be
> specified by only defining the start and end of a read/write region.
>
> Windows can be used to reserve regions for the VM but it is not what we
> want. What we want is to know what the host can offer which is a mix of
> aperture and windows.
>
> To report this we can use capabilities in a positive way, describing
> what the host offers not what it can not provide.
>
> To achieve this we have to use two interfaces:
> - VFIO user interface with VFIO_IOMMU_GET_INFO and capabilities
> - Physical IOMMU interface with both geometry and window iommu_ops
> callbacks.
>
> If it is sufficiently near from what you thought I will provide a new
> version in this direction.
I believe so. VFIO would construct a set of mappable iova
regions/windows using information provided via the IOMMU API via
iommu_ops and expose this via a new capability supporting multiple such
regions via the VFIO_IOMMU_GET_INFO ioctl. This ioctl would be
extended to support capabilities in the same way we've done so for
other vfio ioctls. Thanks,
Alex
On 01/12/2017 17:22, Alex Williamson wrote:
> On Fri, 1 Dec 2017 10:38:07 +0100
> Pierre Morel <[email protected]> wrote:
>
>> On 30/11/2017 19:30, Alex Williamson wrote:
>>> On Thu, 30 Nov 2017 16:11:35 +0100
>>> Pierre Morel <[email protected]> wrote:
>>>
>>>> On 30/11/2017 15:08, Alex Williamson wrote:
>>>>> On Thu, 30 Nov 2017 12:34:38 +0100
>>>>> Pierre Morel <[email protected]> wrote:
>>>>>
>>>>>> When userland VFIO defines a new IOMMU for a guest it may
>>>>>> want to specify to the guest the physical limits of
>>>>>> the underlying host IOMMU to avoid access to forbidden
>>>>>> memory ranges.
>>>>>>
>>>>>> Currently, the vfio_iommu_type1 driver does not report this
>>>>>> information to userland.
>>>>>>
>>>>>> Let's extend the vfio_iommu_type1_info structure reported
>>>>>> by the ioctl VFIO_IOMMU_GET_INFO command to report the
>>>>>> IOMMU limits as new uint64_t entries aperture_start and
>>>>>> aperture_end.
>>>>>>
>>>>>> Let's also extend the flags bit map to add a flag specifying
>>>>>> if this extension of the info structure is reported or not.
>>>>>>
>>>>>> Signed-off-by: Pierre Morel <[email protected]>
>>>>>> ---
>>>>>> drivers/vfio/vfio_iommu_type1.c | 42 +++++++++++++++++++++++++++++++++++++++++
>>>>>> include/uapi/linux/vfio.h | 3 +++
>>>>>> 2 files changed, 45 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>>>> index 8549cb1..7da5fe0 100644
>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>> @@ -1526,6 +1526,40 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>>>>> return ret;
>>>>>> }
>>>>>>
>>>>>> +/**
>>>>>> + * vfio_get_aperture - report minimal aperture of a vfio_iommu
>>>>>> + * @iommu: the current vfio_iommu
>>>>>> + * @start: a pointer to the aperture start
>>>>>> + * @end : a pointer to the aperture end
>>>>>> + *
>>>>>> + * This function iterate on the domains using the given vfio_iommu
>>>>>> + * and restrict the aperture to the minimal aperture common
>>>>>> + * to all domains sharing this vfio_iommu.
>>>>>> + */
>>>>>> +static void vfio_get_aperture(struct vfio_iommu *iommu, uint64_t *start,
>>>>>> + uint64_t *end)
>>>>>> +{
>>>>>> + struct iommu_domain_geometry geometry;
>>>>>> + struct vfio_domain *domain;
>>>>>> +
>>>>>> + *start = 0;
>>>>>> + *end = U64_MAX;
>>>>>> +
>>>>>> + mutex_lock(&iommu->lock);
>>>>>> + /* loop on all domains using this vfio_iommu */
>>>>>> + list_for_each_entry(domain, &iommu->domain_list, next) {
>>>>>> + iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
>>>>>> + &geometry);
>>>>>> + if (geometry.force_aperture) {
>>>>>> + if (geometry.aperture_start > *start)
>>>>>> + *start = geometry.aperture_start;
>>>>>> + if (geometry.aperture_end < *end)
>>>>>> + *end = geometry.aperture_end;
>>>>>> + }
>>>>>> + }
>>>>>> + mutex_unlock(&iommu->lock);
>>>>>> +}
>>>>>> +
>>>>>> static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>> unsigned int cmd, unsigned long arg)
>>>>>> {
>>>>>> @@ -1560,6 +1594,14 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>>
>>>>>> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>>>>>
>>>>>> + minsz = min_t(size_t, info.argsz, sizeof(info));
>>>>>> + if (minsz >= offsetofend(struct vfio_iommu_type1_info,
>>>>>> + aperture_end)) {
>>>>>> + info.flags |= VFIO_IOMMU_INFO_APERTURE;
>>>>>> + vfio_get_aperture(iommu, &info.aperture_start,
>>>>>> + &info.aperture_end);
>>>>>> + }
>>>>>> +
>>>>>> return copy_to_user((void __user *)arg, &info, minsz) ?
>>>>>> -EFAULT : 0;
>>>>>>
>>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>>> index 0fb25fb..780d909 100644
>>>>>> --- a/include/uapi/linux/vfio.h
>>>>>> +++ b/include/uapi/linux/vfio.h
>>>>>> @@ -519,6 +519,9 @@ struct vfio_iommu_type1_info {
>>>>>> __u32 flags;
>>>>>> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
>>>>>> __u64 iova_pgsizes; /* Bitmap of supported page sizes */
>>>>>> +#define VFIO_IOMMU_INFO_APERTURE (1 << 1) /* supported aperture info */
>>>>>> + __u64 aperture_start; /* start of DMA aperture */
>>>>>> + __u64 aperture_end; /* end of DMA aperture */
>>>>>> };
>>>>>>
>>>>>> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>>>>
>>>>> This only supports the most simple topology, even x86 cannot claim to
>>>>> have a single contiguous aperture, it's typically bisected by an MSI
>>>>> window. I think we need an API that supports one or more apertures
>>>>> out of the box. Also as Eric indicates, a capability is probably the
>>>>> better option for creating a flexible structure. Thanks,
>>>>>
>>>>> Alex
>>>>>
>>>>
>>>>
>>>> Yes, I understand that a capability here is a must, I will follow this way.
>>>>
>>>> For having multiple aperture and MSI protection, I understood it was
>>>> done using windows and reserved regions.
>>>> Can you point me to my error?
>>>
>>> See the thread from Huawei, I don't think that's a solved problem:
>>>
>>> https://lists.gnu.org/archive/html/qemu-arm/2017-11/msg00237.html
>>>
>>> If you want sysfs to be consumed separately by the user and fed into
>>> new QEMU command line options for creating a VM layout, perhaps that's
>>> sufficient, but I think the vfio api for the iommu should encompass
>>> describing available ranges of mappable iova space without cobbling
>>> together arbitrary info from sysfs. Thanks,
>>>
>>> Alex
>>>
>>
>> Hi Alex,
>>
>> I resume to see if I understood you well:
>>
>> We may have physical IOMMUs with a more complex access that can not be
>> specified by only defining the start and end of a read/write region.
>>
>> Windows can be used to reserve regions for the VM but it is not what we
>> want. What we want is to know what the host can offer which is a mix of
>> aperture and windows.
>>
>> To report this we can use capabilities in a positive way, describing
>> what the host offers not what it can not provide.
>>
>> To achieve this we have to use two interfaces:
>> - VFIO user interface with VFIO_IOMMU_GET_INFO and capabilities
>> - Physical IOMMU interface with both geometry and window iommu_ops
>> callbacks.
>>
>> If it is sufficiently near from what you thought I will provide a new
>> version in this direction.
>
> I believe so. VFIO would construct a set of mappable iova
> regions/windows using information provided via the IOMMU API via
> iommu_ops and expose this via a new capability supporting multiple such
> regions via the VFIO_IOMMU_GET_INFO ioctl. This ioctl would be
> extended to support capabilities in the same way we've done so for
> other vfio ioctls. Thanks,
>
> Alex
>
Hi Alex,
Thanks, I go this way.
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany