2020-03-22 12:27:34

by Yi Liu

[permalink] [raw]
Subject: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

From: Liu Yi L <[email protected]>

VFIO exposes IOMMU nesting translation (a.k.a dual stage translation)
capability to userspace. Thus applications like QEMU could support
vIOMMU with hardware's nesting translation capability for pass-through
devices. Before setting up nesting translation for pass-through devices,
QEMU and other applications need to learn the supported 1st-lvl/stage-1
translation structure format like page table format.

Take vSVA (virtual Shared Virtual Addressing) as an example, to support
vSVA for pass-through devices, QEMU setup nesting translation for pass-
through devices. The guest page table are configured to host as 1st-lvl/
stage-1 page table. Therefore, guest format should be compatible with
host side.

This patch reports the supported 1st-lvl/stage-1 page table format on the
current platform to userspace. QEMU and other alike applications should
use this format info when trying to setup IOMMU nesting translation on
host IOMMU.

Cc: Kevin Tian <[email protected]>
CC: Jacob Pan <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 56 +++++++++++++++++++++++++++++++++++++++++
include/uapi/linux/vfio.h | 1 +
2 files changed, 57 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9aa2a67..82a9e0b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
return ret;
}

+static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
+ u32 *stage1_format)
+{
+ struct vfio_domain *domain;
+ u32 format = 0, tmp_format = 0;
+ int ret;
+
+ mutex_lock(&iommu->lock);
+ if (list_empty(&iommu->domain_list)) {
+ mutex_unlock(&iommu->lock);
+ return -EINVAL;
+ }
+
+ list_for_each_entry(domain, &iommu->domain_list, next) {
+ if (iommu_domain_get_attr(domain->domain,
+ DOMAIN_ATTR_PASID_FORMAT, &format)) {
+ ret = -EINVAL;
+ format = 0;
+ goto out_unlock;
+ }
+ /*
+ * format is always non-zero (the first format is
+ * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
+ * the reason of potential different backed IOMMU
+ * formats, here we expect to have identical formats
+ * in the domain list, no mixed formats support.
+ * return -EINVAL to fail the attempt of setup
+ * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
+ * are detected.
+ */
+ if (tmp_format && tmp_format != format) {
+ ret = -EINVAL;
+ format = 0;
+ goto out_unlock;
+ }
+
+ tmp_format = format;
+ }
+ ret = 0;
+
+out_unlock:
+ if (format)
+ *stage1_format = format;
+ mutex_unlock(&iommu->lock);
+ return ret;
+}
+
static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
struct vfio_info_cap *caps)
{
struct vfio_info_cap_header *header;
struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
+ u32 formats = 0;
+ int ret;
+
+ ret = vfio_iommu_get_stage1_format(iommu, &formats);
+ if (ret) {
+ pr_warn("Failed to get stage-1 format\n");
+ return ret;
+ }

header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
@@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
/* nesting iommu type supports PASID requests (alloc/free) */
nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
}
+ nesting_cap->stage1_formats = formats;

return 0;
}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index ed9881d..ebeaf3e 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting {
struct vfio_info_cap_header header;
#define VFIO_IOMMU_PASID_REQS (1 << 0)
__u32 nesting_capabilities;
+ __u32 stage1_formats;
};

#define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
--
2.7.4


2020-03-22 16:46:59

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Yi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vfio/next]
[also build test ERROR on v5.6-rc6 next-20200320]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Liu-Yi-L/vfio-expose-virtual-Shared-Virtual-Addressing-to-VMs/20200322-213259
base: https://github.com/awilliam/linux-vfio.git next
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=9.2.0 make.cross ARCH=arm64

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All errors (new ones prefixed by >>):

drivers/vfio/vfio_iommu_type1.c: In function 'vfio_iommu_get_stage1_format':
>> drivers/vfio/vfio_iommu_type1.c:2273:4: error: 'DOMAIN_ATTR_PASID_FORMAT' undeclared (first use in this function)
2273 | DOMAIN_ATTR_PASID_FORMAT, &format)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~
drivers/vfio/vfio_iommu_type1.c:2273:4: note: each undeclared identifier is reported only once for each function it appears in
drivers/vfio/vfio_iommu_type1.c: In function 'vfio_iommu_type1_ioctl':
drivers/vfio/vfio_iommu_type1.c:2355:11: error: implicit declaration of function 'iommu_get_uapi_version' [-Werror=implicit-function-declaration]
2355 | return iommu_get_uapi_version();
| ^~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors

vim +/DOMAIN_ATTR_PASID_FORMAT +2273 drivers/vfio/vfio_iommu_type1.c

2257
2258 static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
2259 u32 *stage1_format)
2260 {
2261 struct vfio_domain *domain;
2262 u32 format = 0, tmp_format = 0;
2263 int ret;
2264
2265 mutex_lock(&iommu->lock);
2266 if (list_empty(&iommu->domain_list)) {
2267 mutex_unlock(&iommu->lock);
2268 return -EINVAL;
2269 }
2270
2271 list_for_each_entry(domain, &iommu->domain_list, next) {
2272 if (iommu_domain_get_attr(domain->domain,
> 2273 DOMAIN_ATTR_PASID_FORMAT, &format)) {
2274 ret = -EINVAL;
2275 format = 0;
2276 goto out_unlock;
2277 }
2278 /*
2279 * format is always non-zero (the first format is
2280 * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
2281 * the reason of potential different backed IOMMU
2282 * formats, here we expect to have identical formats
2283 * in the domain list, no mixed formats support.
2284 * return -EINVAL to fail the attempt of setup
2285 * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
2286 * are detected.
2287 */
2288 if (tmp_format && tmp_format != format) {
2289 ret = -EINVAL;
2290 format = 0;
2291 goto out_unlock;
2292 }
2293
2294 tmp_format = format;
2295 }
2296 ret = 0;
2297
2298 out_unlock:
2299 if (format)
2300 *stage1_format = format;
2301 mutex_unlock(&iommu->lock);
2302 return ret;
2303 }
2304

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (3.47 kB)
.config.gz (45.38 kB)
Download all attachments

2020-03-30 12:14:43

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

> From: Liu, Yi L <[email protected]>
> Sent: Sunday, March 22, 2020 8:32 PM
>
> From: Liu Yi L <[email protected]>
>
> VFIO exposes IOMMU nesting translation (a.k.a dual stage translation)
> capability to userspace. Thus applications like QEMU could support
> vIOMMU with hardware's nesting translation capability for pass-through
> devices. Before setting up nesting translation for pass-through devices,
> QEMU and other applications need to learn the supported 1st-lvl/stage-1
> translation structure format like page table format.
>
> Take vSVA (virtual Shared Virtual Addressing) as an example, to support
> vSVA for pass-through devices, QEMU setup nesting translation for pass-
> through devices. The guest page table are configured to host as 1st-lvl/
> stage-1 page table. Therefore, guest format should be compatible with
> host side.
>
> This patch reports the supported 1st-lvl/stage-1 page table format on the
> current platform to userspace. QEMU and other alike applications should
> use this format info when trying to setup IOMMU nesting translation on
> host IOMMU.
>
> Cc: Kevin Tian <[email protected]>
> CC: Jacob Pan <[email protected]>
> Cc: Alex Williamson <[email protected]>
> Cc: Eric Auger <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> ---
> drivers/vfio/vfio_iommu_type1.c | 56
> +++++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 1 +
> 2 files changed, 57 insertions(+)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> index 9aa2a67..82a9e0b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct
> vfio_iommu *iommu,
> return ret;
> }
>
> +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> + u32 *stage1_format)
> +{
> + struct vfio_domain *domain;
> + u32 format = 0, tmp_format = 0;
> + int ret;
> +
> + mutex_lock(&iommu->lock);
> + if (list_empty(&iommu->domain_list)) {
> + mutex_unlock(&iommu->lock);
> + return -EINVAL;
> + }
> +
> + list_for_each_entry(domain, &iommu->domain_list, next) {
> + if (iommu_domain_get_attr(domain->domain,
> + DOMAIN_ATTR_PASID_FORMAT, &format)) {
> + ret = -EINVAL;
> + format = 0;
> + goto out_unlock;
> + }
> + /*
> + * format is always non-zero (the first format is
> + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
> + * the reason of potential different backed IOMMU
> + * formats, here we expect to have identical formats
> + * in the domain list, no mixed formats support.
> + * return -EINVAL to fail the attempt of setup
> + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
> + * are detected.
> + */
> + if (tmp_format && tmp_format != format) {
> + ret = -EINVAL;
> + format = 0;
> + goto out_unlock;
> + }
> +
> + tmp_format = format;
> + }

this path is invoked only in VFIO_IOMMU_GET_INFO path. If we don't
want to assume the status quo that one container holds only one
device w/ vIOMMU (the prerequisite for vSVA), looks we also need
check the format compatibility when attaching a new group to this
container?

> + ret = 0;
> +
> +out_unlock:
> + if (format)
> + *stage1_format = format;
> + mutex_unlock(&iommu->lock);
> + return ret;
> +}
> +
> static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> struct vfio_info_cap *caps)
> {
> struct vfio_info_cap_header *header;
> struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> + u32 formats = 0;
> + int ret;
> +
> + ret = vfio_iommu_get_stage1_format(iommu, &formats);
> + if (ret) {
> + pr_warn("Failed to get stage-1 format\n");
> + return ret;
> + }
>
> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING,
> 1);
> @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct
> vfio_iommu *iommu,
> /* nesting iommu type supports PASID requests (alloc/free)
> */
> nesting_cap->nesting_capabilities |=
> VFIO_IOMMU_PASID_REQS;
> }
> + nesting_cap->stage1_formats = formats;
>
> return 0;
> }
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ed9881d..ebeaf3e 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting {
> struct vfio_info_cap_header header;
> #define VFIO_IOMMU_PASID_REQS (1 << 0)
> __u32 nesting_capabilities;
> + __u32 stage1_formats;

do you plan to support multiple formats? If not, use singular name.

> };
>
> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> --
> 2.7.4

2020-04-01 07:39:46

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

> From: Tian, Kevin <[email protected]>
> Sent: Monday, March 30, 2020 7:49 PM
> To: Liu, Yi L <[email protected]>; [email protected];
> Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
> > From: Liu, Yi L <[email protected]>
> > Sent: Sunday, March 22, 2020 8:32 PM
> >
> > From: Liu Yi L <[email protected]>
> >
> > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation)
> > capability to userspace. Thus applications like QEMU could support
> > vIOMMU with hardware's nesting translation capability for pass-through
> > devices. Before setting up nesting translation for pass-through
> > devices, QEMU and other applications need to learn the supported
> > 1st-lvl/stage-1 translation structure format like page table format.
> >
> > Take vSVA (virtual Shared Virtual Addressing) as an example, to
> > support vSVA for pass-through devices, QEMU setup nesting translation
> > for pass- through devices. The guest page table are configured to host
> > as 1st-lvl/
> > stage-1 page table. Therefore, guest format should be compatible with
> > host side.
> >
> > This patch reports the supported 1st-lvl/stage-1 page table format on
> > the current platform to userspace. QEMU and other alike applications
> > should use this format info when trying to setup IOMMU nesting
> > translation on host IOMMU.
> >
> > Cc: Kevin Tian <[email protected]>
> > CC: Jacob Pan <[email protected]>
> > Cc: Alex Williamson <[email protected]>
> > Cc: Eric Auger <[email protected]>
> > Cc: Jean-Philippe Brucker <[email protected]>
> > Signed-off-by: Liu Yi L <[email protected]>
> > ---
> > drivers/vfio/vfio_iommu_type1.c | 56
> > +++++++++++++++++++++++++++++++++++++++++
> > include/uapi/linux/vfio.h | 1 +
> > 2 files changed, 57 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct
> > vfio_iommu *iommu,
> > return ret;
> > }
> >
> > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> > + u32 *stage1_format)
> > +{
> > + struct vfio_domain *domain;
> > + u32 format = 0, tmp_format = 0;
> > + int ret;
> > +
> > + mutex_lock(&iommu->lock);
> > + if (list_empty(&iommu->domain_list)) {
> > + mutex_unlock(&iommu->lock);
> > + return -EINVAL;
> > + }
> > +
> > + list_for_each_entry(domain, &iommu->domain_list, next) {
> > + if (iommu_domain_get_attr(domain->domain,
> > + DOMAIN_ATTR_PASID_FORMAT, &format)) {
> > + ret = -EINVAL;
> > + format = 0;
> > + goto out_unlock;
> > + }
> > + /*
> > + * format is always non-zero (the first format is
> > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
> > + * the reason of potential different backed IOMMU
> > + * formats, here we expect to have identical formats
> > + * in the domain list, no mixed formats support.
> > + * return -EINVAL to fail the attempt of setup
> > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
> > + * are detected.
> > + */
> > + if (tmp_format && tmp_format != format) {
> > + ret = -EINVAL;
> > + format = 0;
> > + goto out_unlock;
> > + }
> > +
> > + tmp_format = format;
> > + }
>
> this path is invoked only in VFIO_IOMMU_GET_INFO path. If we don't want to
> assume the status quo that one container holds only one device w/ vIOMMU
> (the prerequisite for vSVA), looks we also need check the format
> compatibility when attaching a new group to this container?

right. if attaching to a nesting type container (vfio_iommu.nesting bit
indicates it), it should check if it is compabile with prior domains in
the domain list. But if it is the first one attached to this container,
it's fine. is it good?

> > + ret = 0;
> > +
> > +out_unlock:
> > + if (format)
> > + *stage1_format = format;
> > + mutex_unlock(&iommu->lock);
> > + return ret;
> > +}
> > +
> > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> > struct vfio_info_cap *caps)
> > {
> > struct vfio_info_cap_header *header;
> > struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> > + u32 formats = 0;
> > + int ret;
> > +
> > + ret = vfio_iommu_get_stage1_format(iommu, &formats);
> > + if (ret) {
> > + pr_warn("Failed to get stage-1 format\n");
> > + return ret;
> > + }
> >
> > header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING,
> > 1);
> > @@ -2254,6 +2309,7 @@ static int
> > vfio_iommu_info_add_nesting_cap(struct
> > vfio_iommu *iommu,
> > /* nesting iommu type supports PASID requests (alloc/free) */
> > nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
> > }
> > + nesting_cap->stage1_formats = formats;
> >
> > return 0;
> > }
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index ed9881d..ebeaf3e 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting {
> > struct vfio_info_cap_header header;
> > #define VFIO_IOMMU_PASID_REQS (1 << 0)
> > __u32 nesting_capabilities;
> > + __u32 stage1_formats;
>
> do you plan to support multiple formats? If not, use singular name.

I do have such plan. e.g. it may be helpful when one day a platform can
support multiple formats.

Regards,
Yi Liu

2020-04-01 07:58:31

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

> From: Liu, Yi L <[email protected]>
> Sent: Wednesday, April 1, 2020 3:38 PM
>
> > From: Tian, Kevin <[email protected]>
> > Sent: Monday, March 30, 2020 7:49 PM
> > To: Liu, Yi L <[email protected]>; [email protected];
> > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> > userspace
> >
> > > From: Liu, Yi L <[email protected]>
> > > Sent: Sunday, March 22, 2020 8:32 PM
> > >
> > > From: Liu Yi L <[email protected]>
> > >
> > > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation)
> > > capability to userspace. Thus applications like QEMU could support
> > > vIOMMU with hardware's nesting translation capability for pass-through
> > > devices. Before setting up nesting translation for pass-through
> > > devices, QEMU and other applications need to learn the supported
> > > 1st-lvl/stage-1 translation structure format like page table format.
> > >
> > > Take vSVA (virtual Shared Virtual Addressing) as an example, to
> > > support vSVA for pass-through devices, QEMU setup nesting translation
> > > for pass- through devices. The guest page table are configured to host
> > > as 1st-lvl/
> > > stage-1 page table. Therefore, guest format should be compatible with
> > > host side.
> > >
> > > This patch reports the supported 1st-lvl/stage-1 page table format on
> > > the current platform to userspace. QEMU and other alike applications
> > > should use this format info when trying to setup IOMMU nesting
> > > translation on host IOMMU.
> > >
> > > Cc: Kevin Tian <[email protected]>
> > > CC: Jacob Pan <[email protected]>
> > > Cc: Alex Williamson <[email protected]>
> > > Cc: Eric Auger <[email protected]>
> > > Cc: Jean-Philippe Brucker <[email protected]>
> > > Signed-off-by: Liu Yi L <[email protected]>
> > > ---
> > > drivers/vfio/vfio_iommu_type1.c | 56
> > > +++++++++++++++++++++++++++++++++++++++++
> > > include/uapi/linux/vfio.h | 1 +
> > > 2 files changed, 57 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -2234,11 +2234,66 @@ static int
> vfio_iommu_type1_pasid_free(struct
> > > vfio_iommu *iommu,
> > > return ret;
> > > }
> > >
> > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> > > + u32 *stage1_format)
> > > +{
> > > + struct vfio_domain *domain;
> > > + u32 format = 0, tmp_format = 0;
> > > + int ret;
> > > +
> > > + mutex_lock(&iommu->lock);
> > > + if (list_empty(&iommu->domain_list)) {
> > > + mutex_unlock(&iommu->lock);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + list_for_each_entry(domain, &iommu->domain_list, next) {
> > > + if (iommu_domain_get_attr(domain->domain,
> > > + DOMAIN_ATTR_PASID_FORMAT, &format)) {
> > > + ret = -EINVAL;
> > > + format = 0;
> > > + goto out_unlock;
> > > + }
> > > + /*
> > > + * format is always non-zero (the first format is
> > > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
> > > + * the reason of potential different backed IOMMU
> > > + * formats, here we expect to have identical formats
> > > + * in the domain list, no mixed formats support.
> > > + * return -EINVAL to fail the attempt of setup
> > > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
> > > + * are detected.
> > > + */
> > > + if (tmp_format && tmp_format != format) {
> > > + ret = -EINVAL;
> > > + format = 0;
> > > + goto out_unlock;
> > > + }
> > > +
> > > + tmp_format = format;
> > > + }
> >
> > this path is invoked only in VFIO_IOMMU_GET_INFO path. If we don't want
> to
> > assume the status quo that one container holds only one device w/
> vIOMMU
> > (the prerequisite for vSVA), looks we also need check the format
> > compatibility when attaching a new group to this container?
>
> right. if attaching to a nesting type container (vfio_iommu.nesting bit
> indicates it), it should check if it is compabile with prior domains in
> the domain list. But if it is the first one attached to this container,
> it's fine. is it good?

yes, but my point is whether we should check the format compatibility
in the attach path...

>
> > > + ret = 0;
> > > +
> > > +out_unlock:
> > > + if (format)
> > > + *stage1_format = format;
> > > + mutex_unlock(&iommu->lock);
> > > + return ret;
> > > +}
> > > +
> > > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> > > struct vfio_info_cap *caps)
> > > {
> > > struct vfio_info_cap_header *header;
> > > struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> > > + u32 formats = 0;
> > > + int ret;
> > > +
> > > + ret = vfio_iommu_get_stage1_format(iommu, &formats);
> > > + if (ret) {
> > > + pr_warn("Failed to get stage-1 format\n");
> > > + return ret;
> > > + }
> > >
> > > header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> > > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING,
> > > 1);
> > > @@ -2254,6 +2309,7 @@ static int
> > > vfio_iommu_info_add_nesting_cap(struct
> > > vfio_iommu *iommu,
> > > /* nesting iommu type supports PASID requests (alloc/free)
> */
> > > nesting_cap->nesting_capabilities |=
> VFIO_IOMMU_PASID_REQS;
> > > }
> > > + nesting_cap->stage1_formats = formats;
> > >
> > > return 0;
> > > }
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index ed9881d..ebeaf3e 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting {
> > > struct vfio_info_cap_header header;
> > > #define VFIO_IOMMU_PASID_REQS (1 << 0)
> > > __u32 nesting_capabilities;
> > > + __u32 stage1_formats;
> >
> > do you plan to support multiple formats? If not, use singular name.
>
> I do have such plan. e.g. it may be helpful when one day a platform can
> support multiple formats.
>
> Regards,
> Yi Liu

2020-04-01 08:07:43

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

> From: Tian, Kevin <[email protected]>
> Sent: Wednesday, April 1, 2020 3:56 PM
> To: Liu, Yi L <[email protected]>; [email protected];
> Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
> > From: Liu, Yi L <[email protected]>
> > Sent: Wednesday, April 1, 2020 3:38 PM
> >
> > > From: Tian, Kevin <[email protected]>
> > > Sent: Monday, March 30, 2020 7:49 PM
> > > To: Liu, Yi L <[email protected]>; [email protected];
> > > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1
> > > format to userspace
> > >
> > > > From: Liu, Yi L <[email protected]>
> > > > Sent: Sunday, March 22, 2020 8:32 PM
> > > >
> > > > From: Liu Yi L <[email protected]>
> > > >
> > > > VFIO exposes IOMMU nesting translation (a.k.a dual stage
> > > > translation) capability to userspace. Thus applications like QEMU
> > > > could support vIOMMU with hardware's nesting translation
> > > > capability for pass-through devices. Before setting up nesting
> > > > translation for pass-through devices, QEMU and other applications
> > > > need to learn the supported
> > > > 1st-lvl/stage-1 translation structure format like page table format.
> > > >
> > > > Take vSVA (virtual Shared Virtual Addressing) as an example, to
> > > > support vSVA for pass-through devices, QEMU setup nesting
> > > > translation for pass- through devices. The guest page table are
> > > > configured to host as 1st-lvl/
> > > > stage-1 page table. Therefore, guest format should be compatible
> > > > with host side.
> > > >
> > > > This patch reports the supported 1st-lvl/stage-1 page table format
> > > > on the current platform to userspace. QEMU and other alike
> > > > applications should use this format info when trying to setup
> > > > IOMMU nesting translation on host IOMMU.
> > > >
> > > > Cc: Kevin Tian <[email protected]>
> > > > CC: Jacob Pan <[email protected]>
> > > > Cc: Alex Williamson <[email protected]>
> > > > Cc: Eric Auger <[email protected]>
> > > > Cc: Jean-Philippe Brucker <[email protected]>
> > > > Signed-off-by: Liu Yi L <[email protected]>
> > > > ---
> > > > drivers/vfio/vfio_iommu_type1.c | 56
> > > > +++++++++++++++++++++++++++++++++++++++++
> > > > include/uapi/linux/vfio.h | 1 +
> > > > 2 files changed, 57 insertions(+)
> > > >
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -2234,11 +2234,66 @@ static int
> > vfio_iommu_type1_pasid_free(struct
> > > > vfio_iommu *iommu,
> > > > return ret;
> > > > }
> > > >
> > > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> > > > + u32 *stage1_format)
> > > > +{
> > > > + struct vfio_domain *domain;
> > > > + u32 format = 0, tmp_format = 0;
> > > > + int ret;
> > > > +
> > > > + mutex_lock(&iommu->lock);
> > > > + if (list_empty(&iommu->domain_list)) {
> > > > + mutex_unlock(&iommu->lock);
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + list_for_each_entry(domain, &iommu->domain_list, next) {
> > > > + if (iommu_domain_get_attr(domain->domain,
> > > > + DOMAIN_ATTR_PASID_FORMAT, &format)) {
> > > > + ret = -EINVAL;
> > > > + format = 0;
> > > > + goto out_unlock;
> > > > + }
> > > > + /*
> > > > + * format is always non-zero (the first format is
> > > > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
> > > > + * the reason of potential different backed IOMMU
> > > > + * formats, here we expect to have identical formats
> > > > + * in the domain list, no mixed formats support.
> > > > + * return -EINVAL to fail the attempt of setup
> > > > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
> > > > + * are detected.
> > > > + */
> > > > + if (tmp_format && tmp_format != format) {
> > > > + ret = -EINVAL;
> > > > + format = 0;
> > > > + goto out_unlock;
> > > > + }
> > > > +
> > > > + tmp_format = format;
> > > > + }
> > >
> > > this path is invoked only in VFIO_IOMMU_GET_INFO path. If we don't
> > > want
> > to
> > > assume the status quo that one container holds only one device w/
> > vIOMMU
> > > (the prerequisite for vSVA), looks we also need check the format
> > > compatibility when attaching a new group to this container?
> >
> > right. if attaching to a nesting type container (vfio_iommu.nesting
> > bit indicates it), it should check if it is compabile with prior
> > domains in the domain list. But if it is the first one attached to
> > this container, it's fine. is it good?
>
> yes, but my point is whether we should check the format compatibility
> in the attach path...

I guess so. Assume a device has been attached to a container, and
userspace has fetched the nesting cap info. e.g. QEMU will have a
per-container structure to store the nesting info. And then attach
another device from a separate group, if its backend iommu supports
different formats, then it will be a problem. If userspace reads the
nesting cap info again, it will get a different value. It may affect
the prior attched device. If userspace doesn't refresh the nesting
info by re-fetch, then the newly added device may use a format which
its backend iommu doesn't support.

Although, the vendor specific iommu driver should ensure all devices
are backed by iommu units w/ same capability (e.g. format). But it
would better to have a check in vfio side all the same. how about your
opinion so far?:-)

Regards,
Yi Liu

2020-04-01 08:09:31

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

> From: Liu, Yi L <[email protected]>
> Sent: Wednesday, April 1, 2020 4:07 PM
>
> > From: Tian, Kevin <[email protected]>
> > Sent: Wednesday, April 1, 2020 3:56 PM
> > To: Liu, Yi L <[email protected]>; [email protected];
> > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> > userspace
> >
> > > From: Liu, Yi L <[email protected]>
> > > Sent: Wednesday, April 1, 2020 3:38 PM
> > >
> > > > From: Tian, Kevin <[email protected]>
> > > > Sent: Monday, March 30, 2020 7:49 PM
> > > > To: Liu, Yi L <[email protected]>; [email protected];
> > > > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1
> > > > format to userspace
> > > >
> > > > > From: Liu, Yi L <[email protected]>
> > > > > Sent: Sunday, March 22, 2020 8:32 PM
> > > > >
> > > > > From: Liu Yi L <[email protected]>
> > > > >
> > > > > VFIO exposes IOMMU nesting translation (a.k.a dual stage
> > > > > translation) capability to userspace. Thus applications like QEMU
> > > > > could support vIOMMU with hardware's nesting translation
> > > > > capability for pass-through devices. Before setting up nesting
> > > > > translation for pass-through devices, QEMU and other applications
> > > > > need to learn the supported
> > > > > 1st-lvl/stage-1 translation structure format like page table format.
> > > > >
> > > > > Take vSVA (virtual Shared Virtual Addressing) as an example, to
> > > > > support vSVA for pass-through devices, QEMU setup nesting
> > > > > translation for pass- through devices. The guest page table are
> > > > > configured to host as 1st-lvl/
> > > > > stage-1 page table. Therefore, guest format should be compatible
> > > > > with host side.
> > > > >
> > > > > This patch reports the supported 1st-lvl/stage-1 page table format
> > > > > on the current platform to userspace. QEMU and other alike
> > > > > applications should use this format info when trying to setup
> > > > > IOMMU nesting translation on host IOMMU.
> > > > >
> > > > > Cc: Kevin Tian <[email protected]>
> > > > > CC: Jacob Pan <[email protected]>
> > > > > Cc: Alex Williamson <[email protected]>
> > > > > Cc: Eric Auger <[email protected]>
> > > > > Cc: Jean-Philippe Brucker <[email protected]>
> > > > > Signed-off-by: Liu Yi L <[email protected]>
> > > > > ---
> > > > > drivers/vfio/vfio_iommu_type1.c | 56
> > > > > +++++++++++++++++++++++++++++++++++++++++
> > > > > include/uapi/linux/vfio.h | 1 +
> > > > > 2 files changed, 57 insertions(+)
> > > > >
> > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > > b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644
> > > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > @@ -2234,11 +2234,66 @@ static int
> > > vfio_iommu_type1_pasid_free(struct
> > > > > vfio_iommu *iommu,
> > > > > return ret;
> > > > > }
> > > > >
> > > > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> > > > > + u32 *stage1_format)
> > > > > +{
> > > > > + struct vfio_domain *domain;
> > > > > + u32 format = 0, tmp_format = 0;
> > > > > + int ret;
> > > > > +
> > > > > + mutex_lock(&iommu->lock);
> > > > > + if (list_empty(&iommu->domain_list)) {
> > > > > + mutex_unlock(&iommu->lock);
> > > > > + return -EINVAL;
> > > > > + }
> > > > > +
> > > > > + list_for_each_entry(domain, &iommu->domain_list, next) {
> > > > > + if (iommu_domain_get_attr(domain->domain,
> > > > > + DOMAIN_ATTR_PASID_FORMAT, &format)) {
> > > > > + ret = -EINVAL;
> > > > > + format = 0;
> > > > > + goto out_unlock;
> > > > > + }
> > > > > + /*
> > > > > + * format is always non-zero (the first format is
> > > > > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1).
> For
> > > > > + * the reason of potential different backed IOMMU
> > > > > + * formats, here we expect to have identical formats
> > > > > + * in the domain list, no mixed formats support.
> > > > > + * return -EINVAL to fail the attempt of setup
> > > > > + * VFIO_TYPE1_NESTING_IOMMU if non-identical
> formats
> > > > > + * are detected.
> > > > > + */
> > > > > + if (tmp_format && tmp_format != format) {
> > > > > + ret = -EINVAL;
> > > > > + format = 0;
> > > > > + goto out_unlock;
> > > > > + }
> > > > > +
> > > > > + tmp_format = format;
> > > > > + }
> > > >
> > > > this path is invoked only in VFIO_IOMMU_GET_INFO path. If we don't
> > > > want
> > > to
> > > > assume the status quo that one container holds only one device w/
> > > vIOMMU
> > > > (the prerequisite for vSVA), looks we also need check the format
> > > > compatibility when attaching a new group to this container?
> > >
> > > right. if attaching to a nesting type container (vfio_iommu.nesting
> > > bit indicates it), it should check if it is compabile with prior
> > > domains in the domain list. But if it is the first one attached to
> > > this container, it's fine. is it good?
> >
> > yes, but my point is whether we should check the format compatibility
> > in the attach path...
>
> I guess so. Assume a device has been attached to a container, and
> userspace has fetched the nesting cap info. e.g. QEMU will have a
> per-container structure to store the nesting info. And then attach
> another device from a separate group, if its backend iommu supports
> different formats, then it will be a problem. If userspace reads the
> nesting cap info again, it will get a different value. It may affect
> the prior attched device. If userspace doesn't refresh the nesting
> info by re-fetch, then the newly added device may use a format which
> its backend iommu doesn't support.
>
> Although, the vendor specific iommu driver should ensure all devices
> are backed by iommu units w/ same capability (e.g. format). But it
> would better to have a check in vfio side all the same. how about your
> opinion so far?:-)
>

I think so.

2020-04-01 08:09:57

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

> From: Tian, Kevin <[email protected]>
> Sent: Wednesday, April 1, 2020 4:09 PM
> Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
> > From: Liu, Yi L <[email protected]>
> > Sent: Wednesday, April 1, 2020 4:07 PM
> >
> > > From: Tian, Kevin <[email protected]>
> > > Sent: Wednesday, April 1, 2020 3:56 PM
> > > To: Liu, Yi L <[email protected]>; [email protected];
> > > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1
> > > format to userspace
> > >
> > > > From: Liu, Yi L <[email protected]>
> > > > Sent: Wednesday, April 1, 2020 3:38 PM
> > > >
> > > > > From: Tian, Kevin <[email protected]>
> > > > > Sent: Monday, March 30, 2020 7:49 PM
> > > > > To: Liu, Yi L <[email protected]>; [email protected];
> > > > > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1
> > > > > format to userspace
> > > > >
> > > > > > From: Liu, Yi L <[email protected]>
> > > > > > Sent: Sunday, March 22, 2020 8:32 PM
> > > > > >
> > > > > > From: Liu Yi L <[email protected]>
> > > > > >
> > > > > > VFIO exposes IOMMU nesting translation (a.k.a dual stage
> > > > > > translation) capability to userspace. Thus applications like
> > > > > > QEMU could support vIOMMU with hardware's nesting translation
> > > > > > capability for pass-through devices. Before setting up nesting
> > > > > > translation for pass-through devices, QEMU and other
> > > > > > applications need to learn the supported
> > > > > > 1st-lvl/stage-1 translation structure format like page table format.
> > > > > >
> > > > > > Take vSVA (virtual Shared Virtual Addressing) as an example,
> > > > > > to support vSVA for pass-through devices, QEMU setup nesting
> > > > > > translation for pass- through devices. The guest page table
> > > > > > are configured to host as 1st-lvl/
> > > > > > stage-1 page table. Therefore, guest format should be
> > > > > > compatible with host side.
> > > > > >
> > > > > > This patch reports the supported 1st-lvl/stage-1 page table
> > > > > > format on the current platform to userspace. QEMU and other
> > > > > > alike applications should use this format info when trying to
> > > > > > setup IOMMU nesting translation on host IOMMU.
> > > > > >
> > > > > > Cc: Kevin Tian <[email protected]>
> > > > > > CC: Jacob Pan <[email protected]>
> > > > > > Cc: Alex Williamson <[email protected]>
> > > > > > Cc: Eric Auger <[email protected]>
> > > > > > Cc: Jean-Philippe Brucker <[email protected]>
> > > > > > Signed-off-by: Liu Yi L <[email protected]>
> > > > > > ---
> > > > > > drivers/vfio/vfio_iommu_type1.c | 56
> > > > > > +++++++++++++++++++++++++++++++++++++++++
> > > > > > include/uapi/linux/vfio.h | 1 +
> > > > > > 2 files changed, 57 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > > > b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b
> > > > > > 100644
> > > > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > > @@ -2234,11 +2234,66 @@ static int
> > > > vfio_iommu_type1_pasid_free(struct
> > > > > > vfio_iommu *iommu,
> > > > > > return ret;
> > > > > > }
> > > > > >
> > > > > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> > > > > > + u32 *stage1_format)
> > > > > > +{
> > > > > > + struct vfio_domain *domain;
> > > > > > + u32 format = 0, tmp_format = 0;
> > > > > > + int ret;
> > > > > > +
> > > > > > + mutex_lock(&iommu->lock);
> > > > > > + if (list_empty(&iommu->domain_list)) {
> > > > > > + mutex_unlock(&iommu->lock);
> > > > > > + return -EINVAL;
> > > > > > + }
> > > > > > +
> > > > > > + list_for_each_entry(domain, &iommu->domain_list, next) {
> > > > > > + if (iommu_domain_get_attr(domain->domain,
> > > > > > + DOMAIN_ATTR_PASID_FORMAT, &format)) {
> > > > > > + ret = -EINVAL;
> > > > > > + format = 0;
> > > > > > + goto out_unlock;
> > > > > > + }
> > > > > > + /*
> > > > > > + * format is always non-zero (the first format is
> > > > > > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1).
> > For
> > > > > > + * the reason of potential different backed IOMMU
> > > > > > + * formats, here we expect to have identical formats
> > > > > > + * in the domain list, no mixed formats support.
> > > > > > + * return -EINVAL to fail the attempt of setup
> > > > > > + * VFIO_TYPE1_NESTING_IOMMU if non-identical
> > formats
> > > > > > + * are detected.
> > > > > > + */
> > > > > > + if (tmp_format && tmp_format != format) {
> > > > > > + ret = -EINVAL;
> > > > > > + format = 0;
> > > > > > + goto out_unlock;
> > > > > > + }
> > > > > > +
> > > > > > + tmp_format = format;
> > > > > > + }
> > > > >
> > > > > this path is invoked only in VFIO_IOMMU_GET_INFO path. If we
> > > > > don't want
> > > > to
> > > > > assume the status quo that one container holds only one device
> > > > > w/
> > > > vIOMMU
> > > > > (the prerequisite for vSVA), looks we also need check the format
> > > > > compatibility when attaching a new group to this container?
> > > >
> > > > right. if attaching to a nesting type container
> > > > (vfio_iommu.nesting bit indicates it), it should check if it is
> > > > compabile with prior domains in the domain list. But if it is the
> > > > first one attached to this container, it's fine. is it good?
> > >
> > > yes, but my point is whether we should check the format
> > > compatibility in the attach path...
> >
> > I guess so. Assume a device has been attached to a container, and
> > userspace has fetched the nesting cap info. e.g. QEMU will have a
> > per-container structure to store the nesting info. And then attach
> > another device from a separate group, if its backend iommu supports
> > different formats, then it will be a problem. If userspace reads the
> > nesting cap info again, it will get a different value. It may affect
> > the prior attched device. If userspace doesn't refresh the nesting
> > info by re-fetch, then the newly added device may use a format which
> > its backend iommu doesn't support.
> >
> > Although, the vendor specific iommu driver should ensure all devices
> > are backed by iommu units w/ same capability (e.g. format). But it
> > would better to have a check in vfio side all the same. how about your
> > opinion so far?:-)
> >
>
> I think so.

Thanks, :-)

2020-04-01 08:51:50

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Yi,
On 3/22/20 1:32 PM, Liu, Yi L wrote:
> From: Liu Yi L <[email protected]>
>
> VFIO exposes IOMMU nesting translation (a.k.a dual stage translation)
> capability to userspace. Thus applications like QEMU could support
> vIOMMU with hardware's nesting translation capability for pass-through
> devices. Before setting up nesting translation for pass-through devices,
> QEMU and other applications need to learn the supported 1st-lvl/stage-1
> translation structure format like page table format.
>
> Take vSVA (virtual Shared Virtual Addressing) as an example, to support
> vSVA for pass-through devices, QEMU setup nesting translation for pass-
> through devices. The guest page table are configured to host as 1st-lvl/
> stage-1 page table. Therefore, guest format should be compatible with
> host side.
>
> This patch reports the supported 1st-lvl/stage-1 page table format on the
> current platform to userspace. QEMU and other alike applications should
> use this format info when trying to setup IOMMU nesting translation on
> host IOMMU.
>
> Cc: Kevin Tian <[email protected]>
> CC: Jacob Pan <[email protected]>
> Cc: Alex Williamson <[email protected]>
> Cc: Eric Auger <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> ---
> drivers/vfio/vfio_iommu_type1.c | 56 +++++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 1 +
> 2 files changed, 57 insertions(+)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 9aa2a67..82a9e0b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
> return ret;
> }
>
> +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> + u32 *stage1_format)
vfio_pasid_format() to be homogeneous with vfio_pgsize_bitmap() which
does the same kind of enumeration of the vfio_iommu domains
> +{
> + struct vfio_domain *domain;
> + u32 format = 0, tmp_format = 0;
> + int ret;
ret = -EINVAL;
> +
> + mutex_lock(&iommu->lock);
> + if (list_empty(&iommu->domain_list)) {
goto out_unlock;
> + mutex_unlock(&iommu->lock);
> + return -EINVAL;
> + }
> +
> + list_for_each_entry(domain, &iommu->domain_list, next) {
> + if (iommu_domain_get_attr(domain->domain,
> + DOMAIN_ATTR_PASID_FORMAT, &format)) {
I can find DOMAIN_ATTR_PASID_FORMAT in Jacob's v9 but not in v10
> + ret = -EINVAL;
could be removed
> + format = 0;
> + goto out_unlock;
> + }
> + /*
> + * format is always non-zero (the first format is
> + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
> + * the reason of potential different backed IOMMU
> + * formats, here we expect to have identical formats
> + * in the domain list, no mixed formats support.
> + * return -EINVAL to fail the attempt of setup
> + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
> + * are detected.
> + */
> + if (tmp_format && tmp_format != format) {
> + ret = -EINVAL;
could be removed
> + format = 0;
> + goto out_unlock;
> + }
> +
> + tmp_format = format;
> + }
> + ret = 0;
> +
> +out_unlock:
> + if (format)
if (!ret) ? then you can remove the format = 0 in case of error.
> + *stage1_format = format;
> + mutex_unlock(&iommu->lock);
> + return ret;
> +}
> +
> static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> struct vfio_info_cap *caps)
> {
> struct vfio_info_cap_header *header;
> struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> + u32 formats = 0;
> + int ret;
> +
> + ret = vfio_iommu_get_stage1_format(iommu, &formats);
> + if (ret) {
> + pr_warn("Failed to get stage-1 format\n");
trace triggered by userspace to be removed?
> + return ret;
> + }
>
> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> /* nesting iommu type supports PASID requests (alloc/free) */
> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
What is the meaning for ARM?
> }
> + nesting_cap->stage1_formats = formats;
as spotted by Kevin, since a single format is supported, rename
>
> return 0;
> }
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ed9881d..ebeaf3e 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting {
> struct vfio_info_cap_header header;
> #define VFIO_IOMMU_PASID_REQS (1 << 0)
> __u32 nesting_capabilities;
> + __u32 stage1_formats;
> };
>
> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>
Thanks

Eric

2020-04-01 13:04:34

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Yi,

On 4/1/20 2:51 PM, Liu, Yi L wrote:
> Hi Eric,
>
>> From: Auger Eric <[email protected]>
>> Sent: Wednesday, April 1, 2020 4:51 PM
>> To: Liu, Yi L <[email protected]>; [email protected]
>> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
>> userspace
>>
>> Hi Yi,
>> On 3/22/20 1:32 PM, Liu, Yi L wrote:
>>> From: Liu Yi L <[email protected]>
>>>
>>> VFIO exposes IOMMU nesting translation (a.k.a dual stage translation)
>>> capability to userspace. Thus applications like QEMU could support
>>> vIOMMU with hardware's nesting translation capability for pass-through
>>> devices. Before setting up nesting translation for pass-through devices,
>>> QEMU and other applications need to learn the supported 1st-lvl/stage-1
>>> translation structure format like page table format.
>>>
>>> Take vSVA (virtual Shared Virtual Addressing) as an example, to support
>>> vSVA for pass-through devices, QEMU setup nesting translation for pass-
>>> through devices. The guest page table are configured to host as 1st-lvl/
>>> stage-1 page table. Therefore, guest format should be compatible with
>>> host side.
>>>
>>> This patch reports the supported 1st-lvl/stage-1 page table format on the
>>> current platform to userspace. QEMU and other alike applications should
>>> use this format info when trying to setup IOMMU nesting translation on
>>> host IOMMU.
>>>
>>> Cc: Kevin Tian <[email protected]>
>>> CC: Jacob Pan <[email protected]>
>>> Cc: Alex Williamson <[email protected]>
>>> Cc: Eric Auger <[email protected]>
>>> Cc: Jean-Philippe Brucker <[email protected]>
>>> Signed-off-by: Liu Yi L <[email protected]>
>>> ---
>>> drivers/vfio/vfio_iommu_type1.c | 56
>> +++++++++++++++++++++++++++++++++++++++++
>>> include/uapi/linux/vfio.h | 1 +
>>> 2 files changed, 57 insertions(+)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 9aa2a67..82a9e0b 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct
>> vfio_iommu *iommu,
>>> return ret;
>>> }
>>>
>>> +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
>>> + u32 *stage1_format)
>> vfio_pasid_format() to be homogeneous with vfio_pgsize_bitmap() which
>> does the same kind of enumeration of the vfio_iommu domains
>
> yes, similar.
>
>>> +{
>>> + struct vfio_domain *domain;
>>> + u32 format = 0, tmp_format = 0;
>>> + int ret;
>> ret = -EINVAL;
>
> got it.
>
>>> +
>>> + mutex_lock(&iommu->lock);
>>> + if (list_empty(&iommu->domain_list)) {
>> goto out_unlock;
>
> right.
>>> + mutex_unlock(&iommu->lock);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + list_for_each_entry(domain, &iommu->domain_list, next) {
>>> + if (iommu_domain_get_attr(domain->domain,
>>> + DOMAIN_ATTR_PASID_FORMAT, &format)) {
>> I can find DOMAIN_ATTR_PASID_FORMAT in Jacob's v9 but not in v10
>
> oops, I guess he somehow missed. you may find it in below link.
>
> https://github.com/luxis1999/linux-vsva/commit/bf14b11a12f74d58ad3ee626a5d891de395082eb
>
>>> + ret = -EINVAL;
>> could be removed
>
> sure.
>
>>> + format = 0;
>>> + goto out_unlock;
>>> + }
>>> + /*
>>> + * format is always non-zero (the first format is
>>> + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
>>> + * the reason of potential different backed IOMMU
>>> + * formats, here we expect to have identical formats
>>> + * in the domain list, no mixed formats support.
>>> + * return -EINVAL to fail the attempt of setup
>>> + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
>>> + * are detected.
>>> + */
>>> + if (tmp_format && tmp_format != format) {
>>> + ret = -EINVAL;
>> could be removed
>
> got it.
>
>>> + format = 0;
>>> + goto out_unlock;
>>> + }
>>> +
>>> + tmp_format = format;
>>> + }
>>> + ret = 0;
>>> +
>>> +out_unlock:
>>> + if (format)
>> if (!ret) ? then you can remove the format = 0 in case of error.
>
> oh, yes.
>
>>> + *stage1_format = format;
>>> + mutex_unlock(&iommu->lock);
>>> + return ret;
>>> +}
>>> +
>>> static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
>>> struct vfio_info_cap *caps)
>>> {
>>> struct vfio_info_cap_header *header;
>>> struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
>>> + u32 formats = 0;
>>> + int ret;
>>> +
>>> + ret = vfio_iommu_get_stage1_format(iommu, &formats);
>>> + if (ret) {
>>> + pr_warn("Failed to get stage-1 format\n");
>> trace triggered by userspace to be removed?
>
> sure.
>
>>> + return ret;
>>> + }
>>>
>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
>>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
>>> @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct
>> vfio_iommu *iommu,
>>> /* nesting iommu type supports PASID requests (alloc/free) */
>>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
>> What is the meaning for ARM?
>
> I think it's just a software capability exposed to userspace, on
> userspace side, it has a choice to use it or not. :-) The reason
> define it and report it in cap nesting is that I'd like to make
> the pasid alloc/free be available just for IOMMU with type
> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not
> good for ARM. We can find a proper way to report the availability.

Well it is more a question for jean-Philippe. Do we have a system wide
PASID allocation on ARM?

Thanks

Eric
>
>>> }
>>> + nesting_cap->stage1_formats = formats;
>> as spotted by Kevin, since a single format is supported, rename
>
> ok, I was believing it may be possible on ARM or so. :-) will
> rename it.
>
> I'll refine the patch per your above comments.
>
> Regards,
> Yi Liu
>

2020-04-01 13:35:58

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Eric,

> From: Auger Eric <[email protected]>
> Sent: Wednesday, April 1, 2020 4:51 PM
> To: Liu, Yi L <[email protected]>; [email protected]
> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
> Hi Yi,
> On 3/22/20 1:32 PM, Liu, Yi L wrote:
> > From: Liu Yi L <[email protected]>
> >
> > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation)
> > capability to userspace. Thus applications like QEMU could support
> > vIOMMU with hardware's nesting translation capability for pass-through
> > devices. Before setting up nesting translation for pass-through devices,
> > QEMU and other applications need to learn the supported 1st-lvl/stage-1
> > translation structure format like page table format.
> >
> > Take vSVA (virtual Shared Virtual Addressing) as an example, to support
> > vSVA for pass-through devices, QEMU setup nesting translation for pass-
> > through devices. The guest page table are configured to host as 1st-lvl/
> > stage-1 page table. Therefore, guest format should be compatible with
> > host side.
> >
> > This patch reports the supported 1st-lvl/stage-1 page table format on the
> > current platform to userspace. QEMU and other alike applications should
> > use this format info when trying to setup IOMMU nesting translation on
> > host IOMMU.
> >
> > Cc: Kevin Tian <[email protected]>
> > CC: Jacob Pan <[email protected]>
> > Cc: Alex Williamson <[email protected]>
> > Cc: Eric Auger <[email protected]>
> > Cc: Jean-Philippe Brucker <[email protected]>
> > Signed-off-by: Liu Yi L <[email protected]>
> > ---
> > drivers/vfio/vfio_iommu_type1.c | 56
> +++++++++++++++++++++++++++++++++++++++++
> > include/uapi/linux/vfio.h | 1 +
> > 2 files changed, 57 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 9aa2a67..82a9e0b 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct
> vfio_iommu *iommu,
> > return ret;
> > }
> >
> > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> > + u32 *stage1_format)
> vfio_pasid_format() to be homogeneous with vfio_pgsize_bitmap() which
> does the same kind of enumeration of the vfio_iommu domains

yes, similar.

> > +{
> > + struct vfio_domain *domain;
> > + u32 format = 0, tmp_format = 0;
> > + int ret;
> ret = -EINVAL;

got it.

> > +
> > + mutex_lock(&iommu->lock);
> > + if (list_empty(&iommu->domain_list)) {
> goto out_unlock;

right.
> > + mutex_unlock(&iommu->lock);
> > + return -EINVAL;
> > + }
> > +
> > + list_for_each_entry(domain, &iommu->domain_list, next) {
> > + if (iommu_domain_get_attr(domain->domain,
> > + DOMAIN_ATTR_PASID_FORMAT, &format)) {
> I can find DOMAIN_ATTR_PASID_FORMAT in Jacob's v9 but not in v10

oops, I guess he somehow missed. you may find it in below link.

https://github.com/luxis1999/linux-vsva/commit/bf14b11a12f74d58ad3ee626a5d891de395082eb

> > + ret = -EINVAL;
> could be removed

sure.

> > + format = 0;
> > + goto out_unlock;
> > + }
> > + /*
> > + * format is always non-zero (the first format is
> > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
> > + * the reason of potential different backed IOMMU
> > + * formats, here we expect to have identical formats
> > + * in the domain list, no mixed formats support.
> > + * return -EINVAL to fail the attempt of setup
> > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
> > + * are detected.
> > + */
> > + if (tmp_format && tmp_format != format) {
> > + ret = -EINVAL;
> could be removed

got it.

> > + format = 0;
> > + goto out_unlock;
> > + }
> > +
> > + tmp_format = format;
> > + }
> > + ret = 0;
> > +
> > +out_unlock:
> > + if (format)
> if (!ret) ? then you can remove the format = 0 in case of error.

oh, yes.

> > + *stage1_format = format;
> > + mutex_unlock(&iommu->lock);
> > + return ret;
> > +}
> > +
> > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> > struct vfio_info_cap *caps)
> > {
> > struct vfio_info_cap_header *header;
> > struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> > + u32 formats = 0;
> > + int ret;
> > +
> > + ret = vfio_iommu_get_stage1_format(iommu, &formats);
> > + if (ret) {
> > + pr_warn("Failed to get stage-1 format\n");
> trace triggered by userspace to be removed?

sure.

> > + return ret;
> > + }
> >
> > header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> > @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct
> vfio_iommu *iommu,
> > /* nesting iommu type supports PASID requests (alloc/free) */
> > nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
> What is the meaning for ARM?

I think it's just a software capability exposed to userspace, on
userspace side, it has a choice to use it or not. :-) The reason
define it and report it in cap nesting is that I'd like to make
the pasid alloc/free be available just for IOMMU with type
VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not
good for ARM. We can find a proper way to report the availability.

> > }
> > + nesting_cap->stage1_formats = formats;
> as spotted by Kevin, since a single format is supported, rename

ok, I was believing it may be possible on ARM or so. :-) will
rename it.

I'll refine the patch per your above comments.

Regards,
Yi Liu

2020-04-02 19:47:41

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

On Sun, 22 Mar 2020 05:32:02 -0700
"Liu, Yi L" <[email protected]> wrote:

> From: Liu Yi L <[email protected]>
>
> VFIO exposes IOMMU nesting translation (a.k.a dual stage translation)
> capability to userspace. Thus applications like QEMU could support
> vIOMMU with hardware's nesting translation capability for pass-through
> devices. Before setting up nesting translation for pass-through devices,
> QEMU and other applications need to learn the supported 1st-lvl/stage-1
> translation structure format like page table format.
>
> Take vSVA (virtual Shared Virtual Addressing) as an example, to support
> vSVA for pass-through devices, QEMU setup nesting translation for pass-
> through devices. The guest page table are configured to host as 1st-lvl/
> stage-1 page table. Therefore, guest format should be compatible with
> host side.
>
> This patch reports the supported 1st-lvl/stage-1 page table format on the
> current platform to userspace. QEMU and other alike applications should
> use this format info when trying to setup IOMMU nesting translation on
> host IOMMU.
>
> Cc: Kevin Tian <[email protected]>
> CC: Jacob Pan <[email protected]>
> Cc: Alex Williamson <[email protected]>
> Cc: Eric Auger <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> ---
> drivers/vfio/vfio_iommu_type1.c | 56 +++++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 1 +
> 2 files changed, 57 insertions(+)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 9aa2a67..82a9e0b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
> return ret;
> }
>
> +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> + u32 *stage1_format)
> +{
> + struct vfio_domain *domain;
> + u32 format = 0, tmp_format = 0;
> + int ret;
> +
> + mutex_lock(&iommu->lock);
> + if (list_empty(&iommu->domain_list)) {
> + mutex_unlock(&iommu->lock);
> + return -EINVAL;
> + }
> +
> + list_for_each_entry(domain, &iommu->domain_list, next) {
> + if (iommu_domain_get_attr(domain->domain,
> + DOMAIN_ATTR_PASID_FORMAT, &format)) {
> + ret = -EINVAL;
> + format = 0;
> + goto out_unlock;
> + }
> + /*
> + * format is always non-zero (the first format is
> + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
> + * the reason of potential different backed IOMMU
> + * formats, here we expect to have identical formats
> + * in the domain list, no mixed formats support.
> + * return -EINVAL to fail the attempt of setup
> + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
> + * are detected.
> + */
> + if (tmp_format && tmp_format != format) {
> + ret = -EINVAL;
> + format = 0;
> + goto out_unlock;
> + }
> +
> + tmp_format = format;
> + }
> + ret = 0;
> +
> +out_unlock:
> + if (format)
> + *stage1_format = format;
> + mutex_unlock(&iommu->lock);
> + return ret;
> +}
> +
> static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> struct vfio_info_cap *caps)
> {
> struct vfio_info_cap_header *header;
> struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> + u32 formats = 0;
> + int ret;
> +
> + ret = vfio_iommu_get_stage1_format(iommu, &formats);
> + if (ret) {
> + pr_warn("Failed to get stage-1 format\n");
> + return ret;

Looks like this generates a warning and causes the iommu_get_info ioctl
to fail if the hardware doesn't support the pasid format attribute, or
the domain list is empty. This breaks users on existing hardware.

> + }
>
> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> /* nesting iommu type supports PASID requests (alloc/free) */
> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
> }
> + nesting_cap->stage1_formats = formats;
>
> return 0;
> }
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ed9881d..ebeaf3e 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting {
> struct vfio_info_cap_header header;
> #define VFIO_IOMMU_PASID_REQS (1 << 0)
> __u32 nesting_capabilities;
> + __u32 stage1_formats;
> };
>
> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)

2020-04-03 08:24:33

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
> >>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> >>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> >>> @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct
> >> vfio_iommu *iommu,
> >>> /* nesting iommu type supports PASID requests (alloc/free) */
> >>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
> >> What is the meaning for ARM?
> >
> > I think it's just a software capability exposed to userspace, on
> > userspace side, it has a choice to use it or not. :-) The reason
> > define it and report it in cap nesting is that I'd like to make
> > the pasid alloc/free be available just for IOMMU with type
> > VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not
> > good for ARM. We can find a proper way to report the availability.
>
> Well it is more a question for jean-Philippe. Do we have a system wide
> PASID allocation on ARM?

We don't, the PASID spaces are per-VM on Arm, so this function should
consult the IOMMU driver before setting flags. As you said on patch 3,
nested doesn't necessarily imply PASID support. The SMMUv2 does not
support PASID but does support nesting stages 1 and 2 for the IOVA space.
SMMUv3 support of PASID depends on HW capabilities. So I think this needs
to be finer grained:

Does the container support:
* VFIO_IOMMU_PASID_REQUEST?
-> Yes for VT-d 3
-> No for Arm SMMU
* VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
-> Yes for VT-d 3
-> Sometimes for SMMUv2
-> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
PASID tables being in GPA space.)
* VFIO_IOMMU_BIND_PASID_TABLE?
-> No for VT-d
-> Sometimes for SMMUv3

Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.


> >>> + nesting_cap->stage1_formats = formats;
> >> as spotted by Kevin, since a single format is supported, rename
> >
> > ok, I was believing it may be possible on ARM or so. :-) will
> > rename it.

Yes I don't think an u32 is going to cut it for Arm :( We need to describe
all sorts of capabilities for page and PASID tables (granules, GPA size,
ASID/PASID size, HW access/dirty, etc etc.) Just saying "Arm stage-1
format" wouldn't mean much. I guess we could have a secondary vendor
capability for these?

Thanks,
Jean

2020-04-03 12:35:24

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Alex,

> From: Alex Williamson <[email protected]>
> Sent: Friday, April 3, 2020 3:20 AM
> To: Liu, Yi L <[email protected]>
> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
> On Sun, 22 Mar 2020 05:32:02 -0700
> "Liu, Yi L" <[email protected]> wrote:
>
> > From: Liu Yi L <[email protected]>
> >
> > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation)
> > capability to userspace. Thus applications like QEMU could support
> > vIOMMU with hardware's nesting translation capability for pass-through
> > devices. Before setting up nesting translation for pass-through devices,
> > QEMU and other applications need to learn the supported 1st-lvl/stage-1
> > translation structure format like page table format.
> >
> > Take vSVA (virtual Shared Virtual Addressing) as an example, to support
> > vSVA for pass-through devices, QEMU setup nesting translation for pass-
> > through devices. The guest page table are configured to host as 1st-lvl/
> > stage-1 page table. Therefore, guest format should be compatible with
> > host side.
> >
> > This patch reports the supported 1st-lvl/stage-1 page table format on the
> > current platform to userspace. QEMU and other alike applications should
> > use this format info when trying to setup IOMMU nesting translation on
> > host IOMMU.
> >
> > Cc: Kevin Tian <[email protected]>
> > CC: Jacob Pan <[email protected]>
> > Cc: Alex Williamson <[email protected]>
> > Cc: Eric Auger <[email protected]>
> > Cc: Jean-Philippe Brucker <[email protected]>
> > Signed-off-by: Liu Yi L <[email protected]>
> > ---
> > drivers/vfio/vfio_iommu_type1.c | 56
> +++++++++++++++++++++++++++++++++++++++++
> > include/uapi/linux/vfio.h | 1 +
> > 2 files changed, 57 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 9aa2a67..82a9e0b 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct
> vfio_iommu *iommu,
> > return ret;
> > }
> >
> > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> > + u32 *stage1_format)
> > +{
> > + struct vfio_domain *domain;
> > + u32 format = 0, tmp_format = 0;
> > + int ret;
> > +
> > + mutex_lock(&iommu->lock);
> > + if (list_empty(&iommu->domain_list)) {
> > + mutex_unlock(&iommu->lock);
> > + return -EINVAL;
> > + }
> > +
> > + list_for_each_entry(domain, &iommu->domain_list, next) {
> > + if (iommu_domain_get_attr(domain->domain,
> > + DOMAIN_ATTR_PASID_FORMAT, &format)) {
> > + ret = -EINVAL;
> > + format = 0;
> > + goto out_unlock;
> > + }
> > + /*
> > + * format is always non-zero (the first format is
> > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
> > + * the reason of potential different backed IOMMU
> > + * formats, here we expect to have identical formats
> > + * in the domain list, no mixed formats support.
> > + * return -EINVAL to fail the attempt of setup
> > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
> > + * are detected.
> > + */
> > + if (tmp_format && tmp_format != format) {
> > + ret = -EINVAL;
> > + format = 0;
> > + goto out_unlock;
> > + }
> > +
> > + tmp_format = format;
> > + }
> > + ret = 0;
> > +
> > +out_unlock:
> > + if (format)
> > + *stage1_format = format;
> > + mutex_unlock(&iommu->lock);
> > + return ret;
> > +}
> > +
> > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> > struct vfio_info_cap *caps)
> > {
> > struct vfio_info_cap_header *header;
> > struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> > + u32 formats = 0;
> > + int ret;
> > +
> > + ret = vfio_iommu_get_stage1_format(iommu, &formats);
> > + if (ret) {
> > + pr_warn("Failed to get stage-1 format\n");
> > + return ret;
>
> Looks like this generates a warning and causes the iommu_get_info ioctl
> to fail if the hardware doesn't support the pasid format attribute, or
> the domain list is empty. This breaks users on existing hardware.

oops, yes, it should not fail anything as it is just an extended feature.
let me correct it.

Thanks,
Yi Liu

2020-04-07 09:45:20

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Jean,

> From: Jean-Philippe Brucker <[email protected]>
> Sent: Friday, April 3, 2020 4:23 PM
> To: Auger Eric <[email protected]>
> userspace
>
> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
> > >>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> > >>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> @@ -2254,6 +2309,7
> > >>> @@ static int vfio_iommu_info_add_nesting_cap(struct
> > >> vfio_iommu *iommu,
> > >>> /* nesting iommu type supports PASID requests (alloc/free) */
> > >>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
> > >> What is the meaning for ARM?
> > >
> > > I think it's just a software capability exposed to userspace, on
> > > userspace side, it has a choice to use it or not. :-) The reason
> > > define it and report it in cap nesting is that I'd like to make the
> > > pasid alloc/free be available just for IOMMU with type
> > > VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
> > > for ARM. We can find a proper way to report the availability.
> >
> > Well it is more a question for jean-Philippe. Do we have a system wide
> > PASID allocation on ARM?
>
> We don't, the PASID spaces are per-VM on Arm, so this function should consult the
> IOMMU driver before setting flags. As you said on patch 3, nested doesn't
> necessarily imply PASID support. The SMMUv2 does not support PASID but does
> support nesting stages 1 and 2 for the IOVA space.
> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to be
> finer grained:
>
> Does the container support:
> * VFIO_IOMMU_PASID_REQUEST?
> -> Yes for VT-d 3
> -> No for Arm SMMU
> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
> -> Yes for VT-d 3
> -> Sometimes for SMMUv2
> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
> PASID tables being in GPA space.)
> * VFIO_IOMMU_BIND_PASID_TABLE?
> -> No for VT-d
> -> Sometimes for SMMUv3
>
> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.

good summary. do you expect to see any

>
> > >>> + nesting_cap->stage1_formats = formats;
> > >> as spotted by Kevin, since a single format is supported, rename
> > >
> > > ok, I was believing it may be possible on ARM or so. :-) will rename
> > > it.
>
> Yes I don't think an u32 is going to cut it for Arm :( We need to describe all sorts of
> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW
> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I
> guess we could have a secondary vendor capability for these?

Actually, I'm wondering if we can define some formats to stands for a set of
capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
the capabilities.

Regards,
Yi Liu

2020-04-08 01:04:05

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

> From: Liu, Yi L
> Sent: Tuesday, April 7, 2020 5:43 PM
>
> > We don't, the PASID spaces are per-VM on Arm, so this function should
> > consult the IOMMU driver before setting flags. As you said on patch 3,
> > nested doesn't necessarily imply PASID support. The SMMUv2 does not
> > support PASID but does support nesting stages 1 and 2 for the IOVA space.
> > SMMUv3 support of PASID depends on HW capabilities. So I think this
> > needs to be finer grained:
> >
> > Does the container support:
> > * VFIO_IOMMU_PASID_REQUEST?
> > -> Yes for VT-d 3
> > -> No for Arm SMMU
> > * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
> > -> Yes for VT-d 3
> > -> Sometimes for SMMUv2
> > -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
> > PASID tables being in GPA space.)
> > * VFIO_IOMMU_BIND_PASID_TABLE?
> > -> No for VT-d
> > -> Sometimes for SMMUv3
> >
> > Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
>
> good summary. do you expect to see any
please ignore this message. I planned to ask if possible to report
VFIO_IOMMU_CACHE_INVALIDATE only (no bind support). But I stopped
typing it when I came to believe it's unnecessary to report it if
there is no bind support.

Regards,
Yi Liu

2020-04-08 10:33:29

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Yi,

On 4/7/20 11:43 AM, Liu, Yi L wrote:
> Hi Jean,
>
>> From: Jean-Philippe Brucker <[email protected]>
>> Sent: Friday, April 3, 2020 4:23 PM
>> To: Auger Eric <[email protected]>
>> userspace
>>
>> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
>>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
>>>>>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
>> @@ -2254,6 +2309,7
>>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct
>>>>> vfio_iommu *iommu,
>>>>>> /* nesting iommu type supports PASID requests (alloc/free) */
>>>>>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
>>>>> What is the meaning for ARM?
>>>>
>>>> I think it's just a software capability exposed to userspace, on
>>>> userspace side, it has a choice to use it or not. :-) The reason
>>>> define it and report it in cap nesting is that I'd like to make the
>>>> pasid alloc/free be available just for IOMMU with type
>>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
>>>> for ARM. We can find a proper way to report the availability.
>>>
>>> Well it is more a question for jean-Philippe. Do we have a system wide
>>> PASID allocation on ARM?
>>
>> We don't, the PASID spaces are per-VM on Arm, so this function should consult the
>> IOMMU driver before setting flags. As you said on patch 3, nested doesn't
>> necessarily imply PASID support. The SMMUv2 does not support PASID but does
>> support nesting stages 1 and 2 for the IOVA space.
>> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to be
>> finer grained:
>>
>> Does the container support:
>> * VFIO_IOMMU_PASID_REQUEST?
>> -> Yes for VT-d 3
>> -> No for Arm SMMU
>> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
>> -> Yes for VT-d 3
>> -> Sometimes for SMMUv2
>> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
>> PASID tables being in GPA space.)
>> * VFIO_IOMMU_BIND_PASID_TABLE?
>> -> No for VT-d
>> -> Sometimes for SMMUv3
>>
>> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
>
> good summary. do you expect to see any
>
>>
>>>>>> + nesting_cap->stage1_formats = formats;
>>>>> as spotted by Kevin, since a single format is supported, rename
>>>>
>>>> ok, I was believing it may be possible on ARM or so. :-) will rename
>>>> it.
>>
>> Yes I don't think an u32 is going to cut it for Arm :( We need to describe all sorts of
>> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW
>> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I
>> guess we could have a secondary vendor capability for these?
>
> Actually, I'm wondering if we can define some formats to stands for a set of
> capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
> page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
> the capabilities.

But eventually do we really need all those capability getters? I mean
can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
to detect any mismatch? Definitively the error handling may be heavier
on userspace but can't we manage. My fear is we end up with an overly
complex series. This capability getter may be interesting if we can
switch to a fallback implementation but here I guess we don't have any
fallback. With smmuv3 nested stage we don't have any fallback solution
either. For the versions, it is different because the userspace shall be
able to adapt (or not) to the max version supported by the kernel.

Thanks

Eric
>
> Regards,
> Yi Liu
>

2020-04-09 08:16:05

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
> Hi Yi,
>
> On 4/7/20 11:43 AM, Liu, Yi L wrote:
> > Hi Jean,
> >
> >> From: Jean-Philippe Brucker <[email protected]>
> >> Sent: Friday, April 3, 2020 4:23 PM
> >> To: Auger Eric <[email protected]>
> >> userspace
> >>
> >> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
> >>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> >>>>>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> >> @@ -2254,6 +2309,7
> >>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct
> >>>>> vfio_iommu *iommu,
> >>>>>> /* nesting iommu type supports PASID requests (alloc/free) */
> >>>>>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
> >>>>> What is the meaning for ARM?
> >>>>
> >>>> I think it's just a software capability exposed to userspace, on
> >>>> userspace side, it has a choice to use it or not. :-) The reason
> >>>> define it and report it in cap nesting is that I'd like to make the
> >>>> pasid alloc/free be available just for IOMMU with type
> >>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
> >>>> for ARM. We can find a proper way to report the availability.
> >>>
> >>> Well it is more a question for jean-Philippe. Do we have a system wide
> >>> PASID allocation on ARM?
> >>
> >> We don't, the PASID spaces are per-VM on Arm, so this function should consult the
> >> IOMMU driver before setting flags. As you said on patch 3, nested doesn't
> >> necessarily imply PASID support. The SMMUv2 does not support PASID but does
> >> support nesting stages 1 and 2 for the IOVA space.
> >> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to be
> >> finer grained:
> >>
> >> Does the container support:
> >> * VFIO_IOMMU_PASID_REQUEST?
> >> -> Yes for VT-d 3
> >> -> No for Arm SMMU
> >> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
> >> -> Yes for VT-d 3
> >> -> Sometimes for SMMUv2
> >> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
> >> PASID tables being in GPA space.)
> >> * VFIO_IOMMU_BIND_PASID_TABLE?
> >> -> No for VT-d
> >> -> Sometimes for SMMUv3
> >>
> >> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
> >
> > good summary. do you expect to see any
> >
> >>
> >>>>>> + nesting_cap->stage1_formats = formats;
> >>>>> as spotted by Kevin, since a single format is supported, rename
> >>>>
> >>>> ok, I was believing it may be possible on ARM or so. :-) will rename
> >>>> it.
> >>
> >> Yes I don't think an u32 is going to cut it for Arm :( We need to describe all sorts of
> >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW
> >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I
> >> guess we could have a secondary vendor capability for these?
> >
> > Actually, I'm wondering if we can define some formats to stands for a set of
> > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
> > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
> > the capabilities.
>
> But eventually do we really need all those capability getters? I mean
> can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
> to detect any mismatch? Definitively the error handling may be heavier
> on userspace but can't we manage.

I think we need to present these capabilities at boot time, long before
the guest triggers a bind(). For example if the host SMMU doesn't support
16-bit ASID, we need to communicate that to the guest using vSMMU ID
registers or PROBE properties. Otherwise a bind() will succeed, but if the
guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events
which we'll inject into the guest, for no apparent reason from their
perspective.

In addition some VMMs may have fallbacks if shared page tables are not
available. They could fall back to a MAP/UNMAP interface, or simply not
present a vIOMMU to the guest.

Thanks,
Jean

> My fear is we end up with an overly
> complex series. This capability getter may be interesting if we can
> switch to a fallback implementation but here I guess we don't have any
> fallback. With smmuv3 nested stage we don't have any fallback solution
> either. For the versions, it is different because the userspace shall be
> able to adapt (or not) to the max version supported by the kernel.
>
> Thanks
>
> Eric
> >
> > Regards,
> > Yi Liu
> >
>

2020-04-09 09:04:15

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Jean,

On 4/9/20 10:14 AM, Jean-Philippe Brucker wrote:
> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
>> Hi Yi,
>>
>> On 4/7/20 11:43 AM, Liu, Yi L wrote:
>>> Hi Jean,
>>>
>>>> From: Jean-Philippe Brucker <[email protected]>
>>>> Sent: Friday, April 3, 2020 4:23 PM
>>>> To: Auger Eric <[email protected]>
>>>> userspace
>>>>
>>>> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
>>>>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
>>>>>>>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
>>>> @@ -2254,6 +2309,7
>>>>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct
>>>>>>> vfio_iommu *iommu,
>>>>>>>> /* nesting iommu type supports PASID requests (alloc/free) */
>>>>>>>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
>>>>>>> What is the meaning for ARM?
>>>>>>
>>>>>> I think it's just a software capability exposed to userspace, on
>>>>>> userspace side, it has a choice to use it or not. :-) The reason
>>>>>> define it and report it in cap nesting is that I'd like to make the
>>>>>> pasid alloc/free be available just for IOMMU with type
>>>>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
>>>>>> for ARM. We can find a proper way to report the availability.
>>>>>
>>>>> Well it is more a question for jean-Philippe. Do we have a system wide
>>>>> PASID allocation on ARM?
>>>>
>>>> We don't, the PASID spaces are per-VM on Arm, so this function should consult the
>>>> IOMMU driver before setting flags. As you said on patch 3, nested doesn't
>>>> necessarily imply PASID support. The SMMUv2 does not support PASID but does
>>>> support nesting stages 1 and 2 for the IOVA space.
>>>> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to be
>>>> finer grained:
>>>>
>>>> Does the container support:
>>>> * VFIO_IOMMU_PASID_REQUEST?
>>>> -> Yes for VT-d 3
>>>> -> No for Arm SMMU
>>>> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
>>>> -> Yes for VT-d 3
>>>> -> Sometimes for SMMUv2
>>>> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
>>>> PASID tables being in GPA space.)
>>>> * VFIO_IOMMU_BIND_PASID_TABLE?
>>>> -> No for VT-d
>>>> -> Sometimes for SMMUv3
>>>>
>>>> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
>>>
>>> good summary. do you expect to see any
>>>
>>>>
>>>>>>>> + nesting_cap->stage1_formats = formats;
>>>>>>> as spotted by Kevin, since a single format is supported, rename
>>>>>>
>>>>>> ok, I was believing it may be possible on ARM or so. :-) will rename
>>>>>> it.
>>>>
>>>> Yes I don't think an u32 is going to cut it for Arm :( We need to describe all sorts of
>>>> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW
>>>> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I
>>>> guess we could have a secondary vendor capability for these?
>>>
>>> Actually, I'm wondering if we can define some formats to stands for a set of
>>> capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
>>> page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
>>> the capabilities.
>>
>> But eventually do we really need all those capability getters? I mean
>> can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
>> to detect any mismatch? Definitively the error handling may be heavier
>> on userspace but can't we manage.
>
> I think we need to present these capabilities at boot time, long before
> the guest triggers a bind(). For example if the host SMMU doesn't support
> 16-bit ASID, we need to communicate that to the guest using vSMMU ID
> registers or PROBE properties. Otherwise a bind() will succeed, but if the
> guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events
> which we'll inject into the guest, for no apparent reason from their
> perspective.
OK I understand this case as in this situation we may be able to change
the way to iommu is exposed to the guest.
>
> In addition some VMMs may have fallbacks if shared page tables are not
> available. They could fall back to a MAP/UNMAP interface, or simply not
> present a vIOMMU to the guest.
fair enough, there is a need for such capability checker in the mid
term. But this patch introduces the capability to check whether system
wide PASID alloc is supported and this may not be requested at that
stage for the whole vSVM integration?

Thanks

Eric
>
> Thanks,
> Jean
>
>> My fear is we end up with an overly
>> complex series. This capability getter may be interesting if we can
>> switch to a fallback implementation but here I guess we don't have any
>> fallback. With smmuv3 nested stage we don't have any fallback solution
>> either. For the versions, it is different because the userspace shall be
>> able to adapt (or not) to the max version supported by the kernel.
>>
>> Thanks
>>
>> Eric
>>>
>>> Regards,
>>> Yi Liu
>>>
>>
>

2020-04-09 12:49:05

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Jean,

> From: Jean-Philippe Brucker <[email protected]>
> Sent: Thursday, April 9, 2020 4:15 PM
> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
> > Hi Yi,
> >
> > On 4/7/20 11:43 AM, Liu, Yi L wrote:
> > > Hi Jean,
> > >
> > >> From: Jean-Philippe Brucker <[email protected]>
> > >> Sent: Friday, April 3, 2020 4:23 PM
> > >> To: Auger Eric <[email protected]>
> > >> userspace
> > >>
> > >> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
> > >>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> > >>>>>>
> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> > >> @@ -2254,6 +2309,7
> > >>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct
> > >>>>> vfio_iommu *iommu,
> > >>>>>> /* nesting iommu type supports PASID requests (alloc/free)
> */
> > >>>>>> nesting_cap->nesting_capabilities |=
> VFIO_IOMMU_PASID_REQS;
> > >>>>> What is the meaning for ARM?
> > >>>>
> > >>>> I think it's just a software capability exposed to userspace, on
> > >>>> userspace side, it has a choice to use it or not. :-) The reason
> > >>>> define it and report it in cap nesting is that I'd like to make the
> > >>>> pasid alloc/free be available just for IOMMU with type
> > >>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
> > >>>> for ARM. We can find a proper way to report the availability.
> > >>>
> > >>> Well it is more a question for jean-Philippe. Do we have a system wide
> > >>> PASID allocation on ARM?
> > >>
> > >> We don't, the PASID spaces are per-VM on Arm, so this function should consult
> the
> > >> IOMMU driver before setting flags. As you said on patch 3, nested doesn't
> > >> necessarily imply PASID support. The SMMUv2 does not support PASID but does
> > >> support nesting stages 1 and 2 for the IOVA space.
> > >> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to
> be
> > >> finer grained:
> > >>
> > >> Does the container support:
> > >> * VFIO_IOMMU_PASID_REQUEST?
> > >> -> Yes for VT-d 3
> > >> -> No for Arm SMMU
> > >> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
> > >> -> Yes for VT-d 3
> > >> -> Sometimes for SMMUv2
> > >> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
> > >> PASID tables being in GPA space.)
> > >> * VFIO_IOMMU_BIND_PASID_TABLE?
> > >> -> No for VT-d
> > >> -> Sometimes for SMMUv3
> > >>
> > >> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
> > >
> > > good summary. do you expect to see any
> > >
> > >>
> > >>>>>> + nesting_cap->stage1_formats = formats;
> > >>>>> as spotted by Kevin, since a single format is supported, rename
> > >>>>
> > >>>> ok, I was believing it may be possible on ARM or so. :-) will rename
> > >>>> it.
> > >>
> > >> Yes I don't think an u32 is going to cut it for Arm :( We need to
> > >> describe all sorts
> of
> > >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW
> > >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I
> > >> guess we could have a secondary vendor capability for these?
> > >
> > > Actually, I'm wondering if we can define some formats to stands for a set of
> > > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
> > > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
> > > the capabilities.
> >
> > But eventually do we really need all those capability getters? I mean
> > can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
> > to detect any mismatch? Definitively the error handling may be heavier
> > on userspace but can't we manage.
>
> I think we need to present these capabilities at boot time, long before
> the guest triggers a bind(). For example if the host SMMU doesn't support
> 16-bit ASID, we need to communicate that to the guest using vSMMU ID
> registers or PROBE properties. Otherwise a bind() will succeed, but if the
> guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events
> which we'll inject into the guest, for no apparent reason from their
> perspective.
>
> In addition some VMMs may have fallbacks if shared page tables are not
> available. They could fall back to a MAP/UNMAP interface, or simply not
> present a vIOMMU to the guest.
>

Based on the comments, I think it would be a need to report iommu caps
in detail. So I guess iommu uapi needs to provide something alike vfio
cap chain in iommu uapi. Please feel free let me know your thoughts. :-)

In vfio, we can define a cap as below:

struct vfio_iommu_type1_info_cap_nesting {
struct vfio_info_cap_header header;
__u64 iommu_model;
#define VFIO_IOMMU_PASID_REQS (1 << 0)
#define VFIO_IOMMU_BIND_GPASID (1 << 1)
#define VFIO_IOMMU_CACHE_INV (1 << 2)
__u32 nesting_capabilities;
__u32 pasid_bits;
#define VFIO_IOMMU_VENDOR_SUB_CAP (1 << 3)
__u32 flags;
__u32 data_size;
__u8 data[]; /*iommu info caps defined by iommu uapi */
};

VFIO needs new iommu APIs to ask iommu driver whether PASID/bind_gpasid/
cache_inv/bind_gpasid_table is available or not and also the pasid
bits. After that VFIO will ask iommu driver about the iommu_cap_info
and fill in the @data[] field.

iommu uapi:
struct iommu_info_cap_header {
__u16 id; /* Identifies capability */
__u16 version; /* Version specific to the capability ID */
__u32 next; /* Offset of next capability */
};

#define IOMMU_INFO_CAP_INTEL_VTD 1
struct iommu_info_cap_intel_vtd {
struct iommu_info_cap_header header;
__u32 vaddr_width; /* VA addr_width*/
__u32 ipaddr_width; /* IPA addr_width, input of SL page table */
/* same definition with @flags instruct iommu_gpasid_bind_data_vtd */
__u64 flags;
};

#define IOMMU_INFO_CAP_ARM_SMMUv3 2
struct iommu_info_cap_arm_smmuv3 {
struct iommu_info_cap_header header;
...
};

Regards,
Yi Liu

2020-04-10 03:29:56

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Yi,

On 4/9/20 2:47 PM, Liu, Yi L wrote:
> Hi Jean,
>
>> From: Jean-Philippe Brucker <[email protected]>
>> Sent: Thursday, April 9, 2020 4:15 PM
>> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
>> userspace
>>
>> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
>>> Hi Yi,
>>>
>>> On 4/7/20 11:43 AM, Liu, Yi L wrote:
>>>> Hi Jean,
>>>>
>>>>> From: Jean-Philippe Brucker <[email protected]>
>>>>> Sent: Friday, April 3, 2020 4:23 PM
>>>>> To: Auger Eric <[email protected]>
>>>>> userspace
>>>>>
>>>>> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
>>>>>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
>>>>>>>>>
>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
>>>>> @@ -2254,6 +2309,7
>>>>>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct
>>>>>>>> vfio_iommu *iommu,
>>>>>>>>> /* nesting iommu type supports PASID requests (alloc/free)
>> */
>>>>>>>>> nesting_cap->nesting_capabilities |=
>> VFIO_IOMMU_PASID_REQS;
>>>>>>>> What is the meaning for ARM?
>>>>>>>
>>>>>>> I think it's just a software capability exposed to userspace, on
>>>>>>> userspace side, it has a choice to use it or not. :-) The reason
>>>>>>> define it and report it in cap nesting is that I'd like to make the
>>>>>>> pasid alloc/free be available just for IOMMU with type
>>>>>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
>>>>>>> for ARM. We can find a proper way to report the availability.
>>>>>>
>>>>>> Well it is more a question for jean-Philippe. Do we have a system wide
>>>>>> PASID allocation on ARM?
>>>>>
>>>>> We don't, the PASID spaces are per-VM on Arm, so this function should consult
>> the
>>>>> IOMMU driver before setting flags. As you said on patch 3, nested doesn't
>>>>> necessarily imply PASID support. The SMMUv2 does not support PASID but does
>>>>> support nesting stages 1 and 2 for the IOVA space.
>>>>> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to
>> be
>>>>> finer grained:
>>>>>
>>>>> Does the container support:
>>>>> * VFIO_IOMMU_PASID_REQUEST?
>>>>> -> Yes for VT-d 3
>>>>> -> No for Arm SMMU
>>>>> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
>>>>> -> Yes for VT-d 3
>>>>> -> Sometimes for SMMUv2
>>>>> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
>>>>> PASID tables being in GPA space.)
>>>>> * VFIO_IOMMU_BIND_PASID_TABLE?
>>>>> -> No for VT-d
>>>>> -> Sometimes for SMMUv3
>>>>>
>>>>> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
>>>>
>>>> good summary. do you expect to see any
>>>>
>>>>>
>>>>>>>>> + nesting_cap->stage1_formats = formats;
>>>>>>>> as spotted by Kevin, since a single format is supported, rename
>>>>>>>
>>>>>>> ok, I was believing it may be possible on ARM or so. :-) will rename
>>>>>>> it.
>>>>>
>>>>> Yes I don't think an u32 is going to cut it for Arm :( We need to
>>>>> describe all sorts
>> of
>>>>> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW
>>>>> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I
>>>>> guess we could have a secondary vendor capability for these?
>>>>
>>>> Actually, I'm wondering if we can define some formats to stands for a set of
>>>> capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
>>>> page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
>>>> the capabilities.
>>>
>>> But eventually do we really need all those capability getters? I mean
>>> can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
>>> to detect any mismatch? Definitively the error handling may be heavier
>>> on userspace but can't we manage.
>>
>> I think we need to present these capabilities at boot time, long before
>> the guest triggers a bind(). For example if the host SMMU doesn't support
>> 16-bit ASID, we need to communicate that to the guest using vSMMU ID
>> registers or PROBE properties. Otherwise a bind() will succeed, but if the
>> guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events
>> which we'll inject into the guest, for no apparent reason from their
>> perspective.
>>
>> In addition some VMMs may have fallbacks if shared page tables are not
>> available. They could fall back to a MAP/UNMAP interface, or simply not
>> present a vIOMMU to the guest.
>>
>
> Based on the comments, I think it would be a need to report iommu caps
> in detail. So I guess iommu uapi needs to provide something alike vfio
> cap chain in iommu uapi. Please feel free let me know your thoughts. :-)

Yes to me it sounds sensible.
>
> In vfio, we can define a cap as below:
>
> struct vfio_iommu_type1_info_cap_nesting {
> struct vfio_info_cap_header header;
> __u64 iommu_model;
> #define VFIO_IOMMU_PASID_REQS (1 << 0)
I still think the name shall be changed
> #define VFIO_IOMMU_BIND_GPASID (1 << 1)
> #define VFIO_IOMMU_CACHE_INV (1 << 2)
this operation seems mandated as soon as we have a nested paging based
implementation?
> __u32 nesting_capabilities;
> __u32 pasid_bits;
> #define VFIO_IOMMU_VENDOR_SUB_CAP (1 << 3)
> __u32 flags;
> __u32 data_size;
> __u8 data[]; /*iommu info caps defined by iommu uapi */
> };
>
> VFIO needs new iommu APIs to ask iommu driver whether PASID/bind_gpasid/
> cache_inv/bind_gpasid_table is available or not and also the pasid
> bits. After that VFIO will ask iommu driver about the iommu_cap_info
> and fill in the @data[] field.
>
> iommu uapi:
> struct iommu_info_cap_header {
> __u16 id; /* Identifies capability */
> __u16 version; /* Version specific to the capability ID */
> __u32 next; /* Offset of next capability */
> };
>
> #define IOMMU_INFO_CAP_INTEL_VTD 1
> struct iommu_info_cap_intel_vtd {
> struct iommu_info_cap_header header;
> __u32 vaddr_width; /* VA addr_width*/
> __u32 ipaddr_width; /* IPA addr_width, input of SL page table */
> /* same definition with @flags instruct iommu_gpasid_bind_data_vtd */
> __u64 flags;
> };
>
> #define IOMMU_INFO_CAP_ARM_SMMUv3 2
> struct iommu_info_cap_arm_smmuv3 {
> struct iommu_info_cap_header header;
> ...
> };

Thanks

Eric
>
> Regards,
> Yi Liu
>

2020-04-10 03:58:47

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Eric,

> From: Auger Eric <[email protected]>
> Sent: Friday, April 10, 2020 11:28 AM
> To: Liu, Yi L <[email protected]>; Jean-Philippe Brucker <jean-
> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
> Hi Yi,
>
> On 4/9/20 2:47 PM, Liu, Yi L wrote:
> > Hi Jean,
> >
> >> From: Jean-Philippe Brucker <[email protected]>
> >> Sent: Thursday, April 9, 2020 4:15 PM
> >> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1
> >> format to userspace
> >>
> >> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
> >>> Hi Yi,
> >>>
> >>> On 4/7/20 11:43 AM, Liu, Yi L wrote:
> >>>> Hi Jean,
> >>>>
> >>>>> From: Jean-Philippe Brucker <[email protected]>
> >>>>> Sent: Friday, April 3, 2020 4:23 PM
> >>>>> To: Auger Eric <[email protected]> userspace
> >>>>>
> >>>>> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
> >>>>>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> >>>>>>>>>
> >> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> >>>>> @@ -2254,6 +2309,7
> >>>>>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct
> >>>>>>>> vfio_iommu *iommu,
> >>>>>>>>> /* nesting iommu type supports PASID requests
> (alloc/free)
> >> */
> >>>>>>>>> nesting_cap->nesting_capabilities |=
> >> VFIO_IOMMU_PASID_REQS;
> >>>>>>>> What is the meaning for ARM?
> >>>>>>>
> >>>>>>> I think it's just a software capability exposed to userspace, on
> >>>>>>> userspace side, it has a choice to use it or not. :-) The reason
> >>>>>>> define it and report it in cap nesting is that I'd like to make
> >>>>>>> the pasid alloc/free be available just for IOMMU with type
> >>>>>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not
> >>>>>>> good for ARM. We can find a proper way to report the availability.
> >>>>>>
> >>>>>> Well it is more a question for jean-Philippe. Do we have a system
> >>>>>> wide PASID allocation on ARM?
> >>>>>
> >>>>> We don't, the PASID spaces are per-VM on Arm, so this function
> >>>>> should consult
> >> the
> >>>>> IOMMU driver before setting flags. As you said on patch 3, nested
> >>>>> doesn't necessarily imply PASID support. The SMMUv2 does not
> >>>>> support PASID but does support nesting stages 1 and 2 for the IOVA space.
> >>>>> SMMUv3 support of PASID depends on HW capabilities. So I think
> >>>>> this needs to
> >> be
> >>>>> finer grained:
> >>>>>
> >>>>> Does the container support:
> >>>>> * VFIO_IOMMU_PASID_REQUEST?
> >>>>> -> Yes for VT-d 3
> >>>>> -> No for Arm SMMU
> >>>>> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
> >>>>> -> Yes for VT-d 3
> >>>>> -> Sometimes for SMMUv2
> >>>>> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler
> due to
> >>>>> PASID tables being in GPA space.)
> >>>>> * VFIO_IOMMU_BIND_PASID_TABLE?
> >>>>> -> No for VT-d
> >>>>> -> Sometimes for SMMUv3
> >>>>>
> >>>>> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
> >>>>
> >>>> good summary. do you expect to see any
> >>>>
> >>>>>
> >>>>>>>>> + nesting_cap->stage1_formats = formats;
> >>>>>>>> as spotted by Kevin, since a single format is supported, rename
> >>>>>>>
> >>>>>>> ok, I was believing it may be possible on ARM or so. :-) will
> >>>>>>> rename it.
> >>>>>
> >>>>> Yes I don't think an u32 is going to cut it for Arm :( We need to
> >>>>> describe all sorts
> >> of
> >>>>> capabilities for page and PASID tables (granules, GPA size,
> >>>>> ASID/PASID size, HW access/dirty, etc etc.) Just saying "Arm
> >>>>> stage-1 format" wouldn't mean much. I guess we could have a secondary
> vendor capability for these?
> >>>>
> >>>> Actually, I'm wondering if we can define some formats to stands for
> >>>> a set of capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may
> >>>> indicates the 1st level page table related caps (aw, a/d, SRE, EA
> >>>> and etc.). And vIOMMU can parse the capabilities.
> >>>
> >>> But eventually do we really need all those capability getters? I
> >>> mean can't we simply rely on the actual call to
> >>> VFIO_IOMMU_BIND_GUEST_PGTBL() to detect any mismatch? Definitively
> >>> the error handling may be heavier on userspace but can't we manage.
> >>
> >> I think we need to present these capabilities at boot time, long
> >> before the guest triggers a bind(). For example if the host SMMU
> >> doesn't support 16-bit ASID, we need to communicate that to the guest
> >> using vSMMU ID registers or PROBE properties. Otherwise a bind() will
> >> succeed, but if the guest uses 16-bit ASIDs in its CD, DMA will
> >> result in C_BAD_CD events which we'll inject into the guest, for no
> >> apparent reason from their perspective.
> >>
> >> In addition some VMMs may have fallbacks if shared page tables are
> >> not available. They could fall back to a MAP/UNMAP interface, or
> >> simply not present a vIOMMU to the guest.
> >>
> >
> > Based on the comments, I think it would be a need to report iommu caps
> > in detail. So I guess iommu uapi needs to provide something alike vfio
> > cap chain in iommu uapi. Please feel free let me know your thoughts.
> > :-)
>
> Yes to me it sounds sensible.
> >
> > In vfio, we can define a cap as below:
> >
> > struct vfio_iommu_type1_info_cap_nesting {
> > struct vfio_info_cap_header header;
> > __u64 iommu_model;
> > #define VFIO_IOMMU_PASID_REQS (1 << 0)
> I still think the name shall be changed

yes, I'll rename it per your suggestion.:-)

> > #define VFIO_IOMMU_BIND_GPASID (1 << 1)
> > #define VFIO_IOMMU_CACHE_INV (1 << 2)
> this operation seems mandated as soon as we have a nested paging based
> implementation?

oh, yes, should be. will remove it and comment in the code.

Regards,
Yi Liu

> > __u32 nesting_capabilities;
> > __u32 pasid_bits;
> > #define VFIO_IOMMU_VENDOR_SUB_CAP (1 << 3)
> > __u32 flags;
> > __u32 data_size;
> > __u8 data[]; /*iommu info caps defined by iommu uapi */
> > };
> >
> > VFIO needs new iommu APIs to ask iommu driver whether
> > PASID/bind_gpasid/ cache_inv/bind_gpasid_table is available or not and
> > also the pasid bits. After that VFIO will ask iommu driver about the
> > iommu_cap_info and fill in the @data[] field.
> >
> > iommu uapi:
> > struct iommu_info_cap_header {
> > __u16 id; /* Identifies capability */
> > __u16 version; /* Version specific to the capability ID */
> > __u32 next; /* Offset of next capability */
> > };
> >
> > #define IOMMU_INFO_CAP_INTEL_VTD 1
> > struct iommu_info_cap_intel_vtd {
> > struct iommu_info_cap_header header;
> > __u32 vaddr_width; /* VA addr_width*/
> > __u32 ipaddr_width; /* IPA addr_width, input of SL page table */
> > /* same definition with @flags instruct iommu_gpasid_bind_data_vtd */
> > __u64 flags;
> > };
> >
> > #define IOMMU_INFO_CAP_ARM_SMMUv3 2
> > struct iommu_info_cap_arm_smmuv3 {
> > struct iommu_info_cap_header header;
> > ...
> > };
>
> Thanks
>
> Eric
> >
> > Regards,
> > Yi Liu
> >

2020-04-10 12:32:04

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

Hi Jean, Eric,

> From: Liu, Yi L <[email protected]>
> Sent: Thursday, April 9, 2020 8:47 PM
> Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
[...]
> > > >>
> > > >> Yes I don't think an u32 is going to cut it for Arm :( We need to
> > > >> describe all sorts
> > of
> > > >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size,
> HW
> > > >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean
> much. I
> > > >> guess we could have a secondary vendor capability for these?
> > > >
> > > > Actually, I'm wondering if we can define some formats to stands for a set of
> > > > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st
> level
> > > > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
> > > > the capabilities.
> > >
> > > But eventually do we really need all those capability getters? I mean
> > > can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
> > > to detect any mismatch? Definitively the error handling may be heavier
> > > on userspace but can't we manage.
> >
> > I think we need to present these capabilities at boot time, long before
> > the guest triggers a bind(). For example if the host SMMU doesn't support
> > 16-bit ASID, we need to communicate that to the guest using vSMMU ID
> > registers or PROBE properties. Otherwise a bind() will succeed, but if the
> > guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events
> > which we'll inject into the guest, for no apparent reason from their
> > perspective.
> >
> > In addition some VMMs may have fallbacks if shared page tables are not
> > available. They could fall back to a MAP/UNMAP interface, or simply not
> > present a vIOMMU to the guest.
> >
>
> Based on the comments, I think it would be a need to report iommu caps
> in detail. So I guess iommu uapi needs to provide something alike vfio
> cap chain in iommu uapi. Please feel free let me know your thoughts. :-)

Consider more, I guess it may be better to start simpler. Cap chain suits
the case in which there are multiple caps. e.g. some vendor iommu driver
may want to report iommu capabilities via multiple caps. Actually, in VT-d
side, the host IOMMU capability could be reported in a single cap structure.
I'm not sure about ARM side. Will there be multiple iommu_info_caps for ARM?

> In vfio, we can define a cap as below:
>
> struct vfio_iommu_type1_info_cap_nesting {
> struct vfio_info_cap_header header;
> __u64 iommu_model;
> #define VFIO_IOMMU_PASID_REQS (1 << 0)
> #define VFIO_IOMMU_BIND_GPASID (1 << 1)
> #define VFIO_IOMMU_CACHE_INV (1 << 2)
> __u32 nesting_capabilities;
> __u32 pasid_bits;
> #define VFIO_IOMMU_VENDOR_SUB_CAP (1 << 3)
> __u32 flags;
> __u32 data_size;
> __u8 data[]; /*iommu info caps defined by iommu uapi */
> };
>

If iommu vendor driver only needs one cap structure to report hw
capability, then I think we needn't implement cap chain in iommu
uapi. The @data[] field could be determined by the @iommu_model
and @flags fields. This would be easier. thoughts?

> VFIO needs new iommu APIs to ask iommu driver whether PASID/bind_gpasid/
> cache_inv/bind_gpasid_table is available or not and also the pasid
> bits. After that VFIO will ask iommu driver about the iommu_cap_info
> and fill in the @data[] field.
>
> iommu uapi:
> struct iommu_info_cap_header {
> __u16 id; /* Identifies capability */
> __u16 version; /* Version specific to the capability ID */
> __u32 next; /* Offset of next capability */
> };
>
> #define IOMMU_INFO_CAP_INTEL_VTD 1
> struct iommu_info_cap_intel_vtd {
> struct iommu_info_cap_header header;
> __u32 vaddr_width; /* VA addr_width*/
> __u32 ipaddr_width; /* IPA addr_width, input of SL page table */
> /* same definition with @flags instruct iommu_gpasid_bind_data_vtd */
> __u64 flags;
> };
>
> #define IOMMU_INFO_CAP_ARM_SMMUv3 2
> struct iommu_info_cap_arm_smmuv3 {
> struct iommu_info_cap_header header;
> ...
> };

Regards,
Yi Liu