2015-07-21 17:44:19

by Gerald Schaefer

[permalink] [raw]
Subject: [RFC PATCH 0/1] vfio-pci/iommu: Detach iommu group on remove path

Hi,

during IOMMU API function testing on s390 I hit the following scenario:

After binding a device to vfio-pci, the user completes the VFIO_SET_IOMMU
ioctl and stops, see the sample C program below. Now the device is manually
removed via "echo 1 > /sys/bus/pci/devices/.../remove", which completes
instantly because the device is not considered in use in vfio_del_group_dev()
and ops->request will be skipped (probably because there was no
VFIO_GROUP_GET_DEVICE_FD ioctl so far, only the SET_IOMMU which only
triggered an "attach iommu group").

Although the SET_IOMMU ioctl triggered the attach_dev callback in the
underlying IOMMU API, removing the device in this way won't trigger the
detach_dev callback, neither during remove nor when the user program
continues with closing group/container.

On s390 this eventually leads to a kernel panic when binding the device
again to its non-vfio PCI driver, because of the missing arch-specific
cleanup in detach_dev. On x86 I couldn't trigger the panic but I could
verify that detach_dev also won't get called in this scenario, which
probably means at least some kind of memory leak there.

I think I found a way to fix this in vfio code by calling
vfio_group_try_dissolve_container() from within vfio_del_group_dev(),
but I'm not really familiar with this code and so there may be better
ways to fix it. Any thoughts?

Regards,
Gerald


Here is the sample C program to trigger the ioctl:

#include <stdio.h>
#include <fcntl.h>
#include <linux/vfio.h>

int main(void)
{
int container, group, rc;

container = open("/dev/vfio/vfio", O_RDWR);
if (container < 0) {
perror("open /dev/vfio/vfio\n");
return -1;
}

group = open("/dev/vfio/0", O_RDWR);
if (group < 0) {
perror("open /dev/vfio/0\n");
return -1;
}

rc = ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
if (rc) {
perror("ioctl VFIO_GROUP_SET_CONTAINER\n");
return -1;
}

rc = ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
if (rc) {
perror("ioctl VFIO_SET_IOMMU\n");
return -1;
}

printf("Try device remove...\n");
getchar();

close(group);
close(container);
return 0;
}


Gerald Schaefer (1):
vfio-pci/iommu: Detach iommu group on remove path

drivers/vfio/vfio.c | 3 +++
1 file changed, 3 insertions(+)

--
2.3.8


2015-07-21 17:44:25

by Gerald Schaefer

[permalink] [raw]
Subject: [RFC PATCH 1/1] vfio-pci/iommu: Detach iommu group on remove path

When a user completes the VFIO_SET_IOMMU ioctl and the vfio-pci device is
removed thereafter (before any other ioctl like VFIO_GROUP_GET_DEVICE_FD),
then the detach_dev callback of the underlying IOMMU API is never called.

This patch adds a call to vfio_group_try_dissolve_container() to the remove
path, which will trigger the missing detach_dev callback in this scenario.

Signed-off-by: Gerald Schaefer <[email protected]>
---
drivers/vfio/vfio.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 2fb29df..9c5c784 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -711,6 +711,8 @@ static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
return true;
}

+static void vfio_group_try_dissolve_container(struct vfio_group *group);
+
/*
* Decrement the device reference count and wait for the device to be
* removed. Open file descriptors for the device... */
@@ -785,6 +787,7 @@ void *vfio_del_group_dev(struct device *dev)
}
} while (ret <= 0);

+ vfio_group_try_dissolve_container(group);
vfio_group_put(group);

return device_data;
--
2.3.8

2015-07-22 16:54:38

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] vfio-pci/iommu: Detach iommu group on remove path

On Tue, 2015-07-21 at 19:44 +0200, Gerald Schaefer wrote:
> When a user completes the VFIO_SET_IOMMU ioctl and the vfio-pci device is
> removed thereafter (before any other ioctl like VFIO_GROUP_GET_DEVICE_FD),
> then the detach_dev callback of the underlying IOMMU API is never called.
>
> This patch adds a call to vfio_group_try_dissolve_container() to the remove
> path, which will trigger the missing detach_dev callback in this scenario.
>
> Signed-off-by: Gerald Schaefer <[email protected]>
> ---
> drivers/vfio/vfio.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 2fb29df..9c5c784 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -711,6 +711,8 @@ static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
> return true;
> }
>
> +static void vfio_group_try_dissolve_container(struct vfio_group *group);
> +
> /*
> * Decrement the device reference count and wait for the device to be
> * removed. Open file descriptors for the device... */
> @@ -785,6 +787,7 @@ void *vfio_del_group_dev(struct device *dev)
> }
> } while (ret <= 0);
>
> + vfio_group_try_dissolve_container(group);
> vfio_group_put(group);
>
> return device_data;


This won't work, vfio_group_try_dissolve_container() decrements
container_users, which an unused device is not. Imagine if we had more
than one device in the iommu group, one device is removed and the
container is dissolved despite the user holding a reference and other
viable devices remaining. Additionally, from an isolation perspective,
an unbind from vfio-pci should not pull the device out of the iommu
domain, it's part of the domain because it's not isolated and that
continues even after unbind.

I think what you want to do is detach a device from the iommu domain
only when it's being removed from iommu group, such as through
iommu_group_remove_device(). We already have a bit of an asymmetry
there as iommu_group_add_device() will add devices to the currently
active iommu domain for the group, but iommu_group_remove_device() does
not appear to do the reverse. Thanks,

Alex

2015-07-22 17:11:01

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] vfio-pci/iommu: Detach iommu group on remove path

On Wed, 2015-07-22 at 10:54 -0600, Alex Williamson wrote:
> On Tue, 2015-07-21 at 19:44 +0200, Gerald Schaefer wrote:
> > When a user completes the VFIO_SET_IOMMU ioctl and the vfio-pci device is
> > removed thereafter (before any other ioctl like VFIO_GROUP_GET_DEVICE_FD),
> > then the detach_dev callback of the underlying IOMMU API is never called.
> >
> > This patch adds a call to vfio_group_try_dissolve_container() to the remove
> > path, which will trigger the missing detach_dev callback in this scenario.
> >
> > Signed-off-by: Gerald Schaefer <[email protected]>
> > ---
> > drivers/vfio/vfio.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 2fb29df..9c5c784 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -711,6 +711,8 @@ static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
> > return true;
> > }
> >
> > +static void vfio_group_try_dissolve_container(struct vfio_group *group);
> > +
> > /*
> > * Decrement the device reference count and wait for the device to be
> > * removed. Open file descriptors for the device... */
> > @@ -785,6 +787,7 @@ void *vfio_del_group_dev(struct device *dev)
> > }
> > } while (ret <= 0);
> >
> > + vfio_group_try_dissolve_container(group);
> > vfio_group_put(group);
> >
> > return device_data;
>
>
> This won't work, vfio_group_try_dissolve_container() decrements
> container_users, which an unused device is not. Imagine if we had more
> than one device in the iommu group, one device is removed and the
> container is dissolved despite the user holding a reference and other
> viable devices remaining. Additionally, from an isolation perspective,
> an unbind from vfio-pci should not pull the device out of the iommu
> domain, it's part of the domain because it's not isolated and that
> continues even after unbind.
>
> I think what you want to do is detach a device from the iommu domain
> only when it's being removed from iommu group, such as through
> iommu_group_remove_device(). We already have a bit of an asymmetry
> there as iommu_group_add_device() will add devices to the currently
> active iommu domain for the group, but iommu_group_remove_device() does
> not appear to do the reverse. Thanks,

BTW, VT-d on x86 avoids a leak using its own notifier_block,
drivers/iommu/intel-iommu.c:device_notifier() catches
BUS_NOTIFY_REMOVED_DEVICE and removes the device from the domain (the
domain_exit() there is only used for non-IOMMU-API domains). It's
possible that's the only IOMMU driver that avoids a leak due to the
scenario you describe. Thanks,

Alex

2015-07-23 13:03:24

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] vfio-pci/iommu: Detach iommu group on remove path

On Wed, 22 Jul 2015 10:54:35 -0600
Alex Williamson <[email protected]> wrote:

> On Tue, 2015-07-21 at 19:44 +0200, Gerald Schaefer wrote:
> > When a user completes the VFIO_SET_IOMMU ioctl and the vfio-pci
> > device is removed thereafter (before any other ioctl like
> > VFIO_GROUP_GET_DEVICE_FD), then the detach_dev callback of the
> > underlying IOMMU API is never called.
> >
> > This patch adds a call to vfio_group_try_dissolve_container() to
> > the remove path, which will trigger the missing detach_dev callback
> > in this scenario.
> >
> > Signed-off-by: Gerald Schaefer <[email protected]>
> > ---
> > drivers/vfio/vfio.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 2fb29df..9c5c784 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -711,6 +711,8 @@ static bool vfio_dev_present(struct vfio_group
> > *group, struct device *dev) return true;
> > }
> >
> > +static void vfio_group_try_dissolve_container(struct vfio_group
> > *group); +
> > /*
> > * Decrement the device reference count and wait for the device to
> > be
> > * removed. Open file descriptors for the device... */
> > @@ -785,6 +787,7 @@ void *vfio_del_group_dev(struct device *dev)
> > }
> > } while (ret <= 0);
> >
> > + vfio_group_try_dissolve_container(group);
> > vfio_group_put(group);
> >
> > return device_data;
>
>
> This won't work, vfio_group_try_dissolve_container() decrements
> container_users, which an unused device is not. Imagine if we had
> more than one device in the iommu group, one device is removed and the
> container is dissolved despite the user holding a reference and other
> viable devices remaining. Additionally, from an isolation
> perspective, an unbind from vfio-pci should not pull the device out
> of the iommu domain, it's part of the domain because it's not
> isolated and that continues even after unbind.
>
> I think what you want to do is detach a device from the iommu domain
> only when it's being removed from iommu group, such as through
> iommu_group_remove_device(). We already have a bit of an asymmetry
> there as iommu_group_add_device() will add devices to the currently
> active iommu domain for the group, but iommu_group_remove_device()
> does not appear to do the reverse. Thanks,

Interesting, I haven't noticed this asymmetry so far, do you mean
something like this:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f286090..82ac8b3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -447,6 +447,9 @@ rename:
}
EXPORT_SYMBOL_GPL(iommu_group_add_device);

+static void __iommu_detach_device(struct iommu_domain *domain,
+ struct device *dev);
+
/**
* iommu_group_remove_device - remove a device from it's current group
* @dev: device to be removed
@@ -466,6 +469,8 @@ void iommu_group_remove_device(struct device *dev)
IOMMU_GROUP_NOTIFY_DEL_DEVICE,
dev);
mutex_lock(&group->mutex);
+ if (group->domain)
+ __iommu_detach_device(group->domain, dev);
list_for_each_entry(tmp_device, &group->devices, list) {
if (tmp_device->dev == dev) {
device = tmp_device;

This would also fix the issue in my scenario, but like before that
doesn't need to mean it is the correct fix. Adding the iommu list and
maintainer to cc.

Joerg, what do you think? (see https://lkml.org/lkml/2015/7/21/635 for
the problem description)

2015-07-23 13:28:14

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] vfio-pci/iommu: Detach iommu group on remove path

On Wed, 22 Jul 2015 11:10:57 -0600
Alex Williamson <[email protected]> wrote:

> On Wed, 2015-07-22 at 10:54 -0600, Alex Williamson wrote:
> > On Tue, 2015-07-21 at 19:44 +0200, Gerald Schaefer wrote:
> > > When a user completes the VFIO_SET_IOMMU ioctl and the vfio-pci
> > > device is removed thereafter (before any other ioctl like
> > > VFIO_GROUP_GET_DEVICE_FD), then the detach_dev callback of the
> > > underlying IOMMU API is never called.
> > >
> > > This patch adds a call to vfio_group_try_dissolve_container() to
> > > the remove path, which will trigger the missing detach_dev
> > > callback in this scenario.
> > >
> > > Signed-off-by: Gerald Schaefer <[email protected]>
> > > ---
> > > drivers/vfio/vfio.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > > index 2fb29df..9c5c784 100644
> > > --- a/drivers/vfio/vfio.c
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -711,6 +711,8 @@ static bool vfio_dev_present(struct
> > > vfio_group *group, struct device *dev) return true;
> > > }
> > >
> > > +static void vfio_group_try_dissolve_container(struct vfio_group
> > > *group); +
> > > /*
> > > * Decrement the device reference count and wait for the device
> > > to be
> > > * removed. Open file descriptors for the device... */
> > > @@ -785,6 +787,7 @@ void *vfio_del_group_dev(struct device *dev)
> > > }
> > > } while (ret <= 0);
> > >
> > > + vfio_group_try_dissolve_container(group);
> > > vfio_group_put(group);
> > >
> > > return device_data;
> >
> >
> > This won't work, vfio_group_try_dissolve_container() decrements
> > container_users, which an unused device is not. Imagine if we had
> > more than one device in the iommu group, one device is removed and
> > the container is dissolved despite the user holding a reference and
> > other viable devices remaining. Additionally, from an isolation
> > perspective, an unbind from vfio-pci should not pull the device out
> > of the iommu domain, it's part of the domain because it's not
> > isolated and that continues even after unbind.
> >
> > I think what you want to do is detach a device from the iommu domain
> > only when it's being removed from iommu group, such as through
> > iommu_group_remove_device(). We already have a bit of an asymmetry
> > there as iommu_group_add_device() will add devices to the currently
> > active iommu domain for the group, but iommu_group_remove_device()
> > does not appear to do the reverse. Thanks,
>
> BTW, VT-d on x86 avoids a leak using its own notifier_block,
> drivers/iommu/intel-iommu.c:device_notifier() catches
> BUS_NOTIFY_REMOVED_DEVICE and removes the device from the domain (the
> domain_exit() there is only used for non-IOMMU-API domains). It's
> possible that's the only IOMMU driver that avoids a leak due to the
> scenario you describe. Thanks,

Thanks, that's good to know, so as a last resort I could also use the
notifier to work around the issue. But x86 seems to be the only arch
using this notifier so far, so a general fix would be nice.