2015-07-28 17:56:18

by Gerald Schaefer

[permalink] [raw]
Subject: [RFC PATCH 0/1] iommu: Detach device from domain when removed from group

Hi,

during IOMMU API function testing on s390 I hit the following scenario:

After binding a device to vfio-pci, the user completes the VFIO_SET_IOMMU
ioctl and stops, see the sample C program below. Now the device is manually
removed via "echo 1 > /sys/bus/pci/devices/.../remove".

Although the SET_IOMMU ioctl triggered the attach_dev callback in the
underlying IOMMU API, removing the device in this way won't trigger the
detach_dev callback, neither during remove nor when the user program
continues with closing group/container.

On s390, this eventually leads to a kernel panic when binding the device
again to its non-vfio PCI driver, because of the missing arch-specific
cleanup in detach_dev. On x86, the detach_dev callback will also not be
called directly, but there is a notifier that will catch
BUS_NOTIFY_REMOVED_DEVICE and eventually do the cleanup. Other
architectures w/o the notifier probably have at least some kind of memory
leak in this scenario, so a general fix would be nice.

My first approach was to try and fix this in VFIO code, but Alex Williamson
pointed me to some asymmetry in the IOMMU code: iommu_group_add_device()
will invoke the attach_dev callback, but iommu_group_remove_device() won't
trigger detach_dev. Fixing this asymmetry would fix the issue for me, but
is this the correct fix? Any thoughts?

Regards,
Gerald


Here is the sample C program to trigger the ioctl:

#include <stdio.h>
#include <fcntl.h>
#include <linux/vfio.h>

int main(void)
{
int container, group, rc;

container = open("/dev/vfio/vfio", O_RDWR);
if (container < 0) {
perror("open /dev/vfio/vfio\n");
return -1;
}

group = open("/dev/vfio/0", O_RDWR);
if (group < 0) {
perror("open /dev/vfio/0\n");
return -1;
}

rc = ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
if (rc) {
perror("ioctl VFIO_GROUP_SET_CONTAINER\n");
return -1;
}

rc = ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
if (rc) {
perror("ioctl VFIO_SET_IOMMU\n");
return -1;
}

printf("Try device remove...\n");
getchar();

close(group);
close(container);
return 0;
}

Gerald Schaefer (1):
iommu: Detach device from domain when removed from group

drivers/iommu/iommu.c | 5 +++++
1 file changed, 5 insertions(+)

--
2.3.8


2015-07-28 17:56:20

by Gerald Schaefer

[permalink] [raw]
Subject: [RFC PATCH 1/1] iommu: Detach device from domain when removed from group

This patch adds a call to __iommu_detach_device() to the
iommu_group_remove_device() function, which will trigger a missing
detach_dev callback in (at least) the following scenario:

When a user completes the VFIO_SET_IOMMU ioctl for a vfio-pci device,
and the corresponding device is removed thereafter (before any other
ioctl like VFIO_GROUP_GET_DEVICE_FD), then the detach_dev callback of
the underlying IOMMU API is never called.

This also fixes an asymmetry with iommu_group_add_device() and
iommu_group_remove_device(), where the former did an "attach_dev" but
the latter did no "detach_dev".

Signed-off-by: Gerald Schaefer <[email protected]>
---
drivers/iommu/iommu.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f286090..82ac8b3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -447,6 +447,9 @@ rename:
}
EXPORT_SYMBOL_GPL(iommu_group_add_device);

+static void __iommu_detach_device(struct iommu_domain *domain,
+ struct device *dev);
+
/**
* iommu_group_remove_device - remove a device from it's current group
* @dev: device to be removed
@@ -466,6 +469,8 @@ void iommu_group_remove_device(struct device *dev)
IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev);

mutex_lock(&group->mutex);
+ if (group->domain)
+ __iommu_detach_device(group->domain, dev);
list_for_each_entry(tmp_device, &group->devices, list) {
if (tmp_device->dev == dev) {
device = tmp_device;
--
2.3.8

2015-08-03 12:25:48

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] iommu: Detach device from domain when removed from group

On Tue, 28 Jul 2015 19:55:55 +0200
Gerald Schaefer <[email protected]> wrote:

> Hi,
>
> during IOMMU API function testing on s390 I hit the following scenario:
>
> After binding a device to vfio-pci, the user completes the VFIO_SET_IOMMU
> ioctl and stops, see the sample C program below. Now the device is manually
> removed via "echo 1 > /sys/bus/pci/devices/.../remove".
>
> Although the SET_IOMMU ioctl triggered the attach_dev callback in the
> underlying IOMMU API, removing the device in this way won't trigger the
> detach_dev callback, neither during remove nor when the user program
> continues with closing group/container.
>
> On s390, this eventually leads to a kernel panic when binding the device
> again to its non-vfio PCI driver, because of the missing arch-specific
> cleanup in detach_dev. On x86, the detach_dev callback will also not be
> called directly, but there is a notifier that will catch
> BUS_NOTIFY_REMOVED_DEVICE and eventually do the cleanup. Other
> architectures w/o the notifier probably have at least some kind of memory
> leak in this scenario, so a general fix would be nice.
>
> My first approach was to try and fix this in VFIO code, but Alex Williamson
> pointed me to some asymmetry in the IOMMU code: iommu_group_add_device()
> will invoke the attach_dev callback, but iommu_group_remove_device() won't
> trigger detach_dev. Fixing this asymmetry would fix the issue for me, but
> is this the correct fix? Any thoughts?

Ping.

The suggested fix may be completely wrong, but not having detach_dev called
seems like like a serious issue, any feedback would be greatly appreciated.


>
> Regards,
> Gerald
>
>
> Here is the sample C program to trigger the ioctl:
>
> #include <stdio.h>
> #include <fcntl.h>
> #include <linux/vfio.h>
>
> int main(void)
> {
> int container, group, rc;
>
> container = open("/dev/vfio/vfio", O_RDWR);
> if (container < 0) {
> perror("open /dev/vfio/vfio\n");
> return -1;
> }
>
> group = open("/dev/vfio/0", O_RDWR);
> if (group < 0) {
> perror("open /dev/vfio/0\n");
> return -1;
> }
>
> rc = ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
> if (rc) {
> perror("ioctl VFIO_GROUP_SET_CONTAINER\n");
> return -1;
> }
>
> rc = ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
> if (rc) {
> perror("ioctl VFIO_SET_IOMMU\n");
> return -1;
> }
>
> printf("Try device remove...\n");
> getchar();
>
> close(group);
> close(container);
> return 0;
> }
>
> Gerald Schaefer (1):
> iommu: Detach device from domain when removed from group
>
> drivers/iommu/iommu.c | 5 +++++
> 1 file changed, 5 insertions(+)
>

2015-08-03 15:49:00

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] iommu: Detach device from domain when removed from group

On Tue, Jul 28, 2015 at 07:55:55PM +0200, Gerald Schaefer wrote:
> On s390, this eventually leads to a kernel panic when binding the device
> again to its non-vfio PCI driver, because of the missing arch-specific
> cleanup in detach_dev. On x86, the detach_dev callback will also not be
> called directly, but there is a notifier that will catch
> BUS_NOTIFY_REMOVED_DEVICE and eventually do the cleanup. Other
> architectures w/o the notifier probably have at least some kind of memory
> leak in this scenario, so a general fix would be nice.

This notifier is not arch-specific, but registered against the bus the
iommu-ops are set for. Why does it not run on s390?


Joerg

2015-08-03 17:04:34

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] iommu: Detach device from domain when removed from group

On Mon, 3 Aug 2015 17:48:55 +0200
Joerg Roedel <[email protected]> wrote:

> On Tue, Jul 28, 2015 at 07:55:55PM +0200, Gerald Schaefer wrote:
> > On s390, this eventually leads to a kernel panic when binding the device
> > again to its non-vfio PCI driver, because of the missing arch-specific
> > cleanup in detach_dev. On x86, the detach_dev callback will also not be
> > called directly, but there is a notifier that will catch
> > BUS_NOTIFY_REMOVED_DEVICE and eventually do the cleanup. Other
> > architectures w/o the notifier probably have at least some kind of memory
> > leak in this scenario, so a general fix would be nice.
>
> This notifier is not arch-specific, but registered against the bus the
> iommu-ops are set for. Why does it not run on s390?

Adding the notifier would of course also work on s390 (and all other affected
architectures). However, it seems that the "missing detach_dev" issue in this
scenario is not fundamentally fixed by using this notifier, it just seems to
hide the symptom by chance.

Adding the otherwise unneeded notifier just to work around this issue somehow
doesn't seem right, also given that x86 is so far the only user of it. At
least I thought it would be cleaner to fix it in common code and for all
architectures. Not sure what's wrong with fixing the asymmetry as suggested
in my patch, but I guess there are good reasons for having this asymmetry.

For now, I'll just add the notifier to my s390 implementation and post it
soon.

>
>
> Joerg
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>