2018-02-15 11:05:18

by Linu Cherian

[permalink] [raw]
Subject: Handling active DMA during a VFIO application crash

Hi,

Was exploring the implications of an application crash while DMA
is active from a vfio PCI device; the DMA being configured and
started by the application using vfio APIs.

The expectation is that, DMA is stopped/reset before we tear down the IOMMU mappings
and finally free the mmapped pages(on which DMA is happening).

From the below stack trace(with dump_stack in vfio_pci_release),
[ 201.564273] [<ffffff8008798b50>] vfio_pci_release+0x80/0x458
[ 201.564276] [<ffffff8008792b74>] vfio_device_fops_release+0x2c/0x50
[ 201.564279] [<ffffff8008269ef4>] __fput+0x9c/0x218
[ 201.564283] [<ffffff800826a0e8>] ____fput+0x20/0x30
[ 201.564286] [<ffffff80080e7fe0>] task_work_run+0xa0/0xc8
[ 201.564289] [<ffffff80080cbc7c>] do_exit+0x2bc/0x9c8
[ 201.564293] [<ffffff80080cd0ec>] do_group_exit+0x3c/0xa8
[ 201.564296] [<ffffff80080d94c4>] get_signal+0x3e4/0x538
[ 201.564299] [<ffffff80080892f0>] do_signal+0x70/0x660
[ 201.564302] [<ffffff8008089ce8>] do_notify_resume+0xe0/0x120


PCI device is disabled/reset from vfio_pci_release invoked as part of
device fd release. The fd releases are in turn invoked from exit_files
and exit_task_work.

But exit_mm, gets called before exit_files/exit_task_work in do_exit.

Assuming all pages allocated/mmaped to a process gets freed in exit_mm,
is there is a possibility that user pages configured for DMA can get freed
to kernel before the vfio device is stopped/reset ?

Thanks.

--
Linu cherian


2018-02-15 16:22:40

by Alex Williamson

[permalink] [raw]
Subject: Re: Handling active DMA during a VFIO application crash

On Thu, 15 Feb 2018 16:34:06 +0530
Linu Cherian <[email protected]> wrote:

> Hi,
>
> Was exploring the implications of an application crash while DMA
> is active from a vfio PCI device; the DMA being configured and
> started by the application using vfio APIs.
>
> The expectation is that, DMA is stopped/reset before we tear down the IOMMU mappings
> and finally free the mmapped pages(on which DMA is happening).
>
> From the below stack trace(with dump_stack in vfio_pci_release),
> [ 201.564273] [<ffffff8008798b50>] vfio_pci_release+0x80/0x458
> [ 201.564276] [<ffffff8008792b74>] vfio_device_fops_release+0x2c/0x50
> [ 201.564279] [<ffffff8008269ef4>] __fput+0x9c/0x218
> [ 201.564283] [<ffffff800826a0e8>] ____fput+0x20/0x30
> [ 201.564286] [<ffffff80080e7fe0>] task_work_run+0xa0/0xc8
> [ 201.564289] [<ffffff80080cbc7c>] do_exit+0x2bc/0x9c8
> [ 201.564293] [<ffffff80080cd0ec>] do_group_exit+0x3c/0xa8
> [ 201.564296] [<ffffff80080d94c4>] get_signal+0x3e4/0x538
> [ 201.564299] [<ffffff80080892f0>] do_signal+0x70/0x660
> [ 201.564302] [<ffffff8008089ce8>] do_notify_resume+0xe0/0x120
>
>
> PCI device is disabled/reset from vfio_pci_release invoked as part of
> device fd release. The fd releases are in turn invoked from exit_files
> and exit_task_work.
>
> But exit_mm, gets called before exit_files/exit_task_work in do_exit.
>
> Assuming all pages allocated/mmaped to a process gets freed in exit_mm,
> is there is a possibility that user pages configured for DMA can get freed
> to kernel before the vfio device is stopped/reset ?

Pages mapped through the IOMMU are still pinned, so they have an
elevated reference count and I believe therefore cannot "get freed to
kernel". Nothing should therefore be able to allocate those pages
until the container is released, which happens even after the device is
released. Thanks,

Alex

2018-02-16 18:44:51

by Linu Cherian

[permalink] [raw]
Subject: Re: Handling active DMA during a VFIO application crash

Hi Alex,

On Thu Feb 15, 2018 at 09:21:09AM -0700, Alex Williamson wrote:
> On Thu, 15 Feb 2018 16:34:06 +0530
> Linu Cherian <[email protected]> wrote:
>
> > Hi,
> >
> > Was exploring the implications of an application crash while DMA
> > is active from a vfio PCI device; the DMA being configured and
> > started by the application using vfio APIs.
> >
> > The expectation is that, DMA is stopped/reset before we tear down the IOMMU mappings
> > and finally free the mmapped pages(on which DMA is happening).
> >
> > From the below stack trace(with dump_stack in vfio_pci_release),
> > [ 201.564273] [<ffffff8008798b50>] vfio_pci_release+0x80/0x458
> > [ 201.564276] [<ffffff8008792b74>] vfio_device_fops_release+0x2c/0x50
> > [ 201.564279] [<ffffff8008269ef4>] __fput+0x9c/0x218
> > [ 201.564283] [<ffffff800826a0e8>] ____fput+0x20/0x30
> > [ 201.564286] [<ffffff80080e7fe0>] task_work_run+0xa0/0xc8
> > [ 201.564289] [<ffffff80080cbc7c>] do_exit+0x2bc/0x9c8
> > [ 201.564293] [<ffffff80080cd0ec>] do_group_exit+0x3c/0xa8
> > [ 201.564296] [<ffffff80080d94c4>] get_signal+0x3e4/0x538
> > [ 201.564299] [<ffffff80080892f0>] do_signal+0x70/0x660
> > [ 201.564302] [<ffffff8008089ce8>] do_notify_resume+0xe0/0x120
> >
> >
> > PCI device is disabled/reset from vfio_pci_release invoked as part of
> > device fd release. The fd releases are in turn invoked from exit_files
> > and exit_task_work.
> >
> > But exit_mm, gets called before exit_files/exit_task_work in do_exit.
> >
> > Assuming all pages allocated/mmaped to a process gets freed in exit_mm,
> > is there is a possibility that user pages configured for DMA can get freed
> > to kernel before the vfio device is stopped/reset ?
>
> Pages mapped through the IOMMU are still pinned, so they have an
> elevated reference count and I believe therefore cannot "get freed to
> kernel". Nothing should therefore be able to allocate those pages
> until the container is released, which happens even after the device is
> released. Thanks,
>
> Alex


Thanks for the clarification. I will dig through the code on this.

--
Linu cherian