LinuxLists.cc - [PATCH kernel v9 00/32] powerpc/iommu/vfio: Enable Dynamic DMA windows

2015-04-25 12:16:09

Subject: [PATCH kernel v9 00/32] powerpc/iommu/vfio: Enable Dynamic DMA windows

This enables sPAPR defined feature called Dynamic DMA windows (DDW).

Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus
where devices are allowed to do DMA. These ranges are called DMA windows.
By default, there is a single DMA window, 1 or 2GB big, mapped at zero
on a PCI bus.

Hi-speed devices may suffer from the limited size of the window.
The recent host kernels use a TCE bypass window on POWER8 CPU which implements
direct PCI bus address range mapping (with offset of 1<<59) to the host memory.

For guests, PAPR defines a DDW RTAS API which allows pseries guests
querying the hypervisor about DDW support and capabilities (page size mask
for now). A pseries guest may request an additional (to the default)
DMA windows using this RTAS API.
The existing pseries Linux guests request an additional window as big as
the guest RAM and map the entire guest window which effectively creates
direct mapping of the guest memory to a PCI bus.

The multiple DMA windows feature is supported by POWER7/POWER8 CPUs; however
this patchset only adds support for POWER8 as TCE tables are implemented
in POWER7 in a quite different way ans POWER7 is not the highest priority.

This patchset reworks PPC64 IOMMU code and adds necessary structures
to support big windows.

Once a Linux guest discovers the presence of DDW, it does:
1. query hypervisor about number of available windows and page size masks;
2. create a window with the biggest possible page size (today 4K/64K/16M);
3. map the entire guest RAM via H_PUT_TCE* hypercalls;
4. switche dma_ops to direct_dma_ops on the selected PE.

Once this is done, H_PUT_TCE is not called anymore for 64bit devices and
the guest does not waste time on DMA map/unmap operations.

Note that 32bit devices won't use DDW and will keep using the default
DMA window so KVM optimizations will be required (to be posted later).

This is pushed to [email protected]:aik/linux.git
+ d9b711d...4d0247b 4d0247b -> vfio-for-github (forced update)

Changes:
v9:
* rebased on top of SRIOV (which is in upstream now)
* fixed multiple comments from David
* reworked ownership patches
* removed vfio: powerpc/spapr: Do cleanup when releasing the group (used to be #2)
as updated #1 should do this
* moved "powerpc/powernv: Implement accessor to TCE entry" to a separate patch
* added a patch which moves TCE Kill register address to PE from IOMMU table

v8:
* fixed a bug in error fallback in "powerpc/mmu: Add userspace-to-physical
addresses translation cache"
* fixed subject in "vfio: powerpc/spapr: Check that IOMMU page is fully
contained by system page"
* moved v2 documentation to the correct patch
* added checks for failed vzalloc() in "powerpc/iommu: Add userspace view
of TCE table"

v7:
* moved memory preregistration to the current process's MMU context
* added code preventing unregistration if some pages are still mapped;
for this, there is a userspace view of the table is stored in iommu_table
* added locked_vm counting for DDW tables (including userspace view of those)

v6:
* fixed a bunch of errors in "vfio: powerpc/spapr: Support Dynamic DMA windows"
* moved static IOMMU properties from iommu_table_group to iommu_table_group_ops

v5:
* added SPAPR_TCE_IOMMU_v2 to tell the userspace that there is a memory
pre-registration feature
* added backward compatibility
* renamed few things (mostly powerpc_iommu -> iommu_table_group)

v4:
* moved patches around to have VFIO and PPC patches separated as much as
possible
* now works with the existing upstream QEMU

v3:
* redesigned the whole thing
* multiple IOMMU groups per PHB -> one PHB is needed for VFIO in the guest ->
no problems with locked_vm counting; also we save memory on actual tables
* guest RAM preregistration is required for DDW
* PEs (IOMMU groups) are passed to VFIO with no DMA windows at all so
we do not bother with iommu_table::it_map anymore
* added multilevel TCE tables support to support really huge guests

v2:
* added missing __pa() in "powerpc/powernv: Release replaced TCE"
* reposted to make some noise

Alexey Kardashevskiy (32):
powerpc/iommu: Split iommu_free_table into 2 helpers
Revert "powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table
dynamically"
vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU
driver
vfio: powerpc/spapr: Check that IOMMU page is fully contained by
system page
vfio: powerpc/spapr: Use it_page_size
vfio: powerpc/spapr: Move locked_vm accounting to helpers
vfio: powerpc/spapr: Disable DMA mappings on disabled container
vfio: powerpc/spapr: Moving pinning/unpinning to helpers
vfio: powerpc/spapr: Rework groups attaching
powerpc/powernv: Do not set "read" flag if direction==DMA_NONE
powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table
powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group
vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership
control
powerpc/iommu: Fix IOMMU ownership control functions
powerpc/powernv/ioda/ioda2: Rework TCE invalidation in
tce_build()/tce_free()
powerpc/powernv/ioda: Move TCE kill register address to PE
powerpc/powernv: Implement accessor to TCE entry
powerpc/iommu/powernv: Release replaced TCE
powerpc/powernv/ioda2: Rework iommu_table creation
powerpc/powernv/ioda2: Introduce
pnv_pci_create_table/pnv_pci_free_table
powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window
powerpc/powernv: Implement multilevel TCE tables
powerpc/powernv/ioda: Define and implement DMA table/window management
callbacks
powerpc/powernv/ioda2: Use new helpers to do proper cleanup on PE
release
vfio: powerpc/spapr: powerpc/powernv/ioda2: Rework ownership
powerpc/iommu: Add userspace view of TCE table
powerpc/iommu/ioda2: Add get_table_size() to calculate the size of
future table
powerpc/mmu: Add userspace-to-physical addresses translation cache
vfio: powerpc/spapr: Register memory and define IOMMU v2
vfio: powerpc/spapr: Use 32bit DMA window properties from table_group
vfio: powerpc/spapr: Support multiple groups in one container if
possible
vfio: powerpc/spapr: Support Dynamic DMA windows

Documentation/vfio.txt | 50 +-
arch/powerpc/include/asm/iommu.h | 111 ++-
arch/powerpc/include/asm/machdep.h | 25 -
arch/powerpc/include/asm/mmu-hash64.h | 3 +
arch/powerpc/include/asm/mmu_context.h | 17 +
arch/powerpc/include/asm/pci-bridge.h | 2 +-
arch/powerpc/kernel/eeh.c | 2 +-
arch/powerpc/kernel/iommu.c | 303 ++++----
arch/powerpc/kernel/vio.c | 5 +
arch/powerpc/mm/Makefile | 1 +
arch/powerpc/mm/mmu_context_hash64.c | 6 +
arch/powerpc/mm/mmu_context_hash64_iommu.c | 215 ++++++
arch/powerpc/platforms/cell/iommu.c | 8 +-
arch/powerpc/platforms/pasemi/iommu.c | 7 +-
arch/powerpc/platforms/powernv/pci-ioda.c | 520 ++++++++++----
arch/powerpc/platforms/powernv/pci-p5ioc2.c | 33 +-
arch/powerpc/platforms/powernv/pci.c | 275 +++++--
arch/powerpc/platforms/powernv/pci.h | 20 +-
arch/powerpc/platforms/pseries/iommu.c | 138 ++--
arch/powerpc/sysdev/dart_iommu.c | 12 +-
drivers/vfio/vfio_iommu_spapr_tce.c | 1034 ++++++++++++++++++++++++---
include/uapi/linux/vfio.h | 88 ++-
22 files changed, 2304 insertions(+), 571 deletions(-)
create mode 100644 arch/powerpc/mm/mmu_context_hash64_iommu.c

--
2.0.0

2015-04-25 12:16:06

Subject: [PATCH kernel v9 00/32] powerpc/iommu/vfio: Enable Dynamic DMA windows

Subject: [PATCH kernel v9 01/32] powerpc/iommu: Split iommu_free_table into 2 helpers

Subject: [PATCH kernel v9 02/32] Revert "powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically"

Subject: [PATCH kernel v9 03/32] vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver

Subject: [PATCH kernel v9 04/32] vfio: powerpc/spapr: Check that IOMMU page is fully contained by system page

Subject: [PATCH kernel v9 05/32] vfio: powerpc/spapr: Use it_page_size

Subject: [PATCH kernel v9 06/32] vfio: powerpc/spapr: Move locked_vm accounting to helpers

Subject: [PATCH kernel v9 07/32] vfio: powerpc/spapr: Disable DMA mappings on disabled container

Subject: [PATCH kernel v9 08/32] vfio: powerpc/spapr: Moving pinning/unpinning to helpers

Subject: [PATCH kernel v9 09/32] vfio: powerpc/spapr: Rework groups attaching

Subject: [PATCH kernel v9 10/32] powerpc/powernv: Do not set "read" flag if direction==DMA_NONE

Subject: [PATCH kernel v9 11/32] powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table

Subject: [PATCH kernel v9 12/32] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

Subject: [PATCH kernel v9 13/32] vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control

Subject: [PATCH kernel v9 14/32] powerpc/iommu: Fix IOMMU ownership control functions

Subject: [PATCH kernel v9 15/32] powerpc/powernv/ioda/ioda2: Rework TCE invalidation in tce_build()/tce_free()

Subject: [PATCH kernel v9 16/32] powerpc/powernv/ioda: Move TCE kill register address to PE

Subject: [PATCH kernel v9 17/32] powerpc/powernv: Implement accessor to TCE entry

Subject: [PATCH kernel v9 18/32] powerpc/iommu/powernv: Release replaced TCE

Subject: [PATCH kernel v9 19/32] powerpc/powernv/ioda2: Rework iommu_table creation

Subject: [PATCH kernel v9 20/32] powerpc/powernv/ioda2: Introduce pnv_pci_create_table/pnv_pci_free_table

Subject: [PATCH kernel v9 21/32] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window

Subject: [PATCH kernel v9 22/32] powerpc/powernv: Implement multilevel TCE tables

Subject: [PATCH kernel v9 23/32] powerpc/powernv/ioda: Define and implement DMA table/window management callbacks

Subject: [PATCH kernel v9 24/32] powerpc/powernv/ioda2: Use new helpers to do proper cleanup on PE release

Subject: [PATCH kernel v9 25/32] vfio: powerpc/spapr: powerpc/powernv/ioda2: Rework ownership

Subject: [PATCH kernel v9 26/32] powerpc/iommu: Add userspace view of TCE table

Subject: [PATCH kernel v9 27/32] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table

Subject: [PATCH kernel v9 28/32] powerpc/mmu: Add userspace-to-physical addresses translation cache

Subject: [PATCH kernel v9 29/32] vfio: powerpc/spapr: Register memory and define IOMMU v2

Subject: [PATCH kernel v9 30/32] vfio: powerpc/spapr: Use 32bit DMA window properties from table_group

Subject: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible

Subject: [PATCH kernel v9 32/32] vfio: powerpc/spapr: Support Dynamic DMA windows

Subject: Re: [PATCH kernel v9 02/32] Revert "powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically"

Subject: Re: [PATCH kernel v9 16/32] powerpc/powernv/ioda: Move TCE kill register address to PE

Subject: Re: [PATCH kernel v9 30/32] vfio: powerpc/spapr: Use 32bit DMA window properties from table_group

Subject: Re: [PATCH kernel v9 01/32] powerpc/iommu: Split iommu_free_table into 2 helpers

Attachments:

Subject: Re: [PATCH kernel v9 02/32] Revert "powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically"

Attachments:

Subject: Re: [PATCH kernel v9 08/32] vfio: powerpc/spapr: Moving pinning/unpinning to helpers

Attachments:

Subject: Re: [PATCH kernel v9 09/32] vfio: powerpc/spapr: Rework groups attaching

Attachments:

Subject: Re: [PATCH kernel v9 12/32] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

Attachments:

Subject: Re: [PATCH kernel v9 13/32] vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control

Attachments:

Subject: Re: [PATCH kernel v9 14/32] powerpc/iommu: Fix IOMMU ownership control functions

Attachments:

Subject: Re: [PATCH kernel v9 15/32] powerpc/powernv/ioda/ioda2: Rework TCE invalidation in tce_build()/tce_free()

Attachments:

Subject: Re: [PATCH kernel v9 16/32] powerpc/powernv/ioda: Move TCE kill register address to PE

Attachments:

Subject: Re: [PATCH kernel v9 17/32] powerpc/powernv: Implement accessor to TCE entry

Attachments:

Subject: Re: [PATCH kernel v9 18/32] powerpc/iommu/powernv: Release replaced TCE

Attachments:

Subject: Re: [PATCH kernel v9 19/32] powerpc/powernv/ioda2: Rework iommu_table creation

Attachments:

Subject: Re: [PATCH kernel v9 20/32] powerpc/powernv/ioda2: Introduce pnv_pci_create_table/pnv_pci_free_table

Attachments:

Subject: Re: [PATCH kernel v9 21/32] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window

Attachments:

Subject: Re: [PATCH kernel v9 22/32] powerpc/powernv: Implement multilevel TCE tables

Attachments:

Subject: Re: [PATCH kernel v9 23/32] powerpc/powernv/ioda: Define and implement DMA table/window management callbacks

Attachments:

Subject: Re: [PATCH kernel v9 25/32] vfio: powerpc/spapr: powerpc/powernv/ioda2: Rework ownership

Attachments:

Subject: Re: [PATCH kernel v9 26/32] powerpc/iommu: Add userspace view of TCE table

Attachments:

Subject: Re: [PATCH kernel v9 27/32] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table

Attachments:

Subject: Re: [PATCH kernel v9 28/32] powerpc/mmu: Add userspace-to-physical addresses translation cache

Attachments:

Subject: Re: [PATCH kernel v9 16/32] powerpc/powernv/ioda: Move TCE kill register address to PE

Subject: Re: [PATCH kernel v9 17/32] powerpc/powernv: Implement accessor to TCE entry

Subject: Re: [PATCH kernel v9 20/32] powerpc/powernv/ioda2: Introduce pnv_pci_create_table/pnv_pci_free_table

Subject: Re: [PATCH kernel v9 13/32] vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control