2023-07-14 03:06:59

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: VFIO (PCI) and write combine mapping of BARs

Hi Folks !

I'd like to revive an old discussion as we (Amazon Linux) have been
getting asks for it.

What's the best interface to provide the option of write combine mmap's
of BARs via VFIO ?

The problem isn't so much the low level implementation, we just have to
play with the pgprot, the question is more around what API to present
to control this.

One trivial way would be to have an ioctl to set a flag for a given
region/BAR that cause subsequent mmap's to use write-combine. We would
have to keep a bitmap for the "legacy" regions, and use a flag in
struct vfio_pci_region for the others.

One potentially better way is to make it strictly an attribute of
vfio_pci_region, along with an ioctl that creates a "subregion". The
idea here is that we would have an ioctl to create a region from an
existing region dynamically, which represents a subset of the original
region (typically a BAR), with potentially different attributes (or we
keep the attribute get/set separate).

I like the latter more because it will allow to more easily define that
portions of a BAR can need different attributes without causing
state/race issues between setting the attribute and mmap.

This will also enable other attributes than write-combine if/when the
need arises.

Any better idea ? thoughs ? objections ?

This is still quite specific to PCI, but so is the entire regions
mechanism, so I don't see an easy path to something more generic at
this stage.

Cheers,
Ben.



2023-07-14 08:07:42

by Lorenzo Pieralisi

[permalink] [raw]
Subject: Re: VFIO (PCI) and write combine mapping of BARs

[+Catalin, Marc, Jason]

On Fri, Jul 14, 2023 at 12:32:49PM +1000, Benjamin Herrenschmidt wrote:
> Hi Folks !
>
> I'd like to revive an old discussion as we (Amazon Linux) have been
> getting asks for it.
>
> What's the best interface to provide the option of write combine mmap's
> of BARs via VFIO ?

There is an ongoing thread on this topic that we should use to
bring this discussion to completion:

https://lore.kernel.org/linux-arm-kernel/[email protected]
>
> The problem isn't so much the low level implementation, we just have to
> play with the pgprot, the question is more around what API to present
> to control this.
>
> One trivial way would be to have an ioctl to set a flag for a given
> region/BAR that cause subsequent mmap's to use write-combine. We would
> have to keep a bitmap for the "legacy" regions, and use a flag in
> struct vfio_pci_region for the others.
>
> One potentially better way is to make it strictly an attribute of
> vfio_pci_region, along with an ioctl that creates a "subregion". The
> idea here is that we would have an ioctl to create a region from an
> existing region dynamically, which represents a subset of the original
> region (typically a BAR), with potentially different attributes (or we
> keep the attribute get/set separate).
>
> I like the latter more because it will allow to more easily define that
> portions of a BAR can need different attributes without causing
> state/race issues between setting the attribute and mmap.
>
> This will also enable other attributes than write-combine if/when the
> need arises.
>
> Any better idea ? thoughs ? objections ?
>
> This is still quite specific to PCI, but so is the entire regions
> mechanism, so I don't see an easy path to something more generic at
> this stage.
>
> Cheers,
> Ben.
>

2023-07-14 12:59:00

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: VFIO (PCI) and write combine mapping of BARs

On Fri, Jul 14, 2023 at 09:13:27AM +0200, Lorenzo Pieralisi wrote:
> [+Catalin, Marc, Jason]
>
> On Fri, Jul 14, 2023 at 12:32:49PM +1000, Benjamin Herrenschmidt wrote:
> > Hi Folks !
> >
> > I'd like to revive an old discussion as we (Amazon Linux) have been
> > getting asks for it.
> >
> > What's the best interface to provide the option of write combine mmap's
> > of BARs via VFIO ?
>
> There is an ongoing thread on this topic that we should use to
> bring this discussion to completion:
>
> https://lore.kernel.org/linux-arm-kernel/[email protected]

There are two topics here

1) Make ARM KVM allow the VM to select WC for its MMIO. This has
evolved in a way that is not related to VFIO

2) Allow VFIO to create mmaps with WC for non-VM use cases like DPDK.

We have a draft patch for #1, and I think a general understanding with
ARM folks that this is the right direction.

2 is more like what this email talks about - providing mmaps with
specific flags.

Benjamin, which are you interested in?

> > The problem isn't so much the low level implementation, we just have to
> > play with the pgprot, the question is more around what API to present
> > to control this.

Assuming this is for #2, I think VFIO has fallen into a bit of a trap
by allowing userspace to form the mmap offset. I've seen this happen
in other subsystems too. It seems like a good idea then you realize
you need more stuff in the mmap space and become sad.

Typically the way out is to covert the mmap offset into a cookie where
userspace issues some ioctl and then the ioctl returns an opaque mmap
offset to use.

eg in the vfio context you'd do some 'prepare region for mmap' ioctl
where you could specify flags. The kernel would encode the flags in
the cookie and then mmap would do the right thing. Adding more stuff
is done by enhancing the prepare ioctl.

Legacy mmap offsets are kept working.

> > This is still quite specific to PCI, but so is the entire regions
> > mechanism, so I don't see an easy path to something more generic at
> > this stage.

Regions are general, but the encoding of the mmap cookie has various
PCI semantics when used with the PCI interface..

We'd want the same ability with platform devices too, for instance.

Jason

2023-07-25 06:47:13

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: VFIO (PCI) and write combine mapping of BARs

On Fri, 2023-07-14 at 09:37 -0300, Jason Gunthorpe wrote:
>
> There are two topics here
>
> 1) Make ARM KVM allow the VM to select WC for its MMIO. This has
>    evolved in a way that is not related to VFIO
>
> 2) Allow VFIO to create mmaps with WC for non-VM use cases like DPDK.
>
> We have a draft patch for #1, and I think a general understanding with
> ARM folks that this is the right direction.
>
> 2 is more like what this email talks about - providing mmaps with
> specific flags.
>
> Benjamin, which are you interested in?

Sorry for the delay, got caught up.... The customer request we have
(and what I was indeed talking about) is 2. That said, when running in
a VM, 2 won't do much without 1.

> > > The problem isn't so much the low level implementation, we just have to
> > > play with the pgprot, the question is more around what API to present
> > > to control this.
>
> Assuming this is for #2, I think VFIO has fallen into a bit of a trap
> by allowing userspace to form the mmap offset. I've seen this happen
> in other subsystems too. It seems like a good idea then you realize
> you need more stuff in the mmap space and become sad.
>
> Typically the way out is to covert the mmap offset into a cookie where
> userspace issues some ioctl and then the ioctl returns an opaque mmap
> offset to use.
>
> eg in the vfio context you'd do some 'prepare region for mmap' ioctl
> where you could specify flags. The kernel would encode the flags in
> the cookie and then mmap would do the right thing. Adding more stuff
> is done by enhancing the prepare ioctl.
>
> Legacy mmap offsets are kept working.

This indeed what I have in mind. IE. VFIO has legacy regions and add-on
regions though the latter is currently only exploited by some drivers
that create their own add-on regions. My proposal is to add an ioctl to
create them from userspace as "children" of an existing driver-provided
region, allowing to set different attributes for mmap.

> > > This is still quite specific to PCI, but so is the entire regions
> > > mechanism, so I don't see an easy path to something more generic at
> > > this stage.
>
> Regions are general, but the encoding of the mmap cookie has various
> PCI semantics when used with the PCI interface..
>
> We'd want the same ability with platform devices too, for instance.

In the current VFIO the implementation is *entirely* in vfio_pci_core
for PCI and entirely in vfio_platform_common.c for platform, so while
the same ioctls could be imagined to create sub-regions, it would have
to be completely implemented twice unless we do a lot of heavy lifting
to move some of that region stuff into common code.

But yes, appart from that, no objection :-)

Cheers,
Ben.

2023-07-25 11:50:07

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: VFIO (PCI) and write combine mapping of BARs

On Tue, Jul 25, 2023 at 04:15:39PM +1000, Benjamin Herrenschmidt wrote:
> > Assuming this is for #2, I think VFIO has fallen into a bit of a trap
> > by allowing userspace to form the mmap offset. I've seen this happen
> > in other subsystems too. It seems like a good idea then you realize
> > you need more stuff in the mmap space and become sad.
> >
> > Typically the way out is to covert the mmap offset into a cookie where
> > userspace issues some ioctl and then the ioctl returns an opaque mmap
> > offset to use.
> >
> > eg in the vfio context you'd do some 'prepare region for mmap' ioctl
> > where you could specify flags. The kernel would encode the flags in
> > the cookie and then mmap would do the right thing. Adding more stuff
> > is done by enhancing the prepare ioctl.
> >
> > Legacy mmap offsets are kept working.
>
> This indeed what I have in mind. IE. VFIO has legacy regions and add-on
> regions though the latter is currently only exploited by some drivers
> that create their own add-on regions. My proposal is to add an ioctl to
> create them from userspace as "children" of an existing driver-provided
> region, allowing to set different attributes for mmap.

I wouldn't call it children, you are just getting a different mmap
cookie for the same region object.

> In the current VFIO the implementation is *entirely* in vfio_pci_core
> for PCI and entirely in vfio_platform_common.c for platform, so while
> the same ioctls could be imagined to create sub-regions, it would have
> to be completely implemented twice unless we do a lot of heavy lifting
> to move some of that region stuff into common code.

The machinery for managing the mmap cookies should be in common code

Jason

2023-07-26 01:57:32

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: VFIO (PCI) and write combine mapping of BARs

On Tue, 2023-07-25 at 08:39 -0300, Jason Gunthorpe wrote:
> On Tue, Jul 25, 2023 at 04:15:39PM +1000, Benjamin Herrenschmidt wrote:
> > > Assuming this is for #2, I think VFIO has fallen into a bit of a trap
> > > by allowing userspace to form the mmap offset. I've seen this happen
> > > in other subsystems too. It seems like a good idea then you realize
> > > you need more stuff in the mmap space and become sad.
> > >
> > > Typically the way out is to covert the mmap offset into a cookie where
> > > userspace issues some ioctl and then the ioctl returns an opaque mmap
> > > offset to use.
> > >
> > > eg in the vfio context you'd do some 'prepare region for mmap' ioctl
> > > where you could specify flags. The kernel would encode the flags in
> > > the cookie and then mmap would do the right thing. Adding more stuff
> > > is done by enhancing the prepare ioctl.
> > >
> > > Legacy mmap offsets are kept working.
> >
> > This indeed what I have in mind. IE. VFIO has legacy regions and add-on
> > regions though the latter is currently only exploited by some drivers
> > that create their own add-on regions. My proposal is to add an ioctl to
> > create them from userspace as "children" of an existing driver-provided
> > region, allowing to set different attributes for mmap.
>
> I wouldn't call it children, you are just getting a different mmap
> cookie for the same region object.

I though they could be subsets but that might be overkill.

> > In the current VFIO the implementation is *entirely* in vfio_pci_core
> > for PCI and entirely in vfio_platform_common.c for platform, so while
> > the same ioctls could be imagined to create sub-regions, it would have
> > to be completely implemented twice unless we do a lot of heavy lifting
> > to move some of that region stuff into common code.
>
> The machinery for managing the mmap cookies should be in common code

Ok. I'll whip up a POC within vfio_pci only intially to test the
concept and to agree on the API, then look at how we can clean all that
up.

Cheers,
Ben.