2010-04-01 16:30:42

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On 04/01/2010 03:08 AM, Tom Lyon wrote:
> uio_pci_generic has previously been discussed on the KVM list, but this patch
> has nothing to do with KVM, so it is also going to LKML.
>

(needs to go to lkml even if it was for kvm)

> The point of this patch is to beef up the uio_pci_generic driver so that a
> non-privileged user process can run a user level driver for most PCIe
> devices. This can only be safe if there is an IOMMU in the system with
> per-device domains. Privileged users (CAP_SYS_RAWIO) are allowed if there is
> no IOMMU.
>
> Specifically, I seek to allow low-latency user level network drivers (non
> tcp/ip) which directly access SR-IOV style virtual network adapters, for use
> with packages such as OpenMPI.
>
> Key areas of change:
> - ioctl extensions to allow registration and dma mapping of memory regions,
> with lock accounting
> - support for mmu notifier driven de-mapping
>

Note that current iommus/devices don't support restart-on-fault dma, so
userspace drivers will have to lock memory so that it is not swapped
out. I don't think this prevents page migration, though.

> - support for MSI and MSI-X interrupts (the intel 82599 VFs support only
> MSI-X)
>

How does a userspace program receive those interrupts?

> - allowing interrupt enabling and device register mapping all
> through /dev/uio* so that permissions may be granted just by chmod
> on /dev/uio*
>

That was always broken with the sysfs interface.

--
error compiling committee.c: too many arguments to function


2010-04-01 15:41:54

by Tom Lyon

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On Thursday 01 April 2010 02:09:09 am Avi Kivity wrote:
> On 04/01/2010 03:08 AM, Tom Lyon wrote:
> > uio_pci_generic has previously been discussed on the KVM list, but this
> > patch has nothing to do with KVM, so it is also going to LKML.
>
> (needs to go to lkml even if it was for kvm)
>
> > The point of this patch is to beef up the uio_pci_generic driver so that
> > a non-privileged user process can run a user level driver for most PCIe
> > devices. This can only be safe if there is an IOMMU in the system with
> > per-device domains. Privileged users (CAP_SYS_RAWIO) are allowed if
> > there is no IOMMU.
> >
> > Specifically, I seek to allow low-latency user level network drivers (non
> > tcp/ip) which directly access SR-IOV style virtual network adapters, for
> > use with packages such as OpenMPI.
> >
> > Key areas of change:
> > - ioctl extensions to allow registration and dma mapping of memory
> > regions, with lock accounting
> > - support for mmu notifier driven de-mapping
>
> Note that current iommus/devices don't support restart-on-fault dma, so
> userspace drivers will have to lock memory so that it is not swapped
> out. I don't think this prevents page migration, though.
The driver provides a way to lock memory for DMA; the mmu notifier support is
to catch things when the user accidentally frees locked pages.

> > - support for MSI and MSI-X interrupts (the intel 82599 VFs support only
> > MSI-X)
>
> How does a userspace program receive those interrupts?
Same as other UIO drivers - by read()ing an event counter.

2010-04-01 16:09:31

by Tom Lyon

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On Thursday 01 April 2010 08:54:14 am Avi Kivity wrote:
> On 04/01/2010 06:39 PM, Tom Lyon wrote:
> >>> - support for MSI and MSI-X interrupts (the intel 82599 VFs support
> >>> only MSI-X)
> >>
> >> How does a userspace program receive those interrupts?
> >
> > Same as other UIO drivers - by read()ing an event counter.
>
> IIRC the usual event counter is /dev/uioX, what's your event counter now?
Exact same mechanism.

>
> kvm really wants the event counter to be an eventfd, that allows hooking
> it directly to kvm (which can inject an interrupt on an eventfd_signal),
> can you adapt your patch to do this?

My patch does not currently go anywhere near the read/fd logic of /dev/uioX.
I think a separate patch would be appropriate.

2010-04-01 16:20:36

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On 04/01/2010 06:39 PM, Tom Lyon wrote:
>
>
>>> - support for MSI and MSI-X interrupts (the intel 82599 VFs support only
>>> MSI-X)
>>>
>> How does a userspace program receive those interrupts?
>>
> Same as other UIO drivers - by read()ing an event counter.
>

IIRC the usual event counter is /dev/uioX, what's your event counter now?

kvm really wants the event counter to be an eventfd, that allows hooking
it directly to kvm (which can inject an interrupt on an eventfd_signal),
can you adapt your patch to do this?

--
error compiling committee.c: too many arguments to function

2010-04-01 16:39:55

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On 04/01/2010 07:06 PM, Tom Lyon wrote:
> On Thursday 01 April 2010 08:54:14 am Avi Kivity wrote:
>
>> On 04/01/2010 06:39 PM, Tom Lyon wrote:
>>
>>>>> - support for MSI and MSI-X interrupts (the intel 82599 VFs support
>>>>> only MSI-X)
>>>>>
>>>> How does a userspace program receive those interrupts?
>>>>
>>> Same as other UIO drivers - by read()ing an event counter.
>>>
>> IIRC the usual event counter is /dev/uioX, what's your event counter now?
>>
> Exact same mechanism.
>

But there are multiple msi-x interrupts, how do you know which one
triggered?

>> kvm really wants the event counter to be an eventfd, that allows hooking
>> it directly to kvm (which can inject an interrupt on an eventfd_signal),
>> can you adapt your patch to do this?
>>
> My patch does not currently go anywhere near the read/fd logic of /dev/uioX.
> I think a separate patch would be appropriate.
>

Sure.

--
error compiling committee.c: too many arguments to function

2010-04-01 19:27:26

by Tom Lyon

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On Thursday 01 April 2010 09:10:57 am Avi Kivity wrote:
> On 04/01/2010 07:06 PM, Tom Lyon wrote:
> > On Thursday 01 April 2010 08:54:14 am Avi Kivity wrote:
> >> On 04/01/2010 06:39 PM, Tom Lyon wrote:
> >>>>> - support for MSI and MSI-X interrupts (the intel 82599 VFs support
> >>>>> only MSI-X)
> >>>>
> >>>> How does a userspace program receive those interrupts?
> >>>
> >>> Same as other UIO drivers - by read()ing an event counter.
> >>
> >> IIRC the usual event counter is /dev/uioX, what's your event counter
> >> now?
> >
> > Exact same mechanism.
>
> But there are multiple msi-x interrupts, how do you know which one
> triggered?

You don't. This would suck for KVM, I guess, but we'd need major rework of the
generic UIO stuff to have a separate event channel for each MSI-X.

For my purposes, collapsing all the MSI-Xs into one MSI-look-alike is fine,
because I'd be using MSI anyways if I could. The weird Intel 82599 VF only
supports MSI-X.

So one big question is - do we expand the whole UIO framework for KVM
requirements, or do we split off either KVM or non-VM into a separate driver?
Hans or Greg - care to opine?



2010-04-01 20:21:06

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On Thu, Apr 01, 2010 at 12:24:45PM -0700, Tom Lyon wrote:

> For my purposes, collapsing all the MSI-Xs into one MSI-look-alike is fine,
> because I'd be using MSI anyways if I could. The weird Intel 82599 VF only
> supports MSI-X.

For KVM this is not fine. The device should look in the guest as it
looks in the host. Some devices might only support MSI-X and thus the
drivers for it only search for MSI-X and get confused when they only
find MSI.

> So one big question is - do we expand the whole UIO framework for KVM
> requirements, or do we split off either KVM or non-VM into a separate driver?
> Hans or Greg - care to opine?

We should definitly work towards a single implementation. The KVM device
passthrough requirements are not very different from that of userspace
device access.

Joerg

2010-04-01 21:29:48

by Tom Lyon

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On Thursday 01 April 2010 08:54:14 am Avi Kivity wrote:
> On 04/01/2010 06:39 PM, Tom Lyon wrote:
> >>> - support for MSI and MSI-X interrupts (the intel 82599 VFs support
> >>> only MSI-X)
> >>
> >> How does a userspace program receive those interrupts?
> >
> > Same as other UIO drivers - by read()ing an event counter.
>
> IIRC the usual event counter is /dev/uioX, what's your event counter now?
>
> kvm really wants the event counter to be an eventfd, that allows hooking
> it directly to kvm (which can inject an interrupt on an eventfd_signal),
> can you adapt your patch to do this?

I looked further into eventfds - they seem the perfect solution for the
MSI/MSI-X interrupts. Will include in V2.

2010-04-02 06:43:57

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On 04/01/2010 10:24 PM, Tom Lyon wrote:
>
>> But there are multiple msi-x interrupts, how do you know which one
>> triggered?
>>
> You don't. This would suck for KVM, I guess, but we'd need major rework of the
> generic UIO stuff to have a separate event channel for each MSI-X.
>

Doesn't it suck for non-kvm in the same way? Multiple vectors are there
for a reason. For example, if you have a multiqueue NIC, you'd have to
process all queues instead of just the one that triggered.

> For my purposes, collapsing all the MSI-Xs into one MSI-look-alike is fine,
> because I'd be using MSI anyways if I could. The weird Intel 82599 VF only
> supports MSI-X.
>
> So one big question is - do we expand the whole UIO framework for KVM
> requirements, or do we split off either KVM or non-VM into a separate driver?
> Hans or Greg - care to opine?
>

Currently kvm does device assignment with its own code, I'd like to
unify it with uio, not split it off.

Separate notifications for msi-x interrupts are just as useful for uio
as they are for kvm.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2010-04-02 06:44:20

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On 04/02/2010 12:27 AM, Tom Lyon wrote:
>
>> kvm really wants the event counter to be an eventfd, that allows hooking
>> it directly to kvm (which can inject an interrupt on an eventfd_signal),
>> can you adapt your patch to do this?
>>
> I looked further into eventfds - they seem the perfect solution for the
> MSI/MSI-X interrupts. Will include in V2.
>

They are indeed. Thanks.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2010-04-02 17:05:25

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On Fri, Apr 02, 2010 at 09:43:35AM +0300, Avi Kivity wrote:
> On 04/01/2010 10:24 PM, Tom Lyon wrote:
>>
>>> But there are multiple msi-x interrupts, how do you know which one
>>> triggered?
>>>
>> You don't. This would suck for KVM, I guess, but we'd need major rework of the
>> generic UIO stuff to have a separate event channel for each MSI-X.
>>
>
> Doesn't it suck for non-kvm in the same way? Multiple vectors are there
> for a reason. For example, if you have a multiqueue NIC, you'd have to
> process all queues instead of just the one that triggered.
>
>> For my purposes, collapsing all the MSI-Xs into one MSI-look-alike is fine,
>> because I'd be using MSI anyways if I could. The weird Intel 82599 VF only
>> supports MSI-X.
>>
>> So one big question is - do we expand the whole UIO framework for KVM
>> requirements, or do we split off either KVM or non-VM into a separate driver?
>> Hans or Greg - care to opine?
>>
>
> Currently kvm does device assignment with its own code, I'd like to unify
> it with uio, not split it off.
>
> Separate notifications for msi-x interrupts are just as useful for uio as
> they are for kvm.

I agree, there should not be a difference here for KVM vs. the "normal"
version.

thanks,

greg k-h

2010-04-09 09:58:35

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On 04/02/2010 08:05 PM, Greg KH wrote:
>
>> Currently kvm does device assignment with its own code, I'd like to unify
>> it with uio, not split it off.
>>
>> Separate notifications for msi-x interrupts are just as useful for uio as
>> they are for kvm.
>>
> I agree, there should not be a difference here for KVM vs. the "normal"
> version.
>

Just so you know what you got into, here are the kvm requirements:

- msi interrupts delivered via eventfd (these allow us to inject
interrupts from uio to a guest without going through userspace)
- nonlinear iommu mapping (i.e. map discontiguous ranges of the device
address space into ranges of the virtual address space)
- dynamic iommu mapping (support guest memory hotplug)
- unprivileged operation once an admin has assigned a device (my
preferred implementation is to have all operations go through an fd,
which can be passed via SCM_RIGHTS from a privileged application that
opens the file)
- access to all config space, but BARs must be translated so userspace
cannot attack the host
- some mechanism which allows us to affine device interrupts with their
target vcpus (eventually, this is vague)
- anything mst might add
- a pony

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2010-04-09 16:37:09

by Tom Lyon

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On Friday 09 April 2010 02:58:19 am Avi Kivity wrote:
> On 04/02/2010 08:05 PM, Greg KH wrote:
> >
> >> Currently kvm does device assignment with its own code, I'd like to unify
> >> it with uio, not split it off.
> >>
> >> Separate notifications for msi-x interrupts are just as useful for uio as
> >> they are for kvm.
> >>
> > I agree, there should not be a difference here for KVM vs. the "normal"
> > version.
> >
>
> Just so you know what you got into, here are the kvm requirements:
>
> - msi interrupts delivered via eventfd (these allow us to inject
> interrupts from uio to a guest without going through userspace)
Check.
> - nonlinear iommu mapping (i.e. map discontiguous ranges of the device
> address space into ranges of the virtual address space)
Check.
> - dynamic iommu mapping (support guest memory hotplug)
Check.
> - unprivileged operation once an admin has assigned a device (my
> preferred implementation is to have all operations go through an fd,
> which can be passed via SCM_RIGHTS from a privileged application that
> opens the file)
Check.
> - access to all config space, but BARs must be translated so userspace
> cannot attack the host
Please elaborate. All of PCI config? All of PCIe config? Seems like a huge mess.
> - some mechanism which allows us to affine device interrupts with their
> target vcpus (eventually, this is vague)
Do-able.
> - anything mst might add
mst?
> - a pony
Rainbow or glitter?

The 'check' items are already done, not fully tested; probably available next week.
Can we leave the others for future patches? Please? And I definitely need help with
the PCI config stuff.

2010-04-09 16:48:05

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for?non-privileged processes

On Fri, Apr 09, 2010 at 09:34:16AM -0700, Tom Lyon wrote:
> The 'check' items are already done, not fully tested; probably available next week.
> Can we leave the others for future patches? Please? And I definitely need help with
> the PCI config stuff.

Yeah, go in small steps forward. Just post again what you have next
week. We can add more functionality step by step.

Joerg

2010-04-09 17:43:52

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

On 04/09/2010 07:34 PM, Tom Lyon wrote:
>> - access to all config space, but BARs must be translated so userspace
>> cannot attack the host
>>
> Please elaborate. All of PCI config? All of PCIe config? Seems like a huge mess.
>

Yes. Anything a guest's device driver may want to access.

> The 'check' items are already done, not fully tested; probably available next week.
> Can we leave the others for future patches? Please?

Hey, I was expecting we'd have to do all of this. The requirements list
was to get the uio maintainers confirmation that this is going in an
acceptable direction.

We can definitely proceed incrementally.

> And I definitely need help with
> the PCI config stuff.
>

Sure.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2010-04-09 20:06:17

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

* Avi Kivity ([email protected]) wrote:
> On 04/02/2010 08:05 PM, Greg KH wrote:
> - access to all config space, but BARs must be translated so
> userspace cannot attack the host

Specifically, intermediated access to config space. For example, need
to know about MSI/MSI-X updates in config space.

thanks,
-chris

2010-04-09 20:09:32

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for?non-privileged processes

* Tom Lyon ([email protected]) wrote:
> On Friday 09 April 2010 02:58:19 am Avi Kivity wrote:
> > - access to all config space, but BARs must be translated so userspace
> > cannot attack the host
> Please elaborate. All of PCI config? All of PCIe config? Seems like a huge mess.

All of config space, but not raw access to all bits. So the MSI/MSI-X
capability writes need to be intermediated. There's bits in the
header too. And it's not just PCI, it's extended config space as well,
drivers may care about finding their whizzybang PCIe capability and doing
someting with it (and worse...they are allowed to put device specific
registers there, and worse yet...they do!).

thanks,
-chris