2020-11-05 07:11:12

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC, v0 1/3] vfio/platform: add support for msi

On Thu, 5 Nov 2020 11:32:55 +0530
Vikas Gupta <[email protected]> wrote:

> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 2f313a238a8f..aab051e8338d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -203,6 +203,7 @@ struct vfio_device_info {
> #define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
> #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6) /* vfio-fsl-mc device */
> #define VFIO_DEVICE_FLAGS_CAPS (1 << 7) /* Info supports caps */
> +#define VFIO_DEVICE_FLAGS_MSI (1 << 8) /* Device supports msi */
> __u32 num_regions; /* Max region index + 1 */
> __u32 num_irqs; /* Max IRQ index + 1 */
> __u32 cap_offset; /* Offset within info struct of first cap */

This doesn't make any sense to me, MSIs are just edge triggered
interrupts to userspace, so why isn't this fully described via
VFIO_DEVICE_GET_IRQ_INFO? If we do need something new to describe it,
this seems incomplete, which indexes are MSI (IRQ_INFO can describe
that)? We also already support MSI with vfio-pci, so a global flag for
the device advertising this still seems wrong. Thanks,

Alex

2020-11-06 02:57:00

by Vikas Gupta

[permalink] [raw]
Subject: Re: [RFC, v0 1/3] vfio/platform: add support for msi

Hi Alex,

On Thu, Nov 5, 2020 at 12:38 PM Alex Williamson
<[email protected]> wrote:
>
> On Thu, 5 Nov 2020 11:32:55 +0530
> Vikas Gupta <[email protected]> wrote:
>
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 2f313a238a8f..aab051e8338d 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -203,6 +203,7 @@ struct vfio_device_info {
> > #define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
> > #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6) /* vfio-fsl-mc device */
> > #define VFIO_DEVICE_FLAGS_CAPS (1 << 7) /* Info supports caps */
> > +#define VFIO_DEVICE_FLAGS_MSI (1 << 8) /* Device supports msi */
> > __u32 num_regions; /* Max region index + 1 */
> > __u32 num_irqs; /* Max IRQ index + 1 */
> > __u32 cap_offset; /* Offset within info struct of first cap */
>
> This doesn't make any sense to me, MSIs are just edge triggered
> interrupts to userspace, so why isn't this fully described via
> VFIO_DEVICE_GET_IRQ_INFO? If we do need something new to describe it,
> this seems incomplete, which indexes are MSI (IRQ_INFO can describe
> that)? We also already support MSI with vfio-pci, so a global flag for
> the device advertising this still seems wrong. Thanks,
>
> Alex
>
Since VFIO platform uses indexes for IRQ numbers so I think MSI(s)
cannot be described using indexes.
In the patch set there is no difference between MSI and normal
interrupt for VFIO_DEVICE_GET_IRQ_INFO.
The patch set adds MSI(s), say as an extension, to the normal
interrupts and handled accordingly. Do you see this is a violation? If
yes, then we`ll think of other possible ways to support MSI for the
platform devices.
Macro VFIO_DEVICE_FLAGS_MSI can be changed to any other name if it
collides with an already supported vfio-pci or if not necessary, we
can remove this flag.

Thanks,
Vikas


Attachments:
smime.p7s (4.07 kB)
S/MIME Cryptographic Signature

2020-11-06 03:14:00

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC, v0 1/3] vfio/platform: add support for msi

On Fri, 6 Nov 2020 08:24:26 +0530
Vikas Gupta <[email protected]> wrote:

> Hi Alex,
>
> On Thu, Nov 5, 2020 at 12:38 PM Alex Williamson
> <[email protected]> wrote:
> >
> > On Thu, 5 Nov 2020 11:32:55 +0530
> > Vikas Gupta <[email protected]> wrote:
> >
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 2f313a238a8f..aab051e8338d 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -203,6 +203,7 @@ struct vfio_device_info {
> > > #define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
> > > #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6) /* vfio-fsl-mc device */
> > > #define VFIO_DEVICE_FLAGS_CAPS (1 << 7) /* Info supports caps */
> > > +#define VFIO_DEVICE_FLAGS_MSI (1 << 8) /* Device supports msi */
> > > __u32 num_regions; /* Max region index + 1 */
> > > __u32 num_irqs; /* Max IRQ index + 1 */
> > > __u32 cap_offset; /* Offset within info struct of first cap */
> >
> > This doesn't make any sense to me, MSIs are just edge triggered
> > interrupts to userspace, so why isn't this fully described via
> > VFIO_DEVICE_GET_IRQ_INFO? If we do need something new to describe it,
> > this seems incomplete, which indexes are MSI (IRQ_INFO can describe
> > that)? We also already support MSI with vfio-pci, so a global flag for
> > the device advertising this still seems wrong. Thanks,
> >
> > Alex
> >
> Since VFIO platform uses indexes for IRQ numbers so I think MSI(s)
> cannot be described using indexes.

That would be news for vfio-pci which has been describing MSIs with
sub-indexes within indexes since vfio started.

> In the patch set there is no difference between MSI and normal
> interrupt for VFIO_DEVICE_GET_IRQ_INFO.

Then what exactly is a global device flag indicating? Does it indicate
all IRQs are MSI?

> The patch set adds MSI(s), say as an extension, to the normal
> interrupts and handled accordingly.

So we have both "normal" IRQs and MSIs? How does the user know which
indexes are which?

> Do you see this is a violation? If

Seems pretty unclear and dubious use of a global device flag.

> yes, then we`ll think of other possible ways to support MSI for the
> platform devices.
> Macro VFIO_DEVICE_FLAGS_MSI can be changed to any other name if it
> collides with an already supported vfio-pci or if not necessary, we
> can remove this flag.

If nothing else you're using a global flag to describe a platform
device specific augmentation. We've recently added capabilities on the
device info return that would be more appropriate for this, but
fundamentally I don't understand why the irq info isn't sufficient.
Thanks,

Alex

2020-11-09 06:43:21

by Vikas Gupta

[permalink] [raw]
Subject: Re: [RFC, v0 1/3] vfio/platform: add support for msi

Hi Alex,

On Fri, Nov 6, 2020 at 8:42 AM Alex Williamson
<[email protected]> wrote:
>
> On Fri, 6 Nov 2020 08:24:26 +0530
> Vikas Gupta <[email protected]> wrote:
>
> > Hi Alex,
> >
> > On Thu, Nov 5, 2020 at 12:38 PM Alex Williamson
> > <[email protected]> wrote:
> > >
> > > On Thu, 5 Nov 2020 11:32:55 +0530
> > > Vikas Gupta <[email protected]> wrote:
> > >
> > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > index 2f313a238a8f..aab051e8338d 100644
> > > > --- a/include/uapi/linux/vfio.h
> > > > +++ b/include/uapi/linux/vfio.h
> > > > @@ -203,6 +203,7 @@ struct vfio_device_info {
> > > > #define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
> > > > #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6) /* vfio-fsl-mc device */
> > > > #define VFIO_DEVICE_FLAGS_CAPS (1 << 7) /* Info supports caps */
> > > > +#define VFIO_DEVICE_FLAGS_MSI (1 << 8) /* Device supports msi */
> > > > __u32 num_regions; /* Max region index + 1 */
> > > > __u32 num_irqs; /* Max IRQ index + 1 */
> > > > __u32 cap_offset; /* Offset within info struct of first cap */
> > >
> > > This doesn't make any sense to me, MSIs are just edge triggered
> > > interrupts to userspace, so why isn't this fully described via
> > > VFIO_DEVICE_GET_IRQ_INFO? If we do need something new to describe it,
> > > this seems incomplete, which indexes are MSI (IRQ_INFO can describe
> > > that)? We also already support MSI with vfio-pci, so a global flag for
> > > the device advertising this still seems wrong. Thanks,
> > >
> > > Alex
> > >
> > Since VFIO platform uses indexes for IRQ numbers so I think MSI(s)
> > cannot be described using indexes.
>
> That would be news for vfio-pci which has been describing MSIs with
> sub-indexes within indexes since vfio started.
>
> > In the patch set there is no difference between MSI and normal
> > interrupt for VFIO_DEVICE_GET_IRQ_INFO.
>
> Then what exactly is a global device flag indicating? Does it indicate
> all IRQs are MSI?

No, it's not indicating that all are MSI.
The rationale behind adding the flag to tell user-space that platform
device supports MSI as well. As you mentioned recently added
capabilities can help on this, I`ll go through that.

>
> > The patch set adds MSI(s), say as an extension, to the normal
> > interrupts and handled accordingly.
>
> So we have both "normal" IRQs and MSIs? How does the user know which
> indexes are which?

With this patch set, I think this is missing and user space cannot
know that particular index is MSI interrupt.
For platform devices there is no such mechanism, like index and
sub-indexes to differentiate between legacy, MSI or MSIX as it’s there
in PCI.
I believe for a particular IRQ index if the flag
VFIO_IRQ_INFO_NORESIZE is used then user space can know which IRQ
index has MSI(s). Does it make sense?
Suggestions on this would be helpful.

Thanks,
Vikas
>
> > Do you see this is a violation? If
>
> Seems pretty unclear and dubious use of a global device flag.
>
> > yes, then we`ll think of other possible ways to support MSI for the
> > platform devices.
> > Macro VFIO_DEVICE_FLAGS_MSI can be changed to any other name if it
> > collides with an already supported vfio-pci or if not necessary, we
> > can remove this flag.
>
> If nothing else you're using a global flag to describe a platform
> device specific augmentation. We've recently added capabilities on the
> device info return that would be more appropriate for this, but
> fundamentally I don't understand why the irq info isn't sufficient.
> Thanks,
>
> Alex
>


Attachments:
smime.p7s (4.07 kB)
S/MIME Cryptographic Signature

2020-11-09 15:10:01

by Eric Auger

[permalink] [raw]
Subject: Re: [RFC, v0 1/3] vfio/platform: add support for msi

Hi Vikas,

On 11/6/20 3:54 AM, Vikas Gupta wrote:
> Hi Alex,
>
> On Thu, Nov 5, 2020 at 12:38 PM Alex Williamson
> <[email protected]> wrote:
>>
>> On Thu, 5 Nov 2020 11:32:55 +0530
>> Vikas Gupta <[email protected]> wrote:
>>
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 2f313a238a8f..aab051e8338d 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -203,6 +203,7 @@ struct vfio_device_info {
>>> #define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
>>> #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6) /* vfio-fsl-mc device */
>>> #define VFIO_DEVICE_FLAGS_CAPS (1 << 7) /* Info supports caps */
>>> +#define VFIO_DEVICE_FLAGS_MSI (1 << 8) /* Device supports msi */
>>> __u32 num_regions; /* Max region index + 1 */
>>> __u32 num_irqs; /* Max IRQ index + 1 */
>>> __u32 cap_offset; /* Offset within info struct of first cap */
>>
>> This doesn't make any sense to me, MSIs are just edge triggered
>> interrupts to userspace, so why isn't this fully described via
>> VFIO_DEVICE_GET_IRQ_INFO? If we do need something new to describe it,
>> this seems incomplete, which indexes are MSI (IRQ_INFO can describe
>> that)? We also already support MSI with vfio-pci, so a global flag for
>> the device advertising this still seems wrong. Thanks,
>>
>> Alex
>>
> Since VFIO platform uses indexes for IRQ numbers so I think MSI(s)
> cannot be described using indexes.
> In the patch set there is no difference between MSI and normal
> interrupt for VFIO_DEVICE_GET_IRQ_INFO.
in vfio_platform_irq_init() we first iterate on normal interrupts using
get_irq(). Can't we add an MSI index at the end of this list with
vdev->irqs[i].count > 1 and set vdev->num_irqs accordingly?

Thanks

Eric
> The patch set adds MSI(s), say as an extension, to the normal
> interrupts and handled accordingly. Do you see this is a violation? If
> yes, then we`ll think of other possible ways to support MSI for the
> platform devices.
> Macro VFIO_DEVICE_FLAGS_MSI can be changed to any other name if it
> collides with an already supported vfio-pci or if not necessary, we
> can remove this flag.
>
> Thanks,
> Vikas
>

2020-11-09 15:20:10

by Eric Auger

[permalink] [raw]
Subject: Re: [RFC, v0 1/3] vfio/platform: add support for msi

Hi Vikas,

On 11/9/20 7:41 AM, Vikas Gupta wrote:
> Hi Alex,
>
> On Fri, Nov 6, 2020 at 8:42 AM Alex Williamson
> <[email protected]> wrote:
>>
>> On Fri, 6 Nov 2020 08:24:26 +0530
>> Vikas Gupta <[email protected]> wrote:
>>
>>> Hi Alex,
>>>
>>> On Thu, Nov 5, 2020 at 12:38 PM Alex Williamson
>>> <[email protected]> wrote:
>>>>
>>>> On Thu, 5 Nov 2020 11:32:55 +0530
>>>> Vikas Gupta <[email protected]> wrote:
>>>>
>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>> index 2f313a238a8f..aab051e8338d 100644
>>>>> --- a/include/uapi/linux/vfio.h
>>>>> +++ b/include/uapi/linux/vfio.h
>>>>> @@ -203,6 +203,7 @@ struct vfio_device_info {
>>>>> #define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
>>>>> #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6) /* vfio-fsl-mc device */
>>>>> #define VFIO_DEVICE_FLAGS_CAPS (1 << 7) /* Info supports caps */
>>>>> +#define VFIO_DEVICE_FLAGS_MSI (1 << 8) /* Device supports msi */
>>>>> __u32 num_regions; /* Max region index + 1 */
>>>>> __u32 num_irqs; /* Max IRQ index + 1 */
>>>>> __u32 cap_offset; /* Offset within info struct of first cap */
>>>>
>>>> This doesn't make any sense to me, MSIs are just edge triggered
>>>> interrupts to userspace, so why isn't this fully described via
>>>> VFIO_DEVICE_GET_IRQ_INFO? If we do need something new to describe it,
>>>> this seems incomplete, which indexes are MSI (IRQ_INFO can describe
>>>> that)? We also already support MSI with vfio-pci, so a global flag for
>>>> the device advertising this still seems wrong. Thanks,
>>>>
>>>> Alex
>>>>
>>> Since VFIO platform uses indexes for IRQ numbers so I think MSI(s)
>>> cannot be described using indexes.
>>
>> That would be news for vfio-pci which has been describing MSIs with
>> sub-indexes within indexes since vfio started.
>>
>>> In the patch set there is no difference between MSI and normal
>>> interrupt for VFIO_DEVICE_GET_IRQ_INFO.
>>
>> Then what exactly is a global device flag indicating? Does it indicate
>> all IRQs are MSI?
>
> No, it's not indicating that all are MSI.
> The rationale behind adding the flag to tell user-space that platform
> device supports MSI as well. As you mentioned recently added
> capabilities can help on this, I`ll go through that.
>
>>
>>> The patch set adds MSI(s), say as an extension, to the normal
>>> interrupts and handled accordingly.
>>
>> So we have both "normal" IRQs and MSIs? How does the user know which
>> indexes are which?
>
> With this patch set, I think this is missing and user space cannot
> know that particular index is MSI interrupt.
> For platform devices there is no such mechanism, like index and
> sub-indexes to differentiate between legacy, MSI or MSIX as it’s there
> in PCI.
Wht can't you use the count field (as per vfio_pci_get_irq_count())?
> I believe for a particular IRQ index if the flag
> VFIO_IRQ_INFO_NORESIZE is used then user space can know which IRQ
> index has MSI(s). Does it make sense?
I don't think it is the same semantics.

Thanks

Eric
> Suggestions on this would be helpful.
>
> Thanks,
> Vikas
>>
>>> Do you see this is a violation? If
>>
>> Seems pretty unclear and dubious use of a global device flag.
>>
>>> yes, then we`ll think of other possible ways to support MSI for the
>>> platform devices.
>>> Macro VFIO_DEVICE_FLAGS_MSI can be changed to any other name if it
>>> collides with an already supported vfio-pci or if not necessary, we
>>> can remove this flag.
>>
>> If nothing else you're using a global flag to describe a platform
>> device specific augmentation. We've recently added capabilities on the
>> device info return that would be more appropriate for this, but
>> fundamentally I don't understand why the irq info isn't sufficient.
>> Thanks,
>>
>> Alex
>>

2020-11-09 15:31:34

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC, v0 1/3] vfio/platform: add support for msi

On Mon, 9 Nov 2020 12:11:15 +0530
Vikas Gupta <[email protected]> wrote:

> Hi Alex,
>
> On Fri, Nov 6, 2020 at 8:42 AM Alex Williamson
> <[email protected]> wrote:
> >
> > On Fri, 6 Nov 2020 08:24:26 +0530
> > Vikas Gupta <[email protected]> wrote:
> >
> > > Hi Alex,
> > >
> > > On Thu, Nov 5, 2020 at 12:38 PM Alex Williamson
> > > <[email protected]> wrote:
> > > >
> > > > On Thu, 5 Nov 2020 11:32:55 +0530
> > > > Vikas Gupta <[email protected]> wrote:
> > > >
> > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > > index 2f313a238a8f..aab051e8338d 100644
> > > > > --- a/include/uapi/linux/vfio.h
> > > > > +++ b/include/uapi/linux/vfio.h
> > > > > @@ -203,6 +203,7 @@ struct vfio_device_info {
> > > > > #define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
> > > > > #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6) /* vfio-fsl-mc device */
> > > > > #define VFIO_DEVICE_FLAGS_CAPS (1 << 7) /* Info supports caps */
> > > > > +#define VFIO_DEVICE_FLAGS_MSI (1 << 8) /* Device supports msi */
> > > > > __u32 num_regions; /* Max region index + 1 */
> > > > > __u32 num_irqs; /* Max IRQ index + 1 */
> > > > > __u32 cap_offset; /* Offset within info struct of first cap */
> > > >
> > > > This doesn't make any sense to me, MSIs are just edge triggered
> > > > interrupts to userspace, so why isn't this fully described via
> > > > VFIO_DEVICE_GET_IRQ_INFO? If we do need something new to describe it,
> > > > this seems incomplete, which indexes are MSI (IRQ_INFO can describe
> > > > that)? We also already support MSI with vfio-pci, so a global flag for
> > > > the device advertising this still seems wrong. Thanks,
> > > >
> > > > Alex
> > > >
> > > Since VFIO platform uses indexes for IRQ numbers so I think MSI(s)
> > > cannot be described using indexes.
> >
> > That would be news for vfio-pci which has been describing MSIs with
> > sub-indexes within indexes since vfio started.
> >
> > > In the patch set there is no difference between MSI and normal
> > > interrupt for VFIO_DEVICE_GET_IRQ_INFO.
> >
> > Then what exactly is a global device flag indicating? Does it indicate
> > all IRQs are MSI?
>
> No, it's not indicating that all are MSI.
> The rationale behind adding the flag to tell user-space that platform
> device supports MSI as well. As you mentioned recently added
> capabilities can help on this, I`ll go through that.


It still seems questionable to me to use a device info capability to
describe an interrupt index specific feature. The scope seems wrong.
Why does userspace need to know that this IRQ is MSI rather than
indicating it's simply an edge triggered interrupt? That can be done
using only vfio_irq_info.flags.


> > > The patch set adds MSI(s), say as an extension, to the normal
> > > interrupts and handled accordingly.
> >
> > So we have both "normal" IRQs and MSIs? How does the user know which
> > indexes are which?
>
> With this patch set, I think this is missing and user space cannot
> know that particular index is MSI interrupt.
> For platform devices there is no such mechanism, like index and
> sub-indexes to differentiate between legacy, MSI or MSIX as it’s there
> in PCI.

Indexes and sub-indexes are a grouping mechanism of vfio to describe
related interrupts. That terminology doesn't exist on PCI either, it's
meant to be used generically. It's left to the vfio bus driver how
userspace associates a given index to a device feature.

> I believe for a particular IRQ index if the flag
> VFIO_IRQ_INFO_NORESIZE is used then user space can know which IRQ
> index has MSI(s). Does it make sense?


No, no-resize is an implementation detail, not an indication of the
interrupt mechanism. It's still not clear to me why it's important to
expose to userspace that a given interrupt is MSI versus simply
exposing it as an edge interrupt (ie. automasked = false). If it is
necessary, the most direct approach might be to expose a capability
extension in the vfio_irq_info structure to describe it. Even then
though, I don't think simply exposing a index as MSI is very
meaningful. What is userspace intended to do differently based on this
information? Thanks,

Alex

2020-11-10 11:04:14

by Vikas Gupta

[permalink] [raw]
Subject: Re: [RFC, v0 1/3] vfio/platform: add support for msi

Hi Eric,

On Mon, Nov 9, 2020 at 8:35 PM Auger Eric <[email protected]> wrote:
>
> Hi Vikas,
>
> On 11/6/20 3:54 AM, Vikas Gupta wrote:
> > Hi Alex,
> >
> > On Thu, Nov 5, 2020 at 12:38 PM Alex Williamson
> > <[email protected]> wrote:
> >>
> >> On Thu, 5 Nov 2020 11:32:55 +0530
> >> Vikas Gupta <[email protected]> wrote:
> >>
> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>> index 2f313a238a8f..aab051e8338d 100644
> >>> --- a/include/uapi/linux/vfio.h
> >>> +++ b/include/uapi/linux/vfio.h
> >>> @@ -203,6 +203,7 @@ struct vfio_device_info {
> >>> #define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
> >>> #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6) /* vfio-fsl-mc device */
> >>> #define VFIO_DEVICE_FLAGS_CAPS (1 << 7) /* Info supports caps */
> >>> +#define VFIO_DEVICE_FLAGS_MSI (1 << 8) /* Device supports msi */
> >>> __u32 num_regions; /* Max region index + 1 */
> >>> __u32 num_irqs; /* Max IRQ index + 1 */
> >>> __u32 cap_offset; /* Offset within info struct of first cap */
> >>
> >> This doesn't make any sense to me, MSIs are just edge triggered
> >> interrupts to userspace, so why isn't this fully described via
> >> VFIO_DEVICE_GET_IRQ_INFO? If we do need something new to describe it,
> >> this seems incomplete, which indexes are MSI (IRQ_INFO can describe
> >> that)? We also already support MSI with vfio-pci, so a global flag for
> >> the device advertising this still seems wrong. Thanks,
> >>
> >> Alex
> >>
> > Since VFIO platform uses indexes for IRQ numbers so I think MSI(s)
> > cannot be described using indexes.
> > In the patch set there is no difference between MSI and normal
> > interrupt for VFIO_DEVICE_GET_IRQ_INFO.
> in vfio_platform_irq_init() we first iterate on normal interrupts using
> get_irq(). Can't we add an MSI index at the end of this list with
> vdev->irqs[i].count > 1 and set vdev->num_irqs accordingly?
Yes, I think MSI can be added to the end of list with setting
vdev->irqs[i].count > 1.
I`ll consider changing in the next patch set.
Thanks,
Vikas
>
> Thanks
>
> Eric
> > The patch set adds MSI(s), say as an extension, to the normal
> > interrupts and handled accordingly. Do you see this is a violation? If
> > yes, then we`ll think of other possible ways to support MSI for the
> > platform devices.
> > Macro VFIO_DEVICE_FLAGS_MSI can be changed to any other name if it
> > collides with an already supported vfio-pci or if not necessary, we
> > can remove this flag.
> >
> > Thanks,
> > Vikas
> >
>


Attachments:
smime.p7s (4.07 kB)
S/MIME Cryptographic Signature

2020-11-10 11:08:49

by Vikas Gupta

[permalink] [raw]
Subject: Re: [RFC, v0 1/3] vfio/platform: add support for msi

Hi Alex,

On Mon, Nov 9, 2020 at 8:58 PM Alex Williamson
<[email protected]> wrote:
>
> On Mon, 9 Nov 2020 12:11:15 +0530
> Vikas Gupta <[email protected]> wrote:
>
> > Hi Alex,
> >
> > On Fri, Nov 6, 2020 at 8:42 AM Alex Williamson
> > <[email protected]> wrote:
> > >
> > > On Fri, 6 Nov 2020 08:24:26 +0530
> > > Vikas Gupta <[email protected]> wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > On Thu, Nov 5, 2020 at 12:38 PM Alex Williamson
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Thu, 5 Nov 2020 11:32:55 +0530
> > > > > Vikas Gupta <[email protected]> wrote:
> > > > >
> > > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > > > index 2f313a238a8f..aab051e8338d 100644
> > > > > > --- a/include/uapi/linux/vfio.h
> > > > > > +++ b/include/uapi/linux/vfio.h
> > > > > > @@ -203,6 +203,7 @@ struct vfio_device_info {
> > > > > > #define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
> > > > > > #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6) /* vfio-fsl-mc device */
> > > > > > #define VFIO_DEVICE_FLAGS_CAPS (1 << 7) /* Info supports caps */
> > > > > > +#define VFIO_DEVICE_FLAGS_MSI (1 << 8) /* Device supports msi */
> > > > > > __u32 num_regions; /* Max region index + 1 */
> > > > > > __u32 num_irqs; /* Max IRQ index + 1 */
> > > > > > __u32 cap_offset; /* Offset within info struct of first cap */
> > > > >
> > > > > This doesn't make any sense to me, MSIs are just edge triggered
> > > > > interrupts to userspace, so why isn't this fully described via
> > > > > VFIO_DEVICE_GET_IRQ_INFO? If we do need something new to describe it,
> > > > > this seems incomplete, which indexes are MSI (IRQ_INFO can describe
> > > > > that)? We also already support MSI with vfio-pci, so a global flag for
> > > > > the device advertising this still seems wrong. Thanks,
> > > > >
> > > > > Alex
> > > > >
> > > > Since VFIO platform uses indexes for IRQ numbers so I think MSI(s)
> > > > cannot be described using indexes.
> > >
> > > That would be news for vfio-pci which has been describing MSIs with
> > > sub-indexes within indexes since vfio started.
> > >
> > > > In the patch set there is no difference between MSI and normal
> > > > interrupt for VFIO_DEVICE_GET_IRQ_INFO.
> > >
> > > Then what exactly is a global device flag indicating? Does it indicate
> > > all IRQs are MSI?
> >
> > No, it's not indicating that all are MSI.
> > The rationale behind adding the flag to tell user-space that platform
> > device supports MSI as well. As you mentioned recently added
> > capabilities can help on this, I`ll go through that.
>
>
> It still seems questionable to me to use a device info capability to
> describe an interrupt index specific feature. The scope seems wrong.
> Why does userspace need to know that this IRQ is MSI rather than
> indicating it's simply an edge triggered interrupt? That can be done
> using only vfio_irq_info.flags.

Ok. In the next patch set I`ll remove the device flag (VFIO_DEVICE_FLAGS_MSI) as
vfio_irq_info.flags should have enough information for edge triggered interrupt.

>
>
> > > > The patch set adds MSI(s), say as an extension, to the normal
> > > > interrupts and handled accordingly.
> > >
> > > So we have both "normal" IRQs and MSIs? How does the user know which
> > > indexes are which?
> >
> > With this patch set, I think this is missing and user space cannot
> > know that particular index is MSI interrupt.
> > For platform devices there is no such mechanism, like index and
> > sub-indexes to differentiate between legacy, MSI or MSIX as it’s there
> > in PCI.
>
> Indexes and sub-indexes are a grouping mechanism of vfio to describe
> related interrupts. That terminology doesn't exist on PCI either, it's
> meant to be used generically. It's left to the vfio bus driver how
> userspace associates a given index to a device feature.
>
> > I believe for a particular IRQ index if the flag
> > VFIO_IRQ_INFO_NORESIZE is used then user space can know which IRQ
> > index has MSI(s). Does it make sense?
>
>
> No, no-resize is an implementation detail, not an indication of the
> interrupt mechanism. It's still not clear to me why it's important to
> expose to userspace that a given interrupt is MSI versus simply
> exposing it as an edge interrupt (ie. automasked = false). If it is
> necessary, the most direct approach might be to expose a capability
> extension in the vfio_irq_info structure to describe it. Even then
> though, I don't think simply exposing a index as MSI is very
> meaningful. What is userspace intended to do differently based on this
> information? Thanks,
The current patch set is not setting VFIO_IRQ_INFO_AUTOMASKED
(automasked=false) for MSIs so I believe this much is information
enough for user space to know that this is an edge triggered
interrupt.
I agree that exposing an index as MSI is not meaningful as user space
has nothing special to do with this information.
>
> Alex
>


Attachments:
smime.p7s (4.07 kB)
S/MIME Cryptographic Signature

2020-11-12 18:42:13

by Eric Auger

[permalink] [raw]
Subject: Re: [RFC, v1 0/3] msi support for platform devices

Hi Vikas,

On 11/12/20 6:58 PM, Vikas Gupta wrote:
> This RFC adds support for MSI for platform devices.
> a) MSI(s) is/are added in addition to the normal interrupts.
> b) The vendor specific MSI configuration can be done using
> callbacks which is implemented as msi module.
> c) Adds a msi handling module for the Broadcom platform devices.
>
> Changes from:
> -------------
> v0 to v1:
> i) Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
> ii) Add MSI(s) at the end of the irq list of platform IRQs.
> MSI(s) with first entry of MSI block has count and flag
> information.
> IRQ list: Allocation for IRQs + MSIs are allocated as below
> Example: if there are 'n' IRQs and 'k' MSIs
> -------------------------------------------------------
> |IRQ-0|IRQ-1|....|IRQ-n|MSI-0|MSI-1|MSI-2|......|MSI-k|
> -------------------------------------------------------
I have not taken time yet to look at your series, but to me you should have
|IRQ-0|IRQ-1|....|IRQ-n|MSI|MSIX
then for setting a given MSIX (i) you would select the MSIx index and
then set start=i count=1.
to me individual MSIs are encoded in the subindex and not in the index.
The index just selects the "type" of interrupt.

For PCI you just have:
VFIO_PCI_INTX_IRQ_INDEX,
VFIO_PCI_MSI_IRQ_INDEX, -> MSI index and then you play with
start/count
VFIO_PCI_MSIX_IRQ_INDEX,
VFIO_PCI_ERR_IRQ_INDEX,
VFIO_PCI_REQ_IRQ_INDEX,

(include/uapi/linux/vfio.h)

Thanks

Eric
> MSI-0 will have count=k set and flags set accordingly.
>
> Vikas Gupta (3):
> vfio/platform: add support for msi
> vfio/platform: change cleanup order
> vfio/platform: add Broadcom msi module
>
> drivers/vfio/platform/Kconfig | 1 +
> drivers/vfio/platform/Makefile | 1 +
> drivers/vfio/platform/msi/Kconfig | 9 +
> drivers/vfio/platform/msi/Makefile | 2 +
> .../vfio/platform/msi/vfio_platform_bcmplt.c | 74 ++++++
> drivers/vfio/platform/vfio_platform_common.c | 86 ++++++-
> drivers/vfio/platform/vfio_platform_irq.c | 238 +++++++++++++++++-
> drivers/vfio/platform/vfio_platform_private.h | 23 ++
> 8 files changed, 419 insertions(+), 15 deletions(-)
> create mode 100644 drivers/vfio/platform/msi/Kconfig
> create mode 100644 drivers/vfio/platform/msi/Makefile
> create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c
>

2020-11-13 17:27:09

by Vikas Gupta

[permalink] [raw]
Subject: Re: [RFC, v1 0/3] msi support for platform devices

Hi Eric,

On Fri, Nov 13, 2020 at 12:10 AM Auger Eric <[email protected]> wrote:
>
> Hi Vikas,
>
> On 11/12/20 6:58 PM, Vikas Gupta wrote:
> > This RFC adds support for MSI for platform devices.
> > a) MSI(s) is/are added in addition to the normal interrupts.
> > b) The vendor specific MSI configuration can be done using
> > callbacks which is implemented as msi module.
> > c) Adds a msi handling module for the Broadcom platform devices.
> >
> > Changes from:
> > -------------
> > v0 to v1:
> > i) Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
> > ii) Add MSI(s) at the end of the irq list of platform IRQs.
> > MSI(s) with first entry of MSI block has count and flag
> > information.
> > IRQ list: Allocation for IRQs + MSIs are allocated as below
> > Example: if there are 'n' IRQs and 'k' MSIs
> > -------------------------------------------------------
> > |IRQ-0|IRQ-1|....|IRQ-n|MSI-0|MSI-1|MSI-2|......|MSI-k|
> > -------------------------------------------------------
> I have not taken time yet to look at your series, but to me you should have
> |IRQ-0|IRQ-1|....|IRQ-n|MSI|MSIX
> then for setting a given MSIX (i) you would select the MSIx index and
> then set start=i count=1.

As per your suggestion, we should have, if there are n-IRQs, k-MSIXs
and m-MSIs, allocation of IRQs should be done as below

|IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
| |
|
|MSIX0||MSIX1||MSXI2|....|MSIX-(k-1)|
|MSI0||MSI1||MSI2|....|MSI-(m-1)|
With this implementation user space can know that, at indexes n and
n+1, edge triggered interrupts are present.
We may add an element in vfio_platform_irq itself to allocate MSIs/MSIXs
struct vfio_platform_irq{
.....
.....
struct vfio_platform_irq *block; => this points to the block
allocation for MSIs/MSIXs and all msi/msix are type of IRQs.
};
OR
Another structure can be defined in 'vfio_pci_private.h'
struct vfio_msi_ctx {
struct eventfd_ctx *trigger;
char *name;
};
and
struct vfio_platform_irq {
.....
.....
struct vfio_msi_ctx *block; => this points to the block allocation
for MSIs/MSIXs
};
Which of the above two options sounds OK to you? Please suggest.

> to me individual MSIs are encoded in the subindex and not in the index.
> The index just selects the "type" of interrupt.
>
> For PCI you just have:
> VFIO_PCI_INTX_IRQ_INDEX,
> VFIO_PCI_MSI_IRQ_INDEX, -> MSI index and then you play with
> start/count
> VFIO_PCI_MSIX_IRQ_INDEX,
> VFIO_PCI_ERR_IRQ_INDEX,
> VFIO_PCI_REQ_IRQ_INDEX,
>
> (include/uapi/linux/vfio.h)

In pci case, type of interrupts is fixed so they can be 'indexed' by
these enums but for VFIO platform user space will need to iterate all
(num_irqs) indexes to know at which indexes edge triggered interrupts
are present.

Thanks,
Vikas
>
> Thanks
>
> Eric
> > MSI-0 will have count=k set and flags set accordingly.
> >
> > Vikas Gupta (3):
> > vfio/platform: add support for msi
> > vfio/platform: change cleanup order
> > vfio/platform: add Broadcom msi module
> >
> > drivers/vfio/platform/Kconfig | 1 +
> > drivers/vfio/platform/Makefile | 1 +
> > drivers/vfio/platform/msi/Kconfig | 9 +
> > drivers/vfio/platform/msi/Makefile | 2 +
> > .../vfio/platform/msi/vfio_platform_bcmplt.c | 74 ++++++
> > drivers/vfio/platform/vfio_platform_common.c | 86 ++++++-
> > drivers/vfio/platform/vfio_platform_irq.c | 238 +++++++++++++++++-
> > drivers/vfio/platform/vfio_platform_private.h | 23 ++
> > 8 files changed, 419 insertions(+), 15 deletions(-)
> > create mode 100644 drivers/vfio/platform/msi/Kconfig
> > create mode 100644 drivers/vfio/platform/msi/Makefile
> > create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c
> >
>


Attachments:
smime.p7s (4.07 kB)
S/MIME Cryptographic Signature

2020-11-16 13:18:18

by Eric Auger

[permalink] [raw]
Subject: Re: [RFC, v1 0/3] msi support for platform devices

Hi Vikas,

On 11/13/20 6:24 PM, Vikas Gupta wrote:
> Hi Eric,
>
> On Fri, Nov 13, 2020 at 12:10 AM Auger Eric <[email protected]> wrote:
>>
>> Hi Vikas,
>>
>> On 11/12/20 6:58 PM, Vikas Gupta wrote:
>>> This RFC adds support for MSI for platform devices.
>>> a) MSI(s) is/are added in addition to the normal interrupts.
>>> b) The vendor specific MSI configuration can be done using
>>> callbacks which is implemented as msi module.
>>> c) Adds a msi handling module for the Broadcom platform devices.
>>>
>>> Changes from:
>>> -------------
>>> v0 to v1:
>>> i) Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
>>> ii) Add MSI(s) at the end of the irq list of platform IRQs.
>>> MSI(s) with first entry of MSI block has count and flag
>>> information.
>>> IRQ list: Allocation for IRQs + MSIs are allocated as below
>>> Example: if there are 'n' IRQs and 'k' MSIs
>>> -------------------------------------------------------
>>> |IRQ-0|IRQ-1|....|IRQ-n|MSI-0|MSI-1|MSI-2|......|MSI-k|
>>> -------------------------------------------------------
>> I have not taken time yet to look at your series, but to me you should have
>> |IRQ-0|IRQ-1|....|IRQ-n|MSI|MSIX
>> then for setting a given MSIX (i) you would select the MSIx index and
>> then set start=i count=1.
>
> As per your suggestion, we should have, if there are n-IRQs, k-MSIXs
> and m-MSIs, allocation of IRQs should be done as below
>
> |IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
> | |
> |
> |MSIX0||MSIX1||MSXI2|....|MSIX-(k-1)|
> |MSI0||MSI1||MSI2|....|MSI-(m-1)|
No I really meant this list of indices: IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
and potentially later on IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX| ERR| REQ
if ERR/REQ were to be added.

I think the userspace could query the total number of indices using
VFIO_DEVICE_GET_INFO and retrieve num_irqs (corresponding to the n wire
interrupts + MSI index + MSIX index)

Then userspace can loop on all the indices using
VFIO_DEVICE_GET_IRQ_INFO. For each index it uses count to determine the
first indices related to wire interrupts (count = 1). Then comes the MSI
index, and after the MSI index. If any of those is supported, count >1,
otherwise count=0. The only thing I am dubious about is can the device
use a single MSI/MSIX? Because my hypothesis here is we use count to
discriminate between wire first indices and other indices.



> With this implementation user space can know that, at indexes n and
> n+1, edge triggered interrupts are present.
note wired interrupts can also be edge ones.
> We may add an element in vfio_platform_irq itself to allocate MSIs/MSIXs
> struct vfio_platform_irq{
> .....
> .....
> struct vfio_platform_irq *block; => this points to the block
> allocation for MSIs/MSIXs and all msi/msix are type of IRQs.As wired interrupts and MSI interrupts coexist, I would store in vdev an
array of wired interrupts (the existing vdev->irqs) and a new array for
MSI(x) as done in the PCI code.

vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);

Does it make sense?

> };
> OR
> Another structure can be defined in 'vfio_pci_private.h'
> struct vfio_msi_ctx {
> struct eventfd_ctx *trigger;
> char *name;
> };
> and
> struct vfio_platform_irq {
> .....
> .....
> struct vfio_msi_ctx *block; => this points to the block allocation
> for MSIs/MSIXs
> };
> Which of the above two options sounds OK to you? Please suggest.
>
>> to me individual MSIs are encoded in the subindex and not in the index.
>> The index just selects the "type" of interrupt.
>>
>> For PCI you just have:
>> VFIO_PCI_INTX_IRQ_INDEX,
>> VFIO_PCI_MSI_IRQ_INDEX, -> MSI index and then you play with
>> start/count
>> VFIO_PCI_MSIX_IRQ_INDEX,
>> VFIO_PCI_ERR_IRQ_INDEX,
>> VFIO_PCI_REQ_IRQ_INDEX,
>>
>> (include/uapi/linux/vfio.h)
>
> In pci case, type of interrupts is fixed so they can be 'indexed' by
> these enums but for VFIO platform user space will need to iterate all
> (num_irqs) indexes to know at which indexes edge triggered interrupts
> are present.
indeed, but can't you loop over all indices looking until count !=1? At
this point you know if have finished emurating the wires. Holds if
MSI(x) count !=1 of course.

Thanks

Eric

>
> Thanks,
> Vikas
>>
>> Thanks
>>
>> Eric
>>> MSI-0 will have count=k set and flags set accordingly.
>>>
>>> Vikas Gupta (3):
>>> vfio/platform: add support for msi
>>> vfio/platform: change cleanup order
>>> vfio/platform: add Broadcom msi module
>>>
>>> drivers/vfio/platform/Kconfig | 1 +
>>> drivers/vfio/platform/Makefile | 1 +
>>> drivers/vfio/platform/msi/Kconfig | 9 +
>>> drivers/vfio/platform/msi/Makefile | 2 +
>>> .../vfio/platform/msi/vfio_platform_bcmplt.c | 74 ++++++
>>> drivers/vfio/platform/vfio_platform_common.c | 86 ++++++-
>>> drivers/vfio/platform/vfio_platform_irq.c | 238 +++++++++++++++++-
>>> drivers/vfio/platform/vfio_platform_private.h | 23 ++
>>> 8 files changed, 419 insertions(+), 15 deletions(-)
>>> create mode 100644 drivers/vfio/platform/msi/Kconfig
>>> create mode 100644 drivers/vfio/platform/msi/Makefile
>>> create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c
>>>
>>

2020-11-17 06:27:55

by Vikas Gupta

[permalink] [raw]
Subject: Re: [RFC, v1 0/3] msi support for platform devices

Hi Eric,

On Mon, Nov 16, 2020 at 6:44 PM Auger Eric <[email protected]> wrote:
>
> Hi Vikas,
>
> On 11/13/20 6:24 PM, Vikas Gupta wrote:
> > Hi Eric,
> >
> > On Fri, Nov 13, 2020 at 12:10 AM Auger Eric <[email protected]> wrote:
> >>
> >> Hi Vikas,
> >>
> >> On 11/12/20 6:58 PM, Vikas Gupta wrote:
> >>> This RFC adds support for MSI for platform devices.
> >>> a) MSI(s) is/are added in addition to the normal interrupts.
> >>> b) The vendor specific MSI configuration can be done using
> >>> callbacks which is implemented as msi module.
> >>> c) Adds a msi handling module for the Broadcom platform devices.
> >>>
> >>> Changes from:
> >>> -------------
> >>> v0 to v1:
> >>> i) Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
> >>> ii) Add MSI(s) at the end of the irq list of platform IRQs.
> >>> MSI(s) with first entry of MSI block has count and flag
> >>> information.
> >>> IRQ list: Allocation for IRQs + MSIs are allocated as below
> >>> Example: if there are 'n' IRQs and 'k' MSIs
> >>> -------------------------------------------------------
> >>> |IRQ-0|IRQ-1|....|IRQ-n|MSI-0|MSI-1|MSI-2|......|MSI-k|
> >>> -------------------------------------------------------
> >> I have not taken time yet to look at your series, but to me you should have
> >> |IRQ-0|IRQ-1|....|IRQ-n|MSI|MSIX
> >> then for setting a given MSIX (i) you would select the MSIx index and
> >> then set start=i count=1.
> >
> > As per your suggestion, we should have, if there are n-IRQs, k-MSIXs
> > and m-MSIs, allocation of IRQs should be done as below
> >
> > |IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
> > | |
> > |
> > |MSIX0||MSIX1||MSXI2|....|MSIX-(k-1)|
> > |MSI0||MSI1||MSI2|....|MSI-(m-1)|
> No I really meant this list of indices: IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
> and potentially later on IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX| ERR| REQ
> if ERR/REQ were to be added.
I agree on this. Actually the map I drew incorrectly above but wanted
to demonstrate the same. It was a child-parent relationship for MSI
and its members and similarly for MSIX as well.
>
> I think the userspace could query the total number of indices using
> VFIO_DEVICE_GET_INFO and retrieve num_irqs (corresponding to the n wire
> interrupts + MSI index + MSIX index)
>
> Then userspace can loop on all the indices using
> VFIO_DEVICE_GET_IRQ_INFO. For each index it uses count to determine the
> first indices related to wire interrupts (count = 1). Then comes the MSI
> index, and after the MSI index. If any of those is supported, count >1,
> otherwise count=0. The only thing I am dubious about is can the device
> use a single MSI/MSIX? Because my hypothesis here is we use count to
> discriminate between wire first indices and other indices.
I believe count can be one as well, especially for ERR/REQ as you
mentioned above.I think we can not rely on the count > 1. Now, this is
blocking and we are not left with options unless we consider adding
more enums in flags in vfio_irq_info to tell userspace that particular
index is wired, MSI, MSIX etc. for the platform device.
What do you think?
>
>
>
> > With this implementation user space can know that, at indexes n and
> > n+1, edge triggered interrupts are present.
> note wired interrupts can also be edge ones.
> > We may add an element in vfio_platform_irq itself to allocate MSIs/MSIXs
> > struct vfio_platform_irq{
> > .....
> > .....
> > struct vfio_platform_irq *block; => this points to the block
> > allocation for MSIs/MSIXs and all msi/msix are type of IRQs.As wired interrupts and MSI interrupts coexist, I would store in vdev an
> array of wired interrupts (the existing vdev->irqs) and a new array for
> MSI(x) as done in the PCI code.
>
> vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);
>
> Does it make sense?
Yes, we can use similar kinds of allocations.

Thanks,
Vikas
>
> > };
> > OR
> > Another structure can be defined in 'vfio_pci_private.h'
> > struct vfio_msi_ctx {
> > struct eventfd_ctx *trigger;
> > char *name;
> > };
> > and
> > struct vfio_platform_irq {
> > .....
> > .....
> > struct vfio_msi_ctx *block; => this points to the block allocation
> > for MSIs/MSIXs
> > };
> > Which of the above two options sounds OK to you? Please suggest.
> >
> >> to me individual MSIs are encoded in the subindex and not in the index.
> >> The index just selects the "type" of interrupt.
> >>
> >> For PCI you just have:
> >> VFIO_PCI_INTX_IRQ_INDEX,
> >> VFIO_PCI_MSI_IRQ_INDEX, -> MSI index and then you play with
> >> start/count
> >> VFIO_PCI_MSIX_IRQ_INDEX,
> >> VFIO_PCI_ERR_IRQ_INDEX,
> >> VFIO_PCI_REQ_IRQ_INDEX,
> >>
> >> (include/uapi/linux/vfio.h)
> >
> > In pci case, type of interrupts is fixed so they can be 'indexed' by
> > these enums but for VFIO platform user space will need to iterate all
> > (num_irqs) indexes to know at which indexes edge triggered interrupts
> > are present.
> indeed, but can't you loop over all indices looking until count !=1? At
> this point you know if have finished emurating the wires. Holds if
> MSI(x) count !=1 of course.
>
> Thanks
>
> Eric
>
> >
> > Thanks,
> > Vikas
> >>
> >> Thanks
> >>
> >> Eric
> >>> MSI-0 will have count=k set and flags set accordingly.
> >>>
> >>> Vikas Gupta (3):
> >>> vfio/platform: add support for msi
> >>> vfio/platform: change cleanup order
> >>> vfio/platform: add Broadcom msi module
> >>>
> >>> drivers/vfio/platform/Kconfig | 1 +
> >>> drivers/vfio/platform/Makefile | 1 +
> >>> drivers/vfio/platform/msi/Kconfig | 9 +
> >>> drivers/vfio/platform/msi/Makefile | 2 +
> >>> .../vfio/platform/msi/vfio_platform_bcmplt.c | 74 ++++++
> >>> drivers/vfio/platform/vfio_platform_common.c | 86 ++++++-
> >>> drivers/vfio/platform/vfio_platform_irq.c | 238 +++++++++++++++++-
> >>> drivers/vfio/platform/vfio_platform_private.h | 23 ++
> >>> 8 files changed, 419 insertions(+), 15 deletions(-)
> >>> create mode 100644 drivers/vfio/platform/msi/Kconfig
> >>> create mode 100644 drivers/vfio/platform/msi/Makefile
> >>> create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c
> >>>
> >>
>


Attachments:
smime.p7s (4.06 kB)
S/MIME Cryptographic Signature

2020-11-17 08:07:41

by Eric Auger

[permalink] [raw]
Subject: Re: [RFC, v1 0/3] msi support for platform devices

Hi Vikas,

On 11/17/20 7:25 AM, Vikas Gupta wrote:
> Hi Eric,
>
> On Mon, Nov 16, 2020 at 6:44 PM Auger Eric <[email protected]> wrote:
>>
>> Hi Vikas,
>>
>> On 11/13/20 6:24 PM, Vikas Gupta wrote:
>>> Hi Eric,
>>>
>>> On Fri, Nov 13, 2020 at 12:10 AM Auger Eric <[email protected]> wrote:
>>>>
>>>> Hi Vikas,
>>>>
>>>> On 11/12/20 6:58 PM, Vikas Gupta wrote:
>>>>> This RFC adds support for MSI for platform devices.
>>>>> a) MSI(s) is/are added in addition to the normal interrupts.
>>>>> b) The vendor specific MSI configuration can be done using
>>>>> callbacks which is implemented as msi module.
>>>>> c) Adds a msi handling module for the Broadcom platform devices.
>>>>>
>>>>> Changes from:
>>>>> -------------
>>>>> v0 to v1:
>>>>> i) Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
>>>>> ii) Add MSI(s) at the end of the irq list of platform IRQs.
>>>>> MSI(s) with first entry of MSI block has count and flag
>>>>> information.
>>>>> IRQ list: Allocation for IRQs + MSIs are allocated as below
>>>>> Example: if there are 'n' IRQs and 'k' MSIs
>>>>> -------------------------------------------------------
>>>>> |IRQ-0|IRQ-1|....|IRQ-n|MSI-0|MSI-1|MSI-2|......|MSI-k|
>>>>> -------------------------------------------------------
>>>> I have not taken time yet to look at your series, but to me you should have
>>>> |IRQ-0|IRQ-1|....|IRQ-n|MSI|MSIX
>>>> then for setting a given MSIX (i) you would select the MSIx index and
>>>> then set start=i count=1.
>>>
>>> As per your suggestion, we should have, if there are n-IRQs, k-MSIXs
>>> and m-MSIs, allocation of IRQs should be done as below
>>>
>>> |IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
>>> | |
>>> |
>>> |MSIX0||MSIX1||MSXI2|....|MSIX-(k-1)|
>>> |MSI0||MSI1||MSI2|....|MSI-(m-1)|
>> No I really meant this list of indices: IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
>> and potentially later on IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX| ERR| REQ
>> if ERR/REQ were to be added.
> I agree on this. Actually the map I drew incorrectly above but wanted
> to demonstrate the same. It was a child-parent relationship for MSI
> and its members and similarly for MSIX as well.
>>
>> I think the userspace could query the total number of indices using
>> VFIO_DEVICE_GET_INFO and retrieve num_irqs (corresponding to the n wire
>> interrupts + MSI index + MSIX index)
>>
>> Then userspace can loop on all the indices using
>> VFIO_DEVICE_GET_IRQ_INFO. For each index it uses count to determine the
>> first indices related to wire interrupts (count = 1). Then comes the MSI
>> index, and after the MSI index. If any of those is supported, count >1,
>> otherwise count=0. The only thing I am dubious about is can the device
>> use a single MSI/MSIX? Because my hypothesis here is we use count to
>> discriminate between wire first indices and other indices.
> I believe count can be one as well, especially for ERR/REQ as you
> mentioned above.
Given ERR and REQ indices would follow MSI and MSIX ones, MSI index
could be recognized by the first index whose count != 1. But indeed I am
not sure the number of supported vectors cannot be 1. In your case it is
induced by the size of the ring so it is OK but for other devices this
may be different.

I think we can not rely on the count > 1. Now, this is
> blocking and we are not left with options unless we consider adding
> more enums in flags in vfio_irq_info to tell userspace that particular
> index is wired, MSI, MSIX etc. for the platform device.
> What do you think?
If count is not reliable to discriminate the first n wired interrupts
from the subsequen MSI and MSIx index, Alex suggested to add a
capability extension in the vfio_irq_info structure. Something similar
to what was done for vfio_region_info.

Such kind of thing was attempted in
https://lore.kernel.org/kvmarm/[email protected]/T/#u

` [PATCH v11 07/13] vfio: Use capability chains to handle device
specific irq
` [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt indices
` [PATCH v11 09/13] vfio: Add new IRQ for DMA fault reporting

Note this has not been reviewed yet.

Thanks

Eric

>>
>>
>>
>>> With this implementation user space can know that, at indexes n and
>>> n+1, edge triggered interrupts are present.
>> note wired interrupts can also be edge ones.
>>> We may add an element in vfio_platform_irq itself to allocate MSIs/MSIXs
>>> struct vfio_platform_irq{
>>> .....
>>> .....
>>> struct vfio_platform_irq *block; => this points to the block
>>> allocation for MSIs/MSIXs and all msi/msix are type of IRQs.As wired interrupts and MSI interrupts coexist, I would store in vdev an
>> array of wired interrupts (the existing vdev->irqs) and a new array for
>> MSI(x) as done in the PCI code.
>>
>> vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);
>>
>> Does it make sense?
> Yes, we can use similar kinds of allocations.
>
> Thanks,
> Vikas
>>
>>> };
>>> OR
>>> Another structure can be defined in 'vfio_pci_private.h'
>>> struct vfio_msi_ctx {
>>> struct eventfd_ctx *trigger;
>>> char *name;
>>> };
>>> and
>>> struct vfio_platform_irq {
>>> .....
>>> .....
>>> struct vfio_msi_ctx *block; => this points to the block allocation
>>> for MSIs/MSIXs
>>> };
>>> Which of the above two options sounds OK to you? Please suggest.
>>>
>>>> to me individual MSIs are encoded in the subindex and not in the index.
>>>> The index just selects the "type" of interrupt.
>>>>
>>>> For PCI you just have:
>>>> VFIO_PCI_INTX_IRQ_INDEX,
>>>> VFIO_PCI_MSI_IRQ_INDEX, -> MSI index and then you play with
>>>> start/count
>>>> VFIO_PCI_MSIX_IRQ_INDEX,
>>>> VFIO_PCI_ERR_IRQ_INDEX,
>>>> VFIO_PCI_REQ_IRQ_INDEX,
>>>>
>>>> (include/uapi/linux/vfio.h)
>>>
>>> In pci case, type of interrupts is fixed so they can be 'indexed' by
>>> these enums but for VFIO platform user space will need to iterate all
>>> (num_irqs) indexes to know at which indexes edge triggered interrupts
>>> are present.
>> indeed, but can't you loop over all indices looking until count !=1? At
>> this point you know if have finished emurating the wires. Holds if
>> MSI(x) count !=1 of course.
>>
>> Thanks
>>
>> Eric
>>
>>>
>>> Thanks,
>>> Vikas
>>>>
>>>> Thanks
>>>>
>>>> Eric
>>>>> MSI-0 will have count=k set and flags set accordingly.
>>>>>
>>>>> Vikas Gupta (3):
>>>>> vfio/platform: add support for msi
>>>>> vfio/platform: change cleanup order
>>>>> vfio/platform: add Broadcom msi module
>>>>>
>>>>> drivers/vfio/platform/Kconfig | 1 +
>>>>> drivers/vfio/platform/Makefile | 1 +
>>>>> drivers/vfio/platform/msi/Kconfig | 9 +
>>>>> drivers/vfio/platform/msi/Makefile | 2 +
>>>>> .../vfio/platform/msi/vfio_platform_bcmplt.c | 74 ++++++
>>>>> drivers/vfio/platform/vfio_platform_common.c | 86 ++++++-
>>>>> drivers/vfio/platform/vfio_platform_irq.c | 238 +++++++++++++++++-
>>>>> drivers/vfio/platform/vfio_platform_private.h | 23 ++
>>>>> 8 files changed, 419 insertions(+), 15 deletions(-)
>>>>> create mode 100644 drivers/vfio/platform/msi/Kconfig
>>>>> create mode 100644 drivers/vfio/platform/msi/Makefile
>>>>> create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c
>>>>>
>>>>
>>

2020-11-17 08:28:35

by Eric Auger

[permalink] [raw]
Subject: Re: [RFC, v1 0/3] msi support for platform devices

Hi Vikas,

On 11/17/20 9:05 AM, Auger Eric wrote:
> Hi Vikas,
>
> On 11/17/20 7:25 AM, Vikas Gupta wrote:
>> Hi Eric,
>>
>> On Mon, Nov 16, 2020 at 6:44 PM Auger Eric <[email protected]> wrote:
>>>
>>> Hi Vikas,
>>>
>>> On 11/13/20 6:24 PM, Vikas Gupta wrote:
>>>> Hi Eric,
>>>>
>>>> On Fri, Nov 13, 2020 at 12:10 AM Auger Eric <[email protected]> wrote:
>>>>>
>>>>> Hi Vikas,
>>>>>
>>>>> On 11/12/20 6:58 PM, Vikas Gupta wrote:
>>>>>> This RFC adds support for MSI for platform devices.
>>>>>> a) MSI(s) is/are added in addition to the normal interrupts.
>>>>>> b) The vendor specific MSI configuration can be done using
>>>>>> callbacks which is implemented as msi module.
>>>>>> c) Adds a msi handling module for the Broadcom platform devices.
>>>>>>
>>>>>> Changes from:
>>>>>> -------------
>>>>>> v0 to v1:
>>>>>> i) Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
>>>>>> ii) Add MSI(s) at the end of the irq list of platform IRQs.
>>>>>> MSI(s) with first entry of MSI block has count and flag
>>>>>> information.
>>>>>> IRQ list: Allocation for IRQs + MSIs are allocated as below
>>>>>> Example: if there are 'n' IRQs and 'k' MSIs
>>>>>> -------------------------------------------------------
>>>>>> |IRQ-0|IRQ-1|....|IRQ-n|MSI-0|MSI-1|MSI-2|......|MSI-k|
>>>>>> -------------------------------------------------------
>>>>> I have not taken time yet to look at your series, but to me you should have
>>>>> |IRQ-0|IRQ-1|....|IRQ-n|MSI|MSIX
>>>>> then for setting a given MSIX (i) you would select the MSIx index and
>>>>> then set start=i count=1.
>>>>
>>>> As per your suggestion, we should have, if there are n-IRQs, k-MSIXs
>>>> and m-MSIs, allocation of IRQs should be done as below
>>>>
>>>> |IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
>>>> | |
>>>> |
>>>> |MSIX0||MSIX1||MSXI2|....|MSIX-(k-1)|
>>>> |MSI0||MSI1||MSI2|....|MSI-(m-1)|
>>> No I really meant this list of indices: IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
>>> and potentially later on IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX| ERR| REQ
>>> if ERR/REQ were to be added.
>> I agree on this. Actually the map I drew incorrectly above but wanted
>> to demonstrate the same. It was a child-parent relationship for MSI
>> and its members and similarly for MSIX as well.
>>>
>>> I think the userspace could query the total number of indices using
>>> VFIO_DEVICE_GET_INFO and retrieve num_irqs (corresponding to the n wire
>>> interrupts + MSI index + MSIX index)
>>>
>>> Then userspace can loop on all the indices using
>>> VFIO_DEVICE_GET_IRQ_INFO. For each index it uses count to determine the
>>> first indices related to wire interrupts (count = 1). Then comes the MSI
>>> index, and after the MSI index. If any of those is supported, count >1,
>>> otherwise count=0. The only thing I am dubious about is can the device
>>> use a single MSI/MSIX? Because my hypothesis here is we use count to
>>> discriminate between wire first indices and other indices.
>> I believe count can be one as well, especially for ERR/REQ as you
>> mentioned above.
> Given ERR and REQ indices would follow MSI and MSIX ones, MSI index
> could be recognized by the first index whose count != 1. But indeed I am
> not sure the number of supported vectors cannot be 1. In your case it is
> induced by the size of the ring so it is OK but for other devices this
> may be different.
>
> I think we can not rely on the count > 1. Now, this is
>> blocking and we are not left with options unless we consider adding
>> more enums in flags in vfio_irq_info to tell userspace that particular
>> index is wired, MSI, MSIX etc. for the platform device.
>> What do you think?
> If count is not reliable to discriminate the first n wired interrupts
> from the subsequen MSI and MSIx index, Alex suggested to add a
> capability extension in the vfio_irq_info structure. Something similar
> to what was done for vfio_region_info.
>
> Such kind of thing was attempted in
> https://lore.kernel.org/kvmarm/[email protected]/T/#u
>
> ` [PATCH v11 07/13] vfio: Use capability chains to handle device
> specific irq
> ` [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt indices
> ` [PATCH v11 09/13] vfio: Add new IRQ for DMA fault reporting

By the way I was mentionning MSI/MSIx in my previous reply but, as Alex
pointed out, with platform device only a single MSI index does make
sense, no?

Thanks

Eric
>
> Note this has not been reviewed yet.
>
> Thanks
>
> Eric
>
>>>
>>>
>>>
>>>> With this implementation user space can know that, at indexes n and
>>>> n+1, edge triggered interrupts are present.
>>> note wired interrupts can also be edge ones.
>>>> We may add an element in vfio_platform_irq itself to allocate MSIs/MSIXs
>>>> struct vfio_platform_irq{
>>>> .....
>>>> .....
>>>> struct vfio_platform_irq *block; => this points to the block
>>>> allocation for MSIs/MSIXs and all msi/msix are type of IRQs.As wired interrupts and MSI interrupts coexist, I would store in vdev an
>>> array of wired interrupts (the existing vdev->irqs) and a new array for
>>> MSI(x) as done in the PCI code.
>>>
>>> vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);
>>>
>>> Does it make sense?
>> Yes, we can use similar kinds of allocations.
>>
>> Thanks,
>> Vikas
>>>
>>>> };
>>>> OR
>>>> Another structure can be defined in 'vfio_pci_private.h'
>>>> struct vfio_msi_ctx {
>>>> struct eventfd_ctx *trigger;
>>>> char *name;
>>>> };
>>>> and
>>>> struct vfio_platform_irq {
>>>> .....
>>>> .....
>>>> struct vfio_msi_ctx *block; => this points to the block allocation
>>>> for MSIs/MSIXs
>>>> };
>>>> Which of the above two options sounds OK to you? Please suggest.
>>>>
>>>>> to me individual MSIs are encoded in the subindex and not in the index.
>>>>> The index just selects the "type" of interrupt.
>>>>>
>>>>> For PCI you just have:
>>>>> VFIO_PCI_INTX_IRQ_INDEX,
>>>>> VFIO_PCI_MSI_IRQ_INDEX, -> MSI index and then you play with
>>>>> start/count
>>>>> VFIO_PCI_MSIX_IRQ_INDEX,
>>>>> VFIO_PCI_ERR_IRQ_INDEX,
>>>>> VFIO_PCI_REQ_IRQ_INDEX,
>>>>>
>>>>> (include/uapi/linux/vfio.h)
>>>>
>>>> In pci case, type of interrupts is fixed so they can be 'indexed' by
>>>> these enums but for VFIO platform user space will need to iterate all
>>>> (num_irqs) indexes to know at which indexes edge triggered interrupts
>>>> are present.
>>> indeed, but can't you loop over all indices looking until count !=1? At
>>> this point you know if have finished emurating the wires. Holds if
>>> MSI(x) count !=1 of course.
>>>
>>> Thanks
>>>
>>> Eric
>>>
>>>>
>>>> Thanks,
>>>> Vikas
>>>>>
>>>>> Thanks
>>>>>
>>>>> Eric
>>>>>> MSI-0 will have count=k set and flags set accordingly.
>>>>>>
>>>>>> Vikas Gupta (3):
>>>>>> vfio/platform: add support for msi
>>>>>> vfio/platform: change cleanup order
>>>>>> vfio/platform: add Broadcom msi module
>>>>>>
>>>>>> drivers/vfio/platform/Kconfig | 1 +
>>>>>> drivers/vfio/platform/Makefile | 1 +
>>>>>> drivers/vfio/platform/msi/Kconfig | 9 +
>>>>>> drivers/vfio/platform/msi/Makefile | 2 +
>>>>>> .../vfio/platform/msi/vfio_platform_bcmplt.c | 74 ++++++
>>>>>> drivers/vfio/platform/vfio_platform_common.c | 86 ++++++-
>>>>>> drivers/vfio/platform/vfio_platform_irq.c | 238 +++++++++++++++++-
>>>>>> drivers/vfio/platform/vfio_platform_private.h | 23 ++
>>>>>> 8 files changed, 419 insertions(+), 15 deletions(-)
>>>>>> create mode 100644 drivers/vfio/platform/msi/Kconfig
>>>>>> create mode 100644 drivers/vfio/platform/msi/Makefile
>>>>>> create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c
>>>>>>
>>>>>
>>>

2020-11-17 16:38:58

by Vikas Gupta

[permalink] [raw]
Subject: Re: [RFC, v1 0/3] msi support for platform devices

Hi Eric,

On Tue, Nov 17, 2020 at 1:55 PM Auger Eric <[email protected]> wrote:
>
> Hi Vikas,
>
> On 11/17/20 9:05 AM, Auger Eric wrote:
> > Hi Vikas,
> >
> > On 11/17/20 7:25 AM, Vikas Gupta wrote:
> >> Hi Eric,
> >>
> >> On Mon, Nov 16, 2020 at 6:44 PM Auger Eric <[email protected]> wrote:
> >>>
> >>> Hi Vikas,
> >>>
> >>> On 11/13/20 6:24 PM, Vikas Gupta wrote:
> >>>> Hi Eric,
> >>>>
> >>>> On Fri, Nov 13, 2020 at 12:10 AM Auger Eric <[email protected]> wrote:
> >>>>>
> >>>>> Hi Vikas,
> >>>>>
> >>>>> On 11/12/20 6:58 PM, Vikas Gupta wrote:
> >>>>>> This RFC adds support for MSI for platform devices.
> >>>>>> a) MSI(s) is/are added in addition to the normal interrupts.
> >>>>>> b) The vendor specific MSI configuration can be done using
> >>>>>> callbacks which is implemented as msi module.
> >>>>>> c) Adds a msi handling module for the Broadcom platform devices.
> >>>>>>
> >>>>>> Changes from:
> >>>>>> -------------
> >>>>>> v0 to v1:
> >>>>>> i) Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
> >>>>>> ii) Add MSI(s) at the end of the irq list of platform IRQs.
> >>>>>> MSI(s) with first entry of MSI block has count and flag
> >>>>>> information.
> >>>>>> IRQ list: Allocation for IRQs + MSIs are allocated as below
> >>>>>> Example: if there are 'n' IRQs and 'k' MSIs
> >>>>>> -------------------------------------------------------
> >>>>>> |IRQ-0|IRQ-1|....|IRQ-n|MSI-0|MSI-1|MSI-2|......|MSI-k|
> >>>>>> -------------------------------------------------------
> >>>>> I have not taken time yet to look at your series, but to me you should have
> >>>>> |IRQ-0|IRQ-1|....|IRQ-n|MSI|MSIX
> >>>>> then for setting a given MSIX (i) you would select the MSIx index and
> >>>>> then set start=i count=1.
> >>>>
> >>>> As per your suggestion, we should have, if there are n-IRQs, k-MSIXs
> >>>> and m-MSIs, allocation of IRQs should be done as below
> >>>>
> >>>> |IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
> >>>> | |
> >>>> |
> >>>> |MSIX0||MSIX1||MSXI2|....|MSIX-(k-1)|
> >>>> |MSI0||MSI1||MSI2|....|MSI-(m-1)|
> >>> No I really meant this list of indices: IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
> >>> and potentially later on IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX| ERR| REQ
> >>> if ERR/REQ were to be added.
> >> I agree on this. Actually the map I drew incorrectly above but wanted
> >> to demonstrate the same. It was a child-parent relationship for MSI
> >> and its members and similarly for MSIX as well.
> >>>
> >>> I think the userspace could query the total number of indices using
> >>> VFIO_DEVICE_GET_INFO and retrieve num_irqs (corresponding to the n wire
> >>> interrupts + MSI index + MSIX index)
> >>>
> >>> Then userspace can loop on all the indices using
> >>> VFIO_DEVICE_GET_IRQ_INFO. For each index it uses count to determine the
> >>> first indices related to wire interrupts (count = 1). Then comes the MSI
> >>> index, and after the MSI index. If any of those is supported, count >1,
> >>> otherwise count=0. The only thing I am dubious about is can the device
> >>> use a single MSI/MSIX? Because my hypothesis here is we use count to
> >>> discriminate between wire first indices and other indices.
> >> I believe count can be one as well, especially for ERR/REQ as you
> >> mentioned above.
> > Given ERR and REQ indices would follow MSI and MSIX ones, MSI index
> > could be recognized by the first index whose count != 1. But indeed I am
> > not sure the number of supported vectors cannot be 1. In your case it is
> > induced by the size of the ring so it is OK but for other devices this
> > may be different.
> >
> > I think we can not rely on the count > 1. Now, this is
> >> blocking and we are not left with options unless we consider adding
> >> more enums in flags in vfio_irq_info to tell userspace that particular
> >> index is wired, MSI, MSIX etc. for the platform device.
> >> What do you think?
> > If count is not reliable to discriminate the first n wired interrupts
> > from the subsequen MSI and MSIx index, Alex suggested to add a
> > capability extension in the vfio_irq_info structure. Something similar
> > to what was done for vfio_region_info.
> >
> > Such kind of thing was attempted in
> > https://lore.kernel.org/kvmarm/[email protected]/T/#u
> >
> > ` [PATCH v11 07/13] vfio: Use capability chains to handle device
> > specific irq
> > ` [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt indices
> > ` [PATCH v11 09/13] vfio: Add new IRQ for DMA fault reporting
>
> By the way I was mentionning MSI/MSIx in my previous reply but, as Alex
> pointed out, with platform device only a single MSI index does make
> sense, no?
Yes, I think single MSI should be OK.
This single MSI index should be implemented as ext_irqs, similar to,
as you implemented in the mentioned patch. Is my understanding
correct?
Thanks,
Vikas
>
> Thanks
>
> Eric
> >
> > Note this has not been reviewed yet.
> >
> > Thanks
> >
> > Eric
> >
> >>>
> >>>
> >>>
> >>>> With this implementation user space can know that, at indexes n and
> >>>> n+1, edge triggered interrupts are present.
> >>> note wired interrupts can also be edge ones.
> >>>> We may add an element in vfio_platform_irq itself to allocate MSIs/MSIXs
> >>>> struct vfio_platform_irq{
> >>>> .....
> >>>> .....
> >>>> struct vfio_platform_irq *block; => this points to the block
> >>>> allocation for MSIs/MSIXs and all msi/msix are type of IRQs.As wired interrupts and MSI interrupts coexist, I would store in vdev an
> >>> array of wired interrupts (the existing vdev->irqs) and a new array for
> >>> MSI(x) as done in the PCI code.
> >>>
> >>> vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);
> >>>
> >>> Does it make sense?
> >> Yes, we can use similar kinds of allocations.
> >>
> >> Thanks,
> >> Vikas
> >>>
> >>>> };
> >>>> OR
> >>>> Another structure can be defined in 'vfio_pci_private.h'
> >>>> struct vfio_msi_ctx {
> >>>> struct eventfd_ctx *trigger;
> >>>> char *name;
> >>>> };
> >>>> and
> >>>> struct vfio_platform_irq {
> >>>> .....
> >>>> .....
> >>>> struct vfio_msi_ctx *block; => this points to the block allocation
> >>>> for MSIs/MSIXs
> >>>> };
> >>>> Which of the above two options sounds OK to you? Please suggest.
> >>>>
> >>>>> to me individual MSIs are encoded in the subindex and not in the index.
> >>>>> The index just selects the "type" of interrupt.
> >>>>>
> >>>>> For PCI you just have:
> >>>>> VFIO_PCI_INTX_IRQ_INDEX,
> >>>>> VFIO_PCI_MSI_IRQ_INDEX, -> MSI index and then you play with
> >>>>> start/count
> >>>>> VFIO_PCI_MSIX_IRQ_INDEX,
> >>>>> VFIO_PCI_ERR_IRQ_INDEX,
> >>>>> VFIO_PCI_REQ_IRQ_INDEX,
> >>>>>
> >>>>> (include/uapi/linux/vfio.h)
> >>>>
> >>>> In pci case, type of interrupts is fixed so they can be 'indexed' by
> >>>> these enums but for VFIO platform user space will need to iterate all
> >>>> (num_irqs) indexes to know at which indexes edge triggered interrupts
> >>>> are present.
> >>> indeed, but can't you loop over all indices looking until count !=1? At
> >>> this point you know if have finished emurating the wires. Holds if
> >>> MSI(x) count !=1 of course.
> >>>
> >>> Thanks
> >>>
> >>> Eric
> >>>
> >>>>
> >>>> Thanks,
> >>>> Vikas
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> Eric
> >>>>>> MSI-0 will have count=k set and flags set accordingly.
> >>>>>>
> >>>>>> Vikas Gupta (3):
> >>>>>> vfio/platform: add support for msi
> >>>>>> vfio/platform: change cleanup order
> >>>>>> vfio/platform: add Broadcom msi module
> >>>>>>
> >>>>>> drivers/vfio/platform/Kconfig | 1 +
> >>>>>> drivers/vfio/platform/Makefile | 1 +
> >>>>>> drivers/vfio/platform/msi/Kconfig | 9 +
> >>>>>> drivers/vfio/platform/msi/Makefile | 2 +
> >>>>>> .../vfio/platform/msi/vfio_platform_bcmplt.c | 74 ++++++
> >>>>>> drivers/vfio/platform/vfio_platform_common.c | 86 ++++++-
> >>>>>> drivers/vfio/platform/vfio_platform_irq.c | 238 +++++++++++++++++-
> >>>>>> drivers/vfio/platform/vfio_platform_private.h | 23 ++
> >>>>>> 8 files changed, 419 insertions(+), 15 deletions(-)
> >>>>>> create mode 100644 drivers/vfio/platform/msi/Kconfig
> >>>>>> create mode 100644 drivers/vfio/platform/msi/Makefile
> >>>>>> create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c
> >>>>>>
> >>>>>
> >>>
>


Attachments:
smime.p7s (4.07 kB)
S/MIME Cryptographic Signature

2020-11-18 11:04:57

by Eric Auger

[permalink] [raw]
Subject: Re: [RFC, v1 0/3] msi support for platform devices

Hi Vikas,

On 11/17/20 5:36 PM, Vikas Gupta wrote:
> Hi Eric,
>
> On Tue, Nov 17, 2020 at 1:55 PM Auger Eric <[email protected]> wrote:
>>
>> Hi Vikas,
>>
>> On 11/17/20 9:05 AM, Auger Eric wrote:
>>> Hi Vikas,
>>>
>>> On 11/17/20 7:25 AM, Vikas Gupta wrote:
>>>> Hi Eric,
>>>>
>>>> On Mon, Nov 16, 2020 at 6:44 PM Auger Eric <[email protected]> wrote:
>>>>>
>>>>> Hi Vikas,
>>>>>
>>>>> On 11/13/20 6:24 PM, Vikas Gupta wrote:
>>>>>> Hi Eric,
>>>>>>
>>>>>> On Fri, Nov 13, 2020 at 12:10 AM Auger Eric <[email protected]> wrote:
>>>>>>>
>>>>>>> Hi Vikas,
>>>>>>>
>>>>>>> On 11/12/20 6:58 PM, Vikas Gupta wrote:
>>>>>>>> This RFC adds support for MSI for platform devices.
>>>>>>>> a) MSI(s) is/are added in addition to the normal interrupts.
>>>>>>>> b) The vendor specific MSI configuration can be done using
>>>>>>>> callbacks which is implemented as msi module.
>>>>>>>> c) Adds a msi handling module for the Broadcom platform devices.
>>>>>>>>
>>>>>>>> Changes from:
>>>>>>>> -------------
>>>>>>>> v0 to v1:
>>>>>>>> i) Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
>>>>>>>> ii) Add MSI(s) at the end of the irq list of platform IRQs.
>>>>>>>> MSI(s) with first entry of MSI block has count and flag
>>>>>>>> information.
>>>>>>>> IRQ list: Allocation for IRQs + MSIs are allocated as below
>>>>>>>> Example: if there are 'n' IRQs and 'k' MSIs
>>>>>>>> -------------------------------------------------------
>>>>>>>> |IRQ-0|IRQ-1|....|IRQ-n|MSI-0|MSI-1|MSI-2|......|MSI-k|
>>>>>>>> -------------------------------------------------------
>>>>>>> I have not taken time yet to look at your series, but to me you should have
>>>>>>> |IRQ-0|IRQ-1|....|IRQ-n|MSI|MSIX
>>>>>>> then for setting a given MSIX (i) you would select the MSIx index and
>>>>>>> then set start=i count=1.
>>>>>>
>>>>>> As per your suggestion, we should have, if there are n-IRQs, k-MSIXs
>>>>>> and m-MSIs, allocation of IRQs should be done as below
>>>>>>
>>>>>> |IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
>>>>>> | |
>>>>>> |
>>>>>> |MSIX0||MSIX1||MSXI2|....|MSIX-(k-1)|
>>>>>> |MSI0||MSI1||MSI2|....|MSI-(m-1)|
>>>>> No I really meant this list of indices: IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX|
>>>>> and potentially later on IRQ0|IRQ1|......|IRQ-(n-1)|MSI|MSIX| ERR| REQ
>>>>> if ERR/REQ were to be added.
>>>> I agree on this. Actually the map I drew incorrectly above but wanted
>>>> to demonstrate the same. It was a child-parent relationship for MSI
>>>> and its members and similarly for MSIX as well.
>>>>>
>>>>> I think the userspace could query the total number of indices using
>>>>> VFIO_DEVICE_GET_INFO and retrieve num_irqs (corresponding to the n wire
>>>>> interrupts + MSI index + MSIX index)
>>>>>
>>>>> Then userspace can loop on all the indices using
>>>>> VFIO_DEVICE_GET_IRQ_INFO. For each index it uses count to determine the
>>>>> first indices related to wire interrupts (count = 1). Then comes the MSI
>>>>> index, and after the MSI index. If any of those is supported, count >1,
>>>>> otherwise count=0. The only thing I am dubious about is can the device
>>>>> use a single MSI/MSIX? Because my hypothesis here is we use count to
>>>>> discriminate between wire first indices and other indices.
>>>> I believe count can be one as well, especially for ERR/REQ as you
>>>> mentioned above.
>>> Given ERR and REQ indices would follow MSI and MSIX ones, MSI index
>>> could be recognized by the first index whose count != 1. But indeed I am
>>> not sure the number of supported vectors cannot be 1. In your case it is
>>> induced by the size of the ring so it is OK but for other devices this
>>> may be different.
>>>
>>> I think we can not rely on the count > 1. Now, this is
>>>> blocking and we are not left with options unless we consider adding
>>>> more enums in flags in vfio_irq_info to tell userspace that particular
>>>> index is wired, MSI, MSIX etc. for the platform device.
>>>> What do you think?
>>> If count is not reliable to discriminate the first n wired interrupts
>>> from the subsequen MSI and MSIx index, Alex suggested to add a
>>> capability extension in the vfio_irq_info structure. Something similar
>>> to what was done for vfio_region_info.
>>>
>>> Such kind of thing was attempted in
>>> https://lore.kernel.org/kvmarm/[email protected]/T/#u
>>>
>>> ` [PATCH v11 07/13] vfio: Use capability chains to handle device
>>> specific irq
>>> ` [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt indices
>>> ` [PATCH v11 09/13] vfio: Add new IRQ for DMA fault reporting
>>
>> By the way I was mentionning MSI/MSIx in my previous reply but, as Alex
>> pointed out, with platform device only a single MSI index does make
>> sense, no?
> Yes, I think single MSI should be OK.
> This single MSI index should be implemented as ext_irqs, similar to,
> as you implemented in the mentioned patch. Is my understanding
> correct?
Yes, if count !=1 cannot be used to detect the MSI index, I think using
a capability would do the job and this is aligned with last Alex'
suggestion.

Thanks

Eric
> Thanks,
> Vikas
>>
>> Thanks
>>
>> Eric
>>>
>>> Note this has not been reviewed yet.
>>>
>>> Thanks
>>>
>>> Eric
>>>
>>>>>
>>>>>
>>>>>
>>>>>> With this implementation user space can know that, at indexes n and
>>>>>> n+1, edge triggered interrupts are present.
>>>>> note wired interrupts can also be edge ones.
>>>>>> We may add an element in vfio_platform_irq itself to allocate MSIs/MSIXs
>>>>>> struct vfio_platform_irq{
>>>>>> .....
>>>>>> .....
>>>>>> struct vfio_platform_irq *block; => this points to the block
>>>>>> allocation for MSIs/MSIXs and all msi/msix are type of IRQs.As wired interrupts and MSI interrupts coexist, I would store in vdev an
>>>>> array of wired interrupts (the existing vdev->irqs) and a new array for
>>>>> MSI(x) as done in the PCI code.
>>>>>
>>>>> vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);
>>>>>
>>>>> Does it make sense?
>>>> Yes, we can use similar kinds of allocations.
>>>>
>>>> Thanks,
>>>> Vikas
>>>>>
>>>>>> };
>>>>>> OR
>>>>>> Another structure can be defined in 'vfio_pci_private.h'
>>>>>> struct vfio_msi_ctx {
>>>>>> struct eventfd_ctx *trigger;
>>>>>> char *name;
>>>>>> };
>>>>>> and
>>>>>> struct vfio_platform_irq {
>>>>>> .....
>>>>>> .....
>>>>>> struct vfio_msi_ctx *block; => this points to the block allocation
>>>>>> for MSIs/MSIXs
>>>>>> };
>>>>>> Which of the above two options sounds OK to you? Please suggest.
>>>>>>
>>>>>>> to me individual MSIs are encoded in the subindex and not in the index.
>>>>>>> The index just selects the "type" of interrupt.
>>>>>>>
>>>>>>> For PCI you just have:
>>>>>>> VFIO_PCI_INTX_IRQ_INDEX,
>>>>>>> VFIO_PCI_MSI_IRQ_INDEX, -> MSI index and then you play with
>>>>>>> start/count
>>>>>>> VFIO_PCI_MSIX_IRQ_INDEX,
>>>>>>> VFIO_PCI_ERR_IRQ_INDEX,
>>>>>>> VFIO_PCI_REQ_IRQ_INDEX,
>>>>>>>
>>>>>>> (include/uapi/linux/vfio.h)
>>>>>>
>>>>>> In pci case, type of interrupts is fixed so they can be 'indexed' by
>>>>>> these enums but for VFIO platform user space will need to iterate all
>>>>>> (num_irqs) indexes to know at which indexes edge triggered interrupts
>>>>>> are present.
>>>>> indeed, but can't you loop over all indices looking until count !=1? At
>>>>> this point you know if have finished emurating the wires. Holds if
>>>>> MSI(x) count !=1 of course.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Eric
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Vikas
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Eric
>>>>>>>> MSI-0 will have count=k set and flags set accordingly.
>>>>>>>>
>>>>>>>> Vikas Gupta (3):
>>>>>>>> vfio/platform: add support for msi
>>>>>>>> vfio/platform: change cleanup order
>>>>>>>> vfio/platform: add Broadcom msi module
>>>>>>>>
>>>>>>>> drivers/vfio/platform/Kconfig | 1 +
>>>>>>>> drivers/vfio/platform/Makefile | 1 +
>>>>>>>> drivers/vfio/platform/msi/Kconfig | 9 +
>>>>>>>> drivers/vfio/platform/msi/Makefile | 2 +
>>>>>>>> .../vfio/platform/msi/vfio_platform_bcmplt.c | 74 ++++++
>>>>>>>> drivers/vfio/platform/vfio_platform_common.c | 86 ++++++-
>>>>>>>> drivers/vfio/platform/vfio_platform_irq.c | 238 +++++++++++++++++-
>>>>>>>> drivers/vfio/platform/vfio_platform_private.h | 23 ++
>>>>>>>> 8 files changed, 419 insertions(+), 15 deletions(-)
>>>>>>>> create mode 100644 drivers/vfio/platform/msi/Kconfig
>>>>>>>> create mode 100644 drivers/vfio/platform/msi/Makefile
>>>>>>>> create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c
>>>>>>>>
>>>>>>>
>>>>>
>>

2020-12-02 14:49:04

by Eric Auger

[permalink] [raw]
Subject: Re: [RFC v2 1/1] vfio/platform: add support for msi

Hi Vikas,

On 11/24/20 5:16 PM, Vikas Gupta wrote:
> MSI support for platform devices.
>
> Signed-off-by: Vikas Gupta <[email protected]>
> ---
> drivers/vfio/platform/vfio_platform_common.c | 99 ++++++-
> drivers/vfio/platform/vfio_platform_irq.c | 260 +++++++++++++++++-
> drivers/vfio/platform/vfio_platform_private.h | 16 ++
> include/uapi/linux/vfio.h | 43 +++
> 4 files changed, 401 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c
> index c0771a9567fb..b0bfc0f4ee1f 100644
> --- a/drivers/vfio/platform/vfio_platform_common.c
> +++ b/drivers/vfio/platform/vfio_platform_common.c
> @@ -16,6 +16,7 @@
> #include <linux/types.h>
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> +#include <linux/nospec.h>
>
> #include "vfio_platform_private.h"
>
> @@ -344,9 +345,16 @@ static long vfio_platform_ioctl(void *device_data,
>
> } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
> struct vfio_irq_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + struct vfio_irq_info_cap_msi *msi_info = NULL;
> + unsigned long capsz;
> + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
>
> minsz = offsetofend(struct vfio_irq_info, count);
>
> + /* For backward compatibility, cannot require this */
> + capsz = offsetofend(struct vfio_irq_info, cap_offset);
> +
> if (copy_from_user(&info, (void __user *)arg, minsz))
> return -EFAULT;
>
> @@ -356,9 +364,89 @@ static long vfio_platform_ioctl(void *device_data,
> if (info.index >= vdev->num_irqs)
> return -EINVAL;
>
> + if (info.argsz >= capsz)
> + minsz = capsz;
> +
> + if (info.index == ext_irq_index) {
nit: n case we add new ext indices afterwards, I would check info.index
- ext_irq_index against an VFIO_EXT_IRQ_MSI define.
> + struct vfio_irq_info_cap_type cap_type = {
> + .header.id = VFIO_IRQ_INFO_CAP_TYPE,
> + .header.version = 1 };
> + int i;
> + int ret;
> + int num_msgs;
> + size_t msi_info_size;
> + struct vfio_platform_irq *irq;
nit: I think generally the opposite order (length) is chosen. This also
would better match the existing style in this file
> +
> + info.index = array_index_nospec(info.index,
> + vdev->num_irqs);
> +
> + irq = &vdev->irqs[info.index];
> +
> + info.flags = irq->flags;
I think this can be removed given [*]
> + cap_type.type = irq->type;
> + cap_type.subtype = irq->subtype;
> +
> + ret = vfio_info_add_capability(&caps,
> + &cap_type.header,
> + sizeof(cap_type));
> + if (ret)
> + return ret;
> +
> + num_msgs = irq->num_ctx;
do would want to return the cap even if !num_ctx?
> +
> + msi_info_size = struct_size(msi_info, msgs, num_msgs);
> +
> + msi_info = kzalloc(msi_info_size, GFP_KERNEL);
> + if (!msi_info) {
> + kfree(caps.buf);
> + return -ENOMEM;
> + }
> +
> + msi_info->header.id = VFIO_IRQ_INFO_CAP_MSI_DESCS;
> + msi_info->header.version = 1;
> + msi_info->nr_msgs = num_msgs;
> +
> + for (i = 0; i < num_msgs; i++) {
> + struct vfio_irq_ctx *ctx = &irq->ctx[i];
> +
> + msi_info->msgs[i].addr_lo = ctx->msg.address_lo;
> + msi_info->msgs[i].addr_hi = ctx->msg.address_hi;
> + msi_info->msgs[i].data = ctx->msg.data;
> + }
> +
> + ret = vfio_info_add_capability(&caps, &msi_info->header,
> + msi_info_size);
> + if (ret) {
> + kfree(msi_info);
> + kfree(caps.buf);
> + return ret;
> + }
> + }
> +
> info.flags = vdev->irqs[info.index].flags;
[*]
> info.count = vdev->irqs[info.index].count;
>
> + if (caps.size) {
> + info.flags |= VFIO_IRQ_INFO_FLAG_CAPS;
> + if (info.argsz < sizeof(info) + caps.size) {
> + info.argsz = sizeof(info) + caps.size;
> + info.cap_offset = 0;
> + } else {
> + vfio_info_cap_shift(&caps, sizeof(info));
> + if (copy_to_user((void __user *)arg +
> + sizeof(info), caps.buf,
> + caps.size)) {
> + kfree(msi_info);
> + kfree(caps.buf);
> + return -EFAULT;
> + }
> + info.cap_offset = sizeof(info);
> + }
> +
> + kfree(msi_info);
> + kfree(caps.buf);
> + }
> +
> return copy_to_user((void __user *)arg, &info, minsz) ?
> -EFAULT : 0;
>
> @@ -366,6 +454,7 @@ static long vfio_platform_ioctl(void *device_data,
> struct vfio_irq_set hdr;
> u8 *data = NULL;
> int ret = 0;
> + int max;
> size_t data_size = 0;
>
> minsz = offsetofend(struct vfio_irq_set, count);
> @@ -373,8 +462,14 @@ static long vfio_platform_ioctl(void *device_data,
> if (copy_from_user(&hdr, (void __user *)arg, minsz))
> return -EFAULT;
>
> - ret = vfio_set_irqs_validate_and_prepare(&hdr, vdev->num_irqs,
> - vdev->num_irqs, &data_size);
> + if (hdr.index >= vdev->num_irqs)
> + return -EINVAL;
> +
> + max = vdev->irqs[hdr.index].count;
> +
> + ret = vfio_set_irqs_validate_and_prepare(&hdr, max,
> + vdev->num_irqs,
> + &data_size);
> if (ret)
> return ret;
>
> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> index c5b09ec0a3c9..4066223e5b2e 100644
> --- a/drivers/vfio/platform/vfio_platform_irq.c
> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> @@ -8,10 +8,12 @@
>
> #include <linux/eventfd.h>
> #include <linux/interrupt.h>
> +#include <linux/eventfd.h>
> #include <linux/slab.h>
> #include <linux/types.h>
> #include <linux/vfio.h>
> #include <linux/irq.h>
> +#include <linux/msi.h>
>
> #include "vfio_platform_private.h"
>
> @@ -253,6 +255,195 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
> return 0;
> }
>
> +/* MSI/MSIX */
> +static irqreturn_t vfio_msihandler(int irq, void *arg)
> +{
> + struct eventfd_ctx *trigger = arg;
> +
> + eventfd_signal(trigger, 1);
> + return IRQ_HANDLED;
> +}
> +
> +static void msi_write(struct msi_desc *desc, struct msi_msg *msg)
> +{
> + int i;
> + struct vfio_platform_irq *irq;
> + u16 index = desc->platform.msi_index;
> + struct device *dev = msi_desc_to_dev(desc);
> + struct vfio_device *device = dev_get_drvdata(dev);
> + struct vfio_platform_device *vdev = (struct vfio_platform_device *)
> + vfio_device_data(device);
> +
> + for (i = 0; i < vdev->num_irqs; i++)
> + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
> + irq = &vdev->irqs[i];
> +
> + irq->ctx[index].msg.address_lo = msg->address_lo;
> + irq->ctx[index].msg.address_hi = msg->address_hi;
> + irq->ctx[index].msg.data = msg->data;
> +}
> +
> +static int vfio_msi_enable(struct vfio_platform_device *vdev,
> + struct vfio_platform_irq *irq, int nvec)
> +{
> + int ret;
> + int msi_idx = 0;
> + struct msi_desc *desc;
> + struct device *dev = vdev->device;
> +
> + irq->ctx = kcalloc(nvec, sizeof(struct vfio_irq_ctx), GFP_KERNEL);
> + if (!irq->ctx)
> + return -ENOMEM;
> +
> + /* Allocate platform MSIs */
> + ret = platform_msi_domain_alloc_irqs(dev, nvec, msi_write);
> + if (ret < 0) {
> + kfree(irq->ctx);
> + return ret;
> + }
> +
> + for_each_msi_entry(desc, dev) {
> + irq->ctx[msi_idx].hwirq = desc->irq;
> + msi_idx++;
> + }
> +
> + irq->num_ctx = nvec;
> + irq->config_msi = 1;
> +
> + return 0;
> +}
> +
> +static int vfio_msi_set_vector_signal(struct vfio_platform_irq *irq,
> + int vector, int fd)
> +{
> + struct eventfd_ctx *trigger;
> + int irq_num, ret;
> +
> + if (vector < 0 || vector >= irq->num_ctx)
> + return -EINVAL;
> +
> + irq_num = irq->ctx[vector].hwirq;
> +
> + if (irq->ctx[vector].trigger) {
> + free_irq(irq_num, irq->ctx[vector].trigger);
> + kfree(irq->ctx[vector].name);
> + eventfd_ctx_put(irq->ctx[vector].trigger);
> + irq->ctx[vector].trigger = NULL;
> + }
> +
> + if (fd < 0)
> + return 0;
> +
> + irq->ctx[vector].name = kasprintf(GFP_KERNEL,
> + "vfio-msi[%d]", vector);
> + if (!irq->ctx[vector].name)
> + return -ENOMEM;
> +
> + trigger = eventfd_ctx_fdget(fd);
> + if (IS_ERR(trigger)) {
> + kfree(irq->ctx[vector].name);
> + return PTR_ERR(trigger);
> + }
> +
> + ret = request_irq(irq_num, vfio_msihandler, 0,
> + irq->ctx[vector].name, trigger);
> + if (ret) {
> + kfree(irq->ctx[vector].name);
> + eventfd_ctx_put(trigger);
> + return ret;
> + }
> +
> + irq->ctx[vector].trigger = trigger;
> +
> + return 0;
> +}
> +
> +static int vfio_msi_set_block(struct vfio_platform_irq *irq, unsigned int start,
> + unsigned int count, int32_t *fds)
> +{
> + int i, j, ret = 0;
> +
> + if (start >= irq->num_ctx || start + count > irq->num_ctx)
> + return -EINVAL;
> +
> + for (i = 0, j = start; i < count && !ret; i++, j++) {
> + int fd = fds ? fds[i] : -1;
> +
> + ret = vfio_msi_set_vector_signal(irq, j, fd);
> + }
> +
> + if (ret) {
> + for (--j; j >= (int)start; j--)
> + vfio_msi_set_vector_signal(irq, j, -1);
> + }
> +
> + return ret;
> +}
> +
> +static void vfio_msi_disable(struct vfio_platform_device *vdev,
> + struct vfio_platform_irq *irq)
> +{
> + struct device *dev = vdev->device;
> +
> + vfio_msi_set_block(irq, 0, irq->num_ctx, NULL);
> +
> + platform_msi_domain_free_irqs(dev);
> +
> + irq->config_msi = 0;
> + irq->num_ctx = 0;
> +
> + kfree(irq->ctx);
> +}
> +
> +static int vfio_set_msi_trigger(struct vfio_platform_device *vdev,
> + unsigned int index, unsigned int start,
> + unsigned int count, uint32_t flags, void *data)
> +{
> + int i;
> + struct vfio_platform_irq *irq = &vdev->irqs[index];
> +
> + if (start + count > irq->count)
> + return -EINVAL;
> +
> + if (!count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
> + vfio_msi_disable(vdev, irq);
> + return 0;
> + }
> +
> + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
> + s32 *fds = data;
> + int ret;
> +
> + if (irq->config_msi)
> + return vfio_msi_set_block(irq, start, count,
> + fds);
> + ret = vfio_msi_enable(vdev, irq, start + count);
> + if (ret)
> + return ret;
> +
> + ret = vfio_msi_set_block(irq, start, count, fds);
> + if (ret)
> + vfio_msi_disable(vdev, irq);
> +
> + return ret;
> + }
> +
> + for (i = start; i < start + count; i++) {
> + if (!irq->ctx[i].trigger)
> + continue;
> + if (flags & VFIO_IRQ_SET_DATA_NONE) {
> + eventfd_signal(irq->ctx[i].trigger, 1);
> + } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
> + u8 *bools = data;
> +
> + if (bools[i - start])
> + eventfd_signal(irq->ctx[i].trigger, 1);
> + }
> + }
> +
> + return 0;
> +}
> +
> int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
> uint32_t flags, unsigned index, unsigned start,
> unsigned count, void *data)
> @@ -261,16 +452,29 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
> unsigned start, unsigned count, uint32_t flags,
> void *data) = NULL;
>
> - switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> - case VFIO_IRQ_SET_ACTION_MASK:
> - func = vfio_platform_set_irq_mask;
> - break;
> - case VFIO_IRQ_SET_ACTION_UNMASK:
> - func = vfio_platform_set_irq_unmask;
> - break;
> - case VFIO_IRQ_SET_ACTION_TRIGGER:
> - func = vfio_platform_set_irq_trigger;
> - break;
> + struct vfio_platform_irq *irq = &vdev->irqs[index];
> +
> + if (irq->type == VFIO_IRQ_TYPE_MSI) {
> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> + case VFIO_IRQ_SET_ACTION_MASK:
> + case VFIO_IRQ_SET_ACTION_UNMASK:
> + break;
> + case VFIO_IRQ_SET_ACTION_TRIGGER:
> + func = vfio_set_msi_trigger;
> + break;
> + }
> + } else {
> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> + case VFIO_IRQ_SET_ACTION_MASK:
> + func = vfio_platform_set_irq_mask;
> + break;
> + case VFIO_IRQ_SET_ACTION_UNMASK:
> + func = vfio_platform_set_irq_unmask;
> + break;
> + case VFIO_IRQ_SET_ACTION_TRIGGER:
> + func = vfio_platform_set_irq_trigger;
> + break;
> + }
> }
>
> if (!func)
> @@ -281,12 +485,21 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
>
> int vfio_platform_irq_init(struct vfio_platform_device *vdev)
> {
> - int cnt = 0, i;
> + int i;
> + int cnt = 0;
> + int num_irqs;
> + struct device *dev = vdev->device;
>
> while (vdev->get_irq(vdev, cnt) >= 0)
> cnt++;
>
> - vdev->irqs = kcalloc(cnt, sizeof(struct vfio_platform_irq), GFP_KERNEL);
> + num_irqs = cnt;
> +
> + if (dev->msi_domain)
> + num_irqs++;
> +
> + vdev->irqs = kcalloc(num_irqs, sizeof(struct vfio_platform_irq),
> + GFP_KERNEL);
> if (!vdev->irqs)
> return -ENOMEM;
>
> @@ -309,7 +522,19 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
> vdev->irqs[i].masked = false;
> }
>
> - vdev->num_irqs = cnt;
> + /*
> + * MSI block is added at last index and its an ext irq
it is
> + */
> + if (dev->msi_domain) {
> + vdev->irqs[i].flags = VFIO_IRQ_INFO_EVENTFD;
> + vdev->irqs[i].count = NR_IRQS;
why NR_IRQS?
> + vdev->irqs[i].hwirq = 0;
> + vdev->irqs[i].masked = false;
> + vdev->irqs[i].type = VFIO_IRQ_TYPE_MSI;
> + vdev->num_ext_irqs = 1;
> + }
> +
> + vdev->num_irqs = num_irqs;
>
> return 0;
> err:
> @@ -321,8 +546,13 @@ void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
> {
> int i;
>
> - for (i = 0; i < vdev->num_irqs; i++)
> - vfio_set_trigger(vdev, i, -1, NULL);
> + for (i = 0; i < vdev->num_irqs; i++) {
> + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
> + vfio_set_msi_trigger(vdev, i, 0, 0,
> + VFIO_IRQ_SET_DATA_NONE, NULL);
> + else
> + vfio_set_trigger(vdev, i, -1, NULL);
> + }
>
> vdev->num_irqs = 0;
> kfree(vdev->irqs);
> diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
> index 289089910643..7bbb05988705 100644
> --- a/drivers/vfio/platform/vfio_platform_private.h
> +++ b/drivers/vfio/platform/vfio_platform_private.h
> @@ -9,6 +9,7 @@
>
> #include <linux/types.h>
> #include <linux/interrupt.h>
> +#include <linux/msi.h>
>
> #define VFIO_PLATFORM_OFFSET_SHIFT 40
> #define VFIO_PLATFORM_OFFSET_MASK (((u64)(1) << VFIO_PLATFORM_OFFSET_SHIFT) - 1)
> @@ -19,9 +20,18 @@
> #define VFIO_PLATFORM_INDEX_TO_OFFSET(index) \
> ((u64)(index) << VFIO_PLATFORM_OFFSET_SHIFT)
>
> +struct vfio_irq_ctx {
> + int hwirq;
> + char *name;
> + struct msi_msg msg;
> + struct eventfd_ctx *trigger;
> +};
> +
> struct vfio_platform_irq {
> u32 flags;
> u32 count;
> + int num_ctx;
> + struct vfio_irq_ctx *ctx;
> int hwirq;
> char *name;
> struct eventfd_ctx *trigger;
> @@ -29,6 +39,11 @@ struct vfio_platform_irq {
> spinlock_t lock;
> struct virqfd *unmask;
> struct virqfd *mask;
> +
> + /* for extended irqs */
> + u32 type;
> + u32 subtype;
> + int config_msi;
> };
>
> struct vfio_platform_region {
> @@ -46,6 +61,7 @@ struct vfio_platform_device {
> u32 num_regions;
> struct vfio_platform_irq *irqs;
> u32 num_irqs;
> + int num_ext_irqs;
> int refcnt;
> struct mutex igate;
> struct module *parent_module;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 2f313a238a8f..598d1c944283 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -697,11 +697,54 @@ struct vfio_irq_info {
> #define VFIO_IRQ_INFO_MASKABLE (1 << 1)
> #define VFIO_IRQ_INFO_AUTOMASKED (1 << 2)
> #define VFIO_IRQ_INFO_NORESIZE (1 << 3)
> +#define VFIO_IRQ_INFO_FLAG_CAPS (1 << 4) /* Info supports caps */
> __u32 index; /* IRQ index */
> __u32 count; /* Number of IRQs within this index */
> + __u32 cap_offset; /* Offset within info struct of first cap */
> };
> #define VFIO_DEVICE_GET_IRQ_INFO _IO(VFIO_TYPE, VFIO_BASE + 9)
>
> +/*
> + * The irq type capability allows IRQs unique to a specific device or
> + * class of devices to be exposed.
> + *
> + * The structures below define version 1 of this capability.
> + */
> +#define VFIO_IRQ_INFO_CAP_TYPE 3
> +
> +struct vfio_irq_info_cap_type {
> + struct vfio_info_cap_header header;
> + __u32 type; /* global per bus driver */
> + __u32 subtype; /* type specific */
> +};
> +
> +/*
> + * List of IRQ types, global per bus driver.
> + * If you introduce a new type, please add it here.
> + */
> +
> +/* Non PCI devices having MSI(s) support */
> +#define VFIO_IRQ_TYPE_MSI (1)
> +
> +/*
> + * The msi capability allows the user to use the msi msg to
> + * configure the iova for the msi configuration.
> + * The structures below define version 1 of this capability.
> + */
> +#define VFIO_IRQ_INFO_CAP_MSI_DESCS 4
> +
> +struct vfio_irq_msi_msg {
> + __u32 addr_lo;
> + __u32 addr_hi;
> + __u32 data;
> +};
> +
> +struct vfio_irq_info_cap_msi {
> + struct vfio_info_cap_header header;
> + __u32 nr_msgs;
I think you should align a __u32 reserved field to have a 64b alignment
> + struct vfio_irq_msi_msg msgs[];
Please can you clarify why this cap is needed versus your prior approach.
> +};
> +
> /**
> * VFIO_DEVICE_SET_IRQS - _IOW(VFIO_TYPE, VFIO_BASE + 10, struct vfio_irq_set)
> *
>
Thanks

Eric

2020-12-02 14:49:32

by Eric Auger

[permalink] [raw]
Subject: Re: [RFC, v2 0/1] msi support for platform devices

Hi Vikas,
On 11/24/20 5:16 PM, Vikas Gupta wrote:
> This RFC adds support for MSI for platform devices.
> MSI block is added as an ext irq along with the existing
> wired interrupt implementation.
>
> Changes from:
> -------------
> v1 to v2:
> 1) IRQ allocation has been implemented as below:
> ----------------------------
> |IRQ-0|IRQ-1|....|IRQ-n|MSI|
> ----------------------------
> MSI block has msi contexts and its implemneted
it is implemented
> as ext irq.
>
> 2) Removed vendor specific module for msi handling so
> previously patch2 and patch3 are not required.
>
> 3) MSI related data is exported to userspace using 'caps'.
> Please note VFIO_IRQ_INFO_CAP_TYPE in include/uapi/linux/vfio.h implementation
> is taken from the Eric`s patch
> https://patchwork.kernel.org/project/kvm/patch/[email protected]/
So do you mean that by exposing the vectors, now you do not need the msi
module anymore?


Thanks

Eric
>
>
> v0 to v1:
> i) Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
> ii) Add MSI(s) at the end of the irq list of platform IRQs.
> MSI(s) with first entry of MSI block has count and flag
> information.
> IRQ list: Allocation for IRQs + MSIs are allocated as below
> Example: if there are 'n' IRQs and 'k' MSIs
> -------------------------------------------------------
> |IRQ-0|IRQ-1|....|IRQ-n|MSI-0|MSI-1|MSI-2|......|MSI-k|
> -------------------------------------------------------
> MSI-0 will have count=k set and flags set accordingly.
>
> Vikas Gupta (1):
> vfio/platform: add support for msi
>
> drivers/vfio/platform/vfio_platform_common.c | 99 ++++++-
> drivers/vfio/platform/vfio_platform_irq.c | 260 +++++++++++++++++-
> drivers/vfio/platform/vfio_platform_private.h | 16 ++
> include/uapi/linux/vfio.h | 43 +++
> 4 files changed, 401 insertions(+), 17 deletions(-)
>

2020-12-03 14:41:59

by Vikas Gupta

[permalink] [raw]
Subject: Re: [RFC, v2 0/1] msi support for platform devices

HI Eric,

On Wed, Dec 2, 2020 at 8:13 PM Auger Eric <[email protected]> wrote:
>
> Hi Vikas,
> On 11/24/20 5:16 PM, Vikas Gupta wrote:
> > This RFC adds support for MSI for platform devices.
> > MSI block is added as an ext irq along with the existing
> > wired interrupt implementation.
> >
> > Changes from:
> > -------------
> > v1 to v2:
> > 1) IRQ allocation has been implemented as below:
> > ----------------------------
> > |IRQ-0|IRQ-1|....|IRQ-n|MSI|
> > ----------------------------
> > MSI block has msi contexts and its implemneted
> it is implemented
> > as ext irq.
> >
> > 2) Removed vendor specific module for msi handling so
> > previously patch2 and patch3 are not required.
> >
> > 3) MSI related data is exported to userspace using 'caps'.
> > Please note VFIO_IRQ_INFO_CAP_TYPE in include/uapi/linux/vfio.h implementation
> > is taken from the Eric`s patch
> > https://patchwork.kernel.org/project/kvm/patch/[email protected]/
> So do you mean that by exposing the vectors, now you do not need the msi
> module anymore?
Yes, with the support of caps we can expose the MSI related data to
userspace and userspace can configure the device, which previous
patches were doing in the kernel module.

Thanks,
Vikas
>
>
> Thanks
>
> Eric
> >
> >
> > v0 to v1:
> > i) Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
> > ii) Add MSI(s) at the end of the irq list of platform IRQs.
> > MSI(s) with first entry of MSI block has count and flag
> > information.
> > IRQ list: Allocation for IRQs + MSIs are allocated as below
> > Example: if there are 'n' IRQs and 'k' MSIs
> > -------------------------------------------------------
> > |IRQ-0|IRQ-1|....|IRQ-n|MSI-0|MSI-1|MSI-2|......|MSI-k|
> > -------------------------------------------------------
> > MSI-0 will have count=k set and flags set accordingly.
> >
> > Vikas Gupta (1):
> > vfio/platform: add support for msi
> >
> > drivers/vfio/platform/vfio_platform_common.c | 99 ++++++-
> > drivers/vfio/platform/vfio_platform_irq.c | 260 +++++++++++++++++-
> > drivers/vfio/platform/vfio_platform_private.h | 16 ++
> > include/uapi/linux/vfio.h | 43 +++
> > 4 files changed, 401 insertions(+), 17 deletions(-)
> >
>


Attachments:
smime.p7s (4.07 kB)
S/MIME Cryptographic Signature

2020-12-03 14:56:35

by Vikas Gupta

[permalink] [raw]
Subject: Re: [RFC v2 1/1] vfio/platform: add support for msi

Hi Eric,

On Wed, Dec 2, 2020 at 8:14 PM Auger Eric <[email protected]> wrote:
>
> Hi Vikas,
>
> On 11/24/20 5:16 PM, Vikas Gupta wrote:
> > MSI support for platform devices.
> >
> > Signed-off-by: Vikas Gupta <[email protected]>
> > ---
> > drivers/vfio/platform/vfio_platform_common.c | 99 ++++++-
> > drivers/vfio/platform/vfio_platform_irq.c | 260 +++++++++++++++++-
> > drivers/vfio/platform/vfio_platform_private.h | 16 ++
> > include/uapi/linux/vfio.h | 43 +++
> > 4 files changed, 401 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c
> > index c0771a9567fb..b0bfc0f4ee1f 100644
> > --- a/drivers/vfio/platform/vfio_platform_common.c
> > +++ b/drivers/vfio/platform/vfio_platform_common.c
> > @@ -16,6 +16,7 @@
> > #include <linux/types.h>
> > #include <linux/uaccess.h>
> > #include <linux/vfio.h>
> > +#include <linux/nospec.h>
> >
> > #include "vfio_platform_private.h"
> >
> > @@ -344,9 +345,16 @@ static long vfio_platform_ioctl(void *device_data,
> >
> > } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
> > struct vfio_irq_info info;
> > + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> > + struct vfio_irq_info_cap_msi *msi_info = NULL;
> > + unsigned long capsz;
> > + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
> >
> > minsz = offsetofend(struct vfio_irq_info, count);
> >
> > + /* For backward compatibility, cannot require this */
> > + capsz = offsetofend(struct vfio_irq_info, cap_offset);
> > +
> > if (copy_from_user(&info, (void __user *)arg, minsz))
> > return -EFAULT;
> >
> > @@ -356,9 +364,89 @@ static long vfio_platform_ioctl(void *device_data,
> > if (info.index >= vdev->num_irqs)
> > return -EINVAL;
> >
> > + if (info.argsz >= capsz)
> > + minsz = capsz;
> > +
> > + if (info.index == ext_irq_index) {
> nit: n case we add new ext indices afterwards, I would check info.index
> - ext_irq_index against an VFIO_EXT_IRQ_MSI define.
> > + struct vfio_irq_info_cap_type cap_type = {
> > + .header.id = VFIO_IRQ_INFO_CAP_TYPE,
> > + .header.version = 1 };
> > + int i;
> > + int ret;
> > + int num_msgs;
> > + size_t msi_info_size;
> > + struct vfio_platform_irq *irq;
> nit: I think generally the opposite order (length) is chosen. This also
> would better match the existing style in this file
Ok will fix it
> > +
> > + info.index = array_index_nospec(info.index,
> > + vdev->num_irqs);
> > +
> > + irq = &vdev->irqs[info.index];
> > +
> > + info.flags = irq->flags;
> I think this can be removed given [*]
Sure.
> > + cap_type.type = irq->type;
> > + cap_type.subtype = irq->subtype;
> > +
> > + ret = vfio_info_add_capability(&caps,
> > + &cap_type.header,
> > + sizeof(cap_type));
> > + if (ret)
> > + return ret;
> > +
> > + num_msgs = irq->num_ctx;
> do would want to return the cap even if !num_ctx?
If num_ctx = 0 then VFIO_IRQ_INFO_CAP_MSI_DESCS should not be written.
I`ll take care of same.
> > +
> > + msi_info_size = struct_size(msi_info, msgs, num_msgs);
> > +
> > + msi_info = kzalloc(msi_info_size, GFP_KERNEL);
> > + if (!msi_info) {
> > + kfree(caps.buf);
> > + return -ENOMEM;
> > + }
> > +
> > + msi_info->header.id = VFIO_IRQ_INFO_CAP_MSI_DESCS;
> > + msi_info->header.version = 1;
> > + msi_info->nr_msgs = num_msgs;
> > +
> > + for (i = 0; i < num_msgs; i++) {
> > + struct vfio_irq_ctx *ctx = &irq->ctx[i];
> > +
> > + msi_info->msgs[i].addr_lo = ctx->msg.address_lo;
> > + msi_info->msgs[i].addr_hi = ctx->msg.address_hi;
> > + msi_info->msgs[i].data = ctx->msg.data;
> > + }
> > +
> > + ret = vfio_info_add_capability(&caps, &msi_info->header,
> > + msi_info_size);
> > + if (ret) {
> > + kfree(msi_info);
> > + kfree(caps.buf);
> > + return ret;
> > + }
> > + }
> > +
> > info.flags = vdev->irqs[info.index].flags;
> [*]
Will fix it.
> > info.count = vdev->irqs[info.index].count;
> >
> > + if (caps.size) {
> > + info.flags |= VFIO_IRQ_INFO_FLAG_CAPS;
> > + if (info.argsz < sizeof(info) + caps.size) {
> > + info.argsz = sizeof(info) + caps.size;
> > + info.cap_offset = 0;
> > + } else {
> > + vfio_info_cap_shift(&caps, sizeof(info));
> > + if (copy_to_user((void __user *)arg +
> > + sizeof(info), caps.buf,
> > + caps.size)) {
> > + kfree(msi_info);
> > + kfree(caps.buf);
> > + return -EFAULT;
> > + }
> > + info.cap_offset = sizeof(info);
> > + }
> > +
> > + kfree(msi_info);
> > + kfree(caps.buf);
> > + }
> > +
> > return copy_to_user((void __user *)arg, &info, minsz) ?
> > -EFAULT : 0;
> >
> > @@ -366,6 +454,7 @@ static long vfio_platform_ioctl(void *device_data,
> > struct vfio_irq_set hdr;
> > u8 *data = NULL;
> > int ret = 0;
> > + int max;
> > size_t data_size = 0;
> >
> > minsz = offsetofend(struct vfio_irq_set, count);
> > @@ -373,8 +462,14 @@ static long vfio_platform_ioctl(void *device_data,
> > if (copy_from_user(&hdr, (void __user *)arg, minsz))
> > return -EFAULT;
> >
> > - ret = vfio_set_irqs_validate_and_prepare(&hdr, vdev->num_irqs,
> > - vdev->num_irqs, &data_size);
> > + if (hdr.index >= vdev->num_irqs)
> > + return -EINVAL;
> > +
> > + max = vdev->irqs[hdr.index].count;
> > +
> > + ret = vfio_set_irqs_validate_and_prepare(&hdr, max,
> > + vdev->num_irqs,
> > + &data_size);
> > if (ret)
> > return ret;
> >
> > diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> > index c5b09ec0a3c9..4066223e5b2e 100644
> > --- a/drivers/vfio/platform/vfio_platform_irq.c
> > +++ b/drivers/vfio/platform/vfio_platform_irq.c
> > @@ -8,10 +8,12 @@
> >
> > #include <linux/eventfd.h>
> > #include <linux/interrupt.h>
> > +#include <linux/eventfd.h>
> > #include <linux/slab.h>
> > #include <linux/types.h>
> > #include <linux/vfio.h>
> > #include <linux/irq.h>
> > +#include <linux/msi.h>
> >
> > #include "vfio_platform_private.h"
> >
> > @@ -253,6 +255,195 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
> > return 0;
> > }
> >
> > +/* MSI/MSIX */
> > +static irqreturn_t vfio_msihandler(int irq, void *arg)
> > +{
> > + struct eventfd_ctx *trigger = arg;
> > +
> > + eventfd_signal(trigger, 1);
> > + return IRQ_HANDLED;
> > +}
> > +
> > +static void msi_write(struct msi_desc *desc, struct msi_msg *msg)
> > +{
> > + int i;
> > + struct vfio_platform_irq *irq;
> > + u16 index = desc->platform.msi_index;
> > + struct device *dev = msi_desc_to_dev(desc);
> > + struct vfio_device *device = dev_get_drvdata(dev);
> > + struct vfio_platform_device *vdev = (struct vfio_platform_device *)
> > + vfio_device_data(device);
> > +
> > + for (i = 0; i < vdev->num_irqs; i++)
> > + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
> > + irq = &vdev->irqs[i];
> > +
> > + irq->ctx[index].msg.address_lo = msg->address_lo;
> > + irq->ctx[index].msg.address_hi = msg->address_hi;
> > + irq->ctx[index].msg.data = msg->data;
> > +}
> > +
> > +static int vfio_msi_enable(struct vfio_platform_device *vdev,
> > + struct vfio_platform_irq *irq, int nvec)
> > +{
> > + int ret;
> > + int msi_idx = 0;
> > + struct msi_desc *desc;
> > + struct device *dev = vdev->device;
> > +
> > + irq->ctx = kcalloc(nvec, sizeof(struct vfio_irq_ctx), GFP_KERNEL);
> > + if (!irq->ctx)
> > + return -ENOMEM;
> > +
> > + /* Allocate platform MSIs */
> > + ret = platform_msi_domain_alloc_irqs(dev, nvec, msi_write);
> > + if (ret < 0) {
> > + kfree(irq->ctx);
> > + return ret;
> > + }
> > +
> > + for_each_msi_entry(desc, dev) {
> > + irq->ctx[msi_idx].hwirq = desc->irq;
> > + msi_idx++;
> > + }
> > +
> > + irq->num_ctx = nvec;
> > + irq->config_msi = 1;
> > +
> > + return 0;
> > +}
> > +
> > +static int vfio_msi_set_vector_signal(struct vfio_platform_irq *irq,
> > + int vector, int fd)
> > +{
> > + struct eventfd_ctx *trigger;
> > + int irq_num, ret;
> > +
> > + if (vector < 0 || vector >= irq->num_ctx)
> > + return -EINVAL;
> > +
> > + irq_num = irq->ctx[vector].hwirq;
> > +
> > + if (irq->ctx[vector].trigger) {
> > + free_irq(irq_num, irq->ctx[vector].trigger);
> > + kfree(irq->ctx[vector].name);
> > + eventfd_ctx_put(irq->ctx[vector].trigger);
> > + irq->ctx[vector].trigger = NULL;
> > + }
> > +
> > + if (fd < 0)
> > + return 0;
> > +
> > + irq->ctx[vector].name = kasprintf(GFP_KERNEL,
> > + "vfio-msi[%d]", vector);
> > + if (!irq->ctx[vector].name)
> > + return -ENOMEM;
> > +
> > + trigger = eventfd_ctx_fdget(fd);
> > + if (IS_ERR(trigger)) {
> > + kfree(irq->ctx[vector].name);
> > + return PTR_ERR(trigger);
> > + }
> > +
> > + ret = request_irq(irq_num, vfio_msihandler, 0,
> > + irq->ctx[vector].name, trigger);
> > + if (ret) {
> > + kfree(irq->ctx[vector].name);
> > + eventfd_ctx_put(trigger);
> > + return ret;
> > + }
> > +
> > + irq->ctx[vector].trigger = trigger;
> > +
> > + return 0;
> > +}
> > +
> > +static int vfio_msi_set_block(struct vfio_platform_irq *irq, unsigned int start,
> > + unsigned int count, int32_t *fds)
> > +{
> > + int i, j, ret = 0;
> > +
> > + if (start >= irq->num_ctx || start + count > irq->num_ctx)
> > + return -EINVAL;
> > +
> > + for (i = 0, j = start; i < count && !ret; i++, j++) {
> > + int fd = fds ? fds[i] : -1;
> > +
> > + ret = vfio_msi_set_vector_signal(irq, j, fd);
> > + }
> > +
> > + if (ret) {
> > + for (--j; j >= (int)start; j--)
> > + vfio_msi_set_vector_signal(irq, j, -1);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +static void vfio_msi_disable(struct vfio_platform_device *vdev,
> > + struct vfio_platform_irq *irq)
> > +{
> > + struct device *dev = vdev->device;
> > +
> > + vfio_msi_set_block(irq, 0, irq->num_ctx, NULL);
> > +
> > + platform_msi_domain_free_irqs(dev);
> > +
> > + irq->config_msi = 0;
> > + irq->num_ctx = 0;
> > +
> > + kfree(irq->ctx);
> > +}
> > +
> > +static int vfio_set_msi_trigger(struct vfio_platform_device *vdev,
> > + unsigned int index, unsigned int start,
> > + unsigned int count, uint32_t flags, void *data)
> > +{
> > + int i;
> > + struct vfio_platform_irq *irq = &vdev->irqs[index];
> > +
> > + if (start + count > irq->count)
> > + return -EINVAL;
> > +
> > + if (!count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
> > + vfio_msi_disable(vdev, irq);
> > + return 0;
> > + }
> > +
> > + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
> > + s32 *fds = data;
> > + int ret;
> > +
> > + if (irq->config_msi)
> > + return vfio_msi_set_block(irq, start, count,
> > + fds);
> > + ret = vfio_msi_enable(vdev, irq, start + count);
> > + if (ret)
> > + return ret;
> > +
> > + ret = vfio_msi_set_block(irq, start, count, fds);
> > + if (ret)
> > + vfio_msi_disable(vdev, irq);
> > +
> > + return ret;
> > + }
> > +
> > + for (i = start; i < start + count; i++) {
> > + if (!irq->ctx[i].trigger)
> > + continue;
> > + if (flags & VFIO_IRQ_SET_DATA_NONE) {
> > + eventfd_signal(irq->ctx[i].trigger, 1);
> > + } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
> > + u8 *bools = data;
> > +
> > + if (bools[i - start])
> > + eventfd_signal(irq->ctx[i].trigger, 1);
> > + }
> > + }
> > +
> > + return 0;
> > +}
> > +
> > int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
> > uint32_t flags, unsigned index, unsigned start,
> > unsigned count, void *data)
> > @@ -261,16 +452,29 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
> > unsigned start, unsigned count, uint32_t flags,
> > void *data) = NULL;
> >
> > - switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> > - case VFIO_IRQ_SET_ACTION_MASK:
> > - func = vfio_platform_set_irq_mask;
> > - break;
> > - case VFIO_IRQ_SET_ACTION_UNMASK:
> > - func = vfio_platform_set_irq_unmask;
> > - break;
> > - case VFIO_IRQ_SET_ACTION_TRIGGER:
> > - func = vfio_platform_set_irq_trigger;
> > - break;
> > + struct vfio_platform_irq *irq = &vdev->irqs[index];
> > +
> > + if (irq->type == VFIO_IRQ_TYPE_MSI) {
> > + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> > + case VFIO_IRQ_SET_ACTION_MASK:
> > + case VFIO_IRQ_SET_ACTION_UNMASK:
> > + break;
> > + case VFIO_IRQ_SET_ACTION_TRIGGER:
> > + func = vfio_set_msi_trigger;
> > + break;
> > + }
> > + } else {
> > + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> > + case VFIO_IRQ_SET_ACTION_MASK:
> > + func = vfio_platform_set_irq_mask;
> > + break;
> > + case VFIO_IRQ_SET_ACTION_UNMASK:
> > + func = vfio_platform_set_irq_unmask;
> > + break;
> > + case VFIO_IRQ_SET_ACTION_TRIGGER:
> > + func = vfio_platform_set_irq_trigger;
> > + break;
> > + }
> > }
> >
> > if (!func)
> > @@ -281,12 +485,21 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
> >
> > int vfio_platform_irq_init(struct vfio_platform_device *vdev)
> > {
> > - int cnt = 0, i;
> > + int i;
> > + int cnt = 0;
> > + int num_irqs;
> > + struct device *dev = vdev->device;
> >
> > while (vdev->get_irq(vdev, cnt) >= 0)
> > cnt++;
> >
> > - vdev->irqs = kcalloc(cnt, sizeof(struct vfio_platform_irq), GFP_KERNEL);
> > + num_irqs = cnt;
> > +
> > + if (dev->msi_domain)
> > + num_irqs++;
> > +
> > + vdev->irqs = kcalloc(num_irqs, sizeof(struct vfio_platform_irq),
> > + GFP_KERNEL);
> > if (!vdev->irqs)
> > return -ENOMEM;
> >
> > @@ -309,7 +522,19 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
> > vdev->irqs[i].masked = false;
> > }
> >
> > - vdev->num_irqs = cnt;
> > + /*
> > + * MSI block is added at last index and its an ext irq
> it is
> > + */
> > + if (dev->msi_domain) {
> > + vdev->irqs[i].flags = VFIO_IRQ_INFO_EVENTFD;
> > + vdev->irqs[i].count = NR_IRQS;
> why NR_IRQS?
Since different devices can have different numbers of MSI(s) so we
need to initialize with max possible values. Can you please suggest if
this value does not seem appropriate?
> > + vdev->irqs[i].hwirq = 0;
> > + vdev->irqs[i].masked = false;
> > + vdev->irqs[i].type = VFIO_IRQ_TYPE_MSI;
> > + vdev->num_ext_irqs = 1;
> > + }
> > +
> > + vdev->num_irqs = num_irqs;
> >
> > return 0;
> > err:
> > @@ -321,8 +546,13 @@ void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
> > {
> > int i;
> >
> > - for (i = 0; i < vdev->num_irqs; i++)
> > - vfio_set_trigger(vdev, i, -1, NULL);
> > + for (i = 0; i < vdev->num_irqs; i++) {
> > + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
> > + vfio_set_msi_trigger(vdev, i, 0, 0,
> > + VFIO_IRQ_SET_DATA_NONE, NULL);
> > + else
> > + vfio_set_trigger(vdev, i, -1, NULL);
> > + }
> >
> > vdev->num_irqs = 0;
> > kfree(vdev->irqs);
> > diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
> > index 289089910643..7bbb05988705 100644
> > --- a/drivers/vfio/platform/vfio_platform_private.h
> > +++ b/drivers/vfio/platform/vfio_platform_private.h
> > @@ -9,6 +9,7 @@
> >
> > #include <linux/types.h>
> > #include <linux/interrupt.h>
> > +#include <linux/msi.h>
> >
> > #define VFIO_PLATFORM_OFFSET_SHIFT 40
> > #define VFIO_PLATFORM_OFFSET_MASK (((u64)(1) << VFIO_PLATFORM_OFFSET_SHIFT) - 1)
> > @@ -19,9 +20,18 @@
> > #define VFIO_PLATFORM_INDEX_TO_OFFSET(index) \
> > ((u64)(index) << VFIO_PLATFORM_OFFSET_SHIFT)
> >
> > +struct vfio_irq_ctx {
> > + int hwirq;
> > + char *name;
> > + struct msi_msg msg;
> > + struct eventfd_ctx *trigger;
> > +};
> > +
> > struct vfio_platform_irq {
> > u32 flags;
> > u32 count;
> > + int num_ctx;
> > + struct vfio_irq_ctx *ctx;
> > int hwirq;
> > char *name;
> > struct eventfd_ctx *trigger;
> > @@ -29,6 +39,11 @@ struct vfio_platform_irq {
> > spinlock_t lock;
> > struct virqfd *unmask;
> > struct virqfd *mask;
> > +
> > + /* for extended irqs */
> > + u32 type;
> > + u32 subtype;
> > + int config_msi;
> > };
> >
> > struct vfio_platform_region {
> > @@ -46,6 +61,7 @@ struct vfio_platform_device {
> > u32 num_regions;
> > struct vfio_platform_irq *irqs;
> > u32 num_irqs;
> > + int num_ext_irqs;
> > int refcnt;
> > struct mutex igate;
> > struct module *parent_module;
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 2f313a238a8f..598d1c944283 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -697,11 +697,54 @@ struct vfio_irq_info {
> > #define VFIO_IRQ_INFO_MASKABLE (1 << 1)
> > #define VFIO_IRQ_INFO_AUTOMASKED (1 << 2)
> > #define VFIO_IRQ_INFO_NORESIZE (1 << 3)
> > +#define VFIO_IRQ_INFO_FLAG_CAPS (1 << 4) /* Info supports caps */
> > __u32 index; /* IRQ index */
> > __u32 count; /* Number of IRQs within this index */
> > + __u32 cap_offset; /* Offset within info struct of first cap */
> > };
> > #define VFIO_DEVICE_GET_IRQ_INFO _IO(VFIO_TYPE, VFIO_BASE + 9)
> >
> > +/*
> > + * The irq type capability allows IRQs unique to a specific device or
> > + * class of devices to be exposed.
> > + *
> > + * The structures below define version 1 of this capability.
> > + */
> > +#define VFIO_IRQ_INFO_CAP_TYPE 3
> > +
> > +struct vfio_irq_info_cap_type {
> > + struct vfio_info_cap_header header;
> > + __u32 type; /* global per bus driver */
> > + __u32 subtype; /* type specific */
> > +};
> > +
> > +/*
> > + * List of IRQ types, global per bus driver.
> > + * If you introduce a new type, please add it here.
> > + */
> > +
> > +/* Non PCI devices having MSI(s) support */
> > +#define VFIO_IRQ_TYPE_MSI (1)
> > +
> > +/*
> > + * The msi capability allows the user to use the msi msg to
> > + * configure the iova for the msi configuration.
> > + * The structures below define version 1 of this capability.
> > + */
> > +#define VFIO_IRQ_INFO_CAP_MSI_DESCS 4
> > +
> > +struct vfio_irq_msi_msg {
> > + __u32 addr_lo;
> > + __u32 addr_hi;
> > + __u32 data;
> > +};
> > +
> > +struct vfio_irq_info_cap_msi {
> > + struct vfio_info_cap_header header;
> > + __u32 nr_msgs;
> I think you should align a __u32 reserved field to have a 64b alignment
Sure.
> > + struct vfio_irq_msi_msg msgs[];
> Please can you clarify why this cap is needed versus your prior approach.
In the previous patch, the reset module was configuring the device
with MSI msg/data now since the module is not available, user space
needs to have this data.
With this approach userspace just needs the pairs <msg and ctx >
associated with the MSI(s) and it can choose to configure the MSI(s)
sources accordingly.
Let me know if this approach does not look appropriate.

Thanks,
Vikas
> > +};
> > +
> > /**
> > * VFIO_DEVICE_SET_IRQS - _IOW(VFIO_TYPE, VFIO_BASE + 10, struct vfio_irq_set)
> > *
> >
> Thanks
>
> Eric
>


Attachments:
smime.p7s (4.07 kB)
S/MIME Cryptographic Signature

2020-12-07 20:50:46

by Eric Auger

[permalink] [raw]
Subject: Re: [RFC v2 1/1] vfio/platform: add support for msi

Hi Vikas,

On 12/3/20 3:50 PM, Vikas Gupta wrote:
> Hi Eric,
>
> On Wed, Dec 2, 2020 at 8:14 PM Auger Eric <[email protected]> wrote:
>>
>> Hi Vikas,
>>
>> On 11/24/20 5:16 PM, Vikas Gupta wrote:
>>> MSI support for platform devices.
>>>
>>> Signed-off-by: Vikas Gupta <[email protected]>
>>> ---
>>> drivers/vfio/platform/vfio_platform_common.c | 99 ++++++-
>>> drivers/vfio/platform/vfio_platform_irq.c | 260 +++++++++++++++++-
>>> drivers/vfio/platform/vfio_platform_private.h | 16 ++
>>> include/uapi/linux/vfio.h | 43 +++
>>> 4 files changed, 401 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c
>>> index c0771a9567fb..b0bfc0f4ee1f 100644
>>> --- a/drivers/vfio/platform/vfio_platform_common.c
>>> +++ b/drivers/vfio/platform/vfio_platform_common.c
>>> @@ -16,6 +16,7 @@
>>> #include <linux/types.h>
>>> #include <linux/uaccess.h>
>>> #include <linux/vfio.h>
>>> +#include <linux/nospec.h>
>>>
>>> #include "vfio_platform_private.h"
>>>
>>> @@ -344,9 +345,16 @@ static long vfio_platform_ioctl(void *device_data,
>>>
>>> } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>>> struct vfio_irq_info info;
>>> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>>> + struct vfio_irq_info_cap_msi *msi_info = NULL;
>>> + unsigned long capsz;
>>> + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
>>>
>>> minsz = offsetofend(struct vfio_irq_info, count);
>>>
>>> + /* For backward compatibility, cannot require this */
>>> + capsz = offsetofend(struct vfio_irq_info, cap_offset);
>>> +
>>> if (copy_from_user(&info, (void __user *)arg, minsz))
>>> return -EFAULT;
>>>
>>> @@ -356,9 +364,89 @@ static long vfio_platform_ioctl(void *device_data,
>>> if (info.index >= vdev->num_irqs)
>>> return -EINVAL;
>>>
>>> + if (info.argsz >= capsz)
>>> + minsz = capsz;
>>> +
>>> + if (info.index == ext_irq_index) {
>> nit: n case we add new ext indices afterwards, I would check info.index
>> - ext_irq_index against an VFIO_EXT_IRQ_MSI define.
>>> + struct vfio_irq_info_cap_type cap_type = {
>>> + .header.id = VFIO_IRQ_INFO_CAP_TYPE,
>>> + .header.version = 1 };
>>> + int i;
>>> + int ret;
>>> + int num_msgs;
>>> + size_t msi_info_size;
>>> + struct vfio_platform_irq *irq;
>> nit: I think generally the opposite order (length) is chosen. This also
>> would better match the existing style in this file
> Ok will fix it
>>> +
>>> + info.index = array_index_nospec(info.index,
>>> + vdev->num_irqs);
>>> +
>>> + irq = &vdev->irqs[info.index];
>>> +
>>> + info.flags = irq->flags;
>> I think this can be removed given [*]
> Sure.
>>> + cap_type.type = irq->type;
>>> + cap_type.subtype = irq->subtype;
>>> +
>>> + ret = vfio_info_add_capability(&caps,
>>> + &cap_type.header,
>>> + sizeof(cap_type));
>>> + if (ret)
>>> + return ret;
>>> +
>>> + num_msgs = irq->num_ctx;
>> do would want to return the cap even if !num_ctx?
> If num_ctx = 0 then VFIO_IRQ_INFO_CAP_MSI_DESCS should not be written.
> I`ll take care of same.
>>> +
>>> + msi_info_size = struct_size(msi_info, msgs, num_msgs);
>>> +
>>> + msi_info = kzalloc(msi_info_size, GFP_KERNEL);
>>> + if (!msi_info) {
>>> + kfree(caps.buf);
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + msi_info->header.id = VFIO_IRQ_INFO_CAP_MSI_DESCS;
>>> + msi_info->header.version = 1;
>>> + msi_info->nr_msgs = num_msgs;
>>> +
>>> + for (i = 0; i < num_msgs; i++) {
>>> + struct vfio_irq_ctx *ctx = &irq->ctx[i];
>>> +
>>> + msi_info->msgs[i].addr_lo = ctx->msg.address_lo;
>>> + msi_info->msgs[i].addr_hi = ctx->msg.address_hi;
>>> + msi_info->msgs[i].data = ctx->msg.data;
>>> + }
>>> +
>>> + ret = vfio_info_add_capability(&caps, &msi_info->header,
>>> + msi_info_size);
>>> + if (ret) {
>>> + kfree(msi_info);
>>> + kfree(caps.buf);
>>> + return ret;
>>> + }
>>> + }
>>> +
>>> info.flags = vdev->irqs[info.index].flags;
>> [*]
> Will fix it.
>>> info.count = vdev->irqs[info.index].count;
>>>
>>> + if (caps.size) {
>>> + info.flags |= VFIO_IRQ_INFO_FLAG_CAPS;
>>> + if (info.argsz < sizeof(info) + caps.size) {
>>> + info.argsz = sizeof(info) + caps.size;
>>> + info.cap_offset = 0;
>>> + } else {
>>> + vfio_info_cap_shift(&caps, sizeof(info));
>>> + if (copy_to_user((void __user *)arg +
>>> + sizeof(info), caps.buf,
>>> + caps.size)) {
>>> + kfree(msi_info);
>>> + kfree(caps.buf);
>>> + return -EFAULT;
>>> + }
>>> + info.cap_offset = sizeof(info);
>>> + }
>>> +
>>> + kfree(msi_info);
>>> + kfree(caps.buf);
>>> + }
>>> +
>>> return copy_to_user((void __user *)arg, &info, minsz) ?
>>> -EFAULT : 0;
>>>
>>> @@ -366,6 +454,7 @@ static long vfio_platform_ioctl(void *device_data,
>>> struct vfio_irq_set hdr;
>>> u8 *data = NULL;
>>> int ret = 0;
>>> + int max;
>>> size_t data_size = 0;
>>>
>>> minsz = offsetofend(struct vfio_irq_set, count);
>>> @@ -373,8 +462,14 @@ static long vfio_platform_ioctl(void *device_data,
>>> if (copy_from_user(&hdr, (void __user *)arg, minsz))
>>> return -EFAULT;
>>>
>>> - ret = vfio_set_irqs_validate_and_prepare(&hdr, vdev->num_irqs,
>>> - vdev->num_irqs, &data_size);
>>> + if (hdr.index >= vdev->num_irqs)
>>> + return -EINVAL;
>>> +
>>> + max = vdev->irqs[hdr.index].count;
>>> +
>>> + ret = vfio_set_irqs_validate_and_prepare(&hdr, max,
>>> + vdev->num_irqs,
>>> + &data_size);
>>> if (ret)
>>> return ret;
>>>
>>> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
>>> index c5b09ec0a3c9..4066223e5b2e 100644
>>> --- a/drivers/vfio/platform/vfio_platform_irq.c
>>> +++ b/drivers/vfio/platform/vfio_platform_irq.c
>>> @@ -8,10 +8,12 @@
>>>
>>> #include <linux/eventfd.h>
>>> #include <linux/interrupt.h>
>>> +#include <linux/eventfd.h>
>>> #include <linux/slab.h>
>>> #include <linux/types.h>
>>> #include <linux/vfio.h>
>>> #include <linux/irq.h>
>>> +#include <linux/msi.h>
>>>
>>> #include "vfio_platform_private.h"
>>>
>>> @@ -253,6 +255,195 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
>>> return 0;
>>> }
>>>
>>> +/* MSI/MSIX */
>>> +static irqreturn_t vfio_msihandler(int irq, void *arg)
>>> +{
>>> + struct eventfd_ctx *trigger = arg;
>>> +
>>> + eventfd_signal(trigger, 1);
>>> + return IRQ_HANDLED;
>>> +}
>>> +
>>> +static void msi_write(struct msi_desc *desc, struct msi_msg *msg)
>>> +{
>>> + int i;
>>> + struct vfio_platform_irq *irq;
>>> + u16 index = desc->platform.msi_index;
>>> + struct device *dev = msi_desc_to_dev(desc);
>>> + struct vfio_device *device = dev_get_drvdata(dev);
>>> + struct vfio_platform_device *vdev = (struct vfio_platform_device *)
>>> + vfio_device_data(device);
>>> +
>>> + for (i = 0; i < vdev->num_irqs; i++)
>>> + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
>>> + irq = &vdev->irqs[i];
>>> +
>>> + irq->ctx[index].msg.address_lo = msg->address_lo;
>>> + irq->ctx[index].msg.address_hi = msg->address_hi;
>>> + irq->ctx[index].msg.data = msg->data;
>>> +}
>>> +
>>> +static int vfio_msi_enable(struct vfio_platform_device *vdev,
>>> + struct vfio_platform_irq *irq, int nvec)
>>> +{
>>> + int ret;
>>> + int msi_idx = 0;
>>> + struct msi_desc *desc;
>>> + struct device *dev = vdev->device;
>>> +
>>> + irq->ctx = kcalloc(nvec, sizeof(struct vfio_irq_ctx), GFP_KERNEL);
>>> + if (!irq->ctx)
>>> + return -ENOMEM;
>>> +
>>> + /* Allocate platform MSIs */
>>> + ret = platform_msi_domain_alloc_irqs(dev, nvec, msi_write);
>>> + if (ret < 0) {
>>> + kfree(irq->ctx);
>>> + return ret;
>>> + }
>>> +
>>> + for_each_msi_entry(desc, dev) {
>>> + irq->ctx[msi_idx].hwirq = desc->irq;
>>> + msi_idx++;
>>> + }
>>> +
>>> + irq->num_ctx = nvec;
>>> + irq->config_msi = 1;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int vfio_msi_set_vector_signal(struct vfio_platform_irq *irq,
>>> + int vector, int fd)
>>> +{
>>> + struct eventfd_ctx *trigger;
>>> + int irq_num, ret;
>>> +
>>> + if (vector < 0 || vector >= irq->num_ctx)
>>> + return -EINVAL;
>>> +
>>> + irq_num = irq->ctx[vector].hwirq;
>>> +
>>> + if (irq->ctx[vector].trigger) {
>>> + free_irq(irq_num, irq->ctx[vector].trigger);
>>> + kfree(irq->ctx[vector].name);
>>> + eventfd_ctx_put(irq->ctx[vector].trigger);
>>> + irq->ctx[vector].trigger = NULL;
>>> + }
>>> +
>>> + if (fd < 0)
>>> + return 0;
>>> +
>>> + irq->ctx[vector].name = kasprintf(GFP_KERNEL,
>>> + "vfio-msi[%d]", vector);
>>> + if (!irq->ctx[vector].name)
>>> + return -ENOMEM;
>>> +
>>> + trigger = eventfd_ctx_fdget(fd);
>>> + if (IS_ERR(trigger)) {
>>> + kfree(irq->ctx[vector].name);
>>> + return PTR_ERR(trigger);
>>> + }
>>> +
>>> + ret = request_irq(irq_num, vfio_msihandler, 0,
>>> + irq->ctx[vector].name, trigger);
>>> + if (ret) {
>>> + kfree(irq->ctx[vector].name);
>>> + eventfd_ctx_put(trigger);
>>> + return ret;
>>> + }
>>> +
>>> + irq->ctx[vector].trigger = trigger;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int vfio_msi_set_block(struct vfio_platform_irq *irq, unsigned int start,
>>> + unsigned int count, int32_t *fds)
>>> +{
>>> + int i, j, ret = 0;
>>> +
>>> + if (start >= irq->num_ctx || start + count > irq->num_ctx)
>>> + return -EINVAL;
>>> +
>>> + for (i = 0, j = start; i < count && !ret; i++, j++) {
>>> + int fd = fds ? fds[i] : -1;
>>> +
>>> + ret = vfio_msi_set_vector_signal(irq, j, fd);
>>> + }
>>> +
>>> + if (ret) {
>>> + for (--j; j >= (int)start; j--)
>>> + vfio_msi_set_vector_signal(irq, j, -1);
>>> + }
>>> +
>>> + return ret;
>>> +}
>>> +
>>> +static void vfio_msi_disable(struct vfio_platform_device *vdev,
>>> + struct vfio_platform_irq *irq)
>>> +{
>>> + struct device *dev = vdev->device;
>>> +
>>> + vfio_msi_set_block(irq, 0, irq->num_ctx, NULL);
>>> +
>>> + platform_msi_domain_free_irqs(dev);
>>> +
>>> + irq->config_msi = 0;
>>> + irq->num_ctx = 0;
>>> +
>>> + kfree(irq->ctx);
>>> +}
>>> +
>>> +static int vfio_set_msi_trigger(struct vfio_platform_device *vdev,
>>> + unsigned int index, unsigned int start,
>>> + unsigned int count, uint32_t flags, void *data)
>>> +{
>>> + int i;
>>> + struct vfio_platform_irq *irq = &vdev->irqs[index];
>>> +
>>> + if (start + count > irq->count)
>>> + return -EINVAL;
>>> +
>>> + if (!count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
>>> + vfio_msi_disable(vdev, irq);
>>> + return 0;
>>> + }
>>> +
>>> + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
>>> + s32 *fds = data;
>>> + int ret;
>>> +
>>> + if (irq->config_msi)
>>> + return vfio_msi_set_block(irq, start, count,
>>> + fds);
>>> + ret = vfio_msi_enable(vdev, irq, start + count);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + ret = vfio_msi_set_block(irq, start, count, fds);
>>> + if (ret)
>>> + vfio_msi_disable(vdev, irq);
>>> +
>>> + return ret;
>>> + }
>>> +
>>> + for (i = start; i < start + count; i++) {
>>> + if (!irq->ctx[i].trigger)
>>> + continue;
>>> + if (flags & VFIO_IRQ_SET_DATA_NONE) {
>>> + eventfd_signal(irq->ctx[i].trigger, 1);
>>> + } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
>>> + u8 *bools = data;
>>> +
>>> + if (bools[i - start])
>>> + eventfd_signal(irq->ctx[i].trigger, 1);
>>> + }
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
>>> uint32_t flags, unsigned index, unsigned start,
>>> unsigned count, void *data)
>>> @@ -261,16 +452,29 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
>>> unsigned start, unsigned count, uint32_t flags,
>>> void *data) = NULL;
>>>
>>> - switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>>> - case VFIO_IRQ_SET_ACTION_MASK:
>>> - func = vfio_platform_set_irq_mask;
>>> - break;
>>> - case VFIO_IRQ_SET_ACTION_UNMASK:
>>> - func = vfio_platform_set_irq_unmask;
>>> - break;
>>> - case VFIO_IRQ_SET_ACTION_TRIGGER:
>>> - func = vfio_platform_set_irq_trigger;
>>> - break;
>>> + struct vfio_platform_irq *irq = &vdev->irqs[index];
>>> +
>>> + if (irq->type == VFIO_IRQ_TYPE_MSI) {
>>> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>>> + case VFIO_IRQ_SET_ACTION_MASK:
>>> + case VFIO_IRQ_SET_ACTION_UNMASK:
>>> + break;
>>> + case VFIO_IRQ_SET_ACTION_TRIGGER:
>>> + func = vfio_set_msi_trigger;
>>> + break;
>>> + }
>>> + } else {
>>> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>>> + case VFIO_IRQ_SET_ACTION_MASK:
>>> + func = vfio_platform_set_irq_mask;
>>> + break;
>>> + case VFIO_IRQ_SET_ACTION_UNMASK:
>>> + func = vfio_platform_set_irq_unmask;
>>> + break;
>>> + case VFIO_IRQ_SET_ACTION_TRIGGER:
>>> + func = vfio_platform_set_irq_trigger;
>>> + break;
>>> + }
>>> }
>>>
>>> if (!func)
>>> @@ -281,12 +485,21 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
>>>
>>> int vfio_platform_irq_init(struct vfio_platform_device *vdev)
>>> {
>>> - int cnt = 0, i;
>>> + int i;
>>> + int cnt = 0;
>>> + int num_irqs;
>>> + struct device *dev = vdev->device;
>>>
>>> while (vdev->get_irq(vdev, cnt) >= 0)
>>> cnt++;
>>>
>>> - vdev->irqs = kcalloc(cnt, sizeof(struct vfio_platform_irq), GFP_KERNEL);
>>> + num_irqs = cnt;
>>> +
>>> + if (dev->msi_domain)
>>> + num_irqs++;
>>> +
>>> + vdev->irqs = kcalloc(num_irqs, sizeof(struct vfio_platform_irq),
>>> + GFP_KERNEL);
>>> if (!vdev->irqs)
>>> return -ENOMEM;
>>>
>>> @@ -309,7 +522,19 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
>>> vdev->irqs[i].masked = false;
>>> }
>>>
>>> - vdev->num_irqs = cnt;
>>> + /*
>>> + * MSI block is added at last index and its an ext irq
>> it is
>>> + */
>>> + if (dev->msi_domain) {
>>> + vdev->irqs[i].flags = VFIO_IRQ_INFO_EVENTFD;
>>> + vdev->irqs[i].count = NR_IRQS;
>> why NR_IRQS?
> Since different devices can have different numbers of MSI(s) so we
> need to initialize with max possible values. Can you please suggest if
> this value does not seem appropriate?
As opposed to PCIe, the userspace has no real way to guess how many
vectors can be set (what vfio_pci_get_irq_count does). This also means
we do not fully implement the original API as we are not able to report
an accurate value for .count. How will the user determine how many
vectors he can use?
>>> + vdev->irqs[i].hwirq = 0;
>>> + vdev->irqs[i].masked = false;
>>> + vdev->irqs[i].type = VFIO_IRQ_TYPE_MSI;
>>> + vdev->num_ext_irqs = 1;
>>> + }
>>> +
>>> + vdev->num_irqs = num_irqs;
>>>
>>> return 0;
>>> err:
>>> @@ -321,8 +546,13 @@ void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
>>> {
>>> int i;
>>>
>>> - for (i = 0; i < vdev->num_irqs; i++)
>>> - vfio_set_trigger(vdev, i, -1, NULL);
>>> + for (i = 0; i < vdev->num_irqs; i++) {
>>> + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
>>> + vfio_set_msi_trigger(vdev, i, 0, 0,
>>> + VFIO_IRQ_SET_DATA_NONE, NULL);
>>> + else
>>> + vfio_set_trigger(vdev, i, -1, NULL);
>>> + }
>>>
>>> vdev->num_irqs = 0;
>>> kfree(vdev->irqs);
>>> diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
>>> index 289089910643..7bbb05988705 100644
>>> --- a/drivers/vfio/platform/vfio_platform_private.h
>>> +++ b/drivers/vfio/platform/vfio_platform_private.h
>>> @@ -9,6 +9,7 @@
>>>
>>> #include <linux/types.h>
>>> #include <linux/interrupt.h>
>>> +#include <linux/msi.h>
>>>
>>> #define VFIO_PLATFORM_OFFSET_SHIFT 40
>>> #define VFIO_PLATFORM_OFFSET_MASK (((u64)(1) << VFIO_PLATFORM_OFFSET_SHIFT) - 1)
>>> @@ -19,9 +20,18 @@
>>> #define VFIO_PLATFORM_INDEX_TO_OFFSET(index) \
>>> ((u64)(index) << VFIO_PLATFORM_OFFSET_SHIFT)
>>>
>>> +struct vfio_irq_ctx {
>>> + int hwirq;
>>> + char *name;
>>> + struct msi_msg msg;
>>> + struct eventfd_ctx *trigger;
>>> +};
>>> +
>>> struct vfio_platform_irq {
>>> u32 flags;
>>> u32 count;
>>> + int num_ctx;
>>> + struct vfio_irq_ctx *ctx;
>>> int hwirq;
>>> char *name;
>>> struct eventfd_ctx *trigger;
>>> @@ -29,6 +39,11 @@ struct vfio_platform_irq {
>>> spinlock_t lock;
>>> struct virqfd *unmask;
>>> struct virqfd *mask;
>>> +
>>> + /* for extended irqs */
>>> + u32 type;
>>> + u32 subtype;
>>> + int config_msi;
>>> };
>>>
>>> struct vfio_platform_region {
>>> @@ -46,6 +61,7 @@ struct vfio_platform_device {
>>> u32 num_regions;
>>> struct vfio_platform_irq *irqs;
>>> u32 num_irqs;
>>> + int num_ext_irqs;
>>> int refcnt;
>>> struct mutex igate;
>>> struct module *parent_module;
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 2f313a238a8f..598d1c944283 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -697,11 +697,54 @@ struct vfio_irq_info {
>>> #define VFIO_IRQ_INFO_MASKABLE (1 << 1)
>>> #define VFIO_IRQ_INFO_AUTOMASKED (1 << 2)
>>> #define VFIO_IRQ_INFO_NORESIZE (1 << 3)
>>> +#define VFIO_IRQ_INFO_FLAG_CAPS (1 << 4) /* Info supports caps */
>>> __u32 index; /* IRQ index */
>>> __u32 count; /* Number of IRQs within this index */
>>> + __u32 cap_offset; /* Offset within info struct of first cap */
>>> };
>>> #define VFIO_DEVICE_GET_IRQ_INFO _IO(VFIO_TYPE, VFIO_BASE + 9)
>>>
>>> +/*
>>> + * The irq type capability allows IRQs unique to a specific device or
>>> + * class of devices to be exposed.
>>> + *
>>> + * The structures below define version 1 of this capability.
>>> + */
>>> +#define VFIO_IRQ_INFO_CAP_TYPE 3
>>> +
>>> +struct vfio_irq_info_cap_type {
>>> + struct vfio_info_cap_header header;
>>> + __u32 type; /* global per bus driver */
>>> + __u32 subtype; /* type specific */
>>> +};
>>> +
>>> +/*
>>> + * List of IRQ types, global per bus driver.
>>> + * If you introduce a new type, please add it here.
>>> + */
>>> +
>>> +/* Non PCI devices having MSI(s) support */
>>> +#define VFIO_IRQ_TYPE_MSI (1)
>>> +
>>> +/*
>>> + * The msi capability allows the user to use the msi msg to
>>> + * configure the iova for the msi configuration.
>>> + * The structures below define version 1 of this capability.
>>> + */
>>> +#define VFIO_IRQ_INFO_CAP_MSI_DESCS 4
>>> +
>>> +struct vfio_irq_msi_msg {
>>> + __u32 addr_lo;
>>> + __u32 addr_hi;
>>> + __u32 data;
>>> +};
>>> +
>>> +struct vfio_irq_info_cap_msi {
>>> + struct vfio_info_cap_header header;
>>> + __u32 nr_msgs;
>> I think you should align a __u32 reserved field to have a 64b alignment
> Sure.
>>> + struct vfio_irq_msi_msg msgs[];
>> Please can you clarify why this cap is needed versus your prior approach.
> In the previous patch, the reset module was configuring the device
> with MSI msg/data now since the module is not available, user space
> needs to have this data.
> With this approach userspace just needs the pairs <msg and ctx >
> associated with the MSI(s) and it can choose to configure the MSI(s)
> sources accordingly.
> Let me know if this approach does not look appropriate.
This comes to the question I asked in my previous email, ie. could you
give some more info about the expected MSI setup sequence? May be the
opportunity to enhance the commit message ;-)

Thanks

Eric
>
> Thanks,
> Vikas
>>> +};
>>> +
>>> /**
>>> * VFIO_DEVICE_SET_IRQS - _IOW(VFIO_TYPE, VFIO_BASE + 10, struct vfio_irq_set)
>>> *
>>>
>> Thanks
>>
>> Eric
>>

2020-12-10 07:46:52

by Vikas Gupta

[permalink] [raw]
Subject: Re: [RFC v2 1/1] vfio/platform: add support for msi

HI Eric,

On Tue, Dec 8, 2020 at 2:13 AM Auger Eric <[email protected]> wrote:
>
> Hi Vikas,
>
> On 12/3/20 3:50 PM, Vikas Gupta wrote:
> > Hi Eric,
> >
> > On Wed, Dec 2, 2020 at 8:14 PM Auger Eric <[email protected]> wrote:
> >>
> >> Hi Vikas,
> >>
> >> On 11/24/20 5:16 PM, Vikas Gupta wrote:
> >>> MSI support for platform devices.
> >>>
> >>> Signed-off-by: Vikas Gupta <[email protected]>
> >>> ---
> >>> drivers/vfio/platform/vfio_platform_common.c | 99 ++++++-
> >>> drivers/vfio/platform/vfio_platform_irq.c | 260 +++++++++++++++++-
> >>> drivers/vfio/platform/vfio_platform_private.h | 16 ++
> >>> include/uapi/linux/vfio.h | 43 +++
> >>> 4 files changed, 401 insertions(+), 17 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c
> >>> index c0771a9567fb..b0bfc0f4ee1f 100644
> >>> --- a/drivers/vfio/platform/vfio_platform_common.c
> >>> +++ b/drivers/vfio/platform/vfio_platform_common.c
> >>> @@ -16,6 +16,7 @@
> >>> #include <linux/types.h>
> >>> #include <linux/uaccess.h>
> >>> #include <linux/vfio.h>
> >>> +#include <linux/nospec.h>
> >>>
> >>> #include "vfio_platform_private.h"
> >>>
> >>> @@ -344,9 +345,16 @@ static long vfio_platform_ioctl(void *device_data,
> >>>
> >>> } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
> >>> struct vfio_irq_info info;
> >>> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> >>> + struct vfio_irq_info_cap_msi *msi_info = NULL;
> >>> + unsigned long capsz;
> >>> + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
> >>>
> >>> minsz = offsetofend(struct vfio_irq_info, count);
> >>>
> >>> + /* For backward compatibility, cannot require this */
> >>> + capsz = offsetofend(struct vfio_irq_info, cap_offset);
> >>> +
> >>> if (copy_from_user(&info, (void __user *)arg, minsz))
> >>> return -EFAULT;
> >>>
> >>> @@ -356,9 +364,89 @@ static long vfio_platform_ioctl(void *device_data,
> >>> if (info.index >= vdev->num_irqs)
> >>> return -EINVAL;
> >>>
> >>> + if (info.argsz >= capsz)
> >>> + minsz = capsz;
> >>> +
> >>> + if (info.index == ext_irq_index) {
> >> nit: n case we add new ext indices afterwards, I would check info.index
> >> - ext_irq_index against an VFIO_EXT_IRQ_MSI define.
> >>> + struct vfio_irq_info_cap_type cap_type = {
> >>> + .header.id = VFIO_IRQ_INFO_CAP_TYPE,
> >>> + .header.version = 1 };
> >>> + int i;
> >>> + int ret;
> >>> + int num_msgs;
> >>> + size_t msi_info_size;
> >>> + struct vfio_platform_irq *irq;
> >> nit: I think generally the opposite order (length) is chosen. This also
> >> would better match the existing style in this file
> > Ok will fix it
> >>> +
> >>> + info.index = array_index_nospec(info.index,
> >>> + vdev->num_irqs);
> >>> +
> >>> + irq = &vdev->irqs[info.index];
> >>> +
> >>> + info.flags = irq->flags;
> >> I think this can be removed given [*]
> > Sure.
> >>> + cap_type.type = irq->type;
> >>> + cap_type.subtype = irq->subtype;
> >>> +
> >>> + ret = vfio_info_add_capability(&caps,
> >>> + &cap_type.header,
> >>> + sizeof(cap_type));
> >>> + if (ret)
> >>> + return ret;
> >>> +
> >>> + num_msgs = irq->num_ctx;
> >> do would want to return the cap even if !num_ctx?
> > If num_ctx = 0 then VFIO_IRQ_INFO_CAP_MSI_DESCS should not be written.
> > I`ll take care of same.
> >>> +
> >>> + msi_info_size = struct_size(msi_info, msgs, num_msgs);
> >>> +
> >>> + msi_info = kzalloc(msi_info_size, GFP_KERNEL);
> >>> + if (!msi_info) {
> >>> + kfree(caps.buf);
> >>> + return -ENOMEM;
> >>> + }
> >>> +
> >>> + msi_info->header.id = VFIO_IRQ_INFO_CAP_MSI_DESCS;
> >>> + msi_info->header.version = 1;
> >>> + msi_info->nr_msgs = num_msgs;
> >>> +
> >>> + for (i = 0; i < num_msgs; i++) {
> >>> + struct vfio_irq_ctx *ctx = &irq->ctx[i];
> >>> +
> >>> + msi_info->msgs[i].addr_lo = ctx->msg.address_lo;
> >>> + msi_info->msgs[i].addr_hi = ctx->msg.address_hi;
> >>> + msi_info->msgs[i].data = ctx->msg.data;
> >>> + }
> >>> +
> >>> + ret = vfio_info_add_capability(&caps, &msi_info->header,
> >>> + msi_info_size);
> >>> + if (ret) {
> >>> + kfree(msi_info);
> >>> + kfree(caps.buf);
> >>> + return ret;
> >>> + }
> >>> + }
> >>> +
> >>> info.flags = vdev->irqs[info.index].flags;
> >> [*]
> > Will fix it.
> >>> info.count = vdev->irqs[info.index].count;
> >>>
> >>> + if (caps.size) {
> >>> + info.flags |= VFIO_IRQ_INFO_FLAG_CAPS;
> >>> + if (info.argsz < sizeof(info) + caps.size) {
> >>> + info.argsz = sizeof(info) + caps.size;
> >>> + info.cap_offset = 0;
> >>> + } else {
> >>> + vfio_info_cap_shift(&caps, sizeof(info));
> >>> + if (copy_to_user((void __user *)arg +
> >>> + sizeof(info), caps.buf,
> >>> + caps.size)) {
> >>> + kfree(msi_info);
> >>> + kfree(caps.buf);
> >>> + return -EFAULT;
> >>> + }
> >>> + info.cap_offset = sizeof(info);
> >>> + }
> >>> +
> >>> + kfree(msi_info);
> >>> + kfree(caps.buf);
> >>> + }
> >>> +
> >>> return copy_to_user((void __user *)arg, &info, minsz) ?
> >>> -EFAULT : 0;
> >>>
> >>> @@ -366,6 +454,7 @@ static long vfio_platform_ioctl(void *device_data,
> >>> struct vfio_irq_set hdr;
> >>> u8 *data = NULL;
> >>> int ret = 0;
> >>> + int max;
> >>> size_t data_size = 0;
> >>>
> >>> minsz = offsetofend(struct vfio_irq_set, count);
> >>> @@ -373,8 +462,14 @@ static long vfio_platform_ioctl(void *device_data,
> >>> if (copy_from_user(&hdr, (void __user *)arg, minsz))
> >>> return -EFAULT;
> >>>
> >>> - ret = vfio_set_irqs_validate_and_prepare(&hdr, vdev->num_irqs,
> >>> - vdev->num_irqs, &data_size);
> >>> + if (hdr.index >= vdev->num_irqs)
> >>> + return -EINVAL;
> >>> +
> >>> + max = vdev->irqs[hdr.index].count;
> >>> +
> >>> + ret = vfio_set_irqs_validate_and_prepare(&hdr, max,
> >>> + vdev->num_irqs,
> >>> + &data_size);
> >>> if (ret)
> >>> return ret;
> >>>
> >>> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> >>> index c5b09ec0a3c9..4066223e5b2e 100644
> >>> --- a/drivers/vfio/platform/vfio_platform_irq.c
> >>> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> >>> @@ -8,10 +8,12 @@
> >>>
> >>> #include <linux/eventfd.h>
> >>> #include <linux/interrupt.h>
> >>> +#include <linux/eventfd.h>
> >>> #include <linux/slab.h>
> >>> #include <linux/types.h>
> >>> #include <linux/vfio.h>
> >>> #include <linux/irq.h>
> >>> +#include <linux/msi.h>
> >>>
> >>> #include "vfio_platform_private.h"
> >>>
> >>> @@ -253,6 +255,195 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
> >>> return 0;
> >>> }
> >>>
> >>> +/* MSI/MSIX */
> >>> +static irqreturn_t vfio_msihandler(int irq, void *arg)
> >>> +{
> >>> + struct eventfd_ctx *trigger = arg;
> >>> +
> >>> + eventfd_signal(trigger, 1);
> >>> + return IRQ_HANDLED;
> >>> +}
> >>> +
> >>> +static void msi_write(struct msi_desc *desc, struct msi_msg *msg)
> >>> +{
> >>> + int i;
> >>> + struct vfio_platform_irq *irq;
> >>> + u16 index = desc->platform.msi_index;
> >>> + struct device *dev = msi_desc_to_dev(desc);
> >>> + struct vfio_device *device = dev_get_drvdata(dev);
> >>> + struct vfio_platform_device *vdev = (struct vfio_platform_device *)
> >>> + vfio_device_data(device);
> >>> +
> >>> + for (i = 0; i < vdev->num_irqs; i++)
> >>> + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
> >>> + irq = &vdev->irqs[i];
> >>> +
> >>> + irq->ctx[index].msg.address_lo = msg->address_lo;
> >>> + irq->ctx[index].msg.address_hi = msg->address_hi;
> >>> + irq->ctx[index].msg.data = msg->data;
> >>> +}
> >>> +
> >>> +static int vfio_msi_enable(struct vfio_platform_device *vdev,
> >>> + struct vfio_platform_irq *irq, int nvec)
> >>> +{
> >>> + int ret;
> >>> + int msi_idx = 0;
> >>> + struct msi_desc *desc;
> >>> + struct device *dev = vdev->device;
> >>> +
> >>> + irq->ctx = kcalloc(nvec, sizeof(struct vfio_irq_ctx), GFP_KERNEL);
> >>> + if (!irq->ctx)
> >>> + return -ENOMEM;
> >>> +
> >>> + /* Allocate platform MSIs */
> >>> + ret = platform_msi_domain_alloc_irqs(dev, nvec, msi_write);
> >>> + if (ret < 0) {
> >>> + kfree(irq->ctx);
> >>> + return ret;
> >>> + }
> >>> +
> >>> + for_each_msi_entry(desc, dev) {
> >>> + irq->ctx[msi_idx].hwirq = desc->irq;
> >>> + msi_idx++;
> >>> + }
> >>> +
> >>> + irq->num_ctx = nvec;
> >>> + irq->config_msi = 1;
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static int vfio_msi_set_vector_signal(struct vfio_platform_irq *irq,
> >>> + int vector, int fd)
> >>> +{
> >>> + struct eventfd_ctx *trigger;
> >>> + int irq_num, ret;
> >>> +
> >>> + if (vector < 0 || vector >= irq->num_ctx)
> >>> + return -EINVAL;
> >>> +
> >>> + irq_num = irq->ctx[vector].hwirq;
> >>> +
> >>> + if (irq->ctx[vector].trigger) {
> >>> + free_irq(irq_num, irq->ctx[vector].trigger);
> >>> + kfree(irq->ctx[vector].name);
> >>> + eventfd_ctx_put(irq->ctx[vector].trigger);
> >>> + irq->ctx[vector].trigger = NULL;
> >>> + }
> >>> +
> >>> + if (fd < 0)
> >>> + return 0;
> >>> +
> >>> + irq->ctx[vector].name = kasprintf(GFP_KERNEL,
> >>> + "vfio-msi[%d]", vector);
> >>> + if (!irq->ctx[vector].name)
> >>> + return -ENOMEM;
> >>> +
> >>> + trigger = eventfd_ctx_fdget(fd);
> >>> + if (IS_ERR(trigger)) {
> >>> + kfree(irq->ctx[vector].name);
> >>> + return PTR_ERR(trigger);
> >>> + }
> >>> +
> >>> + ret = request_irq(irq_num, vfio_msihandler, 0,
> >>> + irq->ctx[vector].name, trigger);
> >>> + if (ret) {
> >>> + kfree(irq->ctx[vector].name);
> >>> + eventfd_ctx_put(trigger);
> >>> + return ret;
> >>> + }
> >>> +
> >>> + irq->ctx[vector].trigger = trigger;
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static int vfio_msi_set_block(struct vfio_platform_irq *irq, unsigned int start,
> >>> + unsigned int count, int32_t *fds)
> >>> +{
> >>> + int i, j, ret = 0;
> >>> +
> >>> + if (start >= irq->num_ctx || start + count > irq->num_ctx)
> >>> + return -EINVAL;
> >>> +
> >>> + for (i = 0, j = start; i < count && !ret; i++, j++) {
> >>> + int fd = fds ? fds[i] : -1;
> >>> +
> >>> + ret = vfio_msi_set_vector_signal(irq, j, fd);
> >>> + }
> >>> +
> >>> + if (ret) {
> >>> + for (--j; j >= (int)start; j--)
> >>> + vfio_msi_set_vector_signal(irq, j, -1);
> >>> + }
> >>> +
> >>> + return ret;
> >>> +}
> >>> +
> >>> +static void vfio_msi_disable(struct vfio_platform_device *vdev,
> >>> + struct vfio_platform_irq *irq)
> >>> +{
> >>> + struct device *dev = vdev->device;
> >>> +
> >>> + vfio_msi_set_block(irq, 0, irq->num_ctx, NULL);
> >>> +
> >>> + platform_msi_domain_free_irqs(dev);
> >>> +
> >>> + irq->config_msi = 0;
> >>> + irq->num_ctx = 0;
> >>> +
> >>> + kfree(irq->ctx);
> >>> +}
> >>> +
> >>> +static int vfio_set_msi_trigger(struct vfio_platform_device *vdev,
> >>> + unsigned int index, unsigned int start,
> >>> + unsigned int count, uint32_t flags, void *data)
> >>> +{
> >>> + int i;
> >>> + struct vfio_platform_irq *irq = &vdev->irqs[index];
> >>> +
> >>> + if (start + count > irq->count)
> >>> + return -EINVAL;
> >>> +
> >>> + if (!count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
> >>> + vfio_msi_disable(vdev, irq);
> >>> + return 0;
> >>> + }
> >>> +
> >>> + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
> >>> + s32 *fds = data;
> >>> + int ret;
> >>> +
> >>> + if (irq->config_msi)
> >>> + return vfio_msi_set_block(irq, start, count,
> >>> + fds);
> >>> + ret = vfio_msi_enable(vdev, irq, start + count);
> >>> + if (ret)
> >>> + return ret;
> >>> +
> >>> + ret = vfio_msi_set_block(irq, start, count, fds);
> >>> + if (ret)
> >>> + vfio_msi_disable(vdev, irq);
> >>> +
> >>> + return ret;
> >>> + }
> >>> +
> >>> + for (i = start; i < start + count; i++) {
> >>> + if (!irq->ctx[i].trigger)
> >>> + continue;
> >>> + if (flags & VFIO_IRQ_SET_DATA_NONE) {
> >>> + eventfd_signal(irq->ctx[i].trigger, 1);
> >>> + } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
> >>> + u8 *bools = data;
> >>> +
> >>> + if (bools[i - start])
> >>> + eventfd_signal(irq->ctx[i].trigger, 1);
> >>> + }
> >>> + }
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
> >>> uint32_t flags, unsigned index, unsigned start,
> >>> unsigned count, void *data)
> >>> @@ -261,16 +452,29 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
> >>> unsigned start, unsigned count, uint32_t flags,
> >>> void *data) = NULL;
> >>>
> >>> - switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> >>> - case VFIO_IRQ_SET_ACTION_MASK:
> >>> - func = vfio_platform_set_irq_mask;
> >>> - break;
> >>> - case VFIO_IRQ_SET_ACTION_UNMASK:
> >>> - func = vfio_platform_set_irq_unmask;
> >>> - break;
> >>> - case VFIO_IRQ_SET_ACTION_TRIGGER:
> >>> - func = vfio_platform_set_irq_trigger;
> >>> - break;
> >>> + struct vfio_platform_irq *irq = &vdev->irqs[index];
> >>> +
> >>> + if (irq->type == VFIO_IRQ_TYPE_MSI) {
> >>> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> >>> + case VFIO_IRQ_SET_ACTION_MASK:
> >>> + case VFIO_IRQ_SET_ACTION_UNMASK:
> >>> + break;
> >>> + case VFIO_IRQ_SET_ACTION_TRIGGER:
> >>> + func = vfio_set_msi_trigger;
> >>> + break;
> >>> + }
> >>> + } else {
> >>> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> >>> + case VFIO_IRQ_SET_ACTION_MASK:
> >>> + func = vfio_platform_set_irq_mask;
> >>> + break;
> >>> + case VFIO_IRQ_SET_ACTION_UNMASK:
> >>> + func = vfio_platform_set_irq_unmask;
> >>> + break;
> >>> + case VFIO_IRQ_SET_ACTION_TRIGGER:
> >>> + func = vfio_platform_set_irq_trigger;
> >>> + break;
> >>> + }
> >>> }
> >>>
> >>> if (!func)
> >>> @@ -281,12 +485,21 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
> >>>
> >>> int vfio_platform_irq_init(struct vfio_platform_device *vdev)
> >>> {
> >>> - int cnt = 0, i;
> >>> + int i;
> >>> + int cnt = 0;
> >>> + int num_irqs;
> >>> + struct device *dev = vdev->device;
> >>>
> >>> while (vdev->get_irq(vdev, cnt) >= 0)
> >>> cnt++;
> >>>
> >>> - vdev->irqs = kcalloc(cnt, sizeof(struct vfio_platform_irq), GFP_KERNEL);
> >>> + num_irqs = cnt;
> >>> +
> >>> + if (dev->msi_domain)
> >>> + num_irqs++;
> >>> +
> >>> + vdev->irqs = kcalloc(num_irqs, sizeof(struct vfio_platform_irq),
> >>> + GFP_KERNEL);
> >>> if (!vdev->irqs)
> >>> return -ENOMEM;
> >>>
> >>> @@ -309,7 +522,19 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
> >>> vdev->irqs[i].masked = false;
> >>> }
> >>>
> >>> - vdev->num_irqs = cnt;
> >>> + /*
> >>> + * MSI block is added at last index and its an ext irq
> >> it is
> >>> + */
> >>> + if (dev->msi_domain) {
> >>> + vdev->irqs[i].flags = VFIO_IRQ_INFO_EVENTFD;
> >>> + vdev->irqs[i].count = NR_IRQS;
> >> why NR_IRQS?
> > Since different devices can have different numbers of MSI(s) so we
> > need to initialize with max possible values. Can you please suggest if
> > this value does not seem appropriate?
> As opposed to PCIe, the userspace has no real way to guess how many
> vectors can be set (what vfio_pci_get_irq_count does). This also means
> we do not fully implement the original API as we are not able to report
> an accurate value for .count. How will the user determine how many
> vectors he can use?

I believe user space will know how many MSIs platform device can have.
We have assigned NR_IRQS because that’s what maximum number of
interrupt can be supported on a specific platform. In case user space
errantly tries to allocate any number of MSIs in kernel
platform_msi_domain_alloc_irqs should fail.
Do you think this approach can create an issue as .count is not
exactly the same as MSIs are existing with device? Is it necessary to
have a PCIe kind of approach as PCIe is standard but platform devices
do not have standards?

> >>> + vdev->irqs[i].hwirq = 0;
> >>> + vdev->irqs[i].masked = false;
> >>> + vdev->irqs[i].type = VFIO_IRQ_TYPE_MSI;
> >>> + vdev->num_ext_irqs = 1;
> >>> + }
> >>> +
> >>> + vdev->num_irqs = num_irqs;
> >>>
> >>> return 0;
> >>> err:
> >>> @@ -321,8 +546,13 @@ void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
> >>> {
> >>> int i;
> >>>
> >>> - for (i = 0; i < vdev->num_irqs; i++)
> >>> - vfio_set_trigger(vdev, i, -1, NULL);
> >>> + for (i = 0; i < vdev->num_irqs; i++) {
> >>> + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
> >>> + vfio_set_msi_trigger(vdev, i, 0, 0,
> >>> + VFIO_IRQ_SET_DATA_NONE, NULL);
> >>> + else
> >>> + vfio_set_trigger(vdev, i, -1, NULL);
> >>> + }
> >>>
> >>> vdev->num_irqs = 0;
> >>> kfree(vdev->irqs);
> >>> diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
> >>> index 289089910643..7bbb05988705 100644
> >>> --- a/drivers/vfio/platform/vfio_platform_private.h
> >>> +++ b/drivers/vfio/platform/vfio_platform_private.h
> >>> @@ -9,6 +9,7 @@
> >>>
> >>> #include <linux/types.h>
> >>> #include <linux/interrupt.h>
> >>> +#include <linux/msi.h>
> >>>
> >>> #define VFIO_PLATFORM_OFFSET_SHIFT 40
> >>> #define VFIO_PLATFORM_OFFSET_MASK (((u64)(1) << VFIO_PLATFORM_OFFSET_SHIFT) - 1)
> >>> @@ -19,9 +20,18 @@
> >>> #define VFIO_PLATFORM_INDEX_TO_OFFSET(index) \
> >>> ((u64)(index) << VFIO_PLATFORM_OFFSET_SHIFT)
> >>>
> >>> +struct vfio_irq_ctx {
> >>> + int hwirq;
> >>> + char *name;
> >>> + struct msi_msg msg;
> >>> + struct eventfd_ctx *trigger;
> >>> +};
> >>> +
> >>> struct vfio_platform_irq {
> >>> u32 flags;
> >>> u32 count;
> >>> + int num_ctx;
> >>> + struct vfio_irq_ctx *ctx;
> >>> int hwirq;
> >>> char *name;
> >>> struct eventfd_ctx *trigger;
> >>> @@ -29,6 +39,11 @@ struct vfio_platform_irq {
> >>> spinlock_t lock;
> >>> struct virqfd *unmask;
> >>> struct virqfd *mask;
> >>> +
> >>> + /* for extended irqs */
> >>> + u32 type;
> >>> + u32 subtype;
> >>> + int config_msi;
> >>> };
> >>>
> >>> struct vfio_platform_region {
> >>> @@ -46,6 +61,7 @@ struct vfio_platform_device {
> >>> u32 num_regions;
> >>> struct vfio_platform_irq *irqs;
> >>> u32 num_irqs;
> >>> + int num_ext_irqs;
> >>> int refcnt;
> >>> struct mutex igate;
> >>> struct module *parent_module;
> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>> index 2f313a238a8f..598d1c944283 100644
> >>> --- a/include/uapi/linux/vfio.h
> >>> +++ b/include/uapi/linux/vfio.h
> >>> @@ -697,11 +697,54 @@ struct vfio_irq_info {
> >>> #define VFIO_IRQ_INFO_MASKABLE (1 << 1)
> >>> #define VFIO_IRQ_INFO_AUTOMASKED (1 << 2)
> >>> #define VFIO_IRQ_INFO_NORESIZE (1 << 3)
> >>> +#define VFIO_IRQ_INFO_FLAG_CAPS (1 << 4) /* Info supports caps */
> >>> __u32 index; /* IRQ index */
> >>> __u32 count; /* Number of IRQs within this index */
> >>> + __u32 cap_offset; /* Offset within info struct of first cap */
> >>> };
> >>> #define VFIO_DEVICE_GET_IRQ_INFO _IO(VFIO_TYPE, VFIO_BASE + 9)
> >>>
> >>> +/*
> >>> + * The irq type capability allows IRQs unique to a specific device or
> >>> + * class of devices to be exposed.
> >>> + *
> >>> + * The structures below define version 1 of this capability.
> >>> + */
> >>> +#define VFIO_IRQ_INFO_CAP_TYPE 3
> >>> +
> >>> +struct vfio_irq_info_cap_type {
> >>> + struct vfio_info_cap_header header;
> >>> + __u32 type; /* global per bus driver */
> >>> + __u32 subtype; /* type specific */
> >>> +};
> >>> +
> >>> +/*
> >>> + * List of IRQ types, global per bus driver.
> >>> + * If you introduce a new type, please add it here.
> >>> + */
> >>> +
> >>> +/* Non PCI devices having MSI(s) support */
> >>> +#define VFIO_IRQ_TYPE_MSI (1)
> >>> +
> >>> +/*
> >>> + * The msi capability allows the user to use the msi msg to
> >>> + * configure the iova for the msi configuration.
> >>> + * The structures below define version 1 of this capability.
> >>> + */
> >>> +#define VFIO_IRQ_INFO_CAP_MSI_DESCS 4
> >>> +
> >>> +struct vfio_irq_msi_msg {
> >>> + __u32 addr_lo;
> >>> + __u32 addr_hi;
> >>> + __u32 data;
> >>> +};
> >>> +
> >>> +struct vfio_irq_info_cap_msi {
> >>> + struct vfio_info_cap_header header;
> >>> + __u32 nr_msgs;
> >> I think you should align a __u32 reserved field to have a 64b alignment
> > Sure.
> >>> + struct vfio_irq_msi_msg msgs[];
> >> Please can you clarify why this cap is needed versus your prior approach.
> > In the previous patch, the reset module was configuring the device
> > with MSI msg/data now since the module is not available, user space
> > needs to have this data.
> > With this approach userspace just needs the pairs <msg and ctx >
> > associated with the MSI(s) and it can choose to configure the MSI(s)
> > sources accordingly.
> > Let me know if this approach does not look appropriate.
> This comes to the question I asked in my previous email, ie. could you
> give some more info about the expected MSI setup sequence? May be the
> opportunity to enhance the commit message ;-)

With our proposal, user space needs to know how many MSI sources there
are as we want user space to map msi-msg(s) to specific MSI sources.
Let us assume there are max ‘n’ MSI sources a platform device have,
user-space requests ‘n’ msi(s) allocation in the kernel which in turn
gets ‘n’ msi_msg with help of cap (VFIO_IRQ_INFO_CAP_MSI_DESCS). User
space is free to map/configure msi_msg->data value to any particular
source. Since msi_msg->data is one to one mapped to eventfd,
user-space knows that particular eventfd corresponds to the same MSI
source which it has configured with the msi_msg->data value.

Steps:
a) User space allocates nvec in kernel. Ioctl VFIO_DEVICE_SET_IRQS
b) Kernel allocates nvec and does associated IRQ allocation, eventfd etc.
c) User space gets the msi_msgs VFIO_DEVICE_GET_IRQ_INFO.
d) User space configures the particular MSI source with msi_msg info.
Do these steps look OK?
We can add more information where this cap is introduced in the file
uapi/linux/vfio.h.

Thanks,
Vikas
>
> Thanks
>
> Eric
> >
> > Thanks,
> > Vikas
> >>> +};
> >>> +
> >>> /**
> >>> * VFIO_DEVICE_SET_IRQS - _IOW(VFIO_TYPE, VFIO_BASE + 10, struct vfio_irq_set)
> >>> *
> >>>
> >> Thanks
> >>
> >> Eric
> >>
>

--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.


Attachments:
smime.p7s (4.07 kB)
S/MIME Cryptographic Signature

2020-12-11 12:35:17

by Eric Auger

[permalink] [raw]
Subject: Re: [RFC v2 1/1] vfio/platform: add support for msi

Hi Vikas,

On 12/10/20 8:34 AM, Vikas Gupta wrote:
> HI Eric,
>
> On Tue, Dec 8, 2020 at 2:13 AM Auger Eric <[email protected]> wrote:
>>
>> Hi Vikas,
>>
>> On 12/3/20 3:50 PM, Vikas Gupta wrote:
>>> Hi Eric,
>>>
>>> On Wed, Dec 2, 2020 at 8:14 PM Auger Eric <[email protected]> wrote:
>>>>
>>>> Hi Vikas,
>>>>
>>>> On 11/24/20 5:16 PM, Vikas Gupta wrote:
>>>>> MSI support for platform devices.
>>>>>
>>>>> Signed-off-by: Vikas Gupta <[email protected]>
>>>>> ---
>>>>> drivers/vfio/platform/vfio_platform_common.c | 99 ++++++-
>>>>> drivers/vfio/platform/vfio_platform_irq.c | 260 +++++++++++++++++-
>>>>> drivers/vfio/platform/vfio_platform_private.h | 16 ++
>>>>> include/uapi/linux/vfio.h | 43 +++
>>>>> 4 files changed, 401 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c
>>>>> index c0771a9567fb..b0bfc0f4ee1f 100644
>>>>> --- a/drivers/vfio/platform/vfio_platform_common.c
>>>>> +++ b/drivers/vfio/platform/vfio_platform_common.c
>>>>> @@ -16,6 +16,7 @@
>>>>> #include <linux/types.h>
>>>>> #include <linux/uaccess.h>
>>>>> #include <linux/vfio.h>
>>>>> +#include <linux/nospec.h>
>>>>>
>>>>> #include "vfio_platform_private.h"
>>>>>
>>>>> @@ -344,9 +345,16 @@ static long vfio_platform_ioctl(void *device_data,
>>>>>
>>>>> } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>>>>> struct vfio_irq_info info;
>>>>> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>>>>> + struct vfio_irq_info_cap_msi *msi_info = NULL;
>>>>> + unsigned long capsz;
>>>>> + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
>>>>>
>>>>> minsz = offsetofend(struct vfio_irq_info, count);
>>>>>
>>>>> + /* For backward compatibility, cannot require this */
>>>>> + capsz = offsetofend(struct vfio_irq_info, cap_offset);
>>>>> +
>>>>> if (copy_from_user(&info, (void __user *)arg, minsz))
>>>>> return -EFAULT;
>>>>>
>>>>> @@ -356,9 +364,89 @@ static long vfio_platform_ioctl(void *device_data,
>>>>> if (info.index >= vdev->num_irqs)
>>>>> return -EINVAL;
>>>>>
>>>>> + if (info.argsz >= capsz)
>>>>> + minsz = capsz;
>>>>> +
>>>>> + if (info.index == ext_irq_index) {
>>>> nit: n case we add new ext indices afterwards, I would check info.index
>>>> - ext_irq_index against an VFIO_EXT_IRQ_MSI define.
>>>>> + struct vfio_irq_info_cap_type cap_type = {
>>>>> + .header.id = VFIO_IRQ_INFO_CAP_TYPE,
>>>>> + .header.version = 1 };
>>>>> + int i;
>>>>> + int ret;
>>>>> + int num_msgs;
>>>>> + size_t msi_info_size;
>>>>> + struct vfio_platform_irq *irq;
>>>> nit: I think generally the opposite order (length) is chosen. This also
>>>> would better match the existing style in this file
>>> Ok will fix it
>>>>> +
>>>>> + info.index = array_index_nospec(info.index,
>>>>> + vdev->num_irqs);
>>>>> +
>>>>> + irq = &vdev->irqs[info.index];
>>>>> +
>>>>> + info.flags = irq->flags;
>>>> I think this can be removed given [*]
>>> Sure.
>>>>> + cap_type.type = irq->type;
>>>>> + cap_type.subtype = irq->subtype;
>>>>> +
>>>>> + ret = vfio_info_add_capability(&caps,
>>>>> + &cap_type.header,
>>>>> + sizeof(cap_type));
>>>>> + if (ret)
>>>>> + return ret;
>>>>> +
>>>>> + num_msgs = irq->num_ctx;
>>>> do would want to return the cap even if !num_ctx?
>>> If num_ctx = 0 then VFIO_IRQ_INFO_CAP_MSI_DESCS should not be written.
>>> I`ll take care of same.
>>>>> +
>>>>> + msi_info_size = struct_size(msi_info, msgs, num_msgs);
>>>>> +
>>>>> + msi_info = kzalloc(msi_info_size, GFP_KERNEL);
>>>>> + if (!msi_info) {
>>>>> + kfree(caps.buf);
>>>>> + return -ENOMEM;
>>>>> + }
>>>>> +
>>>>> + msi_info->header.id = VFIO_IRQ_INFO_CAP_MSI_DESCS;
>>>>> + msi_info->header.version = 1;
>>>>> + msi_info->nr_msgs = num_msgs;
>>>>> +
>>>>> + for (i = 0; i < num_msgs; i++) {
>>>>> + struct vfio_irq_ctx *ctx = &irq->ctx[i];
>>>>> +
>>>>> + msi_info->msgs[i].addr_lo = ctx->msg.address_lo;
>>>>> + msi_info->msgs[i].addr_hi = ctx->msg.address_hi;
>>>>> + msi_info->msgs[i].data = ctx->msg.data;
>>>>> + }
>>>>> +
>>>>> + ret = vfio_info_add_capability(&caps, &msi_info->header,
>>>>> + msi_info_size);
>>>>> + if (ret) {
>>>>> + kfree(msi_info);
>>>>> + kfree(caps.buf);
>>>>> + return ret;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> info.flags = vdev->irqs[info.index].flags;
>>>> [*]
>>> Will fix it.
>>>>> info.count = vdev->irqs[info.index].count;
>>>>>
>>>>> + if (caps.size) {
>>>>> + info.flags |= VFIO_IRQ_INFO_FLAG_CAPS;
>>>>> + if (info.argsz < sizeof(info) + caps.size) {
>>>>> + info.argsz = sizeof(info) + caps.size;
>>>>> + info.cap_offset = 0;
>>>>> + } else {
>>>>> + vfio_info_cap_shift(&caps, sizeof(info));
>>>>> + if (copy_to_user((void __user *)arg +
>>>>> + sizeof(info), caps.buf,
>>>>> + caps.size)) {
>>>>> + kfree(msi_info);
>>>>> + kfree(caps.buf);
>>>>> + return -EFAULT;
>>>>> + }
>>>>> + info.cap_offset = sizeof(info);
>>>>> + }
>>>>> +
>>>>> + kfree(msi_info);
>>>>> + kfree(caps.buf);
>>>>> + }
>>>>> +
>>>>> return copy_to_user((void __user *)arg, &info, minsz) ?
>>>>> -EFAULT : 0;
>>>>>
>>>>> @@ -366,6 +454,7 @@ static long vfio_platform_ioctl(void *device_data,
>>>>> struct vfio_irq_set hdr;
>>>>> u8 *data = NULL;
>>>>> int ret = 0;
>>>>> + int max;
>>>>> size_t data_size = 0;
>>>>>
>>>>> minsz = offsetofend(struct vfio_irq_set, count);
>>>>> @@ -373,8 +462,14 @@ static long vfio_platform_ioctl(void *device_data,
>>>>> if (copy_from_user(&hdr, (void __user *)arg, minsz))
>>>>> return -EFAULT;
>>>>>
>>>>> - ret = vfio_set_irqs_validate_and_prepare(&hdr, vdev->num_irqs,
>>>>> - vdev->num_irqs, &data_size);
>>>>> + if (hdr.index >= vdev->num_irqs)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + max = vdev->irqs[hdr.index].count;
>>>>> +
>>>>> + ret = vfio_set_irqs_validate_and_prepare(&hdr, max,
>>>>> + vdev->num_irqs,
>>>>> + &data_size);
>>>>> if (ret)
>>>>> return ret;
>>>>>
>>>>> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
>>>>> index c5b09ec0a3c9..4066223e5b2e 100644
>>>>> --- a/drivers/vfio/platform/vfio_platform_irq.c
>>>>> +++ b/drivers/vfio/platform/vfio_platform_irq.c
>>>>> @@ -8,10 +8,12 @@
>>>>>
>>>>> #include <linux/eventfd.h>
>>>>> #include <linux/interrupt.h>
>>>>> +#include <linux/eventfd.h>
>>>>> #include <linux/slab.h>
>>>>> #include <linux/types.h>
>>>>> #include <linux/vfio.h>
>>>>> #include <linux/irq.h>
>>>>> +#include <linux/msi.h>
>>>>>
>>>>> #include "vfio_platform_private.h"
>>>>>
>>>>> @@ -253,6 +255,195 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +/* MSI/MSIX */
>>>>> +static irqreturn_t vfio_msihandler(int irq, void *arg)
>>>>> +{
>>>>> + struct eventfd_ctx *trigger = arg;
>>>>> +
>>>>> + eventfd_signal(trigger, 1);
>>>>> + return IRQ_HANDLED;
>>>>> +}
>>>>> +
>>>>> +static void msi_write(struct msi_desc *desc, struct msi_msg *msg)
>>>>> +{
>>>>> + int i;
>>>>> + struct vfio_platform_irq *irq;
>>>>> + u16 index = desc->platform.msi_index;
>>>>> + struct device *dev = msi_desc_to_dev(desc);
>>>>> + struct vfio_device *device = dev_get_drvdata(dev);
>>>>> + struct vfio_platform_device *vdev = (struct vfio_platform_device *)
>>>>> + vfio_device_data(device);
>>>>> +
>>>>> + for (i = 0; i < vdev->num_irqs; i++)
>>>>> + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
>>>>> + irq = &vdev->irqs[i];
>>>>> +
>>>>> + irq->ctx[index].msg.address_lo = msg->address_lo;
>>>>> + irq->ctx[index].msg.address_hi = msg->address_hi;
>>>>> + irq->ctx[index].msg.data = msg->data;
>>>>> +}
>>>>> +
>>>>> +static int vfio_msi_enable(struct vfio_platform_device *vdev,
>>>>> + struct vfio_platform_irq *irq, int nvec)
>>>>> +{
>>>>> + int ret;
>>>>> + int msi_idx = 0;
>>>>> + struct msi_desc *desc;
>>>>> + struct device *dev = vdev->device;
>>>>> +
>>>>> + irq->ctx = kcalloc(nvec, sizeof(struct vfio_irq_ctx), GFP_KERNEL);
>>>>> + if (!irq->ctx)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + /* Allocate platform MSIs */
>>>>> + ret = platform_msi_domain_alloc_irqs(dev, nvec, msi_write);
>>>>> + if (ret < 0) {
>>>>> + kfree(irq->ctx);
>>>>> + return ret;
>>>>> + }
>>>>> +
>>>>> + for_each_msi_entry(desc, dev) {
>>>>> + irq->ctx[msi_idx].hwirq = desc->irq;
>>>>> + msi_idx++;
>>>>> + }
>>>>> +
>>>>> + irq->num_ctx = nvec;
>>>>> + irq->config_msi = 1;
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static int vfio_msi_set_vector_signal(struct vfio_platform_irq *irq,
>>>>> + int vector, int fd)
>>>>> +{
>>>>> + struct eventfd_ctx *trigger;
>>>>> + int irq_num, ret;
>>>>> +
>>>>> + if (vector < 0 || vector >= irq->num_ctx)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + irq_num = irq->ctx[vector].hwirq;
>>>>> +
>>>>> + if (irq->ctx[vector].trigger) {
>>>>> + free_irq(irq_num, irq->ctx[vector].trigger);
>>>>> + kfree(irq->ctx[vector].name);
>>>>> + eventfd_ctx_put(irq->ctx[vector].trigger);
>>>>> + irq->ctx[vector].trigger = NULL;
>>>>> + }
>>>>> +
>>>>> + if (fd < 0)
>>>>> + return 0;
>>>>> +
>>>>> + irq->ctx[vector].name = kasprintf(GFP_KERNEL,
>>>>> + "vfio-msi[%d]", vector);
>>>>> + if (!irq->ctx[vector].name)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + trigger = eventfd_ctx_fdget(fd);
>>>>> + if (IS_ERR(trigger)) {
>>>>> + kfree(irq->ctx[vector].name);
>>>>> + return PTR_ERR(trigger);
>>>>> + }
>>>>> +
>>>>> + ret = request_irq(irq_num, vfio_msihandler, 0,
>>>>> + irq->ctx[vector].name, trigger);
>>>>> + if (ret) {
>>>>> + kfree(irq->ctx[vector].name);
>>>>> + eventfd_ctx_put(trigger);
>>>>> + return ret;
>>>>> + }
>>>>> +
>>>>> + irq->ctx[vector].trigger = trigger;
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static int vfio_msi_set_block(struct vfio_platform_irq *irq, unsigned int start,
>>>>> + unsigned int count, int32_t *fds)
>>>>> +{
>>>>> + int i, j, ret = 0;
>>>>> +
>>>>> + if (start >= irq->num_ctx || start + count > irq->num_ctx)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + for (i = 0, j = start; i < count && !ret; i++, j++) {
>>>>> + int fd = fds ? fds[i] : -1;
>>>>> +
>>>>> + ret = vfio_msi_set_vector_signal(irq, j, fd);
>>>>> + }
>>>>> +
>>>>> + if (ret) {
>>>>> + for (--j; j >= (int)start; j--)
>>>>> + vfio_msi_set_vector_signal(irq, j, -1);
>>>>> + }
>>>>> +
>>>>> + return ret;
>>>>> +}
>>>>> +
>>>>> +static void vfio_msi_disable(struct vfio_platform_device *vdev,
>>>>> + struct vfio_platform_irq *irq)
>>>>> +{
>>>>> + struct device *dev = vdev->device;
>>>>> +
>>>>> + vfio_msi_set_block(irq, 0, irq->num_ctx, NULL);
>>>>> +
>>>>> + platform_msi_domain_free_irqs(dev);
>>>>> +
>>>>> + irq->config_msi = 0;
>>>>> + irq->num_ctx = 0;
>>>>> +
>>>>> + kfree(irq->ctx);
>>>>> +}
>>>>> +
>>>>> +static int vfio_set_msi_trigger(struct vfio_platform_device *vdev,
>>>>> + unsigned int index, unsigned int start,
>>>>> + unsigned int count, uint32_t flags, void *data)
>>>>> +{
>>>>> + int i;
>>>>> + struct vfio_platform_irq *irq = &vdev->irqs[index];
>>>>> +
>>>>> + if (start + count > irq->count)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + if (!count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
>>>>> + vfio_msi_disable(vdev, irq);
>>>>> + return 0;
>>>>> + }
>>>>> +
>>>>> + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
>>>>> + s32 *fds = data;
>>>>> + int ret;
>>>>> +
>>>>> + if (irq->config_msi)
>>>>> + return vfio_msi_set_block(irq, start, count,
>>>>> + fds);
>>>>> + ret = vfio_msi_enable(vdev, irq, start + count);
>>>>> + if (ret)
>>>>> + return ret;
>>>>> +
>>>>> + ret = vfio_msi_set_block(irq, start, count, fds);
>>>>> + if (ret)
>>>>> + vfio_msi_disable(vdev, irq);
>>>>> +
>>>>> + return ret;
>>>>> + }
>>>>> +
>>>>> + for (i = start; i < start + count; i++) {
>>>>> + if (!irq->ctx[i].trigger)
>>>>> + continue;
>>>>> + if (flags & VFIO_IRQ_SET_DATA_NONE) {
>>>>> + eventfd_signal(irq->ctx[i].trigger, 1);
>>>>> + } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
>>>>> + u8 *bools = data;
>>>>> +
>>>>> + if (bools[i - start])
>>>>> + eventfd_signal(irq->ctx[i].trigger, 1);
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
>>>>> uint32_t flags, unsigned index, unsigned start,
>>>>> unsigned count, void *data)
>>>>> @@ -261,16 +452,29 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
>>>>> unsigned start, unsigned count, uint32_t flags,
>>>>> void *data) = NULL;
>>>>>
>>>>> - switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>>>>> - case VFIO_IRQ_SET_ACTION_MASK:
>>>>> - func = vfio_platform_set_irq_mask;
>>>>> - break;
>>>>> - case VFIO_IRQ_SET_ACTION_UNMASK:
>>>>> - func = vfio_platform_set_irq_unmask;
>>>>> - break;
>>>>> - case VFIO_IRQ_SET_ACTION_TRIGGER:
>>>>> - func = vfio_platform_set_irq_trigger;
>>>>> - break;
>>>>> + struct vfio_platform_irq *irq = &vdev->irqs[index];
>>>>> +
>>>>> + if (irq->type == VFIO_IRQ_TYPE_MSI) {
>>>>> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>>>>> + case VFIO_IRQ_SET_ACTION_MASK:
>>>>> + case VFIO_IRQ_SET_ACTION_UNMASK:
>>>>> + break;
>>>>> + case VFIO_IRQ_SET_ACTION_TRIGGER:
>>>>> + func = vfio_set_msi_trigger;
>>>>> + break;
>>>>> + }
>>>>> + } else {
>>>>> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>>>>> + case VFIO_IRQ_SET_ACTION_MASK:
>>>>> + func = vfio_platform_set_irq_mask;
>>>>> + break;
>>>>> + case VFIO_IRQ_SET_ACTION_UNMASK:
>>>>> + func = vfio_platform_set_irq_unmask;
>>>>> + break;
>>>>> + case VFIO_IRQ_SET_ACTION_TRIGGER:
>>>>> + func = vfio_platform_set_irq_trigger;
>>>>> + break;
>>>>> + }
>>>>> }
>>>>>
>>>>> if (!func)
>>>>> @@ -281,12 +485,21 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
>>>>>
>>>>> int vfio_platform_irq_init(struct vfio_platform_device *vdev)
>>>>> {
>>>>> - int cnt = 0, i;
>>>>> + int i;
>>>>> + int cnt = 0;
>>>>> + int num_irqs;
>>>>> + struct device *dev = vdev->device;
>>>>>
>>>>> while (vdev->get_irq(vdev, cnt) >= 0)
>>>>> cnt++;
>>>>>
>>>>> - vdev->irqs = kcalloc(cnt, sizeof(struct vfio_platform_irq), GFP_KERNEL);
>>>>> + num_irqs = cnt;
>>>>> +
>>>>> + if (dev->msi_domain)
>>>>> + num_irqs++;
>>>>> +
>>>>> + vdev->irqs = kcalloc(num_irqs, sizeof(struct vfio_platform_irq),
>>>>> + GFP_KERNEL);
>>>>> if (!vdev->irqs)
>>>>> return -ENOMEM;
>>>>>
>>>>> @@ -309,7 +522,19 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
>>>>> vdev->irqs[i].masked = false;
>>>>> }
>>>>>
>>>>> - vdev->num_irqs = cnt;
>>>>> + /*
>>>>> + * MSI block is added at last index and its an ext irq
>>>> it is
>>>>> + */
>>>>> + if (dev->msi_domain) {
>>>>> + vdev->irqs[i].flags = VFIO_IRQ_INFO_EVENTFD;
>>>>> + vdev->irqs[i].count = NR_IRQS;
>>>> why NR_IRQS?
>>> Since different devices can have different numbers of MSI(s) so we
>>> need to initialize with max possible values. Can you please suggest if
>>> this value does not seem appropriate?
>> As opposed to PCIe, the userspace has no real way to guess how many
>> vectors can be set (what vfio_pci_get_irq_count does). This also means
>> we do not fully implement the original API as we are not able to report
>> an accurate value for .count. How will the user determine how many
>> vectors he can use?
>
> I believe user space will know how many MSIs platform device can have.
> We have assigned NR_IRQS because that’s what maximum number of
> interrupt can be supported on a specific platform. In case user space
> errantly tries to allocate any number of MSIs in kernel
> platform_msi_domain_alloc_irqs should fail.
> Do you think this approach can create an issue as .count is not
> exactly the same as MSIs are existing with device? Is it necessary to
> have a PCIe kind of approach as PCIe is standard but platform devices
> do not have standards?
I had a short discussion about this with Alex and I think we came to the
conclusion that is an issue not to be able to report accurately the
number of supported vectors. Assuming that the userspace knows the
number of vectors does not really look as a valid assumption and does
not really fit the vfio spirit.

The previous approach beased on a kernel module allowed to report their
actual number. Even if it is heavier it allows to fill the info supposed
to be returned by the GET_IRQ ioctl and we think it is better.

Thanks

Eric
>
>>>>> + vdev->irqs[i].hwirq = 0;
>>>>> + vdev->irqs[i].masked = false;
>>>>> + vdev->irqs[i].type = VFIO_IRQ_TYPE_MSI;
>>>>> + vdev->num_ext_irqs = 1;
>>>>> + }
>>>>> +
>>>>> + vdev->num_irqs = num_irqs;
>>>>>
>>>>> return 0;
>>>>> err:
>>>>> @@ -321,8 +546,13 @@ void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
>>>>> {
>>>>> int i;
>>>>>
>>>>> - for (i = 0; i < vdev->num_irqs; i++)
>>>>> - vfio_set_trigger(vdev, i, -1, NULL);
>>>>> + for (i = 0; i < vdev->num_irqs; i++) {
>>>>> + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
>>>>> + vfio_set_msi_trigger(vdev, i, 0, 0,
>>>>> + VFIO_IRQ_SET_DATA_NONE, NULL);
>>>>> + else
>>>>> + vfio_set_trigger(vdev, i, -1, NULL);
>>>>> + }
>>>>>
>>>>> vdev->num_irqs = 0;
>>>>> kfree(vdev->irqs);
>>>>> diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
>>>>> index 289089910643..7bbb05988705 100644
>>>>> --- a/drivers/vfio/platform/vfio_platform_private.h
>>>>> +++ b/drivers/vfio/platform/vfio_platform_private.h
>>>>> @@ -9,6 +9,7 @@
>>>>>
>>>>> #include <linux/types.h>
>>>>> #include <linux/interrupt.h>
>>>>> +#include <linux/msi.h>
>>>>>
>>>>> #define VFIO_PLATFORM_OFFSET_SHIFT 40
>>>>> #define VFIO_PLATFORM_OFFSET_MASK (((u64)(1) << VFIO_PLATFORM_OFFSET_SHIFT) - 1)
>>>>> @@ -19,9 +20,18 @@
>>>>> #define VFIO_PLATFORM_INDEX_TO_OFFSET(index) \
>>>>> ((u64)(index) << VFIO_PLATFORM_OFFSET_SHIFT)
>>>>>
>>>>> +struct vfio_irq_ctx {
>>>>> + int hwirq;
>>>>> + char *name;
>>>>> + struct msi_msg msg;
>>>>> + struct eventfd_ctx *trigger;
>>>>> +};
>>>>> +
>>>>> struct vfio_platform_irq {
>>>>> u32 flags;
>>>>> u32 count;
>>>>> + int num_ctx;
>>>>> + struct vfio_irq_ctx *ctx;
>>>>> int hwirq;
>>>>> char *name;
>>>>> struct eventfd_ctx *trigger;
>>>>> @@ -29,6 +39,11 @@ struct vfio_platform_irq {
>>>>> spinlock_t lock;
>>>>> struct virqfd *unmask;
>>>>> struct virqfd *mask;
>>>>> +
>>>>> + /* for extended irqs */
>>>>> + u32 type;
>>>>> + u32 subtype;
>>>>> + int config_msi;
>>>>> };
>>>>>
>>>>> struct vfio_platform_region {
>>>>> @@ -46,6 +61,7 @@ struct vfio_platform_device {
>>>>> u32 num_regions;
>>>>> struct vfio_platform_irq *irqs;
>>>>> u32 num_irqs;
>>>>> + int num_ext_irqs;
>>>>> int refcnt;
>>>>> struct mutex igate;
>>>>> struct module *parent_module;
>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>> index 2f313a238a8f..598d1c944283 100644
>>>>> --- a/include/uapi/linux/vfio.h
>>>>> +++ b/include/uapi/linux/vfio.h
>>>>> @@ -697,11 +697,54 @@ struct vfio_irq_info {
>>>>> #define VFIO_IRQ_INFO_MASKABLE (1 << 1)
>>>>> #define VFIO_IRQ_INFO_AUTOMASKED (1 << 2)
>>>>> #define VFIO_IRQ_INFO_NORESIZE (1 << 3)
>>>>> +#define VFIO_IRQ_INFO_FLAG_CAPS (1 << 4) /* Info supports caps */
>>>>> __u32 index; /* IRQ index */
>>>>> __u32 count; /* Number of IRQs within this index */
>>>>> + __u32 cap_offset; /* Offset within info struct of first cap */
>>>>> };
>>>>> #define VFIO_DEVICE_GET_IRQ_INFO _IO(VFIO_TYPE, VFIO_BASE + 9)
>>>>>
>>>>> +/*
>>>>> + * The irq type capability allows IRQs unique to a specific device or
>>>>> + * class of devices to be exposed.
>>>>> + *
>>>>> + * The structures below define version 1 of this capability.
>>>>> + */
>>>>> +#define VFIO_IRQ_INFO_CAP_TYPE 3
>>>>> +
>>>>> +struct vfio_irq_info_cap_type {
>>>>> + struct vfio_info_cap_header header;
>>>>> + __u32 type; /* global per bus driver */
>>>>> + __u32 subtype; /* type specific */
>>>>> +};
>>>>> +
>>>>> +/*
>>>>> + * List of IRQ types, global per bus driver.
>>>>> + * If you introduce a new type, please add it here.
>>>>> + */
>>>>> +
>>>>> +/* Non PCI devices having MSI(s) support */
>>>>> +#define VFIO_IRQ_TYPE_MSI (1)
>>>>> +
>>>>> +/*
>>>>> + * The msi capability allows the user to use the msi msg to
>>>>> + * configure the iova for the msi configuration.
>>>>> + * The structures below define version 1 of this capability.
>>>>> + */
>>>>> +#define VFIO_IRQ_INFO_CAP_MSI_DESCS 4
>>>>> +
>>>>> +struct vfio_irq_msi_msg {
>>>>> + __u32 addr_lo;
>>>>> + __u32 addr_hi;
>>>>> + __u32 data;
>>>>> +};
>>>>> +
>>>>> +struct vfio_irq_info_cap_msi {
>>>>> + struct vfio_info_cap_header header;
>>>>> + __u32 nr_msgs;
>>>> I think you should align a __u32 reserved field to have a 64b alignment
>>> Sure.
>>>>> + struct vfio_irq_msi_msg msgs[];
>>>> Please can you clarify why this cap is needed versus your prior approach.
>>> In the previous patch, the reset module was configuring the device
>>> with MSI msg/data now since the module is not available, user space
>>> needs to have this data.
>>> With this approach userspace just needs the pairs <msg and ctx >
>>> associated with the MSI(s) and it can choose to configure the MSI(s)
>>> sources accordingly.
>>> Let me know if this approach does not look appropriate.
>> This comes to the question I asked in my previous email, ie. could you
>> give some more info about the expected MSI setup sequence? May be the
>> opportunity to enhance the commit message ;-)
>
> With our proposal, user space needs to know how many MSI sources there
> are as we want user space to map msi-msg(s) to specific MSI sources.
> Let us assume there are max ‘n’ MSI sources a platform device have,
> user-space requests ‘n’ msi(s) allocation in the kernel which in turn
> gets ‘n’ msi_msg with help of cap (VFIO_IRQ_INFO_CAP_MSI_DESCS). User
> space is free to map/configure msi_msg->data value to any particular
> source. Since msi_msg->data is one to one mapped to eventfd,
> user-space knows that particular eventfd corresponds to the same MSI
> source which it has configured with the msi_msg->data value.
>
> Steps:
> a) User space allocates nvec in kernel. Ioctl VFIO_DEVICE_SET_IRQS
> b) Kernel allocates nvec and does associated IRQ allocation, eventfd etc.
> c) User space gets the msi_msgs VFIO_DEVICE_GET_IRQ_INFO.
> d) User space configures the particular MSI source with msi_msg info.
> Do these steps look OK?
> We can add more information where this cap is introduced in the file
> uapi/linux/vfio.h.
>
> Thanks,
> Vikas
>>
>> Thanks
>>
>> Eric
>>>
>>> Thanks,
>>> Vikas
>>>>> +};
>>>>> +
>>>>> /**
>>>>> * VFIO_DEVICE_SET_IRQS - _IOW(VFIO_TYPE, VFIO_BASE + 10, struct vfio_irq_set)
>>>>> *
>>>>>
>>>> Thanks
>>>>
>>>> Eric
>>>>
>>
>