2020-09-30 19:59:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm

On Tue, Sep 15 2020 at 16:28, Dave Jiang wrote:
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> index a39392157dc2..115a8f49aab3 100644
> --- a/drivers/dma/Kconfig
> +++ b/drivers/dma/Kconfig
> @@ -301,6 +301,7 @@ config INTEL_IDXD_MDEV
> depends on INTEL_IDXD
> depends on VFIO_MDEV
> depends on VFIO_MDEV_DEVICE
> + depends on IMS_MSI_ARRAY

select?

> int idxd_mdev_host_init(struct idxd_device *idxd)
> {
> struct device *dev = &idxd->pdev->dev;

> + ims_info.max_slots = idxd->ims_size;
> + ims_info.slots = idxd->reg_base + idxd->ims_offset;
> + dev->msi_domain =
> pci_ims_array_create_msi_irq_domain(idxd->pdev, &ims_info);

1) creating the domain can fail and checking the return code is overrated

2) dev_set_msi_domain() exists for a reason. If we change any of this in
struct device then we can chase all the open coded mess in drivers
like this.

Also can you please explain how this is supposed to work?

idxd->pdev is the host PCI device. So why are you overwriting the MSI
domain of the underlying host device? This works by chance because you
allocate the regular MSIX interrupts for the host device _before_
invoking this.

IIRC, I provided you ASCII art to show how all of this is supposed to be
structured...

> int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx)
> {
> int rc = -1;
> @@ -44,15 +46,63 @@ int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx)
> return rc;
> }
>
> +#define IMS_PASID_ENABLE 0x8
> int vidxd_disable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx)

Yet more unreadable glue. The coding style of this stuff is horrible.

> {
> - /* PLACEHOLDER */
> + struct mdev_device *mdev = vidxd->vdev.mdev;
> + struct device *dev = mdev_dev(mdev);
> + unsigned int ims_offset;
> + struct idxd_device *idxd = vidxd->idxd;
> + u32 val;
> +
> + /*
> + * Current implementation limits to 1 WQ for the vdev and therefore
> + * also only 1 IMS interrupt for that vdev.
> + */
> + if (ims_idx >= VIDXD_MAX_WQS) {
> + dev_warn(dev, "ims_idx greater than vidxd allowed: %d\n", ims_idx);

This warning text makes no sense whatsoever.

> + return -EINVAL;
> + }
> +
> + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_idx] * 0x10;
> + val = ioread32(idxd->reg_base + ims_offset + 12);
> + val &= ~IMS_PASID_ENABLE;
> + iowrite32(val, idxd->reg_base + ims_offset + 12);

*0x10 + 12 !?!?

Reusing struct ims_slot from the irq chip driver would not be convoluted
enough, right?

Aside of that this is fiddling in the IMS storage array behind the irq
chips back without any comment here and a big fat comment about the
shared usage of ims_slot::ctrl in the irq chip driver.

This is kernel programming, not the obfuscated C code contest.

> + /* Setup the PASID filtering */
> + pasid = idxd_get_mdev_pasid(mdev);
> +
> + if (pasid >= 0) {
> + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_idx] * 0x10;
> + val = ioread32(idxd->reg_base + ims_offset + 12);
> + val |= IMS_PASID_ENABLE | (pasid << 12) | (val & 0x7);
> + iowrite32(val, idxd->reg_base + ims_offset + 12);

More magic numbers and more fiddling in the IMS slot.

> + } else {
> + dev_warn(dev, "pasid setup failed for ims entry %lld\n", vidxd->ims_index[ims_idx]);
> + return -ENXIO;
> + }
> +
> return 0;
> }
>
> @@ -839,12 +889,43 @@ static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask)
>
> void vidxd_free_ims_entries(struct vdcm_idxd *vidxd)
> {
> - /* PLACEHOLDER */
> + struct irq_domain *irq_domain;
> + struct mdev_device *mdev = vidxd->vdev.mdev;
> + struct device *dev = mdev_dev(mdev);
> + int i;
> +
> + for (i = 0; i < VIDXD_MAX_MSIX_VECS - 1; i++)
> + vidxd->ims_index[i] = -1;
> +
> + irq_domain = vidxd->idxd->pdev->dev.msi_domain;

See above.

> + msi_domain_free_irqs(irq_domain, dev);

> int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd)
> {
> - /* PLACEHOLDER */
> + struct irq_domain *irq_domain;
> + struct idxd_device *idxd = vidxd->idxd;
> + struct mdev_device *mdev = vidxd->vdev.mdev;
> + struct device *dev = mdev_dev(mdev);
> + int vecs = VIDXD_MAX_MSIX_VECS - 1;
> + struct msi_desc *entry;
> + struct ims_irq_entry *irq_entry;
> + int rc, i = 0;
> +
> + irq_domain = idxd->pdev->dev.msi_domain;

Ditto.

> + rc = msi_domain_alloc_irqs(irq_domain, dev, vecs);
> + if (rc < 0)
> + return rc;
> +
> + for_each_msi_entry(entry, dev) {
> + irq_entry = &vidxd->irq_entries[i];
> + irq_entry->vidxd = vidxd;
> + irq_entry->int_src = i;

Redundant information because it's the index in the array. What for?

> + irq_entry->irq = entry->irq;
> + vidxd->ims_index[i] = entry->device_msi.hwirq;

The point of having two arrays to store related information is?

It's at least orders of magnitudes better than the previous trainwreck,
but oh well...

Thanks,

tglx


2020-10-07 21:57:53

by Dave Jiang

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm



On 9/30/2020 12:57 PM, Thomas Gleixner wrote:
> On Tue, Sep 15 2020 at 16:28, Dave Jiang wrote:
>> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
>> index a39392157dc2..115a8f49aab3 100644
>> --- a/drivers/dma/Kconfig
>> +++ b/drivers/dma/Kconfig
>> @@ -301,6 +301,7 @@ config INTEL_IDXD_MDEV
>> depends on INTEL_IDXD
>> depends on VFIO_MDEV
>> depends on VFIO_MDEV_DEVICE
>> + depends on IMS_MSI_ARRAY
>
> select?

Will fix

>
>> int idxd_mdev_host_init(struct idxd_device *idxd)
>> {
>> struct device *dev = &idxd->pdev->dev;
>
>> + ims_info.max_slots = idxd->ims_size;
>> + ims_info.slots = idxd->reg_base + idxd->ims_offset;
>> + dev->msi_domain =
>> pci_ims_array_create_msi_irq_domain(idxd->pdev, &ims_info);
>
> 1) creating the domain can fail and checking the return code is overrated
>
> 2) dev_set_msi_domain() exists for a reason. If we change any of this in
> struct device then we can chase all the open coded mess in drivers
> like this.
>
> Also can you please explain how this is supposed to work?
>
> idxd->pdev is the host PCI device. So why are you overwriting the MSI
> domain of the underlying host device? This works by chance because you
> allocate the regular MSIX interrupts for the host device _before_
> invoking this.
>
> IIRC, I provided you ASCII art to show how all of this is supposed to be
> structured...

Yes. I see now that the implementation is wrong. In the updated code for next rev, I'm saving the domain to the idxd driver context.

ims_info.max_slots = idxd->ims_size;
ims_info.slots = idxd->reg_base + idxd->ims_offset;
idxd->ims_domain = pci_ims_array_create_msi_irq_domain(idxd->pdev, &ims_info);

dev_set_msi_domain() will be called with mdev later on when mdevs are being created.

struct device *dev = mdev_dev(mdev);

irq_domain = idxd->ims_domain;
dev_set_msi_domain(dev, irq_domain);
rc = msi_domain_alloc_irqs(irq_domain, dev, vecs);
if (rc < 0)
return rc;

for_each_msi_entry(entry, dev) {
irq_entry = &vidxd->irq_entries[i];
irq_entry->vidxd = vidxd;
irq_entry->entry = entry;
irq_entry->id = i;
i++;
}


>
>> int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx)
>> {
>> int rc = -1;
>> @@ -44,15 +46,63 @@ int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx)
>> return rc;
>> }
>>
>> +#define IMS_PASID_ENABLE 0x8
>> int vidxd_disable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx)
>
> Yet more unreadable glue. The coding style of this stuff is horrible.
>
>> {
>> - /* PLACEHOLDER */
>> + struct mdev_device *mdev = vidxd->vdev.mdev;
>> + struct device *dev = mdev_dev(mdev);
>> + unsigned int ims_offset;
>> + struct idxd_device *idxd = vidxd->idxd;
>> + u32 val;
>> +
>> + /*
>> + * Current implementation limits to 1 WQ for the vdev and therefore
>> + * also only 1 IMS interrupt for that vdev.
>> + */
>> + if (ims_idx >= VIDXD_MAX_WQS) {
>> + dev_warn(dev, "ims_idx greater than vidxd allowed: %d\n", ims_idx);
>
> This warning text makes no sense whatsoever.
>
>> + return -EINVAL;
>> + }
>> +
>> + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_idx] * 0x10;
>> + val = ioread32(idxd->reg_base + ims_offset + 12);
>> + val &= ~IMS_PASID_ENABLE;
>> + iowrite32(val, idxd->reg_base + ims_offset + 12);
>
> *0x10 + 12 !?!?
>
> Reusing struct ims_slot from the irq chip driver would not be convoluted
> enough, right?

Yes. Fixing that.

>
> Aside of that this is fiddling in the IMS storage array behind the irq
> chips back without any comment here and a big fat comment about the
> shared usage of ims_slot::ctrl in the irq chip driver.
>

This is to program the pasid fields in the IMS table entry. Was thinking the pasid fields may be considered device specific so didn't attempt to add the support to the core code.

> This is kernel programming, not the obfuscated C code contest.
>
>> + /* Setup the PASID filtering */
>> + pasid = idxd_get_mdev_pasid(mdev);
>> +
>> + if (pasid >= 0) {
>> + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_idx] * 0x10;
>> + val = ioread32(idxd->reg_base + ims_offset + 12);
>> + val |= IMS_PASID_ENABLE | (pasid << 12) | (val & 0x7);
>> + iowrite32(val, idxd->reg_base + ims_offset + 12);
>
> More magic numbers and more fiddling in the IMS slot.
>
>> + } else {
>> + dev_warn(dev, "pasid setup failed for ims entry %lld\n", vidxd->ims_index[ims_idx]);
>> + return -ENXIO;
>> + }
>> +
>> return 0;
>> }
>>
>> @@ -839,12 +889,43 @@ static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask)
>>
>> void vidxd_free_ims_entries(struct vdcm_idxd *vidxd)
>> {
>> - /* PLACEHOLDER */
>> + struct irq_domain *irq_domain;
>> + struct mdev_device *mdev = vidxd->vdev.mdev;
>> + struct device *dev = mdev_dev(mdev);
>> + int i;
>> +
>> + for (i = 0; i < VIDXD_MAX_MSIX_VECS - 1; i++)
>> + vidxd->ims_index[i] = -1;
>> +
>> + irq_domain = vidxd->idxd->pdev->dev.msi_domain;
>
> See above.

struct device *dev = mdev_dev(mdev);

irq_domain = dev_get_msi_domain(dev);

>
>> + msi_domain_free_irqs(irq_domain, dev);
>
>> int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd)
>> {
>> - /* PLACEHOLDER */
>> + struct irq_domain *irq_domain;
>> + struct idxd_device *idxd = vidxd->idxd;
>> + struct mdev_device *mdev = vidxd->vdev.mdev;
>> + struct device *dev = mdev_dev(mdev);
>> + int vecs = VIDXD_MAX_MSIX_VECS - 1;
>> + struct msi_desc *entry;
>> + struct ims_irq_entry *irq_entry;
>> + int rc, i = 0;
>> +
>> + irq_domain = idxd->pdev->dev.msi_domain;
>
> Ditto.
>
>> + rc = msi_domain_alloc_irqs(irq_domain, dev, vecs);
>> + if (rc < 0)
>> + return rc;
>> +
>> + for_each_msi_entry(entry, dev) {
>> + irq_entry = &vidxd->irq_entries[i];
>> + irq_entry->vidxd = vidxd;
>> + irq_entry->int_src = i;
>
> Redundant information because it's the index in the array. What for?

Yes. I'm setting a ptr to the entry in order to retrieve the needed info. No more duplication.

>
>> + irq_entry->irq = entry->irq;
>> + vidxd->ims_index[i] = entry->device_msi.hwirq;
>
> The point of having two arrays to store related information is?
>
> It's at least orders of magnitudes better than the previous trainwreck,
> but oh well...
>
> Thanks,
>
> tglx
>

2020-10-08 08:58:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm

On Wed, Oct 07 2020 at 14:54, Dave Jiang wrote:
> On 9/30/2020 12:57 PM, Thomas Gleixner wrote:
>> Aside of that this is fiddling in the IMS storage array behind the irq
>> chips back without any comment here and a big fat comment about the
>> shared usage of ims_slot::ctrl in the irq chip driver.
>>
> This is to program the pasid fields in the IMS table entry. Was
> thinking the pasid fields may be considered device specific so didn't
> attempt to add the support to the core code.

Well, the problem is that this is not really irq chip functionality.

But the PASID programming needs to touch the IMS storage which is also
touched by the irq chip.

This might be correct as is, but without a big fat comment explaining
WHY it is safe to do so without any form of serialization this is just
voodoo and unreviewable.

Can you please explain when the PASID is programmed and what the state
of the interrupt is at that point? Is this a one off setup operation or
does this happen dynamically at random points during runtime?

This needs to be clarified first.

Thanks,

tglx


2020-10-08 18:25:19

by Dave Jiang

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm



On 10/8/2020 12:39 AM, Thomas Gleixner wrote:
> On Wed, Oct 07 2020 at 14:54, Dave Jiang wrote:
>> On 9/30/2020 12:57 PM, Thomas Gleixner wrote:
>>> Aside of that this is fiddling in the IMS storage array behind the irq
>>> chips back without any comment here and a big fat comment about the
>>> shared usage of ims_slot::ctrl in the irq chip driver.
>>>
>> This is to program the pasid fields in the IMS table entry. Was
>> thinking the pasid fields may be considered device specific so didn't
>> attempt to add the support to the core code.
>
> Well, the problem is that this is not really irq chip functionality.
>
> But the PASID programming needs to touch the IMS storage which is also
> touched by the irq chip.
>
> This might be correct as is, but without a big fat comment explaining
> WHY it is safe to do so without any form of serialization this is just
> voodoo and unreviewable.
>
> Can you please explain when the PASID is programmed and what the state
> of the interrupt is at that point? Is this a one off setup operation or
> does this happen dynamically at random points during runtime?

I will put in comments for the function to explain why and when we modify the
pasid field for the IMS entry. Programming of the pasid is done right before we
request irq. And the clearing is done after we free the irq. We will not be
touching the IMS field at runtime. So the touching of the entry should be safe.

>
> This needs to be clarified first.
>
> Thanks,
>
> tglx
>
>

2020-10-09 00:37:20

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm

Dave,

On Thu, Oct 08 2020 at 09:51, Dave Jiang wrote:
> On 10/8/2020 12:39 AM, Thomas Gleixner wrote:
>> On Wed, Oct 07 2020 at 14:54, Dave Jiang wrote:
>>> On 9/30/2020 12:57 PM, Thomas Gleixner wrote:
>>>> Aside of that this is fiddling in the IMS storage array behind the irq
>>>> chips back without any comment here and a big fat comment about the
>>>> shared usage of ims_slot::ctrl in the irq chip driver.
>>>>
>>> This is to program the pasid fields in the IMS table entry. Was
>>> thinking the pasid fields may be considered device specific so didn't
>>> attempt to add the support to the core code.
>>
>> Well, the problem is that this is not really irq chip functionality.
>>
>> But the PASID programming needs to touch the IMS storage which is also
>> touched by the irq chip.
>>
>> This might be correct as is, but without a big fat comment explaining
>> WHY it is safe to do so without any form of serialization this is just
>> voodoo and unreviewable.
>>
>> Can you please explain when the PASID is programmed and what the state
>> of the interrupt is at that point? Is this a one off setup operation or
>> does this happen dynamically at random points during runtime?
>
> I will put in comments for the function to explain why and when we modify the
> pasid field for the IMS entry. Programming of the pasid is done right before we
> request irq. And the clearing is done after we free the irq. We will not be
> touching the IMS field at runtime. So the touching of the entry should be safe.

Thanks for clarifying that.

Thinking more about it, that very same thing will be needed for any
other IMS device and of course this is not going to end well because
some driver will fiddle with the PASID at the wrong time.

Find below an infrastructure patch which adds ASID support to the core
and a delta patch for the IMS irq chip. All of it neither compiled nor
tested.

Setting IMS_PASID_ENABLE might have other conditions which I don't know
of, so the IMS part might be completely wrong, but you get the idea.

Thanks,

tglx

8<-----------
Subject: genirq: Provide irq_set_asid()
From: Thomas Gleixner <[email protected]>
Date: Thu, 08 Oct 2020 23:32:16 +0200

Provide storage and a setter for an Address Space IDentifier.

The identifier is stored in the top level irq_data and it only can be
modified when the interrupt is not requested.

Add the necessary storage and helper functions and validate that interrupts
which require an ASID have one assigned.

Signed-off-by: Thomas Gleixner <[email protected]>
---
include/linux/irq.h | 8 ++++++++
kernel/irq/internals.h | 10 ++++++++++
kernel/irq/irqdesc.c | 1 +
kernel/irq/manage.c | 40 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 59 insertions(+)

--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -161,6 +161,7 @@ struct irq_common_data {
* @mask: precomputed bitmask for accessing the chip registers
* @irq: interrupt number
* @hwirq: hardware interrupt number, local to the interrupt domain
+ * @asid: Optional address space ID for this interrupt.
* @common: point to data shared by all irqchips
* @chip: low level interrupt hardware access
* @domain: Interrupt translation domain; responsible for mapping
@@ -174,6 +175,7 @@ struct irq_data {
u32 mask;
unsigned int irq;
unsigned long hwirq;
+ u32 asid;
struct irq_common_data *common;
struct irq_chip *chip;
struct irq_domain *domain;
@@ -183,6 +185,8 @@ struct irq_data {
void *chip_data;
};

+#define IRQ_INVALID_ASID U32_MAX
+
/*
* Bit masks for irq_common_data.state_use_accessors
*
@@ -555,6 +559,8 @@ struct irq_chip {
* IRQCHIP_EOI_THREADED: Chip requires eoi() on unmask in threaded mode
* IRQCHIP_SUPPORTS_LEVEL_MSI Chip can provide two doorbells for Level MSIs
* IRQCHIP_SUPPORTS_NMI: Chip can deliver NMIs, only for root irqchips
+ * IRQCHIP_REQUIRES_ASID: Interrupt chip requires a valid Address Space
+ * IDentifier
*/
enum {
IRQCHIP_SET_TYPE_MASKED = (1 << 0),
@@ -566,6 +572,7 @@ enum {
IRQCHIP_EOI_THREADED = (1 << 6),
IRQCHIP_SUPPORTS_LEVEL_MSI = (1 << 7),
IRQCHIP_SUPPORTS_NMI = (1 << 8),
+ IRQCHIP_REQUIRES_ASID = (1 << 9),
};

#include <linux/irqdesc.h>
@@ -792,6 +799,7 @@ extern int irq_set_irq_type(unsigned int
extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry);
extern int irq_set_msi_desc_off(unsigned int irq_base, unsigned int irq_offset,
struct msi_desc *entry);
+extern int irq_set_asid(unsigned int irq, u32 asid);
extern struct irq_data *irq_get_irq_data(unsigned int irq);

static inline struct irq_chip *irq_get_chip(unsigned int irq)
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -281,6 +281,16 @@ static inline void
irq_pm_remove_action(struct irq_desc *desc, struct irqaction *action) { }
#endif

+static inline bool irq_requires_asid(struct irq_desc *desc)
+{
+ return desc->irq_data.chip->flags & IRQCHIP_REQUIRES_ASID;
+}
+
+static inline bool irq_invalid_asid(struct irq_desc *desc)
+{
+ return desc->irq_data.asid == IRQ_INVALID_ASID;
+}
+
#ifdef CONFIG_IRQ_TIMINGS

#define IRQ_TIMINGS_SHIFT 5
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -115,6 +115,7 @@ static void desc_set_defaults(unsigned i
irq_settings_clr_and_set(desc, ~0, _IRQ_DEFAULT_INIT_FLAGS);
irqd_set(&desc->irq_data, IRQD_IRQ_DISABLED);
irqd_set(&desc->irq_data, IRQD_IRQ_MASKED);
+ desc->irq_data.asid = IRQ_INVALID_ASID;
desc->handle_irq = handle_bad_irq;
desc->depth = 1;
desc->irq_count = 0;
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1520,6 +1520,22 @@ static int
}

/*
+ * If the topmost interrupt chip requires that a valid Address
+ * Space IDentifier is set, validate that the interrupt is not
+ * shared and that a valid ASID is set.
+ */
+ if (irq_requires_asid(desc)) {
+ if (shared || irq_invalid_asid(desc)) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+ /*
+ * CHECKME: Is there a way to figure out that the ASID
+ * makes sense in the context of the caller?
+ */
+ }
+
+ /*
* Setup the thread mask for this irqaction for ONESHOT. For
* !ONESHOT irqs the thread mask is 0 so we can avoid a
* conditional in irq_wake_thread().
@@ -1848,6 +1864,8 @@ static struct irqaction *__free_irq(stru
*/
raw_spin_lock_irqsave(&desc->lock, flags);
irq_domain_deactivate_irq(&desc->irq_data);
+ /* Invalidate the ASID associated to this interrupt */
+ desc->irq_data.asid = IRQ_INVALID_ASID;
raw_spin_unlock_irqrestore(&desc->lock, flags);

irq_release_resources(desc);
@@ -2752,3 +2770,25 @@ int irq_set_irqchip_state(unsigned int i
return err;
}
EXPORT_SYMBOL_GPL(irq_set_irqchip_state);
+
+int irq_set_asid(unsigned int irq, u32 asid)
+{
+ struct irq_desc *desc;
+ struct irq_data *data;
+ unsigned long flags;
+ int err = -EBUSY;
+
+ desc = irq_get_desc_buslock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
+ if (!desc)
+ return -EINVAL;
+
+ /* If the interrupt is active changig ASID is a NONO */
+ if (!desc->action) {
+ data = irq_desc_get_irq_data(desc);
+ data->asid = asid;
+ }
+
+ irq_put_desc_busunlock(desc, flags);
+ return err;
+}
+EXPORT_SYMBOL_GPL(irq_set_asid);

8<------------------

Subject: irqchip/ims: PASID support
From: Thomas Gleixner <[email protected]>
Date: Fri, 09 Oct 2020 01:02:09 +0200

NOT-Signed-off-by: Thomas Gleixner <[email protected]>
---
drivers/irqchip/irq-ims-msi.c | 29 ++++++++++++++++++++++-------
include/linux/irqchip/irq-ims-msi.h | 2 ++
2 files changed, 24 insertions(+), 7 deletions(-)

--- a/drivers/irqchip/irq-ims-msi.c
+++ b/drivers/irqchip/irq-ims-msi.c
@@ -18,14 +18,26 @@ struct ims_array_data {
unsigned long map[0];
};

+#define IMS_ASID_SHIFT 12
+
+static inline u32 ims_ctrl_val(struct irq_data *data, u32 ctrl)
+{
+ return ctrl | (data->asid) << 12 | IMS_PASID_ENABLE;
+}
+
+static inline void iowrite32_and_flush(u32 value, void __iomem *addr)
+{
+ iowrite32(value, addr);
+ ioread32(addr);
+}
+
static void ims_array_mask_irq(struct irq_data *data)
{
struct msi_desc *desc = irq_data_get_msi_desc(data);
struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
u32 __iomem *ctrl = &slot->ctrl;

- iowrite32(ioread32(ctrl) | IMS_VECTOR_CTRL_MASK, ctrl);
- ioread32(ctrl); /* Flush write to device */
+ iowrite32_and_flush(ims_ctrl_val(data, IMS_VECTOR_CTRL_MASK), ctrl);
}

static void ims_array_unmask_irq(struct irq_data *data)
@@ -34,7 +46,10 @@ static void ims_array_unmask_irq(struct
struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
u32 __iomem *ctrl = &slot->ctrl;

- iowrite32(ioread32(ctrl) & ~IMS_VECTOR_CTRL_MASK, ctrl);
+ if (!WARN_ON_ONCE(data->asid == IRQ_INVALID_ASID))
+ return;
+
+ iowrite32_and_flush(ims_ctrl_val(data, 0), ctrl);
}

static void ims_array_write_msi_msg(struct irq_data *data, struct msi_msg *msg)
@@ -44,8 +59,7 @@ static void ims_array_write_msi_msg(stru

iowrite32(msg->address_lo, &slot->address_lo);
iowrite32(msg->address_hi, &slot->address_hi);
- iowrite32(msg->data, &slot->data);
- ioread32(slot);
+ iowrite32_and_flush(msg->data, &slot->data);
}

static const struct irq_chip ims_array_msi_controller = {
@@ -54,7 +68,8 @@ static const struct irq_chip ims_array_m
.irq_unmask = ims_array_unmask_irq,
.irq_write_msi_msg = ims_array_write_msi_msg,
.irq_retrigger = irq_chip_retrigger_hierarchy,
- .flags = IRQCHIP_SKIP_SET_WAKE,
+ .flags = IRQCHIP_SKIP_SET_WAKE |
+ IRQCHIP_REQUIRES_ASID,
};

static void ims_array_reset_slot(struct ims_slot __iomem *slot)
@@ -62,7 +77,7 @@ static void ims_array_reset_slot(struct
iowrite32(0, &slot->address_lo);
iowrite32(0, &slot->address_hi);
iowrite32(0, &slot->data);
- iowrite32(0, &slot->ctrl);
+ iowrite32_and_flush(IMS_VECTOR_CTRL_MASK, &slot->ctrl);
}

static void ims_array_free_msi_store(struct irq_domain *domain,
--- a/include/linux/irqchip/irq-ims-msi.h
+++ b/include/linux/irqchip/irq-ims-msi.h
@@ -25,6 +25,8 @@ struct ims_slot {

/* Bit to mask the interrupt in ims_hw_slot::ctrl */
#define IMS_VECTOR_CTRL_MASK 0x01
+/* Bit to enable PASID in ims_hw_slot::ctrl */
+#define IMS_PASID_ENABLE 0x08

/**
* struct ims_array_info - Information to create an IMS array domain

2020-10-09 00:49:05

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm

On Fri, Oct 09, 2020 at 01:17:38AM +0200, Thomas Gleixner wrote:
> Dave,
>
> On Thu, Oct 08 2020 at 09:51, Dave Jiang wrote:
> > On 10/8/2020 12:39 AM, Thomas Gleixner wrote:
> >> On Wed, Oct 07 2020 at 14:54, Dave Jiang wrote:
> >>> On 9/30/2020 12:57 PM, Thomas Gleixner wrote:
> >>>> Aside of that this is fiddling in the IMS storage array behind the irq
> >>>> chips back without any comment here and a big fat comment about the
> >>>> shared usage of ims_slot::ctrl in the irq chip driver.
> >>>>
> >>> This is to program the pasid fields in the IMS table entry. Was
> >>> thinking the pasid fields may be considered device specific so didn't
> >>> attempt to add the support to the core code.
> >>
> >> Well, the problem is that this is not really irq chip functionality.
> >>
> >> But the PASID programming needs to touch the IMS storage which is also
> >> touched by the irq chip.
> >>
> >> This might be correct as is, but without a big fat comment explaining
> >> WHY it is safe to do so without any form of serialization this is just
> >> voodoo and unreviewable.
> >>
> >> Can you please explain when the PASID is programmed and what the state
> >> of the interrupt is at that point? Is this a one off setup operation or
> >> does this happen dynamically at random points during runtime?
> >
> > I will put in comments for the function to explain why and when we modify the
> > pasid field for the IMS entry. Programming of the pasid is done right before we
> > request irq. And the clearing is done after we free the irq. We will not be
> > touching the IMS field at runtime. So the touching of the entry should be safe.
>
> Thanks for clarifying that.
>
> Thinking more about it, that very same thing will be needed for any
> other IMS device and of course this is not going to end well because
> some driver will fiddle with the PASID at the wrong time.

Why? This looks like some quirk of the IDXD HW where it just randomly
put PASID along with the IRQ mask register. Probably because PASID is
not the full 32 bits.

AFAIK the PASID is not tagged on the MemWr TLP triggering the
interrupt, so it really is unrelated to the irq.

I think the ioread to get the PASID is rather ugly, it should pluck
the PASID out of some driver specific data structure with proper
locking, and thus use the sleepable version of the irqchip?

This is really not that different from what I was describing for queue
contexts - the queue context needs to be assigned to the irq # before
it can be used in the irq chip other wise there is no idea where to
write the msg to. Just like pasid here.

Jason

2020-10-09 01:24:08

by Ashok Raj

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm

Hi Jason

On Thu, Oct 08, 2020 at 08:32:10PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 09, 2020 at 01:17:38AM +0200, Thomas Gleixner wrote:
> > Dave,
> >
> > On Thu, Oct 08 2020 at 09:51, Dave Jiang wrote:
> > > On 10/8/2020 12:39 AM, Thomas Gleixner wrote:
> > >> On Wed, Oct 07 2020 at 14:54, Dave Jiang wrote:
> > >>> On 9/30/2020 12:57 PM, Thomas Gleixner wrote:
> > >>>> Aside of that this is fiddling in the IMS storage array behind the irq
> > >>>> chips back without any comment here and a big fat comment about the
> > >>>> shared usage of ims_slot::ctrl in the irq chip driver.
> > >>>>
> > >>> This is to program the pasid fields in the IMS table entry. Was
> > >>> thinking the pasid fields may be considered device specific so didn't
> > >>> attempt to add the support to the core code.
> > >>
> > >> Well, the problem is that this is not really irq chip functionality.
> > >>
> > >> But the PASID programming needs to touch the IMS storage which is also
> > >> touched by the irq chip.
> > >>
> > >> This might be correct as is, but without a big fat comment explaining
> > >> WHY it is safe to do so without any form of serialization this is just
> > >> voodoo and unreviewable.
> > >>
> > >> Can you please explain when the PASID is programmed and what the state
> > >> of the interrupt is at that point? Is this a one off setup operation or
> > >> does this happen dynamically at random points during runtime?
> > >
> > > I will put in comments for the function to explain why and when we modify the
> > > pasid field for the IMS entry. Programming of the pasid is done right before we
> > > request irq. And the clearing is done after we free the irq. We will not be
> > > touching the IMS field at runtime. So the touching of the entry should be safe.
> >
> > Thanks for clarifying that.
> >
> > Thinking more about it, that very same thing will be needed for any
> > other IMS device and of course this is not going to end well because
> > some driver will fiddle with the PASID at the wrong time.
>
> Why? This looks like some quirk of the IDXD HW where it just randomly
> put PASID along with the IRQ mask register. Probably because PASID is
> not the full 32 bits.

Not randomly put there Jason :-).. There is a good reason for it. I'm sure
Dave must have responded already. ENQCMD for DSA has the interrupt handle
on which the notification should be sent. Since the data from from user
space HW will verify if the PASID for IMS entry matches what is there in
the descriptor.

Check description in section 9.2.2.1 of the DSA specification, when PASID
enable is 1, this field is checked against the PASID field of the
descriptor. Also check Section 5.4 and Interrupt Virtualization 7.3.3 for
more info.

>
> AFAIK the PASID is not tagged on the MemWr TLP triggering the
> interrupt, so it really is unrelated to the irq.

Correct, the purpose is not to send PASID prefix for interrupt tranactions.

>
> I think the ioread to get the PASID is rather ugly, it should pluck

Where do you see the ioread? I suppose idxd driver will fill in from the
aux_domain default PASID. Not reading from the device IMS entry.

> the PASID out of some driver specific data structure with proper
> locking, and thus use the sleepable version of the irqchip?
>
> This is really not that different from what I was describing for queue
> contexts - the queue context needs to be assigned to the irq # before
> it can be used in the irq chip other wise there is no idea where to
> write the msg to. Just like pasid here.

Sorry, I don't follow you on this.. you mean context in hardware or user
context that holds interrupt addr/data values?

2020-10-09 02:39:12

by Dave Jiang

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm



On 10/8/2020 4:32 PM, Jason Gunthorpe wrote:
> On Fri, Oct 09, 2020 at 01:17:38AM +0200, Thomas Gleixner wrote:
>> Dave,
>>
>> On Thu, Oct 08 2020 at 09:51, Dave Jiang wrote:
>>> On 10/8/2020 12:39 AM, Thomas Gleixner wrote:
>>>> On Wed, Oct 07 2020 at 14:54, Dave Jiang wrote:
>>>>> On 9/30/2020 12:57 PM, Thomas Gleixner wrote:
>>>>>> Aside of that this is fiddling in the IMS storage array behind the irq
>>>>>> chips back without any comment here and a big fat comment about the
>>>>>> shared usage of ims_slot::ctrl in the irq chip driver.
>>>>>>
>>>>> This is to program the pasid fields in the IMS table entry. Was
>>>>> thinking the pasid fields may be considered device specific so didn't
>>>>> attempt to add the support to the core code.
>>>>
>>>> Well, the problem is that this is not really irq chip functionality.
>>>>
>>>> But the PASID programming needs to touch the IMS storage which is also
>>>> touched by the irq chip.
>>>>
>>>> This might be correct as is, but without a big fat comment explaining
>>>> WHY it is safe to do so without any form of serialization this is just
>>>> voodoo and unreviewable.
>>>>
>>>> Can you please explain when the PASID is programmed and what the state
>>>> of the interrupt is at that point? Is this a one off setup operation or
>>>> does this happen dynamically at random points during runtime?
>>>
>>> I will put in comments for the function to explain why and when we modify the
>>> pasid field for the IMS entry. Programming of the pasid is done right before we
>>> request irq. And the clearing is done after we free the irq. We will not be
>>> touching the IMS field at runtime. So the touching of the entry should be safe.
>>
>> Thanks for clarifying that.
>>
>> Thinking more about it, that very same thing will be needed for any
>> other IMS device and of course this is not going to end well because
>> some driver will fiddle with the PASID at the wrong time.
>
> Why? This looks like some quirk of the IDXD HW where it just randomly
> put PASID along with the IRQ mask register. Probably because PASID is
> not the full 32 bits.

The hardware checks that the PASID in the descriptor matches the PASID in the
IMS entry, to prevent user-mode software from arbitrarily choosing any interrupt
vector it wants. User mode software has to request an IMS entry from the kernel
driver and the driver fills in the PASID in the IMS so that only that process
can use that IMS entry.

>
> AFAIK the PASID is not tagged on the MemWr TLP triggering the
> interrupt, so it really is unrelated to the irq.
>
> I think the ioread to get the PASID is rather ugly, it should pluck
> the PASID out of some driver specific data structure with proper
> locking, and thus use the sleepable version of the irqchip?
>
> This is really not that different from what I was describing for queue
> contexts - the queue context needs to be assigned to the irq # before
> it can be used in the irq chip other wise there is no idea where to
> write the msg to. Just like pasid here.
>
> Jason
>

2020-10-09 14:46:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm

On Thu, Oct 08 2020 at 20:32, Jason Gunthorpe wrote:
> On Fri, Oct 09, 2020 at 01:17:38AM +0200, Thomas Gleixner wrote:
>> Thinking more about it, that very same thing will be needed for any
>> other IMS device and of course this is not going to end well because
>> some driver will fiddle with the PASID at the wrong time.
>
> Why? This looks like some quirk of the IDXD HW where it just randomly
> put PASID along with the IRQ mask register. Probably because PASID is
> not the full 32 bits.
>
> AFAIK the PASID is not tagged on the MemWr TLP triggering the
> interrupt, so it really is unrelated to the irq.

Right you are. With brain awake this does not make sense.

> I think the ioread to get the PASID is rather ugly, it should pluck
> the PASID out of some driver specific data structure with proper
> locking, and thus use the sleepable version of the irqchip?
>
> This is really not that different from what I was describing for queue
> contexts - the queue context needs to be assigned to the irq # before
> it can be used in the irq chip other wise there is no idea where to
> write the msg to. Just like pasid here.

Not really. In the IDXD case the storage is known when the host device
and the irq domain is initialized which is not the case for your variant
and it neither needs to send a magic command to the device to update the
data.

Due to the already explained racy behaviour of MSI affinity changes when
interrupt remapping is disabled, that async update would just not work
and I really don't want to see the resulting tinkering to paper over the
problem. TBH, even if that problem would not exist, my faith in driver
writers getting any of this correct is close to zero and I rather spend
a few brain cycles now than staring at the creative mess later.

So putting the brainfart of yesterday aside, we can simply treat this as
auxiliary data.

The patch below implements an interface for that and adds support to the
IMS driver. That interface is properly serialized and does not put any
restrictions on when this is called. (I know another interrupt chip
which could be simplified that way).

All the IDXD driver has to do is:

auxval = ims_ctrl_pasid_aux(pasid, enabled);
irq_set_auxdata(irqnr, IMS_AUXDATA_CONTROL_WORD, auxval);

I agree that irq_set_auxdata() is not the most elegant thing, but the
alternative solutions I looked at are just worse.

For now I just kept the data in the IMS storage which still requires an
ioread(), but these interrupts are not frequently masked and unmasked
during normal operation so performance is not a concern and caching that
value is an orthogonal optimization if someone cares.

Thanks,

tglx

8<-------------

drivers/irqchip/irq-ims-msi.c | 35 +++++++++++++++++++++++++++++------
include/linux/interrupt.h | 2 ++
include/linux/irq.h | 4 ++++
include/linux/irqchip/irq-ims-msi.h | 25 ++++++++++++++++++++++++-
kernel/irq/manage.c | 32 ++++++++++++++++++++++++++++++++
5 files changed, 91 insertions(+), 7 deletions(-)

--- a/drivers/irqchip/irq-ims-msi.c
+++ b/drivers/irqchip/irq-ims-msi.c
@@ -18,14 +18,19 @@ struct ims_array_data {
unsigned long map[0];
};

+static inline void iowrite32_and_flush(u32 value, void __iomem *addr)
+{
+ iowrite32(value, addr);
+ ioread32(addr);
+}
+
static void ims_array_mask_irq(struct irq_data *data)
{
struct msi_desc *desc = irq_data_get_msi_desc(data);
struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
u32 __iomem *ctrl = &slot->ctrl;

- iowrite32(ioread32(ctrl) | IMS_VECTOR_CTRL_MASK, ctrl);
- ioread32(ctrl); /* Flush write to device */
+ iowrite32_and_flush(ioread32(ctrl) | IMS_CTRL_VECTOR_MASKBIT, ctrl);
}

static void ims_array_unmask_irq(struct irq_data *data)
@@ -34,7 +39,7 @@ static void ims_array_unmask_irq(struct
struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
u32 __iomem *ctrl = &slot->ctrl;

- iowrite32(ioread32(ctrl) & ~IMS_VECTOR_CTRL_MASK, ctrl);
+ iowrite32_and_flush(ioread32(ctrl) & ~IMS_CTRL_VECTOR_MASKBIT, ctrl);
}

static void ims_array_write_msi_msg(struct irq_data *data, struct msi_msg *msg)
@@ -44,8 +49,24 @@ static void ims_array_write_msi_msg(stru

iowrite32(msg->address_lo, &slot->address_lo);
iowrite32(msg->address_hi, &slot->address_hi);
- iowrite32(msg->data, &slot->data);
- ioread32(slot);
+ iowrite32_and_flush(msg->data, &slot->data);
+}
+
+static int ims_array_set_auxdata(struct irq_data *data, unsigned int which,
+ u64 auxval)
+{
+ struct msi_desc *desc = irq_data_get_msi_desc(data);
+ struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
+ u32 val, __iomem *ctrl = &slot->ctrl;
+
+ if (which != IMS_AUXDATA_CONTROL_WORD)
+ return -EINVAL;
+ if (auxval & ~(u64)IMS_CONTROL_WORD_AUXMASK)
+ return -EINVAL;
+
+ val = ioread32(ctrl) & IMS_CONTROL_WORD_IRQMASK;
+ iowrite32_and_flush(val | (u32) auxval, ctrl);
+ return 0;
}

static const struct irq_chip ims_array_msi_controller = {
@@ -53,6 +74,7 @@ static const struct irq_chip ims_array_m
.irq_mask = ims_array_mask_irq,
.irq_unmask = ims_array_unmask_irq,
.irq_write_msi_msg = ims_array_write_msi_msg,
+ .irq_set_auxdata = ims_array_set_auxdata,
.irq_retrigger = irq_chip_retrigger_hierarchy,
.flags = IRQCHIP_SKIP_SET_WAKE,
};
@@ -62,7 +84,7 @@ static void ims_array_reset_slot(struct
iowrite32(0, &slot->address_lo);
iowrite32(0, &slot->address_hi);
iowrite32(0, &slot->data);
- iowrite32(0, &slot->ctrl);
+ iowrite32_and_flush(IMS_CTRL_VECTOR_MASKBIT, &slot->ctrl);
}

static void ims_array_free_msi_store(struct irq_domain *domain,
@@ -97,6 +119,7 @@ static int ims_array_alloc_msi_store(str
goto fail;
set_bit(idx, ims->map);
entry->device_msi.priv_iomem = &ims->info.slots[idx];
+ ims_array_reset_slot(entry->device_msi.priv_iomem);
entry->device_msi.hwirq = idx;
}
return 0;
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -487,6 +487,8 @@ extern int irq_get_irqchip_state(unsigne
extern int irq_set_irqchip_state(unsigned int irq, enum irqchip_irq_state which,
bool state);

+int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val);
+
#ifdef CONFIG_IRQ_FORCED_THREADING
# ifdef CONFIG_PREEMPT_RT
# define force_irqthreads (true)
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -481,6 +481,8 @@ static inline irq_hw_number_t irqd_to_hw
* irq_request_resources
* @irq_compose_msi_msg: optional to compose message content for MSI
* @irq_write_msi_msg: optional to write message content for MSI
+ * @irq_set_auxdata: Optional function to update auxiliary data e.g. in
+ * shared registers
* @irq_get_irqchip_state: return the internal state of an interrupt
* @irq_set_irqchip_state: set the internal state of a interrupt
* @irq_set_vcpu_affinity: optional to target a vCPU in a virtual machine
@@ -528,6 +530,8 @@ struct irq_chip {
void (*irq_compose_msi_msg)(struct irq_data *data, struct msi_msg *msg);
void (*irq_write_msi_msg)(struct irq_data *data, struct msi_msg *msg);

+ int (*irq_set_auxdata)(struct irq_data *data, unsigned int which, u64 auxval);
+
int (*irq_get_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool *state);
int (*irq_set_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool state);

--- a/include/linux/irqchip/irq-ims-msi.h
+++ b/include/linux/irqchip/irq-ims-msi.h
@@ -5,6 +5,7 @@
#define _LINUX_IRQCHIP_IRQ_IMS_MSI_H

#include <linux/types.h>
+#include <linux/bits.h>

/**
* ims_hw_slot - The hardware layout of an IMS based MSI message
@@ -23,8 +24,30 @@ struct ims_slot {
u32 ctrl;
} __packed;

+/*
+ * The IMS control word utilizes bit 0-2 for interrupt control. The remaining
+ * bits can contain auxiliary data.
+ */
+#define IMS_CONTROL_WORD_IRQMASK GENMASK(2, 0)
+#define IMS_CONTROL_WORD_AUXMASK GENMASK(31, 3)
+
/* Bit to mask the interrupt in ims_hw_slot::ctrl */
-#define IMS_VECTOR_CTRL_MASK 0x01
+#define IMS_CTRL_VECTOR_MASKBIT BIT(0)
+
+/* Auxiliary control word data related defines */
+enum {
+ IMS_AUXDATA_CONTROL_WORD,
+};
+
+#define IMS_CTRL_PASID_ENABLE BIT(3)
+#define IMS_CTRL_PASID_SHIFT 12
+
+static inline u32 ims_ctrl_pasid_aux(unsigned int pasid, bool enable)
+{
+ u32 auxval = pasid << IMS_CTRL_PASID_SHIFT;
+
+ return enable ? auxval | IMS_CTRL_PASID_ENABLE : auxval;
+}

/**
* struct ims_array_info - Information to create an IMS array domain
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -2752,3 +2752,35 @@ int irq_set_irqchip_state(unsigned int i
return err;
}
EXPORT_SYMBOL_GPL(irq_set_irqchip_state);
+
+/**
+ * irq_set_auxdata - Set auxiliary data
+ * @irq: Interrupt to update
+ * @which: Selector which data to update
+ * @auxval: Auxiliary data value
+ *
+ * Function to update auxiliary data for an interrupt, e.g. to update data
+ * which is stored in a shared register or data storage (e.g. IMS).
+ */
+int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val)
+{
+ struct irq_desc *desc;
+ struct irq_data *data;
+ unsigned long flags;
+ int res = -ENODEV;
+
+ desc = irq_get_desc_buslock(irq, &flags, 0);
+ if (!desc)
+ return -EINVAL;
+
+ for (data = &desc->irq_data; data; data = irqd_get_parent_data(data)) {
+ if (data->chip->irq_set_auxdata) {
+ res = data->chip->irq_set_auxdata(data, which, val);
+ break;
+ }
+ }
+
+ irq_put_desc_busunlock(desc, flags);
+ return res;
+}
+EXPORT_SYMBOL_GPL(irq_set_auxdata);

2020-10-09 18:34:40

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm

On Thu, Oct 08, 2020 at 06:22:31PM -0700, Raj, Ashok wrote:

> Not randomly put there Jason :-).. There is a good reason for it.

Sure the PASID value being associated with the IRQ make sense, but
combining that register with the interrupt mask is just a compltely
random thing to do.

If this HW was using MSI-X PASID would have been given its own
register.

Jason

2020-10-09 21:15:35

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm

On Fri, Oct 09, 2020 at 04:44:27PM +0200, Thomas Gleixner wrote:
> > This is really not that different from what I was describing for queue
> > contexts - the queue context needs to be assigned to the irq # before
> > it can be used in the irq chip other wise there is no idea where to
> > write the msg to. Just like pasid here.
>
> Not really. In the IDXD case the storage is known when the host device
> and the irq domain is initialized which is not the case for your variant
> and it neither needs to send a magic command to the device to update the
> data.

I mean, needing the PASID vs needing the memory address before the IRQ
can be use are basically the same issue. Data needs to be attached to
the IRQ before it can be programmed.. In this case programming with
the wrong PASID could lead to a security issue.

> All the IDXD driver has to do is:
>
> auxval = ims_ctrl_pasid_aux(pasid, enabled);
> irq_set_auxdata(irqnr, IMS_AUXDATA_CONTROL_WORD, auxval);
>
> I agree that irq_set_auxdata() is not the most elegant thing, but the
> alternative solutions I looked at are just worse.

It seems reasonable, but quite an obfuscated way to tell a driver they
need to hold irq_get_desc_buslock() when touching data shared with the
irqchip ops.. Not that I have a better suggestion

Jason