2015-05-25 05:37:59

by Wu, Feng

[permalink] [raw]
Subject: [v7 0/8] Add VT-d Posted-Interrupts support - IOMMU part

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

This series was part of http://thread.gmane.org/gmane.linux.kernel.iommu/7708. To make things clear, send out IOMMU part here.

This patch-set is based on the lastest x86/apic branch of tip tree.

Divide the whole series which contain multiple components into three parts:
- Prerequisite changes to irq subsystem (already merged in tip tree x86/apic branch)
- IOMMU part (in this series)
- KVM and VFIO parts (will send out this part once the first two parts are accepted)

v6->v7:
* Add an static inline helper function set_irq_posting_cap() to set
the PI capability.
* Add some comments for the new member "ir_data->irte_pi_entry".

v5->v6:
* Extend 'struct irte' for VT-d Posted-Interrupts, combine remapped
and posted mode into one irte structure.

v4->v5:
* Abstract modify_irte() to accept two format of irte.

v3->v4:
* Change capability to a int variant flags instead of a function call.
* Add hotplug case for VT-d PI.

Feng Wu (7):
iommu: Add new member capability to struct irq_remap_ops
iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
iommu, x86: Add cap_pi_support() to detect VT-d PI capability
iommu, x86: Setup Posted-Interrupts capability for Intel iommu
iommu, x86: define irq_remapping_cap()
iommu, x86: Properly handler PI for IOMMU hotplug

Thomas Gleixner (1):
iommu: dmar: Extend struct irte for VT-d Posted-Interrupts

arch/x86/include/asm/irq_remapping.h | 11 +++++
drivers/iommu/intel_irq_remapping.c | 84 +++++++++++++++++++++++++++++++++++-
drivers/iommu/irq_remapping.c | 11 +++++
drivers/iommu/irq_remapping.h | 6 +++
include/linux/dmar.h | 70 +++++++++++++++++++++++-------
include/linux/intel-iommu.h | 1 +
6 files changed, 167 insertions(+), 16 deletions(-)

--
2.1.0


2015-05-25 05:40:10

by Wu, Feng

[permalink] [raw]
Subject: [v7 1/8] iommu: Add new member capability to struct irq_remap_ops

This patch adds a new member capability to struct irq_remap_ops,
this new function ops can be used to check whether some
features are supported, such as VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <[email protected]>
Reviewed-by: Jiang Liu <[email protected]>
---
arch/x86/include/asm/irq_remapping.h | 4 ++++
drivers/iommu/irq_remapping.h | 3 +++
2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 78974fb..0953723 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -31,6 +31,10 @@ struct irq_alloc_info;

#ifdef CONFIG_IRQ_REMAP

+enum irq_remap_cap {
+ IRQ_POSTING_CAP = 0,
+};
+
extern void set_irq_remapping_broken(void);
extern int irq_remapping_prepare(void);
extern int irq_remapping_enable(void);
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 91d5a11..b6ca30d 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -35,6 +35,9 @@ extern int no_x2apic_optout;
extern int irq_remapping_enabled;

struct irq_remap_ops {
+ /* The supported capabilities */
+ int capability;
+
/* Initializes hardware and makes it ready for remapping interrupts */
int (*prepare)(void);

--
2.1.0

2015-05-25 05:38:07

by Wu, Feng

[permalink] [raw]
Subject: [v7 2/8] iommu: dmar: Extend struct irte for VT-d Posted-Interrupts

From: Thomas Gleixner <[email protected]>

The IRTE (Interrupt Remapping Table Entry) is either an entry for
remapped or for posted interrupts. The hardware distiguishes between
remapped and posted entries by bit 15 in the low 64 bit of the
IRTE. If cleared the entry is remapped, if set it's posted.

The entries have common fields and dependent on the posted bit fields
with different meanings.

Extend struct irte to handle the differences between remap and posted
mode by having three structs in the unions:

- Shared
- Remapped
- Posted

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Feng Wu <[email protected]>
---
include/linux/dmar.h | 70 +++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 55 insertions(+), 15 deletions(-)

diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 8473756..0dbcabc 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -185,33 +185,73 @@ static inline int dmar_device_remove(void *handle)

struct irte {
union {
+ /* Shared between remapped and posted mode*/
struct {
- __u64 present : 1,
- fpd : 1,
- dst_mode : 1,
- redir_hint : 1,
- trigger_mode : 1,
- dlvry_mode : 3,
- avail : 4,
- __reserved_1 : 4,
- vector : 8,
- __reserved_2 : 8,
- dest_id : 32;
+ __u64 present : 1, /* 0 */
+ fpd : 1, /* 1 */
+ __res0 : 6, /* 2 - 6 */
+ avail : 4, /* 8 - 11 */
+ __res1 : 3, /* 12 - 14 */
+ pst : 1, /* 15 */
+ vector : 8, /* 16 - 23 */
+ __res2 : 40; /* 24 - 63 */
+ };
+
+ /* Remapped mode */
+ struct {
+ __u64 r_present : 1, /* 0 */
+ r_fpd : 1, /* 1 */
+ dst_mode : 1, /* 2 */
+ redir_hint : 1, /* 3 */
+ trigger_mode : 1, /* 4 */
+ dlvry_mode : 3, /* 5 - 7 */
+ r_avail : 4, /* 8 - 11 */
+ r_res0 : 4, /* 12 - 15 */
+ r_vector : 8, /* 16 - 23 */
+ r_res1 : 8, /* 24 - 31 */
+ dest_id : 32; /* 32 - 63 */
+ };
+
+ /* Posted mode */
+ struct {
+ __u64 p_present : 1, /* 0 */
+ p_fpd : 1, /* 1 */
+ p_res0 : 6, /* 2 - 7 */
+ p_avail : 4, /* 8 - 11 */
+ p_res1 : 2, /* 12 - 13 */
+ p_urgent : 1, /* 14 */
+ p_pst : 1, /* 15 */
+ p_vector : 8, /* 16 - 23 */
+ p_res2 : 14, /* 24 - 37 */
+ pda_l : 26; /* 38 - 63 */
};
__u64 low;
};

union {
+ /* Shared between remapped and posted mode*/
struct {
- __u64 sid : 16,
- sq : 2,
- svt : 2,
- __reserved_3 : 44;
+ __u64 sid : 16, /* 64 - 79 */
+ sq : 2, /* 80 - 81 */
+ svt : 2, /* 82 - 83 */
+ __res3 : 44; /* 84 - 127 */
+ };
+
+ /* Posted mode*/
+ struct {
+ __u64 p_sid : 16, /* 64 - 79 */
+ p_sq : 2, /* 80 - 81 */
+ p_svt : 2, /* 82 - 83 */
+ p_res3 : 12, /* 84 - 95 */
+ pda_h : 32; /* 96 - 127 */
};
__u64 high;
};
};

+#define PDA_LOW_BIT 26
+#define PDA_HIGH_BIT 32
+
enum {
IRQ_REMAP_XAPIC_MODE,
IRQ_REMAP_X2APIC_MODE,
--
2.1.0

2015-05-25 05:38:09

by Wu, Feng

[permalink] [raw]
Subject: [v7 3/8] iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip

Implement irq_set_vcpu_affinity for intel_ir_chip.

Signed-off-by: Feng Wu <[email protected]>
Reviewed-by: Jiang Liu <[email protected]>
Acked-by: David Woodhouse <[email protected]>
---
arch/x86/include/asm/irq_remapping.h | 5 ++++
drivers/iommu/intel_irq_remapping.c | 46 ++++++++++++++++++++++++++++++++++++
2 files changed, 51 insertions(+)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 0953723..202e040 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -57,6 +57,11 @@ static inline struct irq_domain *arch_get_ir_parent_domain(void)
return x86_vector_domain;
}

+struct vcpu_data {
+ u64 pi_desc_addr; /* Physical address of PI Descriptor */
+ u32 vector; /* Guest vector of the interrupt */
+};
+
#else /* CONFIG_IRQ_REMAP */

static inline void set_irq_remapping_broken(void) { }
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 8fad71c..1955b09 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -42,6 +42,7 @@ struct irq_2_iommu {
struct intel_ir_data {
struct irq_2_iommu irq_2_iommu;
struct irte irte_entry;
+ struct irte irte_pi_entry;
union {
struct msi_msg msi_entry;
};
@@ -1013,10 +1014,55 @@ static void intel_ir_compose_msi_msg(struct irq_data *irq_data,
*msg = ir_data->msi_entry;
}

+static int intel_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
+{
+ struct intel_ir_data *ir_data = data->chip_data;
+ struct irte *irte_pi = &ir_data->irte_pi_entry;
+ struct vcpu_data *vcpu_pi_info;
+
+ /* stop posting interrupts, back to remapping mode */
+ if (!vcpu_info) {
+ modify_irte(&ir_data->irq_2_iommu, &ir_data->irte_entry);
+ } else {
+ vcpu_pi_info = (struct vcpu_data *)vcpu_info;
+
+ /*
+ * "ir_data->irte_entry" saves the remapped format of IRTE,
+ * which being a cached irte is still updated when setting
+ * the affinity even when we are in posted mode. So this make
+ * it possible to switch back to remapped mode from posted mode,
+ * we can just set "ir_data->irte_entry" to hardware for that
+ * purpose. Here we store the posted format of IRTE in another
+ * new member "ir_data->irte_pi_entry" to not corrupt
+ * "ir_data->irte_entry".
+ */
+ memcpy(irte_pi, &ir_data->irte_entry, sizeof(struct irte));
+
+ irte_pi->p_urgent = 0;
+ irte_pi->p_vector = vcpu_pi_info->vector;
+ irte_pi->pda_l = (vcpu_pi_info->pi_desc_addr >>
+ (32 - PDA_LOW_BIT)) & ~(-1UL << PDA_LOW_BIT);
+ irte_pi->pda_h = (vcpu_pi_info->pi_desc_addr >> 32) &
+ ~(-1UL << PDA_HIGH_BIT);
+
+ irte_pi->p_res0 = 0;
+ irte_pi->p_res1 = 0;
+ irte_pi->p_res2 = 0;
+ irte_pi->p_res3 = 0;
+
+ irte_pi->p_pst = 1;
+
+ modify_irte(&ir_data->irq_2_iommu, irte_pi);
+ }
+
+ return 0;
+}
+
static struct irq_chip intel_ir_chip = {
.irq_ack = ir_ack_apic_edge,
.irq_set_affinity = intel_ir_set_affinity,
.irq_compose_msi_msg = intel_ir_compose_msi_msg,
+ .irq_set_vcpu_affinity = intel_ir_set_vcpu_affinity,
};

static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
--
2.1.0

2015-05-25 05:39:48

by Wu, Feng

[permalink] [raw]
Subject: [v7 4/8] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

We don't need to migrate the irqs for VT-d Posted-Interrupts here.
When 'pst' is set in IRTE, the associated irq will be posted to
guests instead of interrupt remapping. The destination of the
interrupt is set in Posted-Interrupts Descriptor, and the migration
happens during vCPU scheduling.

However, we still update the cached irte here, which can be used
when changing back to remapping mode.

Signed-off-by: Feng Wu <[email protected]>
Reviewed-by: Jiang Liu <[email protected]>
Acked-by: David Woodhouse <[email protected]>
---
drivers/iommu/intel_irq_remapping.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 1955b09..646f4cf 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -994,7 +994,10 @@ intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
*/
irte->vector = cfg->vector;
irte->dest_id = IRTE_DEST(cfg->dest_apicid);
- modify_irte(&ir_data->irq_2_iommu, irte);
+
+ /* We don't need to modify irte if the interrupt is for posting. */
+ if (irte->pst != 1)
+ modify_irte(&ir_data->irq_2_iommu, irte);

/*
* After this point, all the interrupts will start arriving
--
2.1.0

2015-05-25 05:39:27

by Wu, Feng

[permalink] [raw]
Subject: [v7 5/8] iommu, x86: Add cap_pi_support() to detect VT-d PI capability

Add helper function to detect VT-d Posted-Interrupts capability.

Signed-off-by: Feng Wu <[email protected]>
Reviewed-by: Jiang Liu <[email protected]>
Acked-by: David Woodhouse <[email protected]>
---
include/linux/intel-iommu.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 0af9b03..0c251be 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -87,6 +87,7 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
/*
* Decoding Capability Register
*/
+#define cap_pi_support(c) (((c) >> 59) & 1)
#define cap_read_drain(c) (((c) >> 55) & 1)
#define cap_write_drain(c) (((c) >> 54) & 1)
#define cap_max_amask_val(c) (((c) >> 48) & 0x3f)
--
2.1.0

2015-05-25 05:38:12

by Wu, Feng

[permalink] [raw]
Subject: [v7 6/8] iommu, x86: Setup Posted-Interrupts capability for Intel iommu

Set Posted-Interrupts capability for Intel iommu when IR is enabled,
clear it when IR is disabled.

Signed-off-by: Feng Wu <[email protected]>
---
drivers/iommu/intel_irq_remapping.c | 30 ++++++++++++++++++++++++++++++
drivers/iommu/irq_remapping.c | 2 ++
drivers/iommu/irq_remapping.h | 3 +++
3 files changed, 35 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 646f4cf..9f7f378 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -572,6 +572,26 @@ error:
return -ENODEV;
}

+/*
+ * Set Posted-Interrupts capability.
+ */
+static inline void set_irq_posting_cap(void)
+{
+ struct dmar_drhd_unit *drhd;
+ struct intel_iommu *iommu;
+
+ if (!disable_irq_post) {
+ intel_irq_remap_ops.capability |= 1 << IRQ_POSTING_CAP;
+
+ for_each_iommu(iommu, drhd)
+ if (!cap_pi_support(iommu->cap)) {
+ intel_irq_remap_ops.capability &=
+ ~(1 << IRQ_POSTING_CAP);
+ break;
+ }
+ }
+}
+
static int __init intel_enable_irq_remapping(void)
{
struct dmar_drhd_unit *drhd;
@@ -647,6 +667,8 @@ static int __init intel_enable_irq_remapping(void)

irq_remapping_enabled = 1;

+ set_irq_posting_cap();
+
pr_info("Enabled IRQ remapping in %s mode\n", eim ? "x2apic" : "xapic");

return eim ? IRQ_REMAP_X2APIC_MODE : IRQ_REMAP_XAPIC_MODE;
@@ -847,6 +869,12 @@ static void disable_irq_remapping(void)

iommu_disable_irq_remapping(iommu);
}
+
+ /*
+ * Clear Posted-Interrupts capability.
+ */
+ if (!disable_irq_post)
+ intel_irq_remap_ops.capability &= ~(1 << IRQ_POSTING_CAP);
}

static int reenable_irq_remapping(int eim)
@@ -874,6 +902,8 @@ static int reenable_irq_remapping(int eim)
if (!setup)
goto error;

+ set_irq_posting_cap();
+
return 0;

error:
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index fc78b0d..ed605a9 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -22,6 +22,8 @@ int irq_remap_broken;
int disable_sourceid_checking;
int no_x2apic_optout;

+int disable_irq_post = 1;
+
static int disable_irq_remap;
static struct irq_remap_ops *remap_ops;

diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index b6ca30d..039c7af 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -34,6 +34,8 @@ extern int disable_sourceid_checking;
extern int no_x2apic_optout;
extern int irq_remapping_enabled;

+extern int disable_irq_post;
+
struct irq_remap_ops {
/* The supported capabilities */
int capability;
@@ -69,6 +71,7 @@ extern void ir_ack_apic_edge(struct irq_data *data);

#define irq_remapping_enabled 0
#define irq_remap_broken 0
+#define disable_irq_post 1

#endif /* CONFIG_IRQ_REMAP */

--
2.1.0

2015-05-25 05:38:15

by Wu, Feng

[permalink] [raw]
Subject: [v7 7/8] iommu, x86: define irq_remapping_cap()

This patch adds a new interface irq_remapping_cap() to detect
whether irq remapping supports new features, such as VT-d
Posted-Interrupts. We export this function out, so that KVM
code can check this and use this mechanism properly.

Signed-off-by: Feng Wu <[email protected]>
Reviewed-by: Jiang Liu <[email protected]>
---
arch/x86/include/asm/irq_remapping.h | 2 ++
drivers/iommu/irq_remapping.c | 9 +++++++++
2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 202e040..61aa8ad 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -35,6 +35,7 @@ enum irq_remap_cap {
IRQ_POSTING_CAP = 0,
};

+extern bool irq_remapping_cap(enum irq_remap_cap cap);
extern void set_irq_remapping_broken(void);
extern int irq_remapping_prepare(void);
extern int irq_remapping_enable(void);
@@ -64,6 +65,7 @@ struct vcpu_data {

#else /* CONFIG_IRQ_REMAP */

+static bool irq_remapping_cap(enum irq_remap_cap cap) { return 0; }
static inline void set_irq_remapping_broken(void) { }
static inline int irq_remapping_prepare(void) { return -ENODEV; }
static inline int irq_remapping_enable(void) { return -ENODEV; }
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index ed605a9..2d99930 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -81,6 +81,15 @@ void set_irq_remapping_broken(void)
irq_remap_broken = 1;
}

+bool irq_remapping_cap(enum irq_remap_cap cap)
+{
+ if (!remap_ops || disable_irq_post)
+ return 0;
+
+ return (remap_ops->capability & (1 << cap));
+}
+EXPORT_SYMBOL_GPL(irq_remapping_cap);
+
int __init irq_remapping_prepare(void)
{
if (disable_irq_remap)
--
2.1.0

2015-05-25 05:38:18

by Wu, Feng

[permalink] [raw]
Subject: [v7 8/8] iommu, x86: Properly handler PI for IOMMU hotplug

Return error when inserting a new IOMMU which doesn't support PI
if PI is currently in use.

Signed-off-by: Feng Wu <[email protected]>
---
drivers/iommu/intel_irq_remapping.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 9f7f378..79ca56e 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1354,6 +1354,9 @@ int dmar_ir_hotplug(struct dmar_drhd_unit *dmaru, bool insert)
return -EINVAL;
if (!ecap_ir_support(iommu->ecap))
return 0;
+ if (irq_remapping_cap(IRQ_POSTING_CAP) &&
+ !cap_pi_support(iommu->cap))
+ return -EBUSY;

if (insert) {
if (!iommu->ir_table)
--
2.1.0

2015-05-25 08:38:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [v7 4/8] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

On Mon, 25 May 2015, Feng Wu wrote:

> We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> When 'pst' is set in IRTE, the associated irq will be posted to
> guests instead of interrupt remapping. The destination of the
> interrupt is set in Posted-Interrupts Descriptor, and the migration
> happens during vCPU scheduling.
>
> However, we still update the cached irte here, which can be used
> when changing back to remapping mode.
>
> Signed-off-by: Feng Wu <[email protected]>
> Reviewed-by: Jiang Liu <[email protected]>
> Acked-by: David Woodhouse <[email protected]>
> ---
> drivers/iommu/intel_irq_remapping.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
> index 1955b09..646f4cf 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -994,7 +994,10 @@ intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
> */
> irte->vector = cfg->vector;
> irte->dest_id = IRTE_DEST(cfg->dest_apicid);
> - modify_irte(&ir_data->irq_2_iommu, irte);
> +
> + /* We don't need to modify irte if the interrupt is for posting. */
> + if (irte->pst != 1)
> + modify_irte(&ir_data->irq_2_iommu, irte);

I don't think this is correct. ir_data->irte_entry contains the non
posted version, which has pst == 0.

You need some other way to store whether you are in posted mode or
not.

Thanks,

tglx

2015-05-26 02:53:16

by Wu, Feng

[permalink] [raw]
Subject: RE: [v7 4/8] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts



> -----Original Message-----
> From: Thomas Gleixner [mailto:[email protected]]
> Sent: Monday, May 25, 2015 4:38 PM
> To: Wu, Feng
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [v7 4/8] iommu, x86: No need to migrating irq for VT-d
> Posted-Interrupts
>
> On Mon, 25 May 2015, Feng Wu wrote:
>
> > We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> > When 'pst' is set in IRTE, the associated irq will be posted to
> > guests instead of interrupt remapping. The destination of the
> > interrupt is set in Posted-Interrupts Descriptor, and the migration
> > happens during vCPU scheduling.
> >
> > However, we still update the cached irte here, which can be used
> > when changing back to remapping mode.
> >
> > Signed-off-by: Feng Wu <[email protected]>
> > Reviewed-by: Jiang Liu <[email protected]>
> > Acked-by: David Woodhouse <[email protected]>
> > ---
> > drivers/iommu/intel_irq_remapping.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c
> > index 1955b09..646f4cf 100644
> > --- a/drivers/iommu/intel_irq_remapping.c
> > +++ b/drivers/iommu/intel_irq_remapping.c
> > @@ -994,7 +994,10 @@ intel_ir_set_affinity(struct irq_data *data, const
> struct cpumask *mask,
> > */
> > irte->vector = cfg->vector;
> > irte->dest_id = IRTE_DEST(cfg->dest_apicid);
> > - modify_irte(&ir_data->irq_2_iommu, irte);
> > +
> > + /* We don't need to modify irte if the interrupt is for posting. */
> > + if (irte->pst != 1)
> > + modify_irte(&ir_data->irq_2_iommu, irte);
>
> I don't think this is correct. ir_data->irte_entry contains the non
> posted version, which has pst == 0.
>
> You need some other way to store whether you are in posted mode or
> not.

Yes, seems this is incorrect. Thank you for pointing this out. After more
thinking about this, I think I can do it this way:
#1. Check the 'pst' field in hardware
#2. If 'pst' is 1, we don't update the IRTE in hardware.

However, the question is the check and update operations should be protected
by the same spinlock ' irq_2_ir_lock ', otherwise, race condition may happen.

Based on the above idea, I have two solutions for this, do you think which one
is better or you have other better suggestions? It is highly appreciated if you
can give comments about them!

Solution 1:
Introduction a new function test_and_modify_irte() which is called by intel_ir_set_affinity
in place of the original modify_irte().
Here is the changes:

+static int test_and_modify_irte(struct irq_2_iommu *irq_iommu,
+ struct irte *irte_modified)
+{
+ struct intel_iommu *iommu;
+ unsigned long flags;
+ struct irte *irte;
+ int rc, index;
+
+ if (!irq_iommu)
+ return -1;
+
+ raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
+
+ iommu = irq_iommu->iommu;
+
+ index = irq_iommu->irte_index + irq_iommu->sub_handle;
+ irte = &iommu->ir_table->base[index];
+
+ if (irte->pst)
+ goto unlock;
+
+ set_64bit(&irte->low, irte_modified->low);
+ set_64bit(&irte->high, irte_modified->high);
+ __iommu_flush_cache(iommu, irte, sizeof(*irte));
+
+ rc = qi_flush_iec(iommu, index, 0);
+unlock:
+ raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);
+
+ return rc;
+}
+

Soluation 2:
Instead of introducing a new function, add a flag in the original modify_irte()
function to indicate that whether we need to check and return before updating
the real hardware, add pass 1 to return_on_pst in intel_ir_set_affinity()
Here is the changes:
static int modify_irte(struct irq_2_iommu *irq_iommu,
- struct irte *irte_modified)
+ struct irte *irte_modified
+ bool return_on_pst)
{
struct intel_iommu *iommu;
unsigned long flags;
@@ -140,11 +173,15 @@ static int modify_irte(struct irq_2_iommu *irq_iommu,
index = irq_iommu->irte_index + irq_iommu->sub_handle;
irte = &iommu->ir_table->base[index];

+ if (return_on_pst && irte->pst)
+ goto unlock;
+
set_64bit(&irte->low, irte_modified->low);
set_64bit(&irte->high, irte_modified->high);
__iommu_flush_cache(iommu, irte, sizeof(*irte));

rc = qi_flush_iec(iommu, index, 0);
+unlock:
raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);

return rc;

Thanks,
Feng

>
> Thanks,
>
> tglx

2015-05-26 10:00:52

by Thomas Gleixner

[permalink] [raw]
Subject: RE: [v7 4/8] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

On Tue, 26 May 2015, Wu, Feng wrote:
> > On Mon, 25 May 2015, Feng Wu wrote:
> > > +
> > > + /* We don't need to modify irte if the interrupt is for posting. */
> > > + if (irte->pst != 1)
> > > + modify_irte(&ir_data->irq_2_iommu, irte);
> >
> > I don't think this is correct. ir_data->irte_entry contains the non
> > posted version, which has pst == 0.
> >
> > You need some other way to store whether you are in posted mode or
> > not.
>
> Yes, seems this is incorrect. Thank you for pointing this out. After more
> thinking about this, I think I can do it this way:
> #1. Check the 'pst' field in hardware
> #2. If 'pst' is 1, we don't update the IRTE in hardware.
>
> However, the question is the check and update operations should be protected
> by the same spinlock ' irq_2_ir_lock ', otherwise, race condition may happen.

Why?

set_affinity() and vcpu_set_affinity() are serialized via
irq_desc->lock. And vcpu_set_affinity() is the only way to switch from
and to posted mode.

So all you need is a field in intel_irq_data which captures whether
posted is enabled or not.

Thanks,

tglx

2015-05-26 14:02:06

by Wu, Feng

[permalink] [raw]
Subject: RE: [v7 4/8] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts



> -----Original Message-----
> From: Thomas Gleixner [mailto:[email protected]]
> Sent: Tuesday, May 26, 2015 6:00 PM
> To: Wu, Feng
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: RE: [v7 4/8] iommu, x86: No need to migrating irq for VT-d
> Posted-Interrupts
>
> On Tue, 26 May 2015, Wu, Feng wrote:
> > > On Mon, 25 May 2015, Feng Wu wrote:
> > > > +
> > > > + /* We don't need to modify irte if the interrupt is for posting. */
> > > > + if (irte->pst != 1)
> > > > + modify_irte(&ir_data->irq_2_iommu, irte);
> > >
> > > I don't think this is correct. ir_data->irte_entry contains the non
> > > posted version, which has pst == 0.
> > >
> > > You need some other way to store whether you are in posted mode or
> > > not.
> >
> > Yes, seems this is incorrect. Thank you for pointing this out. After more
> > thinking about this, I think I can do it this way:
> > #1. Check the 'pst' field in hardware
> > #2. If 'pst' is 1, we don't update the IRTE in hardware.
> >
> > However, the question is the check and update operations should be
> protected
> > by the same spinlock ' irq_2_ir_lock ', otherwise, race condition may happen.
>
> Why?
>
> set_affinity() and vcpu_set_affinity() are serialized via
> irq_desc->lock. And vcpu_set_affinity() is the only way to switch from
> and to posted mode.

Oh, Yes, I didn't notice that they are both protected by that lock. In that case,
I can just add a filed like you mentioned below. Thanks for the comments!

Thanks,
Feng

>
> So all you need is a field in intel_irq_data which captures whether
> posted is enabled or not.
>
> Thanks,
>
> tglx