From: "Wu, Feng" <feng.wu@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
CC: "joro@8bytes.org" <joro@8bytes.org>,
        "dwmw2@infradead.org" <dwmw2@infradead.org>,
        "jiang.liu@linux.intel.com" <jiang.liu@linux.intel.com>,
        "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "Wu, Feng" <feng.wu@intel.com>
Subject: RE: [v7 4/8] iommu, x86: No need to migrating irq for VT-d
 Posted-Interrupts
Thread-Topic: [v7 4/8] iommu, x86: No need to migrating irq for VT-d
 Posted-Interrupts
Thread-Index: AQHQlsYvlv2Xsm9zB0akJmxYz5/RZJ2Nh59A
Date: Tue, 26 May 2015 02:53:00 +0000
Message-ID: <E959C4978C3B6342920538CF579893F0025519ED@SHSMSX104.ccr.corp.intel.com>
References: <1432531734-25978-1-git-send-email-feng.wu@intel.com>
 <1432531734-25978-5-git-send-email-feng.wu@intel.com>
 <alpine.DEB.2.11.1505251034070.5457@nanos>
In-Reply-To: <alpine.DEB.2.11.1505251034070.5457@nanos>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8BIT
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4874
Lines: 142


> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Monday, May 25, 2015 4:38 PM
> To: Wu, Feng
> Cc: joro@8bytes.org; dwmw2@infradead.org; jiang.liu@linux.intel.com;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [v7 4/8] iommu, x86: No need to migrating irq for VT-d
> Posted-Interrupts
> 
> On Mon, 25 May 2015, Feng Wu wrote:
> 
> > We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> > When 'pst' is set in IRTE, the associated irq will be posted to
> > guests instead of interrupt remapping. The destination of the
> > interrupt is set in Posted-Interrupts Descriptor, and the migration
> > happens during vCPU scheduling.
> >
> > However, we still update the cached irte here, which can be used
> > when changing back to remapping mode.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
> > Acked-by: David Woodhouse <David.Woodhouse@intel.com>
> > ---
> >  drivers/iommu/intel_irq_remapping.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c
> > index 1955b09..646f4cf 100644
> > --- a/drivers/iommu/intel_irq_remapping.c
> > +++ b/drivers/iommu/intel_irq_remapping.c
> > @@ -994,7 +994,10 @@ intel_ir_set_affinity(struct irq_data *data, const
> struct cpumask *mask,
> >  	 */
> >  	irte->vector = cfg->vector;
> >  	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
> > -	modify_irte(&ir_data->irq_2_iommu, irte);
> > +
> > +	/* We don't need to modify irte if the interrupt is for posting. */
> > +	if (irte->pst != 1)
> > +		modify_irte(&ir_data->irq_2_iommu, irte);
> 
> I don't think this is correct. ir_data->irte_entry contains the non
> posted version, which has pst == 0.
> 
> You need some other way to store whether you are in posted mode or
> not.

Yes, seems this is incorrect. Thank you for pointing this out. After more
thinking about this, I think I can do it this way:
#1. Check the 'pst' field in hardware
#2. If 'pst' is 1, we don't update the IRTE in hardware.

However, the question is the check and update operations should be protected
by the same spinlock ' irq_2_ir_lock ', otherwise, race condition may happen.

Based on the above idea, I have two solutions for this, do you think which one
is better or you have other better suggestions? It is highly appreciated if you
can give comments about them!

Solution 1:
Introduction a new function test_and_modify_irte() which is called by intel_ir_set_affinity
in place of the original modify_irte().
Here is the changes:

+static int test_and_modify_irte(struct irq_2_iommu *irq_iommu,
+                               struct irte *irte_modified)
+{
+       struct intel_iommu *iommu;
+       unsigned long flags;
+       struct irte *irte;
+       int rc, index;
+
+       if (!irq_iommu)
+               return -1;
+
+       raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
+
+       iommu = irq_iommu->iommu;
+
+       index = irq_iommu->irte_index + irq_iommu->sub_handle;
+       irte = &iommu->ir_table->base[index];
+
+       if (irte->pst)
+               goto unlock;
+
+       set_64bit(&irte->low, irte_modified->low);
+       set_64bit(&irte->high, irte_modified->high);
+       __iommu_flush_cache(iommu, irte, sizeof(*irte));
+
+       rc = qi_flush_iec(iommu, index, 0);
+unlock:
+       raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);
+
+       return rc;
+}
+

Soluation 2:
Instead of introducing a new function, add a flag in the original modify_irte()
function to indicate that whether we need to check and return before updating
the real hardware, add pass 1 to return_on_pst in intel_ir_set_affinity()
Here is the changes:
static int modify_irte(struct irq_2_iommu *irq_iommu,
-                      struct irte *irte_modified)
+                      struct irte *irte_modified
+                      bool return_on_pst)
 {
        struct intel_iommu *iommu;
        unsigned long flags;
@@ -140,11 +173,15 @@ static int modify_irte(struct irq_2_iommu *irq_iommu,
        index = irq_iommu->irte_index + irq_iommu->sub_handle;
        irte = &iommu->ir_table->base[index];

+       if (return_on_pst && irte->pst)
+               goto unlock;
+
        set_64bit(&irte->low, irte_modified->low);
        set_64bit(&irte->high, irte_modified->high);
        __iommu_flush_cache(iommu, irte, sizeof(*irte));

        rc = qi_flush_iec(iommu, index, 0);
+unlock:
        raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);

        return rc;

Thanks,
Feng

> 
> Thanks,
> 
> 	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/