2013-03-21 01:33:50

by Takao Indoh

[permalink] [raw]
Subject: [PATCH] intel-iommu: Synchronize gcmd value with global command register

This patch synchronize iommu->gcmd value with global command register so
that bits in the register could be handled correctly.

iommu_disable_irq_remapping() should disable only IR(Interrupt
Remapping), but in the case of kdump, it also disables QI(Queued
Invalidation) and DMAR(DMA remapping) at the same time.

At boot time, iommu_disable_irq_remapping() is called via
intel_enable_irq_remapping().

static void iommu_disable_irq_remapping(struct intel_iommu *iommu)
{
(snip)
iommu->gcmd &= ~DMA_GCMD_IRE;
writel(iommu->gcmd, iommu->reg + DMAR_GCMD_REG);

In this function, clearing IRE bit in iommu->gcmd and writing it to
global command register. But initial value of iommu->gcmd is zero, so
this writel means clearing all bits in global command register.

In the case of second kernel(kdump kernel) boot, IR/QI/DMAR are already
enabled because it was used in first kernel, so they are disabled at the
same time in this function.

In this patch iommu->gcmd is initialized so that its value could be
synchronized with global command register. By this,
iommu_disable_irq_remapping() disables only IR, and QI is disabled in
dmar_disable_qi(). This patch also changes init_dmars() to disable DMAR
before initializing it if already enabled.

Signed-off-by: Takao Indoh <[email protected]>
---
drivers/iommu/dmar.c | 11 ++++++++++-
drivers/iommu/intel-iommu.c | 12 ++++++++++++
2 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index e5cdaf8..9f69352 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -645,7 +645,7 @@ out:
int alloc_iommu(struct dmar_drhd_unit *drhd)
{
struct intel_iommu *iommu;
- u32 ver;
+ u32 ver, sts;
static int iommu_allocated = 0;
int agaw = 0;
int msagaw = 0;
@@ -695,6 +695,15 @@ int alloc_iommu(struct dmar_drhd_unit *drhd)
(unsigned long long)iommu->cap,
(unsigned long long)iommu->ecap);

+ /* Reflect status in gcmd */
+ sts = readl(iommu->reg + DMAR_GSTS_REG);
+ if (sts & DMA_GSTS_IRES)
+ iommu->gcmd |= DMA_GCMD_IRE;
+ if (sts & DMA_GSTS_TES)
+ iommu->gcmd |= DMA_GCMD_TE;
+ if (sts & DMA_GSTS_QIES)
+ iommu->gcmd |= DMA_GCMD_QIE;
+
raw_spin_lock_init(&iommu->register_lock);

drhd->iommu = iommu;
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0099667..3437046 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2515,6 +2515,18 @@ static int __init init_dmars(void)
}

/*
+ * Disable translation if already enabled prior to OS handover.
+ */
+ for_each_drhd_unit(drhd) {
+ if (drhd->ignored)
+ continue;
+
+ iommu = drhd->iommu;
+ if (iommu->gcmd & DMA_GCMD_TE)
+ iommu_disable_translation(iommu);
+ }
+
+ /*
* Start from the sane iommu hardware state.
*/
for_each_drhd_unit(drhd) {
--
1.7.1


2013-03-26 14:46:34

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

On Thu, Mar 21, 2013 at 10:32:36AM +0900, Takao Indoh wrote:
> In this function, clearing IRE bit in iommu->gcmd and writing it to
> global command register. But initial value of iommu->gcmd is zero, so
> this writel means clearing all bits in global command register.

Seems weird. Why is the value of gcmd zero in your case? The usage of
this register is well encapsulated by the different parts of the VT-d
driver. There are other places which enable/disable translation and qpi
the same way it is done with interrupt remapping. So it looks to me that
it is unlikely that gcmd is really zero in your case.

Can you explain that more and also describe what the actual misbehavior
is you are trying to fix here?

Thanks,

Joerg

2013-03-27 05:03:12

by Takao Indoh

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

(2013/03/26 23:46), Joerg Roedel wrote:
> On Thu, Mar 21, 2013 at 10:32:36AM +0900, Takao Indoh wrote:
>> In this function, clearing IRE bit in iommu->gcmd and writing it to
>> global command register. But initial value of iommu->gcmd is zero, so
>> this writel means clearing all bits in global command register.
>
> Seems weird. Why is the value of gcmd zero in your case? The usage of
> this register is well encapsulated by the different parts of the VT-d
> driver. There are other places which enable/disable translation and qpi
> the same way it is done with interrupt remapping. So it looks to me that
> it is unlikely that gcmd is really zero in your case.
>
> Can you explain that more and also describe what the actual misbehavior
> is you are trying to fix here?

Sure.
At first, please see the debug patch below.

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index af8904d..3ffb029 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -484,12 +484,15 @@ static void iommu_disable_irq_remapping(struct intel_iommu *iommu)
if (!(sts & DMA_GSTS_IRES))
goto end;

+ printk("DEBUG1: %08x\n", sts);
+
iommu->gcmd &= ~DMA_GCMD_IRE;
writel(iommu->gcmd, iommu->reg + DMAR_GCMD_REG);

IOMMU_WAIT_OP(iommu, DMAR_GSTS_REG,
readl, !(sts & DMA_GSTS_IRES), sts);

+ printk("DEBUG2: %08x\n", sts);
end:
raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
}


This is the result in *kdump* kernel(second kernel).

DEBUG1: c7000000
DEBUG2: 41000000

After writel, TES/QIES/IRES is disabled. I think only IRES should be
disabled here because this function is "iommu_disable_irq_remapping".
TES and QIES should be disabled by iommu_disable_translation() and
dmar_disable_qi() respectively.

This is what I found and what I am trying to fix. Next, let's see what
happened at boot time. Again, I'm talking about *kdump* kernel boot
time.

1. dmar_table_init() is called, and intel_iommu structure is allocated in
alloc_iommu().

int alloc_iommu(struct dmar_drhd_unit *drhd)
{
struct intel_iommu *iommu;
(snip)
iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);

iommu->gcmd is zero here.


2. intel_enable_irq_remapping() is called, and interrupt remapping is
initialized.

static int __init intel_enable_irq_remapping(void)
{
(snip)
for_each_drhd_unit(drhd) {
struct intel_iommu *iommu = drhd->iommu;
(snip)
iommu_disable_irq_remapping(iommu);


iommu_disable_irq_remapping is called here. Note that iommu->gcmd is
still zero because anyone doesn't touch it yet.


static void iommu_disable_irq_remapping(struct intel_iommu *iommu)
{
(snip)
sts = dmar_readq(iommu->reg + DMAR_GSTS_REG);
if (!(sts & DMA_GSTS_IRES))
goto end;

iommu->gcmd &= ~DMA_GCMD_IRE;
writel(iommu->gcmd, iommu->reg + DMAR_GCMD_REG);

The purpose of this code is clearing IRE bit of global command
register to disable interrupt remapping, right?

But as I wrote above, iommu->gcmd is always zero here at boot time. So
this code means claring *all* bit of global command register. As the
result of this, both of TE and QIE are also disabled.

The root cause of this problem is mismatch between iommu->gcmd and
global command register in the case of kdump. At boot time, initial
value of iommu->gcmd is zero as I wrote above, but actual global command
register is *not* zero because some bits like IRE/TE/QIE are already set
in *first* kernel. Therefore this patch synchronize them to fix this
problem.

Did I answer your question?

Thanks,
Takao Indoh

2013-03-27 10:31:26

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

On Wed, Mar 27, 2013 at 02:02:44PM +0900, Takao Indoh wrote:
> The root cause of this problem is mismatch between iommu->gcmd and
> global command register in the case of kdump. At boot time, initial
> value of iommu->gcmd is zero as I wrote above, but actual global command
> register is *not* zero because some bits like IRE/TE/QIE are already set
> in *first* kernel. Therefore this patch synchronize them to fix this
> problem.

Ok, I understand, but I still don't see why this is a problem. There is
no point for the kdump kernel to preserve hardware state from the
previous kernel. So I think the way it is implemented is correct.


Joerg

2013-04-01 05:46:00

by Takao Indoh

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

(2013/03/27 19:31), Joerg Roedel wrote:
> On Wed, Mar 27, 2013 at 02:02:44PM +0900, Takao Indoh wrote:
>> The root cause of this problem is mismatch between iommu->gcmd and
>> global command register in the case of kdump. At boot time, initial
>> value of iommu->gcmd is zero as I wrote above, but actual global command
>> register is *not* zero because some bits like IRE/TE/QIE are already set
>> in *first* kernel. Therefore this patch synchronize them to fix this
>> problem.
>
> Ok, I understand, but I still don't see why this is a problem. There is
> no point for the kdump kernel to preserve hardware state from the
> previous kernel. So I think the way it is implemented is correct.

Sorry, my explanation was misleading. The purpose is not preserving
hardware state, but cleaning up hardware status at appropriate place.

kdump kernel boots up without power-on-reset, so we have to disable
hardware functions like DMAR before initializing them.

<Current flow on kdump boot>
enable_IR
intel_enable_irq_remapping
iommu_disable_irq_remapping <== IRES/QIES/TES disabled here
dmar_disable_qi <== do nothing
dmar_enable_qi <== QIES enabled
intel_setup_irq_remapping <== IRES enabled

intel_iommu_init
init_dmars
iommu_enable_translation <== TES enabled


<New flow on kdump boot after applying patch>
enable_IR
intel_enable_irq_remapping
iommu_disable_irq_remapping <== IRES disabled
dmar_disable_qi <== QIES disabled
dmar_enable_qi <== QIES enabled
intel_setup_irq_remapping <== IRES enabled

intel_iommu_init
init_dmars
iommu_disable_translation <== TES disabled (added by patch)
iommu_enable_translation <== TES enabled

I want to change flow like this to avoid misunderstanding. In that sense,
this patch is a kind of cleanup rather than fixing problem.

Backgroud:
I'm working on kdump problem with iommu and investigated when DMAR and
IR are disabled in kdump kernel boot. As for IR, I can easily find the
code in intel_enable_irq_remapping() since there is good comment for
that.

/*
* Disable intr remapping and queued invalidation, if already
* enabled prior to OS handover.
*/
iommu_disable_irq_remapping(iommu);

dmar_disable_qi(iommu);

However, there is no code to disable DMAR on startup, but it is actually
disabled somewhere during bootup. And finally I found out is is done in
intel_enable_irq_remapping(). This is very misleading. Why
"iommu_disable_irq_remapping" is disabling bits other than IR? After
applying patch, it is easy to understand because it disables only IR, as
the name suggests.

Thanks,
Takao Indoh

2013-04-02 14:05:52

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

On Mon, Apr 01, 2013 at 02:45:18PM +0900, Takao Indoh wrote:
> <Current flow on kdump boot>
> enable_IR
> intel_enable_irq_remapping
> iommu_disable_irq_remapping <== IRES/QIES/TES disabled here
> dmar_disable_qi <== do nothing
> dmar_enable_qi <== QIES enabled
> intel_setup_irq_remapping <== IRES enabled

But what we want to do here in the kdumo case is to disable translation
too, right? Because the former kernel might have translation and
irq-remapping enabled and the kdump kernel might be compiled without
support for dma-remapping. So if we don't disable translation here too
the kdump kernel is unable to do DMA.

I agree that disabling translation should be a bit more explicit instead
of the current code.


Joerg

2013-04-03 07:12:34

by Takao Indoh

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

(2013/04/02 23:05), Joerg Roedel wrote:
> On Mon, Apr 01, 2013 at 02:45:18PM +0900, Takao Indoh wrote:
>> <Current flow on kdump boot>
>> enable_IR
>> intel_enable_irq_remapping
>> iommu_disable_irq_remapping <== IRES/QIES/TES disabled here
>> dmar_disable_qi <== do nothing
>> dmar_enable_qi <== QIES enabled
>> intel_setup_irq_remapping <== IRES enabled
>
> But what we want to do here in the kdumo case is to disable translation
> too, right? Because the former kernel might have translation and
> irq-remapping enabled and the kdump kernel might be compiled without
> support for dma-remapping. So if we don't disable translation here too
> the kdump kernel is unable to do DMA.

Yeah, you are right. I forgot such a case.

To be honest, I also expected the side effect of this patch. As I wrote
in the previous mail, I'm working on kdump problem with iommu, that is,
ongoing DMA causes DMAR fault in 2nd kernel and sometimes kdump fails
due to this fault. What we have to do is stopping DMA transaction
before DMA-remapping is disabled in 2nd kernel. To do this, we need to
reset devices before intel_enable_irq_remapping() is called, but it is
very difficult because struct pci_dev is not prepared in this stage.
After applying this patch, DMA-remapping is disabled in init_dmars where
struct pci_dev is ready, so it's easier to handle. For example we can
add pci quirk to reset devices.

So, disabling translation in 2nd kernel is necessary, and it's better if
we can do it after struct pci_dev is prepared. (after subsys_initcall?)
Any idea?

Thanks,
Takao Indoh

>
> I agree that disabling translation should be a bit more explicit instead
> of the current code.
>
>
> Joerg
>

2013-04-03 08:24:59

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

On Wed, 2013-04-03 at 16:11 +0900, Takao Indoh wrote:
> (2013/04/02 23:05), Joerg Roedel wrote:
> > On Mon, Apr 01, 2013 at 02:45:18PM +0900, Takao Indoh wrote:
> >> <Current flow on kdump boot>
> >> enable_IR
> >> intel_enable_irq_remapping
> >> iommu_disable_irq_remapping <== IRES/QIES/TES disabled here
> >> dmar_disable_qi <== do nothing
> >> dmar_enable_qi <== QIES enabled
> >> intel_setup_irq_remapping <== IRES enabled
> >
> > But what we want to do here in the kdumo case is to disable translation
> > too, right? Because the former kernel might have translation and
> > irq-remapping enabled and the kdump kernel might be compiled without
> > support for dma-remapping. So if we don't disable translation here too
> > the kdump kernel is unable to do DMA.
>
> Yeah, you are right. I forgot such a case.

If you disable translation and there's some device still doing DMA, it's
going to scribble over random areas of memory. You really want to have
translation enabled and all the page tables *cleared*, during kexec. I
think it's fair to insist that the secondary kernel should use the IOMMU
if the first one did.

> To be honest, I also expected the side effect of this patch. As I wrote
> in the previous mail, I'm working on kdump problem with iommu, that is,
> ongoing DMA causes DMAR fault in 2nd kernel and sometimes kdump fails
> due to this fault.

Here you've lost me. The DMAR fault is caught and reported, and how does
this lead to a kdump failure? Are you using dodgy hardware that just
keeps *trying* after an abort, and floods the system with a storm of
DMAR faults? We've occasionally spoken about working around such a
problem by setting a bit to make subsequent faults *silent*. Would that
work?

> What we have to do is stopping DMA transaction
> before DMA-remapping is disabled in 2nd kernel.

The IOMMU is there to stop DMA transactions. That is its *job*. :)

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation


Attachments:
smime.p7s (6.03 kB)

2013-04-04 05:48:46

by Takao Indoh

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

(2013/04/03 17:24), David Woodhouse wrote:
> On Wed, 2013-04-03 at 16:11 +0900, Takao Indoh wrote:
>> (2013/04/02 23:05), Joerg Roedel wrote:
>>> On Mon, Apr 01, 2013 at 02:45:18PM +0900, Takao Indoh wrote:
>>>> <Current flow on kdump boot>
>>>> enable_IR
>>>> intel_enable_irq_remapping
>>>> iommu_disable_irq_remapping <== IRES/QIES/TES disabled here
>>>> dmar_disable_qi <== do nothing
>>>> dmar_enable_qi <== QIES enabled
>>>> intel_setup_irq_remapping <== IRES enabled
>>>
>>> But what we want to do here in the kdumo case is to disable translation
>>> too, right? Because the former kernel might have translation and
>>> irq-remapping enabled and the kdump kernel might be compiled without
>>> support for dma-remapping. So if we don't disable translation here too
>>> the kdump kernel is unable to do DMA.
>>
>> Yeah, you are right. I forgot such a case.
>
> If you disable translation and there's some device still doing DMA, it's
> going to scribble over random areas of memory. You really want to have
> translation enabled and all the page tables *cleared*, during kexec. I
> think it's fair to insist that the secondary kernel should use the IOMMU
> if the first one did.
>
>> To be honest, I also expected the side effect of this patch. As I wrote
>> in the previous mail, I'm working on kdump problem with iommu, that is,
>> ongoing DMA causes DMAR fault in 2nd kernel and sometimes kdump fails
>> due to this fault.
>
> Here you've lost me. The DMAR fault is caught and reported, and how does
> this lead to a kdump failure? Are you using dodgy hardware that just
> keeps *trying* after an abort, and floods the system with a storm of
> DMAR faults? We've occasionally spoken about working around such a
> problem by setting a bit to make subsequent faults *silent*. Would that
> work?

There are several cases.
- DMAR fault messages floods and second kernel does not boot. Recently I
saw similar report. https://lkml.org/lkml/2013/3/8/120
- igb driver detectes error on linkup and kdump via network fails.
- On a certain platform, though kdump itself works, PCIe error like
Unexpected Completion is detected and it gets hardware degraded.

Thanks,
Takao Indoh


>
>> What we have to do is stopping DMA transaction
>> before DMA-remapping is disabled in 2nd kernel.
>
> The IOMMU is there to stop DMA transactions. That is its *job*. :)
>

2013-04-04 14:25:04

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

On Thu, 2013-04-04 at 14:48 +0900, Takao Indoh wrote:
>
> - DMAR fault messages floods and second kernel does not boot. Recently I
> saw similar report. https://lkml.org/lkml/2013/3/8/120

Right. So the fix for that is to make the subsequent errors silent,
until/unless we actually get a request to create a mapping for the given
device.

> - igb driver detectes error on linkup and kdump via network fails.

That's a driver bug, IIRC. It was failing to completely reset the
hardware. It's fixed now, isn't it?

> - On a certain platform, though kdump itself works, PCIe error like
> Unexpected Completion is detected and it gets hardware degraded.

More information required.

--
dwmw2


Attachments:
smime.p7s (6.03 kB)

2013-04-05 11:06:20

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

On Wed, Apr 03, 2013 at 09:24:39AM +0100, David Woodhouse wrote:
> On Wed, 2013-04-03 at 16:11 +0900, Takao Indoh wrote:
> > Yeah, you are right. I forgot such a case.
>
> If you disable translation and there's some device still doing DMA, it's
> going to scribble over random areas of memory. You really want to have
> translation enabled and all the page tables *cleared*, during kexec. I
> think it's fair to insist that the secondary kernel should use the IOMMU
> if the first one did.

Do we really need to insist on that? The IOMMU initialization on x86
happens after the kernel scanned and enumerated the PCI bus. While doing
this the kernel (at least it should) disables all devices it finds. So
when the IOMMU init code runs we should be safe from any in-flight DMA
and can either disable translation or re-initialize it for the kdump
kernel. Until then translation needs to stay enabled of course, so that
the old page-tables are still used and in-flight DMA doesn't corrupt
any data.


Joerg

2013-04-08 08:57:35

by Takao Indoh

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

(2013/04/04 23:24), David Woodhouse wrote:
> On Thu, 2013-04-04 at 14:48 +0900, Takao Indoh wrote:
>>
>> - DMAR fault messages floods and second kernel does not boot. Recently I
>> saw similar report. https://lkml.org/lkml/2013/3/8/120
>
> Right. So the fix for that is to make the subsequent errors silent,
> until/unless we actually get a request to create a mapping for the given
> device.
>
>> - igb driver detectes error on linkup and kdump via network fails.
>
> That's a driver bug, IIRC. It was failing to completely reset the
> hardware. It's fixed now, isn't it?

No, it can be reproduced with latest kernel(3.9.0-rc6).

>
>> - On a certain platform, though kdump itself works, PCIe error like
>> Unexpected Completion is detected and it gets hardware degraded.
>
> More information required.

When I tested intel_iommu on a certain machine, the following error
message was logged in its firmware, and I/O board got abnormal status.
05:00.0 is igb, so I think this was caused by DMA error on igb. This
occurs before igb driver loading, so this cannot be fixed in driver.

PCI: Unexpected Completion Bus: 5 Device: 0x00 Function: 0x00

Anyway, I'm thinking we should introduce something framework to clean
all devices to stop DMA at boot time rather than dealing with the
problem in each driver. And one of the way I found is resetting devcies
by PCIe layer. If DMAR is disabled in init_dmars(), we can have a
chance to handle devices to stop DMA in PCI layer, like qci-quirk. This
is one of the reason why I propose this patch.

Thanks,
Takao Indoh

2013-04-10 04:48:00

by Takao Indoh

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

(2013/04/05 20:06), Joerg Roedel wrote:
> On Wed, Apr 03, 2013 at 09:24:39AM +0100, David Woodhouse wrote:
>> On Wed, 2013-04-03 at 16:11 +0900, Takao Indoh wrote:
>>> Yeah, you are right. I forgot such a case.
>>
>> If you disable translation and there's some device still doing DMA, it's
>> going to scribble over random areas of memory. You really want to have
>> translation enabled and all the page tables *cleared*, during kexec. I
>> think it's fair to insist that the secondary kernel should use the IOMMU
>> if the first one did.
>
> Do we really need to insist on that? The IOMMU initialization on x86
> happens after the kernel scanned and enumerated the PCI bus. While doing
> this the kernel (at least it should) disables all devices it finds. So
> when the IOMMU init code runs we should be safe from any in-flight DMA
> and can either disable translation or re-initialize it for the kdump
> kernel. Until then translation needs to stay enabled of course, so that
> the old page-tables are still used and in-flight DMA doesn't corrupt
> any data.

So we should do in this order, right?
(1) PCI initialization. Stop all ongoing DMA here.
(2) Disable translation if already enable.
(3) Make pgtable and enable translation.

Thanks,
Takao Indoh

2013-04-15 09:01:11

by Takao Indoh

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

(2013/04/10 13:47), Takao Indoh wrote:
> (2013/04/05 20:06), Joerg Roedel wrote:
>> On Wed, Apr 03, 2013 at 09:24:39AM +0100, David Woodhouse wrote:
>>> On Wed, 2013-04-03 at 16:11 +0900, Takao Indoh wrote:
>>>> Yeah, you are right. I forgot such a case.
>>>
>>> If you disable translation and there's some device still doing DMA, it's
>>> going to scribble over random areas of memory. You really want to have
>>> translation enabled and all the page tables *cleared*, during kexec. I
>>> think it's fair to insist that the secondary kernel should use the IOMMU
>>> if the first one did.
>>
>> Do we really need to insist on that? The IOMMU initialization on x86
>> happens after the kernel scanned and enumerated the PCI bus. While doing
>> this the kernel (at least it should) disables all devices it finds. So
>> when the IOMMU init code runs we should be safe from any in-flight DMA
>> and can either disable translation or re-initialize it for the kdump
>> kernel. Until then translation needs to stay enabled of course, so that
>> the old page-tables are still used and in-flight DMA doesn't corrupt
>> any data.
>
> So we should do in this order, right?
> (1) PCI initialization. Stop all ongoing DMA here.
> (2) Disable translation if already enable.
> (3) Make pgtable and enable translation.

Joerg, David,

On DMAR initialization during kdump boot, do you guys agree to change
order like this?

Current order:
(1) Disable translation
(2) PCI initialization
(3) Make pgtable and enable translation.

Order I'm proposing:
(1) PCI initialization
(2) Disable translation
(3) Make pgtable and enable translation.

The purpose to change the behavior is to stop DMA in PCI layer.

As Joerg said, if we need to consider the case that kdump kernel is
compiled without dma-remapping(CONFIG_INTEL_IOMMU is off?), I can update
my patch to handle such a case properly by adding ifdef
CONFIG_INTEL_IOMMU.

About how to stop the DMA on PCI layer, I'll post another mail to
discuss it to iommu and pci list.

Thanks,
Takao Indoh

2013-04-15 10:18:34

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

On Mon, Apr 15, 2013 at 06:00:13PM +0900, Takao Indoh wrote:
> On DMAR initialization during kdump boot, do you guys agree to change
> order like this?
>
> Current order:
> (1) Disable translation
> (2) PCI initialization
> (3) Make pgtable and enable translation.
>
> Order I'm proposing:
> (1) PCI initialization
> (2) Disable translation
> (3) Make pgtable and enable translation.

I think this should work. In fact, translation only needs to be disabled
while the IOMMU hardware is reprogrammed to the new data structures the
kdump kernel set up.

> As Joerg said, if we need to consider the case that kdump kernel is
> compiled without dma-remapping(CONFIG_INTEL_IOMMU is off?), I can update
> my patch to handle such a case properly by adding ifdef
> CONFIG_INTEL_IOMMU.

Thinking about it, this case is a bit more special. If the kdump kernel
has no IOMMU driver at all there is also no code to disable it. So
unless we want to force the kdump kernel to have the driver when the
first kernel had it, I think we also want some quirk (depending on
!CONFIG_INTEL_IOMMU) that disables translation at boot.

I know, its complicated :)


Joerg

2013-04-17 08:49:12

by Takao Indoh

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

(2013/04/15 19:18), Joerg Roedel wrote:
> On Mon, Apr 15, 2013 at 06:00:13PM +0900, Takao Indoh wrote:
>> On DMAR initialization during kdump boot, do you guys agree to change
>> order like this?
>>
>> Current order:
>> (1) Disable translation
>> (2) PCI initialization
>> (3) Make pgtable and enable translation.
>>
>> Order I'm proposing:
>> (1) PCI initialization
>> (2) Disable translation
>> (3) Make pgtable and enable translation.
>
> I think this should work. In fact, translation only needs to be disabled
> while the IOMMU hardware is reprogrammed to the new data structures the
> kdump kernel set up.

Ok, I agree.

>
>> As Joerg said, if we need to consider the case that kdump kernel is
>> compiled without dma-remapping(CONFIG_INTEL_IOMMU is off?), I can update
>> my patch to handle such a case properly by adding ifdef
>> CONFIG_INTEL_IOMMU.
>
> Thinking about it, this case is a bit more special. If the kdump kernel
> has no IOMMU driver at all there is also no code to disable it. So
> unless we want to force the kdump kernel to have the driver when the
> first kernel had it, I think we also want some quirk (depending on
> !CONFIG_INTEL_IOMMU) that disables translation at boot.
>
> I know, its complicated :)

yeah complicated. hmm..., this isn't limited only to iommu case, there
are many problems which are caused by difference of config between first
kernel and second one. So, to make code simple, I think we don't need to
care such a exceptional case. I mean, I think it is a matter of user
and user has to compile second kernel with dma-remapping if user want to
use it in first kernel.

Thanks,
Takao Indoh