2007-06-22 19:18:15

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86-64: disable the GART before allocate aperture

[PATCH] x86-64: disable the GART before allocate aperture

For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G RAM
installed. when mem is allocated for GART, it will do the memset for clear.
and for kexec case, the first kernel already enable that, the memset in second
kernel will cause the system restart. So disable that at first before we try to
allocate mem for it.

Signed-off-by: Yinghai Lu <[email protected]>

diff --git a/arch/x86_64/kernel/aperture.c b/arch/x86_64/kernel/aperture.c
index a3d450d..7a1b072 100644
--- a/arch/x86_64/kernel/aperture.c
+++ b/arch/x86_64/kernel/aperture.c
@@ -270,6 +270,19 @@ void __init iommu_hole_init(void)
printk("This costs you %d MB of RAM\n",
32 << fallback_aper_order);

+ /*
+ * disable it in case it is enabled before, esp for kexec/kdump,
+ * previous kernel already enable that. otherwise memset called
+ * by allocate_aperture/__alloc_bootmem_nopanic cause restart.
+ */
+ for (num = 24; num < 32; num++) {
+ if (!early_is_k8_nb(read_pci_config(0, num, 3, 0x00)))
+ continue;
+
+ write_pci_config(0, num, 3, 0x90, 0);
+ write_pci_config(0, num, 3, 0x94, 0);
+ }
+
aper_order = fallback_aper_order;
aper_alloc = allocate_aperture();
if (!aper_alloc) {


2007-06-22 19:31:45

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

On Fri, Jun 22, 2007 at 12:19:15PM -0700, Yinghai Lu wrote:
> [PATCH] x86-64: disable the GART before allocate aperture
>
> For K8 system: 4G RAM with memory hole remapping enabled, or more
> than 4G RAM installed. when mem is allocated for GART, it will do
> the memset for clear. and for kexec case, the first kernel already
> enable that, the memset in second kernel will cause the system
> restart. So disable that at first before we try to allocate mem for
> it.

Why does the memset in the second kernel cause a system restart?

Cheers,
Muli

2007-06-22 19:38:24

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

On 6/22/07, Muli Ben-Yehuda <[email protected]> wrote:
> On Fri, Jun 22, 2007 at 12:19:15PM -0700, Yinghai Lu wrote:
> > [PATCH] x86-64: disable the GART before allocate aperture
> >
> > For K8 system: 4G RAM with memory hole remapping enabled, or more
> > than 4G RAM installed. when mem is allocated for GART, it will do
> > the memset for clear. and for kexec case, the first kernel already
> > enable that, the memset in second kernel will cause the system
> > restart. So disable that at first before we try to allocate mem for
> > it.
>
> Why does the memset in the second kernel cause a system restart?

the mem for aperture is still used by GART. or the translation is still enabled.

YH

2007-06-22 19:48:27

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

Muli Ben-Yehuda wrote:
> On Fri, Jun 22, 2007 at 12:19:15PM -0700, Yinghai Lu wrote:
>> [PATCH] x86-64: disable the GART before allocate aperture
>>
>> For K8 system: 4G RAM with memory hole remapping enabled, or more
>> than 4G RAM installed. when mem is allocated for GART, it will do
>> the memset for clear. and for kexec case, the first kernel already
>> enable that, the memset in second kernel will cause the system
>> restart. So disable that at first before we try to allocate mem for
>> it.
>
> Why does the memset in the second kernel cause a system restart?
>
GART is enabled by first kernel, and some driver is using that for DMA.

YH

2007-06-22 20:28:33

by Alan

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

On Fri, 22 Jun 2007 12:31:24 -0700
Muli Ben-Yehuda <[email protected]> wrote:

> On Fri, Jun 22, 2007 at 12:19:15PM -0700, Yinghai Lu wrote:
> > [PATCH] x86-64: disable the GART before allocate aperture
> >
> > For K8 system: 4G RAM with memory hole remapping enabled, or more
> > than 4G RAM installed. when mem is allocated for GART, it will do
> > the memset for clear. and for kexec case, the first kernel already
> > enable that, the memset in second kernel will cause the system
> > restart. So disable that at first before we try to allocate mem for
> > it.
>
> Why does the memset in the second kernel cause a system restart?

You've got mapped live gart pages from the previous kernel. Even if you
disable the gart before a memset you may well have the video card using
gart translations and possibly live IOMMU mappings for devices using it
via bus mastering - and those will cause you MCE exceptions with a
corrupt cpu context flag (ie not nicely recoverable).

Alan

2007-06-22 21:29:32

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

On Friday 22 June 2007 22:33, Alan Cox wrote:
> On Fri, 22 Jun 2007 12:31:24 -0700
>
> Muli Ben-Yehuda <[email protected]> wrote:
> > On Fri, Jun 22, 2007 at 12:19:15PM -0700, Yinghai Lu wrote:
> > > [PATCH] x86-64: disable the GART before allocate aperture
> > >
> > > For K8 system: 4G RAM with memory hole remapping enabled, or more
> > > than 4G RAM installed. when mem is allocated for GART, it will do
> > > the memset for clear. and for kexec case, the first kernel already
> > > enable that, the memset in second kernel will cause the system
> > > restart. So disable that at first before we try to allocate mem for
> > > it.
> >
> > Why does the memset in the second kernel cause a system restart?
>
> You've got mapped live gart pages from the previous kernel. Even if you
> disable the gart before a memset

It's probably too late then. It could also interfere with other operations.
If anything the GART should be disabled during kexec shutdown. Perhaps we just
need a suitable suspend function that does that. Eric, any preferences?

> you may well have the video card using
> gart translations and possibly live IOMMU mappings for devices using it
> via bus mastering - and those will cause you MCE exceptions with a
> corrupt cpu context flag (ie not nicely recoverable).

We disable those machine checks on K8 because they're not fully reliable.

-Andi

2007-06-22 21:33:44

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

Alan Cox <[email protected]> writes:

> You've got mapped live gart pages from the previous kernel. Even if you
> disable the gart before a memset you may well have the video card using
> gart translations and possibly live IOMMU mappings for devices using it
> via bus mastering - and those will cause you MCE exceptions with a
> corrupt cpu context flag (ie not nicely recoverable).

The original plan (which we have not followed up on). Was to reserve
a chunk of any iommu for the kexec on panic kernel. Then to just
have the second kernel use that unused chunk. This is how we
treat the normal memory space and it seems a nice and simple approach
to this kind of problem.

For a normal kexec we should shut everything down before the kernel
transition so it should not be an issue.

YH do you think you can look at simply reserving a portion of the iommu?
And having the kexec on panic kernel use the reserved portion?

Eric

2007-06-22 21:36:53

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

Andi Kleen wrote:
>
> It's probably too late then. It could also interfere with other operations.
> If anything the GART should be disabled during kexec shutdown. Perhaps we just
> need a suitable suspend function that does that. Eric, any preferences?

how about kdump? do we have chance to call that suspend func?

YH

2007-06-22 21:42:00

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

Andi Kleen <[email protected]> writes:

> On Friday 22 June 2007 22:33, Alan Cox wrote:
>> You've got mapped live gart pages from the previous kernel. Even if you
>> disable the gart before a memset
>
> It's probably too late then. It could also interfere with other operations.
> If anything the GART should be disabled during kexec shutdown. Perhaps we just
> need a suitable suspend function that does that. Eric, any preferences?

Well it would be a shutdown method not a suspend method. The authors of
the suspend code thought sharing code with the reboot and the rmmod case
didn't make sense.

For a normal kexec into another kernel I think this makes sense.

>> you may well have the video card using
>> gart translations and possibly live IOMMU mappings for devices using it
>> via bus mastering - and those will cause you MCE exceptions with a
>> corrupt cpu context flag (ie not nicely recoverable).
>
> We disable those machine checks on K8 because they're not fully reliable.

For the kexec on panic not shutting the hardware down if at all possible is
the ideal. There I think we need a mode to not touch a chunk of the iommu
and reserve it for the kexec on panic kernel (or perhaps just use the
swiotlb code if we need bounce buffers at all).

Eric

2007-06-22 21:46:31

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

On Fri, Jun 22, 2007 at 03:32:53PM -0600, Eric W. Biederman wrote:
> Alan Cox <[email protected]> writes:
>
> > You've got mapped live gart pages from the previous kernel. Even
> > if you disable the gart before a memset you may well have the
> > video card using gart translations and possibly live IOMMU
> > mappings for devices using it via bus mastering - and those will
> > cause you MCE exceptions with a corrupt cpu context flag (ie not
> > nicely recoverable).
>
> The original plan (which we have not followed up on). Was to
> reserve a chunk of any iommu for the kexec on panic kernel. Then to
> just have the second kernel use that unused chunk. This is how we
> treat the normal memory space and it seems a nice and simple
> approach to this kind of problem.
>
> For a normal kexec we should shut everything down before the kernel
> transition so it should not be an issue.
>
> YH do you think you can look at simply reserving a portion of the
> iommu? And having the kexec on panic kernel use the reserved
> portion?

How would this work with an isolation capable IOMMU which has
different address spaces for different devices? (e.g., Calgary which
is in mainline, Intel's VT-d which is coming soon, etc).

Cheers,
Muli

2007-06-22 21:59:28

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

On 6/22/07, Eric W. Biederman <[email protected]> wrote:
> Alan Cox <[email protected]> writes:
>
> For a normal kexec we should shut everything down before the kernel
> transition so it should not be an issue.
>
> YH do you think you can look at simply reserving a portion of the iommu?
> And having the kexec on panic kernel use the reserved portion?
>
two copy region: one for first kernel, and one for second kernel? it
should work.
first kernel is using [64M, 128M), and the second will get assign to
[64M,128M) again.
when it try to memset to clear that region it will cause restart.
in that region, only first 256K can not touched, even read. rest could
be accessed.

YH

2007-06-22 22:15:04

by Alan

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

> YH do you think you can look at simply reserving a portion of the iommu?
> And having the kexec on panic kernel use the reserved portion?

How about simply reserving all of it for the base kernel and using soft
iommu for the panic kernel, its hardly high performance criticial at this
point.

2007-06-22 22:33:05

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

Alan Cox <[email protected]> writes:

>> YH do you think you can look at simply reserving a portion of the iommu?
>> And having the kexec on panic kernel use the reserved portion?
>
> How about simply reserving all of it for the base kernel and using soft
> iommu for the panic kernel, its hardly high performance criticial at this
> point.

The original design came from thinking about systems where using the iommu
was mandatory. I think we almost always reserve memory below 1G for the kexec
on panic kernel so it really shouldn't be an issue in that case. Except
we need to pass an option to force not using the iommu. I don't think
noiommu or swiotlb is going to make any real difference.

So I'm totally in favor of turning off features if we don't need them and we
don't take a tremendous performance hit. (People get grumpy when writing
all of memory to disk takes completely unreasonable amounts of time).

Eric

2007-06-22 22:41:35

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

Eric W. Biederman wrote:
> The original design came from thinking about systems where using the iommu
> was mandatory. I think we almost always reserve memory below 1G for the kexec
> on panic kernel so it really shouldn't be an issue in that case. Except
> we need to pass an option to force not using the iommu. I don't think
> noiommu or swiotlb is going to make any real difference.
>
> So I'm totally in favor of turning off features if we don't need them and we
> don't take a tremendous performance hit. (People get grumpy when writing
> all of memory to disk takes completely unreasonable amounts of time).


So you prefer to
add diable_gart in shutdown or suspend func and let kexec to use swiotlb comand line?

YH

2007-06-22 22:51:00

by Alan

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

On Fri, 22 Jun 2007 15:43:00 -0700
Yinghai Lu <[email protected]> wrote:

> Eric W. Biederman wrote:
> > The original design came from thinking about systems where using the iommu
> > was mandatory. I think we almost always reserve memory below 1G for the kexec
> > on panic kernel so it really shouldn't be an issue in that case. Except
> > we need to pass an option to force not using the iommu. I don't think
> > noiommu or swiotlb is going to make any real difference.
> >
> > So I'm totally in favor of turning off features if we don't need them and we
> > don't take a tremendous performance hit. (People get grumpy when writing
> > all of memory to disk takes completely unreasonable amounts of time).
>
>
> So you prefer to
> add diable_gart in shutdown or suspend func and let kexec to use swiotlb comand line?

Don't disable it, just don't touch it or any of its mappings. Leave it
*alone*, and use swiotlb. That'll maximise the ability to recover stuff
from the kexec kernel (since for one you may want to dump the gart when a
3d app goes kerblam)

Alan

2007-06-22 22:55:42

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

Alan Cox wrote:

> Don't disable it, just don't touch it or any of its mappings. Leave it
> *alone*, and use swiotlb. That'll maximise the ability to recover stuff
> from the kexec kernel (since for one you may want to dump the gart when a
> 3d app goes kerblam)

How about LinuxBIOS + Kernel ===> Final kernel path?
someday EFI(PEI) + Kernel ===> Final kernel path need that too.

or the normal kexec path still could have clean shutdown.

YH

2007-06-22 22:59:13

by Alan

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

On Fri, 22 Jun 2007 15:57:08 -0700
Yinghai Lu <[email protected]> wrote:

> Alan Cox wrote:
>
> > Don't disable it, just don't touch it or any of its mappings. Leave it
> > *alone*, and use swiotlb. That'll maximise the ability to recover stuff
> > from the kexec kernel (since for one you may want to dump the gart when a
> > 3d app goes kerblam)
>
> How about LinuxBIOS + Kernel ===> Final kernel path?
> someday EFI(PEI) + Kernel ===> Final kernel path need that too.
>
> or the normal kexec path still could have clean shutdown.

The kexec path for kdump should be swiotlb and leave the GART alone as
you are dumping as much state as you can and leaving stuff as is when
possible. The new-kernel case you shut everything down so you can shut
down the GART in the old kernel and re-initialise it in the new one

Alan

2007-06-22 23:15:07

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

Alan Cox <[email protected]> writes:

> On Fri, 22 Jun 2007 15:57:08 -0700
> Yinghai Lu <[email protected]> wrote:
>
>> Alan Cox wrote:
>>
>> > Don't disable it, just don't touch it or any of its mappings. Leave it
>> > *alone*, and use swiotlb. That'll maximise the ability to recover stuff
>> > from the kexec kernel (since for one you may want to dump the gart when a
>> > 3d app goes kerblam)
>>
>> How about LinuxBIOS + Kernel ===> Final kernel path?
>> someday EFI(PEI) + Kernel ===> Final kernel path need that too.
>>
>> or the normal kexec path still could have clean shutdown.
>
> The kexec path for kdump should be swiotlb and leave the GART alone as
> you are dumping as much state as you can and leaving stuff as is when
> possible. The new-kernel case you shut everything down so you can shut
> down the GART in the old kernel and re-initialise it in the new one

Agreed.

Eric

2007-06-23 00:14:22

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

On Saturday 23 June 2007 00:19:51 Alan Cox wrote:
> > YH do you think you can look at simply reserving a portion of the iommu?
> > And having the kexec on panic kernel use the reserved portion?
>
> How about simply reserving all of it for the base kernel and using soft
> iommu for the panic kernel, its hardly high performance criticial at this
> point.

The kdump kernel should be normally all <4GB anyways. You won't
need any IOMMU for its IO unless you O_DIRECT/sendfile out of /proc/kcore.
Just don't do that (but I suspect it won't work anyways)

If it's not then swiotlb will also not work because it won't get
any memory <4GB.

But I doubt this was YH's problem - the panic kernel memory
is always reserved and there shouldn't be any ongoing DMAs in this
area anyways. And what happens outside the kdump kernel shouldn't matter.

I suspect he rather saw problems with non kdump kexec where we
can just shut down the GART properly beforehand.

-Andi

2007-06-23 00:26:27

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

Andi Kleen wrote:
> On Saturday 23 June 2007 00:19:51 Alan Cox wrote:
> The kdump kernel should be normally all <4GB anyways. You won't
> need any IOMMU for its IO unless you O_DIRECT/sendfile out of /proc/kcore.
> Just don't do that (but I suspect it won't work anyways)
>
> If it's not then swiotlb will also not work because it won't get
> any memory <4GB.
>
> But I doubt this was YH's problem - the panic kernel memory
> is always reserved and there shouldn't be any ongoing DMAs in this
> area anyways. And what happens outside the kdump kernel shouldn't matter.
>
> I suspect he rather saw problems with non kdump kexec where we
> can just shut down the GART properly beforehand.

current I only test kexec only. So clean shut GART in first kernel will help.

where is hook for shutdown? add one in dma_ops?

YH

2007-06-23 00:38:43

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture


> current I only test kexec only. So clean shut GART in first kernel will help.
>
> where is hook for shutdown? add one in dma_ops?

The low level code could just register its own shutdown handler.
No need to go through dma_ops I think.

-Andi

2007-06-23 02:33:39

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86-64: disable the GART in shutdown

[PATCH] x86-64: disable the GART in shutdown

For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G RAM
installed. when mem is allocated for GART, it will do the memset for clear.
and for kexec case, the first kernel already enable that, the memset in second
kernel will cause the system restart. solution will be:
in second kernel: disable that at first before we try to allocate mem for it.
or in the first kernel: do disable that before shutdown.

Signed-off-by: Yinghai Lu <[email protected]>

arch/x86_64/kernel/pci-dma.c | 7 +++++++
arch/x86_64/kernel/pci-gart.c | 20 ++++++++++++++++++++
arch/x86_64/kernel/reboot.c | 4 ++++
include/asm-x86_64/proto.h | 2 ++
4 files changed, 33 insertions(+)
diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
index ae091cd..4904d5f 100644
--- a/arch/x86_64/kernel/pci-gart.c
+++ b/arch/x86_64/kernel/pci-gart.c
@@ -571,6 +571,26 @@ static const struct dma_mapping_ops gart_dma_ops = {
.unmap_sg = gart_unmap_sg,
};

+void gart_iommu_shutdown(void)
+{
+ struct pci_dev *dev;
+ int i;
+
+ if (dma_ops != &gart_dma_ops)
+ return;
+
+ for (i = 0; i < num_k8_northbridges; i++) {
+ u32 ctl;
+
+ dev = k8_northbridges[i];
+ pci_read_config_dword(dev, 0x90, &ctl);
+
+ ctl &= ~1;
+
+ pci_write_config_dword(dev, 0x90, ctl);
+ }
+}
+
void __init gart_iommu_init(void)
{
struct agp_kern_info info;
diff --git a/arch/x86_64/kernel/pci-dma.c b/arch/x86_64/kernel/pci-dma.c
index 9f80aad..64f2ab3 100644
--- a/arch/x86_64/kernel/pci-dma.c
+++ b/arch/x86_64/kernel/pci-dma.c
@@ -322,6 +322,13 @@ static int __init pci_iommu_init(void)
return 0;
}

+void pci_iommu_shutdown(void)
+{
+#ifdef CONFIG_IOMMU
+ gart_iommu_shutdown();
+#endif
+}
+
#ifdef CONFIG_PCI
/* Many VIA bridges seem to corrupt data for DAC. Disable it here */

diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c
index 7503068..e6e65c2 100644
--- a/arch/x86_64/kernel/reboot.c
+++ b/arch/x86_64/kernel/reboot.c
@@ -16,6 +16,7 @@
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
#include <asm/apic.h>
+#include <asm/proto.h>

/*
* Power off function, if any
@@ -81,6 +82,7 @@ static inline void kb_wait(void)
void machine_shutdown(void)
{
unsigned long flags;
+
/* Stop the cpus and apics */
#ifdef CONFIG_SMP
int reboot_cpu_id;
@@ -111,6 +113,8 @@ void machine_shutdown(void)
disable_IO_APIC();

local_irq_restore(flags);
+
+ pci_iommu_shutdown();
}

void machine_emergency_restart(void)
diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
index 85255db..fb9f73e 100644
--- a/include/asm-x86_64/proto.h
+++ b/include/asm-x86_64/proto.h
@@ -85,11 +85,13 @@ extern int exception_trace;
extern unsigned cpu_khz;
extern unsigned tsc_khz;

+extern void pci_iommu_shutdown(void);
extern void no_iommu_init(void);
extern int force_iommu, no_iommu;
extern int iommu_detected;
#ifdef CONFIG_IOMMU
extern void gart_iommu_init(void);
+extern void gart_iommu_shutdown(void);
extern void __init gart_parse_options(char *);
extern void iommu_hole_init(void);
extern int fallback_aper_order;

2007-06-23 09:03:30

by Alan

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

> But I doubt this was YH's problem - the panic kernel memory
> is always reserved and there shouldn't be any ongoing DMAs in this
> area anyways. And what happens outside the kdump kernel shouldn't matter.

In the kdump case it looks like there is still DMA going on through the
GART on some systems when kdump zaps it.

2007-06-23 10:24:15

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

On Fri, Jun 22, 2007 at 05:27:50PM -0700, Yinghai Lu wrote:
> Andi Kleen wrote:
> >On Saturday 23 June 2007 00:19:51 Alan Cox wrote:
> >The kdump kernel should be normally all <4GB anyways. You won't
> >need any IOMMU for its IO unless you O_DIRECT/sendfile out of /proc/kcore.
> >Just don't do that (but I suspect it won't work anyways)
> >
> >If it's not then swiotlb will also not work because it won't get
> >any memory <4GB.
> >
> >But I doubt this was YH's problem - the panic kernel memory
> >is always reserved and there shouldn't be any ongoing DMAs in this
> >area anyways. And what happens outside the kdump kernel shouldn't matter.
> >
> >I suspect he rather saw problems with non kdump kexec where we
> >can just shut down the GART properly beforehand.
>
> current I only test kexec only. So clean shut GART in first kernel will
> help.
>
> where is hook for shutdown? add one in dma_ops?

I was wondering the same thing. I think it should hook into the
standard device model. It definitely should *not* be in the dma_ops
(it has nothing to do with DMA...) - if needed for this or something
else we should add a struct iommu_ops which includes a dma_ops member.

Cheers,
Muli

2007-06-23 10:39:59

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART in shutdown

On Fri, Jun 22, 2007 at 07:34:59PM -0700, Yinghai Lu wrote:
> [PATCH] x86-64: disable the GART in shutdown
>
> For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G RAM
> installed. when mem is allocated for GART, it will do the memset for clear.
> and for kexec case, the first kernel already enable that, the memset in second
> kernel will cause the system restart. solution will be:
> in second kernel: disable that at first before we try to allocate mem for it.
> or in the first kernel: do disable that before shutdown.
>
> Signed-off-by: Yinghai Lu <[email protected]>
>

[snip]

> diff --git a/arch/x86_64/kernel/pci-dma.c b/arch/x86_64/kernel/pci-dma.c
> index 9f80aad..64f2ab3 100644
> --- a/arch/x86_64/kernel/pci-dma.c
> +++ b/arch/x86_64/kernel/pci-dma.c
> @@ -322,6 +322,13 @@ static int __init pci_iommu_init(void)
> return 0;
> }
>
> +void pci_iommu_shutdown(void)
> +{
> +#ifdef CONFIG_IOMMU
> + gart_iommu_shutdown();
> +#endif
> +}

I'm going to need exactly the same hook fro Calgary, as well Intel for
VT-d, and AMD for their upcoming IOMMU, etc. How about we do

struct iommu_ops {
struct dma_ops {
...
}
void (*shutdown)(void);
}

And then pci_iommu_shutdown() becomes

if (iommu_ops->shutdown)
iommu_ops->shutdown();

?

Cheers,
Muli

2007-06-23 11:01:26

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART in shutdown


> I'm going to need exactly the same hook fro Calgary, as well Intel for
> VT-d, and AMD for their upcoming IOMMU, etc. How about we do
>
> struct iommu_ops {
> struct dma_ops {
> ...
> }
> void (*shutdown)(void);
> }
>
> And then pci_iommu_shutdown() becomes
>
> if (iommu_ops->shutdown)
> iommu_ops->shutdown();

I think it's cleaner if everybody registers their own shutdown handler in sysfs
Don't see any value in going through a generic layer because there is no
shared code.

-Andi


2007-06-23 11:08:41

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART in shutdown

On Saturday 23 June 2007 04:34:59 Yinghai Lu wrote:
> [PATCH] x86-64: disable the GART in shutdown
>
> For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G RAM
> installed. when mem is allocated for GART, it will do the memset for clear.
> and for kexec case, the first kernel already enable that, the memset in second
> kernel will cause the system restart. solution will be:
> in second kernel: disable that at first before we try to allocate mem for it.
> or in the first kernel: do disable that before shutdown.

I thought we agreed it wasn't needed for the kdump case? Or rather
the only way to fix kdump was to reserve or better not use GART. Just disabling
it is not enough because it could be eventually reenabled and any
in flight DMAs could hit innocent memory.

For kexec it would be sufficient to do it in a platform device hook.

Also I'm a little uneasy in doing PCI access that late in reboot.
While there is luckily no bridge involved here it is still risky
after possibly other PCI hardware has been disabled.

So my suggestion would be:

(1) implement sysfs platform device based shutdown hooks (preferable
before PCI shutdown)
(2) make sure GART is never enabled in kernels that are all < 4GB (as
in the kdump kernel). But what to do about AGP?

-Andi

2007-06-23 11:09:39

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART in shutdown

On Sat, Jun 23, 2007 at 12:59:17PM +0200, Andi Kleen wrote:
>
> > I'm going to need exactly the same hook fro Calgary, as well Intel for
> > VT-d, and AMD for their upcoming IOMMU, etc. How about we do
> >
> > struct iommu_ops {
> > struct dma_ops {
> > ...
> > }
> > void (*shutdown)(void);
> > }
> >
> > And then pci_iommu_shutdown() becomes
> >
> > if (iommu_ops->shutdown)
> > iommu_ops->shutdown();
>
> I think it's cleaner if everybody registers their own shutdown
> handler in sysfs Don't see any value in going through a generic
> layer because there is no shared code.

Going through the shared layer makes it obvious to new IOMMU
implementers that they will need to implement a shutdown hook and
takes roughly the same ammount of work. I think that's a net win.

Cheers,
Muli

2007-06-23 11:12:39

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture

On Sat, Jun 23, 2007 at 02:14:01AM +0200, Andi Kleen wrote:
> On Saturday 23 June 2007 00:19:51 Alan Cox wrote:
> > > YH do you think you can look at simply reserving a portion of the iommu?
> > > And having the kexec on panic kernel use the reserved portion?
> >
> > How about simply reserving all of it for the base kernel and using soft
> > iommu for the panic kernel, its hardly high performance criticial at this
> > point.
>
> The kdump kernel should be normally all <4GB anyways. You won't
> need any IOMMU for its IO unless you O_DIRECT/sendfile out of /proc/kcore.
> Just don't do that (but I suspect it won't work anyways)
>

Hi Andi,

Yes, kdump kernel is generally <4GB . Won't I require IOMMU while I am
copying the high memory contents in second kernel (lets say 16 GB of memory
and destination device is not capable of addressing anything more than 4G
for DMA operation)?

> If it's not then swiotlb will also not work because it won't get
> any memory <4GB.
>

Will using IOMMU instead of swiotlb give noticeable perfomance boost.
One of the goals is to automatically save the crash dump as soon as
possible and boot back into production kernel to reduce system down
time and reduce availability.

Right now I am trvelling. After OLS will try to come up with a patch
for reserving a couple of IOMMU entries for kdump purposes. Only issue
is that any boot time reservation requirements make it impossible to
enable a feature at run time without rebooting the system.

Thanks
Vivek

2007-06-23 13:15:07

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART before allocate aperture


> Yes, kdump kernel is generally <4GB . Won't I require IOMMU while I am
> copying the high memory contents in second kernel (lets say 16 GB of memory
> and destination device is not capable of addressing anything more than 4G
> for DMA operation)?

It cannot happen; you don't support mmap on vmcore. Also even if it
was possible it would likely work for networking/block IO because
they support bouncing using their own mechanism.

When you read() from vmcore to write the data out you will always use the CPU
to copy and then do IO from the copied data.

> > If it's not then swiotlb will also not work because it won't get
> > any memory <4GB.
>
> Will using IOMMU instead of swiotlb give noticeable perfomance boost.

The difference is not that dramatic and less and less IO devices need
it anyways. And if you're <4GB you will never need it.

-Andi

2007-06-23 16:53:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART in shutdown

> On Fri, 22 Jun 2007 19:34:59 -0700 Yinghai Lu <[email protected]> wrote:
> [PATCH] x86-64: disable the GART in shutdown
>
> For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G RAM
> installed. when mem is allocated for GART, it will do the memset for clear.
> and for kexec case, the first kernel already enable that, the memset in second
> kernel will cause the system restart. solution will be:
> in second kernel: disable that at first before we try to allocate mem for it.
> or in the first kernel: do disable that before shutdown.
>

The patch seems to do some slightly inappropriate things.

>
> arch/x86_64/kernel/pci-dma.c | 7 +++++++
> arch/x86_64/kernel/pci-gart.c | 20 ++++++++++++++++++++
> arch/x86_64/kernel/reboot.c | 4 ++++
> include/asm-x86_64/proto.h | 2 ++
> 4 files changed, 33 insertions(+)
> diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
> index ae091cd..4904d5f 100644
> --- a/arch/x86_64/kernel/pci-gart.c
> +++ b/arch/x86_64/kernel/pci-gart.c
> @@ -571,6 +571,26 @@ static const struct dma_mapping_ops gart_dma_ops = {
> .unmap_sg = gart_unmap_sg,
> };
>
> +void gart_iommu_shutdown(void)
> +{
> + struct pci_dev *dev;
> + int i;
> +
> + if (dma_ops != &gart_dma_ops)
> + return;
> +
> + for (i = 0; i < num_k8_northbridges; i++) {
> + u32 ctl;
> +
> + dev = k8_northbridges[i];
> + pci_read_config_dword(dev, 0x90, &ctl);
> +
> + ctl &= ~1;
> +
> + pci_write_config_dword(dev, 0x90, ctl);
> + }
> +}

OK.

> void __init gart_iommu_init(void)
> {
> struct agp_kern_info info;
> diff --git a/arch/x86_64/kernel/pci-dma.c b/arch/x86_64/kernel/pci-dma.c
> index 9f80aad..64f2ab3 100644
> --- a/arch/x86_64/kernel/pci-dma.c
> +++ b/arch/x86_64/kernel/pci-dma.c
> @@ -322,6 +322,13 @@ static int __init pci_iommu_init(void)
> return 0;
> }
>
> +void pci_iommu_shutdown(void)
> +{
> +#ifdef CONFIG_IOMMU
> + gart_iommu_shutdown();
> +#endif
> +}

It'd be better to avoid the ifdef-in-C by providing a stub function in a
header file if !CONFIG_IOMMU. That's quite standard kernel practice and
might help avoid some other problems...

> #ifdef CONFIG_PCI
> /* Many VIA bridges seem to corrupt data for DAC. Disable it here */
>
> diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c
> index 7503068..e6e65c2 100644
> --- a/arch/x86_64/kernel/reboot.c
> +++ b/arch/x86_64/kernel/reboot.c
> @@ -16,6 +16,7 @@
> #include <asm/pgtable.h>
> #include <asm/tlbflush.h>
> #include <asm/apic.h>
> +#include <asm/proto.h>
>
> /*
> * Power off function, if any
> @@ -81,6 +82,7 @@ static inline void kb_wait(void)
> void machine_shutdown(void)
> {
> unsigned long flags;
> +
> /* Stop the cpus and apics */
> #ifdef CONFIG_SMP
> int reboot_cpu_id;
> @@ -111,6 +113,8 @@ void machine_shutdown(void)
> disable_IO_APIC();
>
> local_irq_restore(flags);
> +
> + pci_iommu_shutdown();
> }

And does the kernel work OK if CONFIG_PCI=n? I think so from inspection,
by luck.

> void machine_emergency_restart(void)
> diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
> index 85255db..fb9f73e 100644
> --- a/include/asm-x86_64/proto.h
> +++ b/include/asm-x86_64/proto.h
> @@ -85,11 +85,13 @@ extern int exception_trace;
> extern unsigned cpu_khz;
> extern unsigned tsc_khz;
>
> +extern void pci_iommu_shutdown(void);

This is the only pci-related function which is declared in proto.h. I
suspect you chose the wrong header file for this declaration.


> extern void no_iommu_init(void);
> extern int force_iommu, no_iommu;
> extern int iommu_detected;
> #ifdef CONFIG_IOMMU
> extern void gart_iommu_init(void);
> +extern void gart_iommu_shutdown(void);
> extern void __init gart_parse_options(char *);
> extern void iommu_hole_init(void);
> extern int fallback_aper_order;

2007-06-25 00:18:19

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART in shutdown

On 6/23/07, Andi Kleen <[email protected]> wrote:

> So my suggestion would be:
>
> (1) implement sysfs platform device based shutdown hooks (preferable
> before PCI shutdown)

BTW, it would be better if DMA ram range was registered there too.

> (2) make sure GART is never enabled in kernels that are all < 4GB (as
> in the kdump kernel). But what to do about AGP?
yeah, AGP bridge will enable GART already.

YH

2007-06-25 00:22:40

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART in shutdown

On 6/23/07, Andrew Morton <[email protected]> wrote:
> > void __init gart_iommu_init(void)
> > {
> > struct agp_kern_info info;
> > diff --git a/arch/x86_64/kernel/pci-dma.c b/arch/x86_64/kernel/pci-dma.c
> > index 9f80aad..64f2ab3 100644
> > --- a/arch/x86_64/kernel/pci-dma.c
> > +++ b/arch/x86_64/kernel/pci-dma.c
> > @@ -322,6 +322,13 @@ static int __init pci_iommu_init(void)
> > return 0;
> > }
> >
> > +void pci_iommu_shutdown(void)
> > +{
> > +#ifdef CONFIG_IOMMU
> > + gart_iommu_shutdown();
> > +#endif
> > +}
>
> It'd be better to avoid the ifdef-in-C by providing a stub function in a
> header file if !CONFIG_IOMMU. That's quite standard kernel practice and
> might help avoid some other problems...

move ifdef into gart_iommu_shudown()?

>
> > #ifdef CONFIG_PCI
> > /* Many VIA bridges seem to corrupt data for DAC. Disable it here */
> >
> > diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c
> > index 7503068..e6e65c2 100644
> > --- a/arch/x86_64/kernel/reboot.c
> > +++ b/arch/x86_64/kernel/reboot.c
> > @@ -16,6 +16,7 @@
> > #include <asm/pgtable.h>
> > #include <asm/tlbflush.h>
> > #include <asm/apic.h>
> > +#include <asm/proto.h>
> >
> > /*
> > * Power off function, if any
> > @@ -81,6 +82,7 @@ static inline void kb_wait(void)
> > void machine_shutdown(void)
> > {
> > unsigned long flags;
> > +
> > /* Stop the cpus and apics */
> > #ifdef CONFIG_SMP
> > int reboot_cpu_id;
> > @@ -111,6 +113,8 @@ void machine_shutdown(void)
> > disable_IO_APIC();
> >
> > local_irq_restore(flags);
> > +
> > + pci_iommu_shutdown();
> > }
>
> And does the kernel work OK if CONFIG_PCI=n? I think so from inspection,
> by luck.

x86_64 always have that PCI enabled?

>
> > void machine_emergency_restart(void)
> > diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
> > index 85255db..fb9f73e 100644
> > --- a/include/asm-x86_64/proto.h
> > +++ b/include/asm-x86_64/proto.h
> > @@ -85,11 +85,13 @@ extern int exception_trace;
> > extern unsigned cpu_khz;
> > extern unsigned tsc_khz;
> >
> > +extern void pci_iommu_shutdown(void);
>
> This is the only pci-related function which is declared in proto.h. I
> suspect you chose the wrong header file for this declaration.
i tried to put that into dma_mapping.h, but htat will need more header
files before that to make the compiler happy, may need to find another
header file for it.

YH

2007-06-25 02:11:39

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART in shutdown

On Sun, Jun 24, 2007 at 05:22:30PM -0700, Yinghai Lu wrote:
> On 6/23/07, Andrew Morton <[email protected]> wrote:
> >> void __init gart_iommu_init(void)
> >> {
> >> struct agp_kern_info info;
> >> diff --git a/arch/x86_64/kernel/pci-dma.c b/arch/x86_64/kernel/pci-dma.c
> >> index 9f80aad..64f2ab3 100644
> >> --- a/arch/x86_64/kernel/pci-dma.c
> >> +++ b/arch/x86_64/kernel/pci-dma.c
> >> @@ -322,6 +322,13 @@ static int __init pci_iommu_init(void)
> >> return 0;
> >> }
> >>
> >> +void pci_iommu_shutdown(void)
> >> +{
> >> +#ifdef CONFIG_IOMMU
> >> + gart_iommu_shutdown();
> >> +#endif
> >> +}
> >
> >It'd be better to avoid the ifdef-in-C by providing a stub function in a
> >header file if !CONFIG_IOMMU. That's quite standard kernel practice and
> >might help avoid some other problems...
>
> move ifdef into gart_iommu_shudown()?

What Andrew means is
(in a header file)

#ifdef CONFIG_IOMMU
extern void gart_iommu_shutdown(void);
#else
static inline void gart_iommu_shutdown(void)
{
}
#endif /* CONFIG_IOMMU */

Cheers,
Muli

2007-06-25 19:32:33

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86-64: disable the GART in shutdown v2

[PATCH] x86-64: disable the GART in shutdown

For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G RAM
installed. when using kexec to load second kernel. In the second kernel,
when mem is allocated for GART, it will do the memset for clear, it will cause
restart, because some device still used that for dma.
solution will be:
in second kernel: disable that at first before we try to allocate mem for it.
or in the first kernel: do disable that before shutdown.
Andi/Eric/Alan prefer to second one for clean shutdown in first kernel.
Andi also point out need to consider to AGP enable but mem less 4G case too.

Signed-off-by: Yinghai Lu <[email protected]>

arch/x86_64/kernel/pci-dma.c | 5 +++++
arch/x86_64/kernel/pci-gart.c | 21 +++++++++++++++++++++
arch/x86_64/kernel/reboot.c | 4 ++++
include/asm-x86_64/proto.h | 7 +++++++
4 files changed, 37 insertions(+)
diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
index ae091cd..6c4fe16 100644
--- a/arch/x86_64/kernel/pci-gart.c
+++ b/arch/x86_64/kernel/pci-gart.c
@@ -571,6 +571,27 @@ static const struct dma_mapping_ops gart_dma_ops = {
.unmap_sg = gart_unmap_sg,
};

+void gart_iommu_shutdown(void)
+{
+ struct pci_dev *dev;
+ int i;
+
+
+ if (noagp && (dma_ops != &gart_dma_ops))
+ return;
+
+ for (i = 0; i < num_k8_northbridges; i++) {
+ u32 ctl;
+
+ dev = k8_northbridges[i];
+ pci_read_config_dword(dev, 0x90, &ctl);
+
+ ctl &= ~1;
+
+ pci_write_config_dword(dev, 0x90, ctl);
+ }
+}
+
void __init gart_iommu_init(void)
{
struct agp_kern_info info;
diff --git a/arch/x86_64/kernel/pci-dma.c b/arch/x86_64/kernel/pci-dma.c
index 9f80aad..b406b54 100644
--- a/arch/x86_64/kernel/pci-dma.c
+++ b/arch/x86_64/kernel/pci-dma.c
@@ -322,6 +322,11 @@ static int __init pci_iommu_init(void)
return 0;
}

+void pci_iommu_shutdown(void)
+{
+ gart_iommu_shutdown();
+}
+
#ifdef CONFIG_PCI
/* Many VIA bridges seem to corrupt data for DAC. Disable it here */

diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c
index 7503068..e6e65c2 100644
--- a/arch/x86_64/kernel/reboot.c
+++ b/arch/x86_64/kernel/reboot.c
@@ -16,6 +16,7 @@
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
#include <asm/apic.h>
+#include <asm/proto.h>

/*
* Power off function, if any
@@ -81,6 +82,7 @@ static inline void kb_wait(void)
void machine_shutdown(void)
{
unsigned long flags;
+
/* Stop the cpus and apics */
#ifdef CONFIG_SMP
int reboot_cpu_id;
@@ -111,6 +113,8 @@ void machine_shutdown(void)
disable_IO_APIC();

local_irq_restore(flags);
+
+ pci_iommu_shutdown();
}

void machine_emergency_restart(void)
diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
index 85255db..b70aa0c 100644
--- a/include/asm-x86_64/proto.h
+++ b/include/asm-x86_64/proto.h
@@ -85,11 +85,13 @@ extern int exception_trace;
extern unsigned cpu_khz;
extern unsigned tsc_khz;

+extern void pci_iommu_shutdown(void);
extern void no_iommu_init(void);
extern int force_iommu, no_iommu;
extern int iommu_detected;
#ifdef CONFIG_IOMMU
extern void gart_iommu_init(void);
+extern void gart_iommu_shutdown(void);
extern void __init gart_parse_options(char *);
extern void iommu_hole_init(void);
extern int fallback_aper_order;
@@ -101,6 +103,11 @@ extern int fix_aperture;
#else
#define iommu_aperture 0
#define iommu_aperture_allowed 0
+
+static inline void gart_iommu_shutdown(void)
+{
+}
+
#endif

extern int reboot_force;

2007-06-25 19:41:46

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART in shutdown v2

On Mon, Jun 25, 2007 at 12:34:03PM -0700, Yinghai Lu wrote:

> [PATCH] x86-64: disable the GART in shutdown
>
> For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G RAM
> installed. when using kexec to load second kernel. In the second kernel,
> when mem is allocated for GART, it will do the memset for clear, it will cause
> restart, because some device still used that for dma.
> solution will be:
> in second kernel: disable that at first before we try to allocate mem for it.
> or in the first kernel: do disable that before shutdown.
> Andi/Eric/Alan prefer to second one for clean shutdown in first kernel.
> Andi also point out need to consider to AGP enable but mem less 4G case too.
>
> Signed-off-by: Yinghai Lu <[email protected]>
>
> arch/x86_64/kernel/pci-dma.c | 5 +++++
> arch/x86_64/kernel/pci-gart.c | 21 +++++++++++++++++++++
> arch/x86_64/kernel/reboot.c | 4 ++++
> include/asm-x86_64/proto.h | 7 +++++++
> 4 files changed, 37 insertions(+)
> diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
> index ae091cd..6c4fe16 100644
> --- a/arch/x86_64/kernel/pci-gart.c
> +++ b/arch/x86_64/kernel/pci-gart.c
> @@ -571,6 +571,27 @@ static const struct dma_mapping_ops gart_dma_ops = {
> .unmap_sg = gart_unmap_sg,
> };
>
> +void gart_iommu_shutdown(void)
> +{
> + struct pci_dev *dev;
> + int i;
> +

extra blank line

> +
> + if (noagp && (dma_ops != &gart_dma_ops))
> + return;

noagp? did you mean 'no_agp'?

> +
> + for (i = 0; i < num_k8_northbridges; i++) {
> + u32 ctl;
> +
> + dev = k8_northbridges[i];
> + pci_read_config_dword(dev, 0x90, &ctl);
> +
> + ctl &= ~1;
> +
> + pci_write_config_dword(dev, 0x90, ctl);
> + }
> +}
> +
> void __init gart_iommu_init(void)
> {
> struct agp_kern_info info;
> diff --git a/arch/x86_64/kernel/pci-dma.c b/arch/x86_64/kernel/pci-dma.c
> index 9f80aad..b406b54 100644
> --- a/arch/x86_64/kernel/pci-dma.c
> +++ b/arch/x86_64/kernel/pci-dma.c
> @@ -322,6 +322,11 @@ static int __init pci_iommu_init(void)
> return 0;
> }
>
> +void pci_iommu_shutdown(void)
> +{
> + gart_iommu_shutdown();

I really dislike this. Here's how this function is going to look in a
few months:

void pci_iommu_shutdown(void)
{
gart_iommu_shutdown();

calgary_iommu_shutdown();

vtd_iommu_shutdown();

amd_iommu_shutdown();

/* etc, ad nauseam */
}

Where all of these are no-ops, except one. Now what's wrong with this
picture?

> diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
> index 85255db..b70aa0c 100644
> --- a/include/asm-x86_64/proto.h
> +++ b/include/asm-x86_64/proto.h
> @@ -85,11 +85,13 @@ extern int exception_trace;
> extern unsigned cpu_khz;
> extern unsigned tsc_khz;
>
> +extern void pci_iommu_shutdown(void);
> extern void no_iommu_init(void);
> extern int force_iommu, no_iommu;
> extern int iommu_detected;
> #ifdef CONFIG_IOMMU
> extern void gart_iommu_init(void);
> +extern void gart_iommu_shutdown(void);
> extern void __init gart_parse_options(char *);
> extern void iommu_hole_init(void);
> extern int fallback_aper_order;
> @@ -101,6 +103,11 @@ extern int fix_aperture;
> #else
> #define iommu_aperture 0
> #define iommu_aperture_allowed 0
> +
> +static inline void gart_iommu_shutdown(void)
> +{
> +}
> +

I suggest include/asm-x86_64/iommu.h for this. proto.h doesn't have
anything to do with it.

Cheers,
Muli

2007-06-25 19:51:17

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART in shutdown v2

Muli Ben-Yehuda wrote:
>> diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
>> index ae091cd..6c4fe16 100644
>> --- a/arch/x86_64/kernel/pci-gart.c
>> +++ b/arch/x86_64/kernel/pci-gart.c
>> @@ -571,6 +571,27 @@ static const struct dma_mapping_ops gart_dma_ops = {
>> .unmap_sg = gart_unmap_sg,
>> };
>>
>> +void gart_iommu_shutdown(void)
>> +{
>> + struct pci_dev *dev;
>> + int i;
>> +
>
> extra blank line
>
>> +
>> + if (noagp && (dma_ops != &gart_dma_ops))
>> + return;
>
> noagp? did you mean 'no_agp'?
>
I will fix it.
>> +
>> + for (i = 0; i < num_k8_northbridges; i++) {
>> + u32 ctl;
>> +
>> + dev = k8_northbridges[i];
>> + pci_read_config_dword(dev, 0x90, &ctl);
>> +
>> + ctl &= ~1;
>> +
>> + pci_write_config_dword(dev, 0x90, ctl);
>> + }
>> +}
>> +
>> void __init gart_iommu_init(void)
>> {
>> struct agp_kern_info info;
>> diff --git a/arch/x86_64/kernel/pci-dma.c b/arch/x86_64/kernel/pci-dma.c
>> index 9f80aad..b406b54 100644
>> --- a/arch/x86_64/kernel/pci-dma.c
>> +++ b/arch/x86_64/kernel/pci-dma.c
>> @@ -322,6 +322,11 @@ static int __init pci_iommu_init(void)
>> return 0;
>> }
>>
>> +void pci_iommu_shutdown(void)
>> +{
>> + gart_iommu_shutdown();
>
> I really dislike this. Here's how this function is going to look in a
> few months:
>
> void pci_iommu_shutdown(void)
> {
> gart_iommu_shutdown();
>
> calgary_iommu_shutdown();
>
> vtd_iommu_shutdown();
>
> amd_iommu_shutdown();
>
> /* etc, ad nauseam */
> }
>
> Where all of these are no-ops, except one. Now what's wrong with this
> picture?
>
>> diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
>> index 85255db..b70aa0c 100644
>> --- a/include/asm-x86_64/proto.h
>> +++ b/include/asm-x86_64/proto.h
>> @@ -85,11 +85,13 @@ extern int exception_trace;
>> extern unsigned cpu_khz;
>> extern unsigned tsc_khz;
>>
>> +extern void pci_iommu_shutdown(void);
>> extern void no_iommu_init(void);
>> extern int force_iommu, no_iommu;
>> extern int iommu_detected;
>> #ifdef CONFIG_IOMMU
>> extern void gart_iommu_init(void);
>> +extern void gart_iommu_shutdown(void);
>> extern void __init gart_parse_options(char *);
>> extern void iommu_hole_init(void);
>> extern int fallback_aper_order;
>> @@ -101,6 +103,11 @@ extern int fix_aperture;
>> #else
>> #define iommu_aperture 0
>> #define iommu_aperture_allowed 0
>> +
>> +static inline void gart_iommu_shutdown(void)
>> +{
>> +}
>> +
>
> I suggest include/asm-x86_64/iommu.h for this. proto.h doesn't have
> anything to do with it.

move iommu releated to iommu.h and add iommu_ops struct?

YH

2007-06-25 19:56:36

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH] x86-64: disable the GART in shutdown v2

On Mon, Jun 25, 2007 at 12:52:48PM -0700, Yinghai Lu wrote:

> >I suggest include/asm-x86_64/iommu.h for this. proto.h doesn't have
> >anything to do with it.
>
> move iommu releated to iommu.h and add iommu_ops struct?

That's how I would do it, to complement dma_mapping.h.

Cheers,
Muli

2007-06-25 21:48:34

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 1/2] x86-64: disable the GART in shutdown

[PATCH 1/2] x86-64: disable the GART in shutdown

For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G RAM
installed. when using kexec to load second kernel. In the second kernel,
when mem is allocated for GART, it will do the memset for clear, it will cause
restart, because some device still used that for dma.
solution will be:
in second kernel: disable that at first before we try to allocate mem for it.
or in the first kernel: do disable that before shutdown.
Andi/Eric/Alan prefer to second one for clean shutdown in first kernel.
Andi also point out need to consider to AGP enable but mem less 4G case too.

Signed-off-by: Yinghai Lu <[email protected]>

arch/x86_64/kernel/pci-dma.c | 5 +++++
arch/x86_64/kernel/pci-gart.c | 21 +++++++++++++++++++++
arch/x86_64/kernel/reboot.c | 4 ++++
include/asm-x86_64/proto.h | 7 +++++++
4 files changed, 37 insertions(+)

diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
index ae091cd..6c4fe16 100644
--- a/arch/x86_64/kernel/pci-gart.c
+++ b/arch/x86_64/kernel/pci-gart.c
@@ -571,6 +571,26 @@ static const struct dma_mapping_ops gart_dma_ops = {
.unmap_sg = gart_unmap_sg,
};

+void gart_iommu_shutdown(void)
+{
+ struct pci_dev *dev;
+ int i;
+
+ if (no_agp && (dma_ops != &gart_dma_ops))
+ return;
+
+ for (i = 0; i < num_k8_northbridges; i++) {
+ u32 ctl;
+
+ dev = k8_northbridges[i];
+ pci_read_config_dword(dev, 0x90, &ctl);
+
+ ctl &= ~1;
+
+ pci_write_config_dword(dev, 0x90, ctl);
+ }
+}
+
void __init gart_iommu_init(void)
{
struct agp_kern_info info;
diff --git a/arch/x86_64/kernel/pci-dma.c b/arch/x86_64/kernel/pci-dma.c
index 9f80aad..b406b54 100644
--- a/arch/x86_64/kernel/pci-dma.c
+++ b/arch/x86_64/kernel/pci-dma.c
@@ -322,6 +322,11 @@ static int __init pci_iommu_init(void)
return 0;
}

+void pci_iommu_shutdown(void)
+{
+ gart_iommu_shutdown();
+}
+
#ifdef CONFIG_PCI
/* Many VIA bridges seem to corrupt data for DAC. Disable it here */

diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c
index 7503068..e6e65c2 100644
--- a/arch/x86_64/kernel/reboot.c
+++ b/arch/x86_64/kernel/reboot.c
@@ -16,6 +16,7 @@
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
#include <asm/apic.h>
+#include <asm/proto.h>

/*
* Power off function, if any
@@ -81,6 +82,7 @@ static inline void kb_wait(void)
void machine_shutdown(void)
{
unsigned long flags;
+
/* Stop the cpus and apics */
#ifdef CONFIG_SMP
int reboot_cpu_id;
@@ -111,6 +113,8 @@ void machine_shutdown(void)
disable_IO_APIC();

local_irq_restore(flags);
+
+ pci_iommu_shutdown();
}

void machine_emergency_restart(void)
diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
index 85255db..b70aa0c 100644
--- a/include/asm-x86_64/proto.h
+++ b/include/asm-x86_64/proto.h
@@ -85,11 +85,13 @@ extern int exception_trace;
extern unsigned cpu_khz;
extern unsigned tsc_khz;

+extern void pci_iommu_shutdown(void);
extern void no_iommu_init(void);
extern int force_iommu, no_iommu;
extern int iommu_detected;
#ifdef CONFIG_IOMMU
extern void gart_iommu_init(void);
+extern void gart_iommu_shutdown(void);
extern void __init gart_parse_options(char *);
extern void iommu_hole_init(void);
extern int fallback_aper_order;
@@ -101,6 +103,11 @@ extern int fix_aperture;
#else
#define iommu_aperture 0
#define iommu_aperture_allowed 0
+
+static inline void gart_iommu_shutdown(void)
+{
+}
+
#endif

extern int reboot_force;

2007-06-25 21:48:48

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 2/2] x86_84: move iommu declaration from proto to iommu.h

[PATCH 2/2] x86_84: move iommu declaration from proto to iommu.h

Signed-off-by: Yinghai Lu <[email protected]>

arch/x86_64/kernel/aperture.c | 2 +-
arch/x86_64/kernel/early-quirks.c | 1 +
arch/x86_64/kernel/pci-dma.c | 2 +-
arch/x86_64/kernel/pci-gart.c | 1 +
arch/x86_64/kernel/pci-nommu.c | 2 +-
arch/x86_64/kernel/pci-swiotlb.c | 2 +-
arch/x86_64/kernel/reboot.c | 2 +-
include/asm-x86_64/iommu.h | 29 +++++++++++++++++++++++++++++
include/asm-x86_64/proto.h | 25 -------------------------
9 files changed, 36 insertions(+), 30 deletions(-)
diff --git a/arch/x86_64/kernel/aperture.c b/arch/x86_64/kernel/aperture.c
index a3d450d..3b443c8 100644
--- a/arch/x86_64/kernel/aperture.c
+++ b/arch/x86_64/kernel/aperture.c
@@ -20,7 +20,7 @@
#include <linux/ioport.h>
#include <asm/e820.h>
#include <asm/io.h>
-#include <asm/proto.h>
+#include <asm/iommu.h>
#include <asm/pci-direct.h>
#include <asm/dma.h>
#include <asm/k8.h>
diff --git a/arch/x86_64/kernel/early-quirks.c b/arch/x86_64/kernel/early-quirks.c
index 990d9c2..13aa4fd 100644
--- a/arch/x86_64/kernel/early-quirks.c
+++ b/arch/x86_64/kernel/early-quirks.c
@@ -14,6 +14,7 @@
#include <linux/pci_ids.h>
#include <asm/pci-direct.h>
#include <asm/proto.h>
+#include <asm/iommu.h>
#include <asm/dma.h>

static void __init via_bugs(void)
diff --git a/arch/x86_64/kernel/pci-dma.c b/arch/x86_64/kernel/pci-dma.c
index 9f80aad..46c1a57 100644
--- a/arch/x86_64/kernel/pci-dma.c
+++ b/arch/x86_64/kernel/pci-dma.c
@@ -8,7 +8,7 @@
#include <linux/pci.h>
#include <linux/module.h>
#include <asm/io.h>
-#include <asm/proto.h>
+#include <asm/iommu.h>
#include <asm/calgary.h>

int iommu_merge __read_mostly = 0;
diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
index ae091cd..6e44fc5 100644
--- a/arch/x86_64/kernel/pci-gart.c
+++ b/arch/x86_64/kernel/pci-gart.c
@@ -28,6 +28,7 @@
#include <asm/mtrr.h>
#include <asm/pgtable.h>
#include <asm/proto.h>
+#include <asm/iommu.h>
#include <asm/cacheflush.h>
#include <asm/swiotlb.h>
#include <asm/dma.h>
diff --git a/arch/x86_64/kernel/pci-nommu.c b/arch/x86_64/kernel/pci-nommu.c
index 6dade0c..617949e 100644
--- a/arch/x86_64/kernel/pci-nommu.c
+++ b/arch/x86_64/kernel/pci-nommu.c
@@ -6,7 +6,7 @@
#include <linux/string.h>
#include <linux/dma-mapping.h>

-#include <asm/proto.h>
+#include <asm/iommu.h>
#include <asm/processor.h>
#include <asm/dma.h>

diff --git a/arch/x86_64/kernel/pci-swiotlb.c b/arch/x86_64/kernel/pci-swiotlb.c
index 4b4569a..b2f405e 100644
--- a/arch/x86_64/kernel/pci-swiotlb.c
+++ b/arch/x86_64/kernel/pci-swiotlb.c
@@ -5,7 +5,7 @@
#include <linux/module.h>
#include <linux/dma-mapping.h>

-#include <asm/proto.h>
+#include <asm/iommu.h>
#include <asm/swiotlb.h>
#include <asm/dma.h>

diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c
index 7503068..368db2b 100644
--- a/arch/x86_64/kernel/reboot.c
+++ b/arch/x86_64/kernel/reboot.c
@@ -16,7 +16,7 @@
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
#include <asm/apic.h>
-#include <asm/proto.h>
+#include <asm/iommu.h>

/*
* Power off function, if any
diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
index b70aa0c..d6e3225 100644
--- a/include/asm-x86_64/proto.h
+++ b/include/asm-x86_64/proto.h
@@ -85,31 +85,6 @@ extern int exception_trace;
extern unsigned cpu_khz;
extern unsigned tsc_khz;

-extern void pci_iommu_shutdown(void);
-extern void no_iommu_init(void);
-extern int force_iommu, no_iommu;
-extern int iommu_detected;
-#ifdef CONFIG_IOMMU
-extern void gart_iommu_init(void);
-extern void gart_iommu_shutdown(void);
-extern void __init gart_parse_options(char *);
-extern void iommu_hole_init(void);
-extern int fallback_aper_order;
-extern int fallback_aper_force;
-extern int iommu_aperture;
-extern int iommu_aperture_allowed;
-extern int iommu_aperture_disabled;
-extern int fix_aperture;
-#else
-#define iommu_aperture 0
-#define iommu_aperture_allowed 0
-
-static inline void gart_iommu_shutdown(void)
-{
-}
-
-#endif
-
extern int reboot_force;
extern int notsc_setup(char *);


diff --git a/include/asm-x86_64/iommu.h b/include/asm-x86_64/iommu.h
new file mode 100644
index 0000000..5af471f
--- /dev/null
+++ b/include/asm-x86_64/iommu.h
@@ -0,0 +1,29 @@
+#ifndef _ASM_X8664_IOMMU_H
+#define _ASM_X8664_IOMMU_H 1
+
+extern void pci_iommu_shutdown(void);
+extern void no_iommu_init(void);
+extern int force_iommu, no_iommu;
+extern int iommu_detected;
+#ifdef CONFIG_IOMMU
+extern void gart_iommu_init(void);
+extern void gart_iommu_shutdown(void);
+extern void __init gart_parse_options(char *);
+extern void iommu_hole_init(void);
+extern int fallback_aper_order;
+extern int fallback_aper_force;
+extern int iommu_aperture;
+extern int iommu_aperture_allowed;
+extern int iommu_aperture_disabled;
+extern int fix_aperture;
+#else
+#define iommu_aperture 0
+#define iommu_aperture_allowed 0
+
+static inline void gart_iommu_shutdown(void)
+{
+}
+
+#endif
+
+#endif

2007-06-26 11:43:44

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86_84: move iommu declaration from proto to iommu.h

On Mon, Jun 25, 2007 at 02:49:26PM -0700, Yinghai Lu wrote:

> [PATCH 2/2] x86_84: move iommu declaration from proto to iommu.h

Sorry for the hassle, but this patch should come first and then your
current first patch becomes much simpler and does not touch proto.h
needlessly.

Also, I still think an iommu_ops is a better approach. I'll put
something together when I add the Calgary shutdown bits next week
after OLS.

Cheers,
Muli