2009-11-10 10:50:06

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH v2 0/9] x86: handle HW IOMMU initialization failure gracefully

This patchset is against tip/master.

The first version is:

http://marc.info/?l=linux-kernel&m=125671300920411&w=2

The changes since v1 are:

- replaced Chris' bootmem patches with the 6/9 patch to implement
free_bootmem_late in a simple way (thanks to Pekka).

- fixed the bug to break 'iommu=soft' boot opiton (found by Joerg).

- moved iommu_init_noop() to x86_init.c

- added Muli's Acked-by to Calgary patch.


==
arch/ia64/kernel/pci-swiotlb.c | 4 +-
arch/powerpc/kernel/setup_32.c | 2 +-
arch/powerpc/kernel/setup_64.c | 2 +-
arch/x86/include/asm/amd_iommu.h | 2 -
arch/x86/include/asm/calgary.h | 2 -
arch/x86/include/asm/gart.h | 5 +---
arch/x86/include/asm/iommu.h | 1 -
arch/x86/include/asm/x86_init.h | 9 +++++++
arch/x86/kernel/amd_iommu.c | 2 +-
arch/x86/kernel/amd_iommu_init.c | 19 +++-----------
arch/x86/kernel/aperture_64.c | 4 ++-
arch/x86/kernel/pci-calgary_64.c | 19 ++++-----------
arch/x86/kernel/pci-dma.c | 27 ++++++++++-----------
arch/x86/kernel/pci-gart_64.c | 16 ++++-------
arch/x86/kernel/pci-nommu.c | 9 -------
arch/x86/kernel/pci-swiotlb.c | 10 +++----
arch/x86/kernel/x86_init.c | 5 ++++
drivers/pci/dmar.c | 7 ++++-
drivers/pci/intel-iommu.c | 4 +-
include/linux/bootmem.h | 1 +
include/linux/dmar.h | 10 -------
include/linux/swiotlb.h | 5 ++-
lib/swiotlb.c | 49 +++++++++++++++++++++++++++++++------
mm/bootmem.c | 24 ++++++++++++++++++
24 files changed, 131 insertions(+), 107 deletions(-)


2009-11-10 10:52:00

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -v2 1/9] add iommu_init to x86_init_ops

We call the detections functions of all the IOMMUs then all their
initialization functions. The latter is pointless since we don't
detect multiple different IOMMUs. What we need to do is calling the
initialization function of the detected IOMMU.

This adds iommu_init hook to x86_init_ops so if an IOMMU detection
function can set its initialization function to the hook.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/x86/include/asm/x86_init.h | 9 +++++++++
arch/x86/kernel/pci-dma.c | 2 ++
arch/x86/kernel/x86_init.c | 5 +++++
3 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 66008ed..d8e7145 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -91,6 +91,14 @@ struct x86_init_timers {
};

/**
+ * struct x86_init_iommu - platform specific iommu setup
+ * @iommu_init: platform specific iommu setup
+ */
+struct x86_init_iommu {
+ int (*iommu_init)(void);
+};
+
+/**
* struct x86_init_ops - functions for platform specific setup
*
*/
@@ -101,6 +109,7 @@ struct x86_init_ops {
struct x86_init_oem oem;
struct x86_init_paging paging;
struct x86_init_timers timers;
+ struct x86_init_iommu iommu;
};

/**
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 839d49a..a13478d 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -292,6 +292,8 @@ static int __init pci_iommu_init(void)
dma_debug_add_bus(&pci_bus_type);
#endif

+ x86_init.iommu.iommu_init();
+
calgary_iommu_init();

intel_iommu_init();
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index bc9b230..c46984d 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -19,6 +19,7 @@
void __cpuinit x86_init_noop(void) { }
void __init x86_init_uint_noop(unsigned int unused) { }
void __init x86_init_pgd_noop(pgd_t *unused) { }
+int __init iommu_init_noop(void) { return 0; }

/*
* The platform setup functions are preset with the default functions
@@ -63,6 +64,10 @@ struct x86_init_ops x86_init __initdata = {
.tsc_pre_init = x86_init_noop,
.timer_init = hpet_time_init,
},
+
+ .iommu = {
+ .iommu_init = iommu_init_noop,
+ },
};

struct x86_cpuinit_ops x86_cpuinit __cpuinitdata = {
--
1.5.6.5

2009-11-10 10:50:13

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -v2 2/9] Calgary: convert detect_calgary to use iommu_init hook

This changes detect_calgary() to set init_calgary() to iommu_init hook
if detect_calgary() finds the Calgary IOMMU.

We can kill the code to check if we found the IOMMU in init_calgary()
since detect_calgary() sets init_calgary() only when it found the
IOMMU.

Signed-off-by: FUJITA Tomonori <[email protected]>
Acked-by: Muli Ben-Yehuda <[email protected]>
---
arch/x86/include/asm/calgary.h | 2 --
arch/x86/kernel/pci-calgary_64.c | 11 +++++------
arch/x86/kernel/pci-dma.c | 2 --
3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/calgary.h b/arch/x86/include/asm/calgary.h
index b03bedb..0918654 100644
--- a/arch/x86/include/asm/calgary.h
+++ b/arch/x86/include/asm/calgary.h
@@ -62,10 +62,8 @@ struct cal_chipset_ops {
extern int use_calgary;

#ifdef CONFIG_CALGARY_IOMMU
-extern int calgary_iommu_init(void);
extern void detect_calgary(void);
#else
-static inline int calgary_iommu_init(void) { return 1; }
static inline void detect_calgary(void) { return; }
#endif

diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index 971a3be..47bd419 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -46,6 +46,7 @@
#include <asm/dma.h>
#include <asm/rio.h>
#include <asm/bios_ebda.h>
+#include <asm/x86_init.h>

#ifdef CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT
int use_calgary __read_mostly = 1;
@@ -1344,6 +1345,8 @@ static void __init get_tce_space_from_tar(void)
return;
}

+int __init calgary_iommu_init(void);
+
void __init detect_calgary(void)
{
int bus;
@@ -1445,6 +1448,8 @@ void __init detect_calgary(void)
/* swiotlb for devices that aren't behind the Calgary. */
if (max_pfn > MAX_DMA32_PFN)
swiotlb = 1;
+
+ x86_init.iommu.iommu_init = calgary_iommu_init;
}
return;

@@ -1461,12 +1466,6 @@ int __init calgary_iommu_init(void)
{
int ret;

- if (no_iommu || (swiotlb && !calgary_detected))
- return -ENODEV;
-
- if (!calgary_detected)
- return -ENODEV;
-
/* ok, we're trying to use Calgary - let's roll */
printk(KERN_INFO "PCI-DMA: Using Calgary IOMMU\n");

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index a13478d..0224da8 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -294,8 +294,6 @@ static int __init pci_iommu_init(void)

x86_init.iommu.iommu_init();

- calgary_iommu_init();
-
intel_iommu_init();

amd_iommu_init();
--
1.5.6.5

2009-11-10 10:50:13

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -v2 3/9] GART: convert gart_iommu_hole_init to use iommu_init hook

This changes gart_iommu_hole_init() to set gart_iommu_init() to
iommu_init hook if gart_iommu_hole_init() finds the GART IOMMU.

We can kill the code to check if we found the IOMMU in
gart_iommu_init() since gart_iommu_hole_init() sets gart_iommu_init()
only when it found the IOMMU.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/x86/include/asm/gart.h | 5 +----
arch/x86/kernel/aperture_64.c | 2 ++
arch/x86/kernel/pci-dma.c | 2 --
arch/x86/kernel/pci-gart_64.c | 15 +++++----------
4 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/gart.h b/arch/x86/include/asm/gart.h
index 4fdd5b3..4ac5b0f 100644
--- a/arch/x86/include/asm/gart.h
+++ b/arch/x86/include/asm/gart.h
@@ -35,7 +35,7 @@ extern int gart_iommu_aperture_allowed;
extern int gart_iommu_aperture_disabled;

extern void early_gart_iommu_check(void);
-extern void gart_iommu_init(void);
+extern int gart_iommu_init(void);
extern void __init gart_parse_options(char *);
extern void gart_iommu_hole_init(void);

@@ -47,9 +47,6 @@ extern void gart_iommu_hole_init(void);
static inline void early_gart_iommu_check(void)
{
}
-static inline void gart_iommu_init(void)
-{
-}
static inline void gart_parse_options(char *options)
{
}
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 128111d..03933cf 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -28,6 +28,7 @@
#include <asm/pci-direct.h>
#include <asm/dma.h>
#include <asm/k8.h>
+#include <asm/x86_init.h>

int gart_iommu_aperture;
int gart_iommu_aperture_disabled __initdata;
@@ -400,6 +401,7 @@ void __init gart_iommu_hole_init(void)

iommu_detected = 1;
gart_iommu_aperture = 1;
+ x86_init.iommu.iommu_init = gart_iommu_init;

aper_order = (read_pci_config(bus, slot, 3, AMD64_GARTAPERTURECTL) >> 1) & 7;
aper_size = (32 * 1024 * 1024) << aper_order;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 0224da8..ecde854 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -298,8 +298,6 @@ static int __init pci_iommu_init(void)

amd_iommu_init();

- gart_iommu_init();
-
no_iommu_init();
return 0;
}
diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index eb46ab3..0410bd3 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -709,7 +709,7 @@ static void gart_iommu_shutdown(void)
}
}

-void __init gart_iommu_init(void)
+int __init gart_iommu_init(void)
{
struct agp_kern_info info;
unsigned long iommu_start;
@@ -719,7 +719,7 @@ void __init gart_iommu_init(void)
long i;

if (cache_k8_northbridges() < 0 || num_k8_northbridges == 0)
- return;
+ return 0;

#ifndef CONFIG_AGP_AMD64
no_agp = 1;
@@ -731,13 +731,6 @@ void __init gart_iommu_init(void)
(agp_copy_info(agp_bridge, &info) < 0);
#endif

- if (swiotlb)
- return;
-
- /* Did we detect a different HW IOMMU? */
- if (iommu_detected && !gart_iommu_aperture)
- return;
-
if (no_iommu ||
(!force_iommu && max_pfn <= MAX_DMA32_PFN) ||
!gart_iommu_aperture ||
@@ -747,7 +740,7 @@ void __init gart_iommu_init(void)
"but GART IOMMU not available.\n");
printk(KERN_WARNING "falling back to iommu=soft.\n");
}
- return;
+ return 0;
}

/* need to map that range */
@@ -840,6 +833,8 @@ void __init gart_iommu_init(void)
flush_gart();
dma_ops = &gart_dma_ops;
x86_platform.iommu_shutdown = gart_iommu_shutdown;
+
+ return 0;
}

void __init gart_parse_options(char *p)
--
1.5.6.5

2009-11-10 10:49:58

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -v2 4/9] amd_iommu: convert amd_iommu_detect to use iommu_init hook

This changes amd_iommu_detect() to set amd_iommu_init to
iommu_init hook if amd_iommu_detect() finds the AMD IOMMU.

We can kill the code to check if we found the IOMMU in
amd_iommu_init() since amd_iommu_detect() sets amd_iommu_init() only
when it found the IOMMU.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/x86/include/asm/amd_iommu.h | 2 --
arch/x86/kernel/amd_iommu_init.c | 17 +++--------------
arch/x86/kernel/pci-dma.c | 2 --
3 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/amd_iommu.h b/arch/x86/include/asm/amd_iommu.h
index 3604669..b8ef2ee 100644
--- a/arch/x86/include/asm/amd_iommu.h
+++ b/arch/x86/include/asm/amd_iommu.h
@@ -23,7 +23,6 @@
#include <linux/irqreturn.h>

#ifdef CONFIG_AMD_IOMMU
-extern int amd_iommu_init(void);
extern int amd_iommu_init_dma_ops(void);
extern int amd_iommu_init_passthrough(void);
extern void amd_iommu_detect(void);
@@ -32,7 +31,6 @@ extern void amd_iommu_flush_all_domains(void);
extern void amd_iommu_flush_all_devices(void);
extern void amd_iommu_apply_erratum_63(u16 devid);
#else
-static inline int amd_iommu_init(void) { return -ENODEV; }
static inline void amd_iommu_detect(void) { }
#endif

diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c
index 6acd43e..c41aabd 100644
--- a/arch/x86/kernel/amd_iommu_init.c
+++ b/arch/x86/kernel/amd_iommu_init.c
@@ -29,6 +29,7 @@
#include <asm/amd_iommu.h>
#include <asm/iommu.h>
#include <asm/gart.h>
+#include <asm/x86_init.h>

/*
* definitions for the ACPI scanning code
@@ -1176,19 +1177,10 @@ static struct sys_device device_amd_iommu = {
* functions. Finally it prints some information about AMD IOMMUs and
* the driver state and enables the hardware.
*/
-int __init amd_iommu_init(void)
+static int __init amd_iommu_init(void)
{
int i, ret = 0;

-
- if (no_iommu) {
- printk(KERN_INFO "AMD-Vi disabled by kernel command line\n");
- return 0;
- }
-
- if (!amd_iommu_detected)
- return -ENODEV;
-
/*
* First parse ACPI tables to find the largest Bus/Dev/Func
* we need to handle. Upon this information the shared data
@@ -1344,10 +1336,7 @@ void __init amd_iommu_detect(void)
if (acpi_table_parse("IVRS", early_amd_iommu_detect) == 0) {
iommu_detected = 1;
amd_iommu_detected = 1;
-#ifdef CONFIG_GART_IOMMU
- gart_iommu_aperture_disabled = 1;
- gart_iommu_aperture = 0;
-#endif
+ x86_init.iommu.iommu_init = amd_iommu_init;
}
}

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index ecde854..5ca44a9 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -296,8 +296,6 @@ static int __init pci_iommu_init(void)

intel_iommu_init();

- amd_iommu_init();
-
no_iommu_init();
return 0;
}
--
1.5.6.5

2009-11-10 10:51:46

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -v2 5/9] intel-iommu: convert detect_intel_iommu to use iommu_init hook

This changes detect_intel_iommu() to set intel_iommu_init() to
iommu_init hook if detect_intel_iommu() finds the IOMMU.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/x86/kernel/pci-dma.c | 2 --
drivers/pci/dmar.c | 4 ++++
include/linux/dmar.h | 10 ----------
3 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 5ca44a9..bed05e2 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -294,8 +294,6 @@ static int __init pci_iommu_init(void)

x86_init.iommu.iommu_init();

- intel_iommu_init();
-
no_iommu_init();
return 0;
}
diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index 22b02c6..bce9cd7 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -617,6 +617,10 @@ void __init detect_intel_iommu(void)
!dmar_disabled)
iommu_detected = 1;
#endif
+#ifdef CONFIG_X86
+ if (ret)
+ x86_init.iommu.iommu_init = intel_iommu_init;
+#endif
}
early_acpi_os_unmap_memory(dmar_tbl, dmar_tbl_size);
dmar_tbl = NULL;
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index a05cd1c..d814d7d 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -218,16 +218,6 @@ struct dmar_atsr_unit {
u8 include_all:1; /* include all ports */
};

-/* Intel DMAR initialization functions */
extern int intel_iommu_init(void);
-#else
-static inline int intel_iommu_init(void)
-{
-#ifdef CONFIG_INTR_REMAP
- return dmar_dev_scope_init();
-#else
- return -ENODEV;
#endif
-}
-#endif /* !CONFIG_DMAR */
#endif /* __DMAR_H__ */
--
1.5.6.5

2009-11-10 10:50:53

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -v2 6/9] bootmem: add free_bootmem_late

Add a new function for freeing bootmem after the bootmem allocator has
been released and the unreserved pages given to the page allocator.
This allows us to reserve bootmem and then release it if we later
discover it was not needed.

Signed-off-by: FUJITA Tomonori <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Acked-by: Pekka Enberg <[email protected]>
---
include/linux/bootmem.h | 1 +
mm/bootmem.c | 24 ++++++++++++++++++++++++
2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index dd97fb8..b10ec49 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -53,6 +53,7 @@ extern void free_bootmem_node(pg_data_t *pgdat,
unsigned long addr,
unsigned long size);
extern void free_bootmem(unsigned long addr, unsigned long size);
+extern void free_bootmem_late(unsigned long addr, unsigned long size);

/*
* Flags for reserve_bootmem (also if CONFIG_HAVE_ARCH_BOOTMEM_NODE,
diff --git a/mm/bootmem.c b/mm/bootmem.c
index ca92991..30f1702 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -143,6 +143,30 @@ unsigned long __init init_bootmem(unsigned long start, unsigned long pages)
return init_bootmem_core(NODE_DATA(0)->bdata, start, 0, pages);
}

+/*
+ * free_bootmem_late - free bootmem pages directly to page allocator
+ * @addr: starting address of the range
+ * @size: size of the range in bytes
+ *
+ * This is only useful when the bootmem allocator has already been torn
+ * down, but we are still initializing the system. Pages are given directly
+ * to the page allocator, no bootmem metadata is updated because it is gone.
+ */
+void __init free_bootmem_late(unsigned long addr, unsigned long size)
+{
+ unsigned long cursor, end;
+
+ kmemleak_free_part(__va(addr), size);
+
+ cursor = PFN_UP(addr);
+ end = PFN_DOWN(addr + size);
+
+ for (; cursor < end; cursor++) {
+ __free_pages_bootmem(pfn_to_page(cursor), 0);
+ totalram_pages++;
+ }
+}
+
static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
{
int aligned;
--
1.5.6.5

2009-11-10 10:50:46

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -v2 7/9] swiotlb: add swiotlb_free function

swiotlb_free function frees all allocated memory for swiotlb.

We need to initialize swiotlb before IOMMU initialization (x86 and
powerpc needs to allocate memory from bootmem allocator). If IOMMU
initialization is successful, we need to free swiotlb resource (don't
want to waste 64MB).

Signed-off-by: FUJITA Tomonori <[email protected]>
---
include/linux/swiotlb.h | 1 +
lib/swiotlb.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 73b1f1c..51b6cc2 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -88,4 +88,5 @@ swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr);
extern int
swiotlb_dma_supported(struct device *hwdev, u64 mask);

+extern void __init swiotlb_free(void);
#endif /* __LINUX_SWIOTLB_H */
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index ac25cd2..eee512b 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -97,6 +97,8 @@ static phys_addr_t *io_tlb_orig_addr;
*/
static DEFINE_SPINLOCK(io_tlb_lock);

+static int late_alloc;
+
static int __init
setup_io_tlb_npages(char *str)
{
@@ -262,6 +264,8 @@ swiotlb_late_init_with_default_size(size_t default_size)

swiotlb_print_info(bytes);

+ late_alloc = 1;
+
return 0;

cleanup4:
@@ -281,6 +285,32 @@ cleanup1:
return -ENOMEM;
}

+void __init swiotlb_free(void)
+{
+ if (!io_tlb_overflow_buffer)
+ return;
+
+ if (late_alloc) {
+ free_pages((unsigned long)io_tlb_overflow_buffer,
+ get_order(io_tlb_overflow));
+ free_pages((unsigned long)io_tlb_orig_addr,
+ get_order(io_tlb_nslabs * sizeof(phys_addr_t)));
+ free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
+ sizeof(int)));
+ free_pages((unsigned long)io_tlb_start,
+ get_order(io_tlb_nslabs << IO_TLB_SHIFT));
+ } else {
+ free_bootmem_late(__pa(io_tlb_overflow_buffer),
+ io_tlb_overflow);
+ free_bootmem_late(__pa(io_tlb_orig_addr),
+ io_tlb_nslabs * sizeof(phys_addr_t));
+ free_bootmem_late(__pa(io_tlb_list),
+ io_tlb_nslabs * sizeof(int));
+ free_bootmem_late(__pa(io_tlb_start),
+ io_tlb_nslabs << IO_TLB_SHIFT);
+ }
+}
+
static int is_swiotlb_buffer(phys_addr_t paddr)
{
return paddr >= virt_to_phys(io_tlb_start) &&
--
1.5.6.5

2009-11-10 10:50:14

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -v2 8/9] swiotlb: export swiotlb_print_info

This enables us to avoid printing swiotlb memory info when we
initialize swiotlb. After swiotlb initialization, we could find that
we dont' need swiotlb.

This patch removes the code to print swiotlb memory info in
swiotlb_init() and exports the function to do that.

Cc: [email protected]
Cc: [email protected]
Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/ia64/kernel/pci-swiotlb.c | 4 ++--
arch/powerpc/kernel/setup_32.c | 2 +-
arch/powerpc/kernel/setup_64.c | 2 +-
arch/x86/kernel/pci-swiotlb.c | 3 +--
include/linux/swiotlb.h | 4 ++--
lib/swiotlb.c | 15 ++++++++-------
6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/ia64/kernel/pci-swiotlb.c b/arch/ia64/kernel/pci-swiotlb.c
index 285aae8..53292ab 100644
--- a/arch/ia64/kernel/pci-swiotlb.c
+++ b/arch/ia64/kernel/pci-swiotlb.c
@@ -41,7 +41,7 @@ struct dma_map_ops swiotlb_dma_ops = {
void __init swiotlb_dma_init(void)
{
dma_ops = &swiotlb_dma_ops;
- swiotlb_init();
+ swiotlb_init(1);
}

void __init pci_swiotlb_init(void)
@@ -51,7 +51,7 @@ void __init pci_swiotlb_init(void)
swiotlb = 1;
printk(KERN_INFO "PCI-DMA: Re-initialize machine vector.\n");
machvec_init("dig");
- swiotlb_init();
+ swiotlb_init(1);
dma_ops = &swiotlb_dma_ops;
#else
panic("Unable to find Intel IOMMU");
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 53bcf3d..b152de3 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -345,7 +345,7 @@ void __init setup_arch(char **cmdline_p)

#ifdef CONFIG_SWIOTLB
if (ppc_swiotlb_enable)
- swiotlb_init();
+ swiotlb_init(1);
#endif

paging_init();
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 04f638d..df2c9e9 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -550,7 +550,7 @@ void __init setup_arch(char **cmdline_p)

#ifdef CONFIG_SWIOTLB
if (ppc_swiotlb_enable)
- swiotlb_init();
+ swiotlb_init(1);
#endif

paging_init();
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index aaa6b78..ea20ef7 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -52,8 +52,7 @@ void __init pci_swiotlb_init(void)
if (swiotlb_force)
swiotlb = 1;
if (swiotlb) {
- printk(KERN_INFO "PCI-DMA: Using software bounce buffering for IO (SWIOTLB)\n");
- swiotlb_init();
+ swiotlb_init(0);
dma_ops = &swiotlb_dma_ops;
}
}
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 51b6cc2..6b70068 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -20,8 +20,7 @@ struct scatterlist;
*/
#define IO_TLB_SHIFT 11

-extern void
-swiotlb_init(void);
+extern void swiotlb_init(int verbose);

extern void
*swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -89,4 +88,5 @@ extern int
swiotlb_dma_supported(struct device *hwdev, u64 mask);

extern void __init swiotlb_free(void);
+extern void swiotlb_print_info(void);
#endif /* __LINUX_SWIOTLB_H */
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index eee512b..0c12d7c 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -123,8 +123,9 @@ static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
return phys_to_dma(hwdev, virt_to_phys(address));
}

-static void swiotlb_print_info(unsigned long bytes)
+void swiotlb_print_info(void)
{
+ unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
phys_addr_t pstart, pend;

pstart = virt_to_phys(io_tlb_start);
@@ -142,7 +143,7 @@ static void swiotlb_print_info(unsigned long bytes)
* structures for the software IO TLB used to implement the DMA API.
*/
void __init
-swiotlb_init_with_default_size(size_t default_size)
+swiotlb_init_with_default_size(size_t default_size, int verbose)
{
unsigned long i, bytes;

@@ -178,14 +179,14 @@ swiotlb_init_with_default_size(size_t default_size)
io_tlb_overflow_buffer = alloc_bootmem_low(io_tlb_overflow);
if (!io_tlb_overflow_buffer)
panic("Cannot allocate SWIOTLB overflow buffer!\n");
-
- swiotlb_print_info(bytes);
+ if (verbose)
+ swiotlb_print_info();
}

void __init
-swiotlb_init(void)
+swiotlb_init(int verbose)
{
- swiotlb_init_with_default_size(64 * (1<<20)); /* default to 64MB */
+ swiotlb_init_with_default_size(64 * (1<<20), verbose); /* default to 64MB */
}

/*
@@ -262,7 +263,7 @@ swiotlb_late_init_with_default_size(size_t default_size)
if (!io_tlb_overflow_buffer)
goto cleanup4;

- swiotlb_print_info(bytes);
+ swiotlb_print_info();

late_alloc = 1;

--
1.5.6.5

2009-11-10 10:50:08

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -v2 9/9] x86: handle HW IOMMU initialization failure gracely

If HW IOMMU initialization fails (Intel VT-d often does typically due
to BIOS bugs), we fall back to nommu. It doesn't work for the majority
since nowadays we have more than 4GB memory so we must use swiotlb
instead of nommu.

The problem is that it's too late to initialize swiotlb when HW IOMMU
initialization fails. We need to allocate swiotlb memory earlier from
bootmem allocator. Chris explained the issue in detail:

http://marc.info/?l=linux-kernel&m=125657444317079&w=2

The current x86 IOMMU initialization sequence is too complicated and
handling the above issue makes it more hacky.

This patch changes x86 IOMMU initialization sequence to handle the
above issue cleanly.

The new x86 IOMMU initialization sequence are:

1. initializing swiotlb (and setting swiotlb to 1) in the case of
(max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to
swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by the
boot option, we finish here.

2. calling the detection functions of all the IOMMUs

3. the detection function sets x86_init.iommu.iommu_init to the IOMMU
initialization function (so we can avoid calling the initialization
functions of all the IOMMUs needlessly).

4. if the IOMMU initialization function doesn't need to swiotlb then
sets swiotlb to zero (e.g. the initialization is sucessful).

5. if we find that swiotlb is set to zero, we free swiotlb resource.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/x86/include/asm/iommu.h | 1 -
arch/x86/kernel/amd_iommu.c | 2 +-
arch/x86/kernel/amd_iommu_init.c | 2 +-
arch/x86/kernel/aperture_64.c | 2 +-
arch/x86/kernel/pci-calgary_64.c | 10 +---------
arch/x86/kernel/pci-dma.c | 21 +++++++++++++--------
arch/x86/kernel/pci-gart_64.c | 1 +
arch/x86/kernel/pci-nommu.c | 9 ---------
arch/x86/kernel/pci-swiotlb.c | 7 +++----
drivers/pci/dmar.c | 3 +--
drivers/pci/intel-iommu.c | 4 ++--
lib/swiotlb.c | 4 +++-
12 files changed, 27 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index 878b307..df42a71 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -2,7 +2,6 @@
#define _ASM_X86_IOMMU_H

static inline void iommu_shutdown_noop(void) {}
-extern void no_iommu_init(void);
extern struct dma_map_ops nommu_dma_ops;
extern int force_iommu, no_iommu;
extern int iommu_detected;
diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
index 0285521..66237fd 100644
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -2110,8 +2110,8 @@ int __init amd_iommu_init_dma_ops(void)
prealloc_protection_domains();

iommu_detected = 1;
- force_iommu = 1;
bad_dma_address = 0;
+ swiotlb = 0;
#ifdef CONFIG_GART_IOMMU
gart_iommu_aperture_disabled = 1;
gart_iommu_aperture = 0;
diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c
index c41aabd..0d4581e 100644
--- a/arch/x86/kernel/amd_iommu_init.c
+++ b/arch/x86/kernel/amd_iommu_init.c
@@ -1330,7 +1330,7 @@ static int __init early_amd_iommu_detect(struct acpi_table_header *table)

void __init amd_iommu_detect(void)
{
- if (swiotlb || no_iommu || (iommu_detected && !gart_iommu_aperture))
+ if (no_iommu || (iommu_detected && !gart_iommu_aperture))
return;

if (acpi_table_parse("IVRS", early_amd_iommu_detect) == 0) {
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 03933cf..e0dfb68 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -458,7 +458,7 @@ out:

if (aper_alloc) {
/* Got the aperture from the AGP bridge */
- } else if (swiotlb && !valid_agp) {
+ } else if (!valid_agp) {
/* Do nothing */
} else if ((!no_iommu && max_pfn > MAX_DMA32_PFN) ||
force_iommu ||
diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index 47bd419..833f491 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -1360,7 +1360,7 @@ void __init detect_calgary(void)
* if the user specified iommu=off or iommu=soft or we found
* another HW IOMMU already, bail out.
*/
- if (swiotlb || no_iommu || iommu_detected)
+ if (no_iommu || iommu_detected)
return;

if (!use_calgary)
@@ -1445,10 +1445,6 @@ void __init detect_calgary(void)
printk(KERN_INFO "PCI-DMA: Calgary TCE table spec is %d\n",
specified_table_size);

- /* swiotlb for devices that aren't behind the Calgary. */
- if (max_pfn > MAX_DMA32_PFN)
- swiotlb = 1;
-
x86_init.iommu.iommu_init = calgary_iommu_init;
}
return;
@@ -1476,11 +1472,7 @@ int __init calgary_iommu_init(void)
return ret;
}

- force_iommu = 1;
bad_dma_address = 0x0;
- /* dma_ops is set to swiotlb or nommu */
- if (!dma_ops)
- dma_ops = &nommu_dma_ops;

return 0;
}
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index bed05e2..a234e63 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -124,24 +124,24 @@ static void __init dma32_free_bootmem(void)

void __init pci_iommu_alloc(void)
{
+ /* swiotlb is forced by the boot option */
+ int use_swiotlb = swiotlb;
#ifdef CONFIG_X86_64
/* free the range so iommu could get some range less than 4G */
dma32_free_bootmem();
#endif
+ pci_swiotlb_init();
+ if (use_swiotlb)
+ return;

- /*
- * The order of these functions is important for
- * fall-back/fail-over reasons
- */
gart_iommu_hole_init();

detect_calgary();

detect_intel_iommu();

+ /* needs to be called after gart_iommu_hole_init */
amd_iommu_detect();
-
- pci_swiotlb_init();
}

void *dma_generic_alloc_coherent(struct device *dev, size_t size,
@@ -291,10 +291,15 @@ static int __init pci_iommu_init(void)
#ifdef CONFIG_PCI
dma_debug_add_bus(&pci_bus_type);
#endif
-
x86_init.iommu.iommu_init();

- no_iommu_init();
+ if (swiotlb) {
+ printk(KERN_INFO "PCI-DMA: "
+ "Using software bounce buffering for IO (SWIOTLB)\n");
+ swiotlb_print_info();
+ } else
+ swiotlb_free();
+
return 0;
}
/* Must execute after PCI subsystem */
diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index 0410bd3..919182e 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -833,6 +833,7 @@ int __init gart_iommu_init(void)
flush_gart();
dma_ops = &gart_dma_ops;
x86_platform.iommu_shutdown = gart_iommu_shutdown;
+ swiotlb = 0;

return 0;
}
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index a3933d4..875e382 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -103,12 +103,3 @@ struct dma_map_ops nommu_dma_ops = {
.sync_sg_for_device = nommu_sync_sg_for_device,
.is_phys = 1,
};
-
-void __init no_iommu_init(void)
-{
- if (dma_ops)
- return;
-
- force_iommu = 0; /* no HW IOMMU */
- dma_ops = &nommu_dma_ops;
-}
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index ea20ef7..17ce422 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -46,13 +46,12 @@ void __init pci_swiotlb_init(void)
{
/* don't initialize swiotlb if iommu=off (no_iommu=1) */
#ifdef CONFIG_X86_64
- if ((!iommu_detected && !no_iommu && max_pfn > MAX_DMA32_PFN))
+ if (!no_iommu && max_pfn > MAX_DMA32_PFN)
swiotlb = 1;
#endif
- if (swiotlb_force)
- swiotlb = 1;
if (swiotlb) {
swiotlb_init(0);
dma_ops = &swiotlb_dma_ops;
- }
+ } else
+ dma_ops = &nommu_dma_ops;
}
diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index bce9cd7..4373996 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -613,8 +613,7 @@ void __init detect_intel_iommu(void)
"x2apic and Intr-remapping.\n");
#endif
#ifdef CONFIG_DMAR
- if (ret && !no_iommu && !iommu_detected && !swiotlb &&
- !dmar_disabled)
+ if (ret && !no_iommu && !iommu_detected && !dmar_disabled)
iommu_detected = 1;
#endif
#ifdef CONFIG_X86
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index b1e97e6..c7a8bcf 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -3231,7 +3231,7 @@ int __init intel_iommu_init(void)
* Check the need for DMA-remapping initialization now.
* Above initialization will also be used by Interrupt-remapping.
*/
- if (no_iommu || swiotlb || dmar_disabled)
+ if (no_iommu || dmar_disabled)
return -ENODEV;

iommu_init_mempool();
@@ -3252,7 +3252,7 @@ int __init intel_iommu_init(void)
"PCI-DMA: Intel(R) Virtualization Technology for Directed I/O\n");

init_timer(&unmap_timer);
- force_iommu = 1;
+ swiotlb = 0;
dma_ops = &intel_dma_ops;

init_iommu_sysfs();
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 0c12d7c..e6755a0 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -109,8 +109,10 @@ setup_io_tlb_npages(char *str)
}
if (*str == ',')
++str;
- if (!strcmp(str, "force"))
+ if (!strcmp(str, "force")) {
swiotlb_force = 1;
+ swiotlb = 1;
+ }
return 1;
}
__setup("swiotlb=", setup_io_tlb_npages);
--
1.5.6.5

2009-11-10 11:12:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -v2 5/9] intel-iommu: convert detect_intel_iommu to use iommu_init hook


* FUJITA Tomonori <[email protected]> wrote:

> This changes detect_intel_iommu() to set intel_iommu_init() to
> iommu_init hook if detect_intel_iommu() finds the IOMMU.
>
> Signed-off-by: FUJITA Tomonori <[email protected]>
> ---
> arch/x86/kernel/pci-dma.c | 2 --
> drivers/pci/dmar.c | 4 ++++
> include/linux/dmar.h | 10 ----------
> 3 files changed, 4 insertions(+), 12 deletions(-)

FYI, the !CONFIG_DMAR case needed the fix below. (Wrt. the -ENODEV: it
doesnt matter right now as we dont check the result of ->iommu_init(),
but i kept it consistent with device initialization principles.)

Ingo

diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index d814d7d..df0cfb3 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -219,5 +219,8 @@ struct dmar_atsr_unit {
};

extern int intel_iommu_init(void);
-#endif
+#else /* !CONFIG_DMAR: */
+static inline int intel_iommu_init(void) { return -ENODEV; }
+#endif /* CONFIG_DMAR */
+
#endif /* __DMAR_H__ */

2009-11-10 11:19:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 0/9] x86: handle HW IOMMU initialization failure gracefully


* FUJITA Tomonori <[email protected]> wrote:

> This patchset is against tip/master.
>
> The first version is:
>
> http://marc.info/?l=linux-kernel&m=125671300920411&w=2
>
> The changes since v1 are:
>
> - replaced Chris' bootmem patches with the 6/9 patch to implement
> free_bootmem_late in a simple way (thanks to Pekka).
>
> - fixed the bug to break 'iommu=soft' boot opiton (found by Joerg).
>
> - moved iommu_init_noop() to x86_init.c
>
> - added Muli's Acked-by to Calgary patch.
>
>
> ==
> arch/ia64/kernel/pci-swiotlb.c | 4 +-
> arch/powerpc/kernel/setup_32.c | 2 +-
> arch/powerpc/kernel/setup_64.c | 2 +-
> arch/x86/include/asm/amd_iommu.h | 2 -
> arch/x86/include/asm/calgary.h | 2 -
> arch/x86/include/asm/gart.h | 5 +---
> arch/x86/include/asm/iommu.h | 1 -
> arch/x86/include/asm/x86_init.h | 9 +++++++
> arch/x86/kernel/amd_iommu.c | 2 +-
> arch/x86/kernel/amd_iommu_init.c | 19 +++-----------
> arch/x86/kernel/aperture_64.c | 4 ++-
> arch/x86/kernel/pci-calgary_64.c | 19 ++++-----------
> arch/x86/kernel/pci-dma.c | 27 ++++++++++-----------
> arch/x86/kernel/pci-gart_64.c | 16 ++++-------
> arch/x86/kernel/pci-nommu.c | 9 -------
> arch/x86/kernel/pci-swiotlb.c | 10 +++----
> arch/x86/kernel/x86_init.c | 5 ++++
> drivers/pci/dmar.c | 7 ++++-
> drivers/pci/intel-iommu.c | 4 +-
> include/linux/bootmem.h | 1 +
> include/linux/dmar.h | 10 -------
> include/linux/swiotlb.h | 5 ++-
> lib/swiotlb.c | 49 +++++++++++++++++++++++++++++++------
> mm/bootmem.c | 24 ++++++++++++++++++
> 24 files changed, 131 insertions(+), 107 deletions(-)

Nice changes! I've applied them to tip:core/iommu (with the small build
fix i mentioned in the previous mail) and will push them out later
today.

Thanks,

Ingo

2009-11-10 11:27:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -v2 9/9] x86: handle HW IOMMU initialization failure gracely


this patch needed the fixlet below. (it could be cleaned up later
perhaps to not need the #ifdef - but wanted to keep it simple for now.)

Ingo

------------->
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index c7a8bcf..43d755a 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -3252,7 +3252,9 @@ int __init intel_iommu_init(void)
"PCI-DMA: Intel(R) Virtualization Technology for Directed I/O\n");

init_timer(&unmap_timer);
+#ifdef CONFIG_SWIOTLB
swiotlb = 0;
+#endif
dma_ops = &intel_dma_ops;

init_iommu_sysfs();

2009-11-10 11:28:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -v2 7/9] swiotlb: add swiotlb_free function


this patch needed the fix below, for the !CONFIG_SWIOTLB case.

Ingo

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 6b70068..eb9bdb4 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -87,6 +87,11 @@ swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr);
extern int
swiotlb_dma_supported(struct device *hwdev, u64 mask);

+#ifdef CONFIG_SWIOTLB
extern void __init swiotlb_free(void);
+#else
+static inline void swiotlb_free(void) { }
+#endif
+
extern void swiotlb_print_info(void);
#endif /* __LINUX_SWIOTLB_H */

2009-11-10 11:56:24

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 0/9] x86: handle HW IOMMU initialization failure gracefully


hm, one of the swiotlb patches caused this bootup crash:

[ 8.645490] initcall ali_init+0x0/0x40 returned 0 after 61 usecs
[ 8.651501] calling amd_init+0x0/0x16 @ 1
[ 8.655620] pata_amd 0000:00:06.0: version 0.4.1
[ 8.660286] BUG: unable to handle kernel NULL pointer dereference at 00000034
[ 8.663572] IP: [<c100617b>] dma_supported+0x3b/0xa4
[ 8.663572] *pde = 00000000
[ 8.663572] Oops: 0000 [#1] SMP
[ 8.663572] last sysfs file:
[ 8.663572]
[ 8.663572] Pid: 1, comm: swapper Not tainted (2.6.32-rc6-tip-02065-gadf90fa-dirty #344) System Product Name
[ 8.663572] EIP: 0060:[<c100617b>] EFLAGS: 00010246 CPU: 0
[ 8.663572] EIP is at dma_supported+0x3b/0xa4
[ 8.663572] EAX: f6ce4058 EBX: 00000000 ECX: 00000000 EDX: ffffffff
[ 8.663572] ESI: 00000000 EDI: f6ce4000 EBP: f6cbbe1c ESP: f6cbbe10
[ 8.663572] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 8.663572] Process swapper (pid: 1, ti=f6cba000 task=f6cc0000 task.ti=f6cba000)
[ 8.663572] Stack:
[ 8.663572] c158154e f608c000 ffffffff f6cbbe30 c13e1023 f609ab8c f6ce4058 00000000
[ 8.663572] <0> f6cbbe60 c1581aa7 00000002 f609ab8c 00000003 00000000 f6ce4000 00000010
[ 8.663572] <0> f6ce4000 00000000 f609ab8c f6ce4000 f6cbbe7c c1581c20 f6cbbe8c f6ce4058
[ 8.663572] Call Trace:
[ 8.663572] [<c158154e>] ? ata_port_desc+0x4e/0x53
[ 8.663572] [<c13e1023>] ? pci_set_dma_mask+0x23/0x3b
[ 8.663572] [<c1581aa7>] ? ata_pci_bmdma_init+0x32/0x133
[ 8.663572] [<c1581c20>] ? ata_pci_sff_prepare_host+0x78/0xaf
[ 8.663572] [<c1581cd1>] ? ata_pci_sff_init_one+0x7a/0xe7
[ 8.663572] [<c1595897>] ? amd_init_one+0xec/0xf5
[ 8.663572] [<c13e2a24>] ? local_pci_probe+0x13/0x15
[ 8.663572] [<c13e31f0>] ? pci_device_probe+0x48/0x6b
[ 8.663572] [<c1519665>] ? really_probe+0x8e/0x165
[ 8.663572] [<c1519788>] ? driver_probe_device+0x4c/0x82
[ 8.663572] [<c1519806>] ? __driver_attach+0x48/0x64
[ 8.663572] [<c1518ef0>] ? bus_for_each_dev+0x42/0x6c
[ 8.663572] [<c1519491>] ? driver_attach+0x19/0x1b
[ 8.663572] [<c15197be>] ? __driver_attach+0x0/0x64
[ 8.663572] [<c15188b0>] ? bus_add_driver+0xfe/0x23a
[ 8.663572] [<c13ca19f>] ? kset_find_obj+0x23/0x4f
[ 8.663572] [<c1519a6b>] ? driver_register+0x7e/0xe5
[ 8.663572] [<c21e3c31>] ? amd_init+0x0/0x16
[ 8.663572] [<c13e33c7>] ? __pci_register_driver+0x3d/0x9a
[ 8.663572] [<c21e3c31>] ? amd_init+0x0/0x16
[ 8.663572] [<c21e3c45>] ? amd_init+0x14/0x16
[ 8.663572] [<c1001143>] ? do_one_initcall+0x51/0x136
[ 8.663572] [<c21b7324>] ? do_basic_setup+0x47/0x57
[ 8.663572] [<c21b73ae>] ? kernel_init+0x7a/0xbb
[ 8.663572] [<c21b7334>] ? kernel_init+0x0/0xbb
[ 8.663572] [<c1002e97>] ? kernel_thread_helper+0x7/0x10
[ 8.663572] Code: 2b c2 83 f9 00 76 24 83 3d f8 38 1b c2 00 7e 1b 8b 58 08 e8 a4 fe 50 00 53 50 68 2e 67 ee c1 e8 16 f9 ad 00 31 db 83 c4 0c eb 62 <8b> 5b 34 85 db 74 06 ff d3 89 c3 eb 55 83 f9 00 77 0a 31 db 81
[ 8.663572] EIP: [<c100617b>] dma_supported+0x3b/0xa4 SS:ESP 0068:f6cbbe10
[ 8.663572] CR2: 0000000000000034

config attached.

Ingo


Attachments:
(No filename) (3.06 kB)
config (65.44 kB)
Download all attachments

2009-11-10 12:05:52

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH -v2 6/9] bootmem: add free_bootmem_late

On Tue, Nov 10, 2009 at 07:46:17PM +0900, FUJITA Tomonori wrote:
> Add a new function for freeing bootmem after the bootmem allocator has
> been released and the unreserved pages given to the page allocator.
> This allows us to reserve bootmem and then release it if we later
> discover it was not needed.
>
> Signed-off-by: FUJITA Tomonori <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Acked-by: Pekka Enberg <[email protected]>
> ---
> include/linux/bootmem.h | 1 +
> mm/bootmem.c | 24 ++++++++++++++++++++++++
> 2 files changed, 25 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
> index dd97fb8..b10ec49 100644
> --- a/include/linux/bootmem.h
> +++ b/include/linux/bootmem.h
> @@ -53,6 +53,7 @@ extern void free_bootmem_node(pg_data_t *pgdat,
> unsigned long addr,
> unsigned long size);
> extern void free_bootmem(unsigned long addr, unsigned long size);
> +extern void free_bootmem_late(unsigned long addr, unsigned long size);
>
> /*
> * Flags for reserve_bootmem (also if CONFIG_HAVE_ARCH_BOOTMEM_NODE,
> diff --git a/mm/bootmem.c b/mm/bootmem.c
> index ca92991..30f1702 100644
> --- a/mm/bootmem.c
> +++ b/mm/bootmem.c
> @@ -143,6 +143,30 @@ unsigned long __init init_bootmem(unsigned long start, unsigned long pages)
> return init_bootmem_core(NODE_DATA(0)->bdata, start, 0, pages);
> }
>
> +/*
> + * free_bootmem_late - free bootmem pages directly to page allocator
> + * @addr: starting address of the range
> + * @size: size of the range in bytes
> + *
> + * This is only useful when the bootmem allocator has already been torn
> + * down, but we are still initializing the system. Pages are given directly
> + * to the page allocator, no bootmem metadata is updated because it is gone.
> + */
> +void __init free_bootmem_late(unsigned long addr, unsigned long size)
> +{
> + unsigned long cursor, end;
> +
> + kmemleak_free_part(__va(addr), size);
> +
> + cursor = PFN_UP(addr);
> + end = PFN_DOWN(addr + size);
> +
> + for (; cursor < end; cursor++) {
> + __free_pages_bootmem(pfn_to_page(cursor), 0);
> + totalram_pages++;
> + }
> +}

I find it a bit weird that free_all_bootmem() callers have to take
care of totalram_pages while this function does the accounting on its
own.

And I think the function is more logically placed right below
free_bootmem(), like you did in the header.

These are no show-stoppers for me, though, and otherwise the patch
looks simple and straight-forward. Feel free to add

Reviewed-by: Johannes Weiner <[email protected]>

Thanks!

2009-11-10 12:35:51

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH v2 0/9] x86: handle HW IOMMU initialization failure gracefully

On Tue, 10 Nov 2009 12:55:55 +0100
Ingo Molnar <[email protected]> wrote:

>
> hm, one of the swiotlb patches caused this bootup crash:

Sorry about that. I totally forgot about x86_32.

the following patch fixes this? Sorry that I'm away from the workplace
and can't test it.

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index a234e63..63eebee 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -129,6 +129,8 @@ void __init pci_iommu_alloc(void)
#ifdef CONFIG_X86_64
/* free the range so iommu could get some range less than 4G */
dma32_free_bootmem();
+#else
+ dma_ops = &nommu_dma_ops;
#endif
pci_swiotlb_init();
if (use_swiotlb)

2009-11-10 13:23:20

by FUJITA Tomonori

[permalink] [raw]
Subject: [tip:core/iommu] x86: Add iommu_init to x86_init_ops

Commit-ID: d07c1be0693e0902d743160b8b638585b808f8ac
Gitweb: http://git.kernel.org/tip/d07c1be0693e0902d743160b8b638585b808f8ac
Author: FUJITA Tomonori <[email protected]>
AuthorDate: Tue, 10 Nov 2009 19:46:12 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 10 Nov 2009 12:31:07 +0100

x86: Add iommu_init to x86_init_ops

We call the detections functions of all the IOMMUs then all
their initialization functions. The latter is pointless since we
don't detect multiple different IOMMUs. What we need to do is
calling the initialization function of the detected IOMMU.

This adds iommu_init hook to x86_init_ops so if an IOMMU
detection function can set its initialization function to the
hook.

Signed-off-by: FUJITA Tomonori <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/x86_init.h | 9 +++++++++
arch/x86/kernel/pci-dma.c | 2 ++
arch/x86/kernel/x86_init.c | 5 +++++
3 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 66008ed..d8e7145 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -91,6 +91,14 @@ struct x86_init_timers {
};

/**
+ * struct x86_init_iommu - platform specific iommu setup
+ * @iommu_init: platform specific iommu setup
+ */
+struct x86_init_iommu {
+ int (*iommu_init)(void);
+};
+
+/**
* struct x86_init_ops - functions for platform specific setup
*
*/
@@ -101,6 +109,7 @@ struct x86_init_ops {
struct x86_init_oem oem;
struct x86_init_paging paging;
struct x86_init_timers timers;
+ struct x86_init_iommu iommu;
};

/**
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 839d49a..a13478d 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -292,6 +292,8 @@ static int __init pci_iommu_init(void)
dma_debug_add_bus(&pci_bus_type);
#endif

+ x86_init.iommu.iommu_init();
+
calgary_iommu_init();

intel_iommu_init();
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index bc9b230..c46984d 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -19,6 +19,7 @@
void __cpuinit x86_init_noop(void) { }
void __init x86_init_uint_noop(unsigned int unused) { }
void __init x86_init_pgd_noop(pgd_t *unused) { }
+int __init iommu_init_noop(void) { return 0; }

/*
* The platform setup functions are preset with the default functions
@@ -63,6 +64,10 @@ struct x86_init_ops x86_init __initdata = {
.tsc_pre_init = x86_init_noop,
.timer_init = hpet_time_init,
},
+
+ .iommu = {
+ .iommu_init = iommu_init_noop,
+ },
};

struct x86_cpuinit_ops x86_cpuinit __cpuinitdata = {

2009-11-10 13:23:27

by FUJITA Tomonori

[permalink] [raw]
Subject: [tip:core/iommu] x86: Calgary: Convert detect_calgary() to use iommu_init hook

Commit-ID: d7b9f7be216b04ff9d108f856bc03d96e7b3439c
Gitweb: http://git.kernel.org/tip/d7b9f7be216b04ff9d108f856bc03d96e7b3439c
Author: FUJITA Tomonori <[email protected]>
AuthorDate: Tue, 10 Nov 2009 19:46:13 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 10 Nov 2009 12:31:15 +0100

x86: Calgary: Convert detect_calgary() to use iommu_init hook

This changes detect_calgary() to set init_calgary() to
iommu_init hook if detect_calgary() finds the Calgary IOMMU.

We can kill the code to check if we found the IOMMU in
init_calgary() since detect_calgary() sets init_calgary() only
when it found the IOMMU.

Signed-off-by: FUJITA Tomonori <[email protected]>
Acked-by: Muli Ben-Yehuda <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/calgary.h | 2 --
arch/x86/kernel/pci-calgary_64.c | 11 +++++------
arch/x86/kernel/pci-dma.c | 2 --
3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/calgary.h b/arch/x86/include/asm/calgary.h
index b03bedb..0918654 100644
--- a/arch/x86/include/asm/calgary.h
+++ b/arch/x86/include/asm/calgary.h
@@ -62,10 +62,8 @@ struct cal_chipset_ops {
extern int use_calgary;

#ifdef CONFIG_CALGARY_IOMMU
-extern int calgary_iommu_init(void);
extern void detect_calgary(void);
#else
-static inline int calgary_iommu_init(void) { return 1; }
static inline void detect_calgary(void) { return; }
#endif

diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index 971a3be..47bd419 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -46,6 +46,7 @@
#include <asm/dma.h>
#include <asm/rio.h>
#include <asm/bios_ebda.h>
+#include <asm/x86_init.h>

#ifdef CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT
int use_calgary __read_mostly = 1;
@@ -1344,6 +1345,8 @@ static void __init get_tce_space_from_tar(void)
return;
}

+int __init calgary_iommu_init(void);
+
void __init detect_calgary(void)
{
int bus;
@@ -1445,6 +1448,8 @@ void __init detect_calgary(void)
/* swiotlb for devices that aren't behind the Calgary. */
if (max_pfn > MAX_DMA32_PFN)
swiotlb = 1;
+
+ x86_init.iommu.iommu_init = calgary_iommu_init;
}
return;

@@ -1461,12 +1466,6 @@ int __init calgary_iommu_init(void)
{
int ret;

- if (no_iommu || (swiotlb && !calgary_detected))
- return -ENODEV;
-
- if (!calgary_detected)
- return -ENODEV;
-
/* ok, we're trying to use Calgary - let's roll */
printk(KERN_INFO "PCI-DMA: Using Calgary IOMMU\n");

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index a13478d..0224da8 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -294,8 +294,6 @@ static int __init pci_iommu_init(void)

x86_init.iommu.iommu_init();

- calgary_iommu_init();
-
intel_iommu_init();

amd_iommu_init();

2009-11-10 13:23:33

by FUJITA Tomonori

[permalink] [raw]
Subject: [tip:core/iommu] x86: GART: Convert gart_iommu_hole_init() to use iommu_init hook

Commit-ID: de957628ce7c84764ff41331111036b3ae5bad0f
Gitweb: http://git.kernel.org/tip/de957628ce7c84764ff41331111036b3ae5bad0f
Author: FUJITA Tomonori <[email protected]>
AuthorDate: Tue, 10 Nov 2009 19:46:14 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 10 Nov 2009 12:31:23 +0100

x86: GART: Convert gart_iommu_hole_init() to use iommu_init hook

This changes gart_iommu_hole_init() to set gart_iommu_init() to
iommu_init hook if gart_iommu_hole_init() finds the GART IOMMU.

We can kill the code to check if we found the IOMMU in
gart_iommu_init() since gart_iommu_hole_init() sets
gart_iommu_init() only when it found the IOMMU.

Signed-off-by: FUJITA Tomonori <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/gart.h | 5 +----
arch/x86/kernel/aperture_64.c | 2 ++
arch/x86/kernel/pci-dma.c | 2 --
arch/x86/kernel/pci-gart_64.c | 15 +++++----------
4 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/gart.h b/arch/x86/include/asm/gart.h
index 4fdd5b3..4ac5b0f 100644
--- a/arch/x86/include/asm/gart.h
+++ b/arch/x86/include/asm/gart.h
@@ -35,7 +35,7 @@ extern int gart_iommu_aperture_allowed;
extern int gart_iommu_aperture_disabled;

extern void early_gart_iommu_check(void);
-extern void gart_iommu_init(void);
+extern int gart_iommu_init(void);
extern void __init gart_parse_options(char *);
extern void gart_iommu_hole_init(void);

@@ -47,9 +47,6 @@ extern void gart_iommu_hole_init(void);
static inline void early_gart_iommu_check(void)
{
}
-static inline void gart_iommu_init(void)
-{
-}
static inline void gart_parse_options(char *options)
{
}
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 128111d..03933cf 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -28,6 +28,7 @@
#include <asm/pci-direct.h>
#include <asm/dma.h>
#include <asm/k8.h>
+#include <asm/x86_init.h>

int gart_iommu_aperture;
int gart_iommu_aperture_disabled __initdata;
@@ -400,6 +401,7 @@ void __init gart_iommu_hole_init(void)

iommu_detected = 1;
gart_iommu_aperture = 1;
+ x86_init.iommu.iommu_init = gart_iommu_init;

aper_order = (read_pci_config(bus, slot, 3, AMD64_GARTAPERTURECTL) >> 1) & 7;
aper_size = (32 * 1024 * 1024) << aper_order;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 0224da8..ecde854 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -298,8 +298,6 @@ static int __init pci_iommu_init(void)

amd_iommu_init();

- gart_iommu_init();
-
no_iommu_init();
return 0;
}
diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index eb46ab3..0410bd3 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -709,7 +709,7 @@ static void gart_iommu_shutdown(void)
}
}

-void __init gart_iommu_init(void)
+int __init gart_iommu_init(void)
{
struct agp_kern_info info;
unsigned long iommu_start;
@@ -719,7 +719,7 @@ void __init gart_iommu_init(void)
long i;

if (cache_k8_northbridges() < 0 || num_k8_northbridges == 0)
- return;
+ return 0;

#ifndef CONFIG_AGP_AMD64
no_agp = 1;
@@ -731,13 +731,6 @@ void __init gart_iommu_init(void)
(agp_copy_info(agp_bridge, &info) < 0);
#endif

- if (swiotlb)
- return;
-
- /* Did we detect a different HW IOMMU? */
- if (iommu_detected && !gart_iommu_aperture)
- return;
-
if (no_iommu ||
(!force_iommu && max_pfn <= MAX_DMA32_PFN) ||
!gart_iommu_aperture ||
@@ -747,7 +740,7 @@ void __init gart_iommu_init(void)
"but GART IOMMU not available.\n");
printk(KERN_WARNING "falling back to iommu=soft.\n");
}
- return;
+ return 0;
}

/* need to map that range */
@@ -840,6 +833,8 @@ void __init gart_iommu_init(void)
flush_gart();
dma_ops = &gart_dma_ops;
x86_platform.iommu_shutdown = gart_iommu_shutdown;
+
+ return 0;
}

void __init gart_parse_options(char *p)

2009-11-10 13:24:03

by FUJITA Tomonori

[permalink] [raw]
Subject: [tip:core/iommu] x86: amd_iommu: Convert amd_iommu_detect() to use iommu_init hook

Commit-ID: ea1b0d3945c7374849235b6ecaea1191ee1d9d50
Gitweb: http://git.kernel.org/tip/ea1b0d3945c7374849235b6ecaea1191ee1d9d50
Author: FUJITA Tomonori <[email protected]>
AuthorDate: Tue, 10 Nov 2009 19:46:15 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 10 Nov 2009 12:31:30 +0100

x86: amd_iommu: Convert amd_iommu_detect() to use iommu_init hook

This changes amd_iommu_detect() to set amd_iommu_init to
iommu_init hook if amd_iommu_detect() finds the AMD IOMMU.

We can kill the code to check if we found the IOMMU in
amd_iommu_init() since amd_iommu_detect() sets amd_iommu_init()
only when it found the IOMMU.

Signed-off-by: FUJITA Tomonori <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/amd_iommu.h | 2 --
arch/x86/kernel/amd_iommu_init.c | 17 +++--------------
arch/x86/kernel/pci-dma.c | 2 --
3 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/amd_iommu.h b/arch/x86/include/asm/amd_iommu.h
index 3604669..b8ef2ee 100644
--- a/arch/x86/include/asm/amd_iommu.h
+++ b/arch/x86/include/asm/amd_iommu.h
@@ -23,7 +23,6 @@
#include <linux/irqreturn.h>

#ifdef CONFIG_AMD_IOMMU
-extern int amd_iommu_init(void);
extern int amd_iommu_init_dma_ops(void);
extern int amd_iommu_init_passthrough(void);
extern void amd_iommu_detect(void);
@@ -32,7 +31,6 @@ extern void amd_iommu_flush_all_domains(void);
extern void amd_iommu_flush_all_devices(void);
extern void amd_iommu_apply_erratum_63(u16 devid);
#else
-static inline int amd_iommu_init(void) { return -ENODEV; }
static inline void amd_iommu_detect(void) { }
#endif

diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c
index 6acd43e..c41aabd 100644
--- a/arch/x86/kernel/amd_iommu_init.c
+++ b/arch/x86/kernel/amd_iommu_init.c
@@ -29,6 +29,7 @@
#include <asm/amd_iommu.h>
#include <asm/iommu.h>
#include <asm/gart.h>
+#include <asm/x86_init.h>

/*
* definitions for the ACPI scanning code
@@ -1176,19 +1177,10 @@ static struct sys_device device_amd_iommu = {
* functions. Finally it prints some information about AMD IOMMUs and
* the driver state and enables the hardware.
*/
-int __init amd_iommu_init(void)
+static int __init amd_iommu_init(void)
{
int i, ret = 0;

-
- if (no_iommu) {
- printk(KERN_INFO "AMD-Vi disabled by kernel command line\n");
- return 0;
- }
-
- if (!amd_iommu_detected)
- return -ENODEV;
-
/*
* First parse ACPI tables to find the largest Bus/Dev/Func
* we need to handle. Upon this information the shared data
@@ -1344,10 +1336,7 @@ void __init amd_iommu_detect(void)
if (acpi_table_parse("IVRS", early_amd_iommu_detect) == 0) {
iommu_detected = 1;
amd_iommu_detected = 1;
-#ifdef CONFIG_GART_IOMMU
- gart_iommu_aperture_disabled = 1;
- gart_iommu_aperture = 0;
-#endif
+ x86_init.iommu.iommu_init = amd_iommu_init;
}
}

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index ecde854..5ca44a9 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -296,8 +296,6 @@ static int __init pci_iommu_init(void)

intel_iommu_init();

- amd_iommu_init();
-
no_iommu_init();
return 0;
}

2009-11-10 13:24:12

by FUJITA Tomonori

[permalink] [raw]
Subject: [tip:core/iommu] x86: intel-iommu: Convert detect_intel_iommu to use iommu_init hook

Commit-ID: 9d5ce73a64be2be8112147a3e0b551ad9cd1247b
Gitweb: http://git.kernel.org/tip/9d5ce73a64be2be8112147a3e0b551ad9cd1247b
Author: FUJITA Tomonori <[email protected]>
AuthorDate: Tue, 10 Nov 2009 19:46:16 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 10 Nov 2009 12:31:36 +0100

x86: intel-iommu: Convert detect_intel_iommu to use iommu_init hook

This changes detect_intel_iommu() to set intel_iommu_init() to
iommu_init hook if detect_intel_iommu() finds the IOMMU.

Signed-off-by: FUJITA Tomonori <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
[ -v2: build fix for the !CONFIG_DMAR case ]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/pci-dma.c | 2 --
drivers/pci/dmar.c | 4 ++++
include/linux/dmar.h | 15 ++++-----------
3 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 5ca44a9..bed05e2 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -294,8 +294,6 @@ static int __init pci_iommu_init(void)

x86_init.iommu.iommu_init();

- intel_iommu_init();
-
no_iommu_init();
return 0;
}
diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index 22b02c6..bce9cd7 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -617,6 +617,10 @@ void __init detect_intel_iommu(void)
!dmar_disabled)
iommu_detected = 1;
#endif
+#ifdef CONFIG_X86
+ if (ret)
+ x86_init.iommu.iommu_init = intel_iommu_init;
+#endif
}
early_acpi_os_unmap_memory(dmar_tbl, dmar_tbl_size);
dmar_tbl = NULL;
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 4a2b162..5de4c9e 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -208,16 +208,9 @@ struct dmar_atsr_unit {
u8 include_all:1; /* include all ports */
};

-/* Intel DMAR initialization functions */
extern int intel_iommu_init(void);
-#else
-static inline int intel_iommu_init(void)
-{
-#ifdef CONFIG_INTR_REMAP
- return dmar_dev_scope_init();
-#else
- return -ENODEV;
-#endif
-}
-#endif /* !CONFIG_DMAR */
+#else /* !CONFIG_DMAR: */
+static inline int intel_iommu_init(void) { return -ENODEV; }
+#endif /* CONFIG_DMAR */
+
#endif /* __DMAR_H__ */

2009-11-10 13:24:26

by FUJITA Tomonori

[permalink] [raw]
Subject: [tip:core/iommu] bootmem: Add free_bootmem_late()

Commit-ID: 9f993ac3f708b661207ed7de521f245586217a68
Gitweb: http://git.kernel.org/tip/9f993ac3f708b661207ed7de521f245586217a68
Author: FUJITA Tomonori <[email protected]>
AuthorDate: Tue, 10 Nov 2009 19:46:17 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 10 Nov 2009 12:31:43 +0100

bootmem: Add free_bootmem_late()

Add a new function for freeing bootmem after the bootmem
allocator has been released and the unreserved pages given to
the page allocator.

This allows us to reserve bootmem and then release it if we
later discover it was not needed.

( This new API will be used by the swiotlb code to recover
a significant amount of RAM (64MB). )

Signed-off-by: FUJITA Tomonori <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/bootmem.h | 1 +
mm/bootmem.c | 24 ++++++++++++++++++++++++
2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index dd97fb8..b10ec49 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -53,6 +53,7 @@ extern void free_bootmem_node(pg_data_t *pgdat,
unsigned long addr,
unsigned long size);
extern void free_bootmem(unsigned long addr, unsigned long size);
+extern void free_bootmem_late(unsigned long addr, unsigned long size);

/*
* Flags for reserve_bootmem (also if CONFIG_HAVE_ARCH_BOOTMEM_NODE,
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 555d5d2..d1dc23c 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -143,6 +143,30 @@ unsigned long __init init_bootmem(unsigned long start, unsigned long pages)
return init_bootmem_core(NODE_DATA(0)->bdata, start, 0, pages);
}

+/*
+ * free_bootmem_late - free bootmem pages directly to page allocator
+ * @addr: starting address of the range
+ * @size: size of the range in bytes
+ *
+ * This is only useful when the bootmem allocator has already been torn
+ * down, but we are still initializing the system. Pages are given directly
+ * to the page allocator, no bootmem metadata is updated because it is gone.
+ */
+void __init free_bootmem_late(unsigned long addr, unsigned long size)
+{
+ unsigned long cursor, end;
+
+ kmemleak_free_part(__va(addr), size);
+
+ cursor = PFN_UP(addr);
+ end = PFN_DOWN(addr + size);
+
+ for (; cursor < end; cursor++) {
+ __free_pages_bootmem(pfn_to_page(cursor), 0);
+ totalram_pages++;
+ }
+}
+
static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
{
int aligned;

2009-11-10 13:24:55

by FUJITA Tomonori

[permalink] [raw]
Subject: [tip:core/iommu] swiotlb: Add swiotlb_free() function

Commit-ID: 5740afdb68abadc473fd5392df733558a58c1254
Gitweb: http://git.kernel.org/tip/5740afdb68abadc473fd5392df733558a58c1254
Author: FUJITA Tomonori <[email protected]>
AuthorDate: Tue, 10 Nov 2009 19:46:18 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 10 Nov 2009 12:31:52 +0100

swiotlb: Add swiotlb_free() function

swiotlb_free() function frees all allocated memory for swiotlb.

We need to initialize swiotlb before IOMMU initialization (x86
and powerpc needs to allocate memory from bootmem allocator). If
IOMMU initialization is successful, we need to free swiotlb
resource (don't want to waste 64MB).

Signed-off-by: FUJITA Tomonori <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
[ -v2: build fix for the !CONFIG_SWIOTLB case ]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/swiotlb.h | 6 ++++++
lib/swiotlb.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 73b1f1c..59bafa6 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -88,4 +88,10 @@ swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr);
extern int
swiotlb_dma_supported(struct device *hwdev, u64 mask);

+#ifdef CONFIG_SWIOTLB
+extern void __init swiotlb_free(void);
+#else
+static inline void swiotlb_free(void) { }
+#endif
+
#endif /* __LINUX_SWIOTLB_H */
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index ac25cd2..eee512b 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -97,6 +97,8 @@ static phys_addr_t *io_tlb_orig_addr;
*/
static DEFINE_SPINLOCK(io_tlb_lock);

+static int late_alloc;
+
static int __init
setup_io_tlb_npages(char *str)
{
@@ -262,6 +264,8 @@ swiotlb_late_init_with_default_size(size_t default_size)

swiotlb_print_info(bytes);

+ late_alloc = 1;
+
return 0;

cleanup4:
@@ -281,6 +285,32 @@ cleanup1:
return -ENOMEM;
}

+void __init swiotlb_free(void)
+{
+ if (!io_tlb_overflow_buffer)
+ return;
+
+ if (late_alloc) {
+ free_pages((unsigned long)io_tlb_overflow_buffer,
+ get_order(io_tlb_overflow));
+ free_pages((unsigned long)io_tlb_orig_addr,
+ get_order(io_tlb_nslabs * sizeof(phys_addr_t)));
+ free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
+ sizeof(int)));
+ free_pages((unsigned long)io_tlb_start,
+ get_order(io_tlb_nslabs << IO_TLB_SHIFT));
+ } else {
+ free_bootmem_late(__pa(io_tlb_overflow_buffer),
+ io_tlb_overflow);
+ free_bootmem_late(__pa(io_tlb_orig_addr),
+ io_tlb_nslabs * sizeof(phys_addr_t));
+ free_bootmem_late(__pa(io_tlb_list),
+ io_tlb_nslabs * sizeof(int));
+ free_bootmem_late(__pa(io_tlb_start),
+ io_tlb_nslabs << IO_TLB_SHIFT);
+ }
+}
+
static int is_swiotlb_buffer(phys_addr_t paddr)
{
return paddr >= virt_to_phys(io_tlb_start) &&

2009-11-10 13:25:04

by FUJITA Tomonori

[permalink] [raw]
Subject: [tip:core/iommu] swiotlb: Defer swiotlb init printing, export swiotlb_print_info()

Commit-ID: ad32e8cb86e7894aac51c8963eaa9f36bb8a4e14
Gitweb: http://git.kernel.org/tip/ad32e8cb86e7894aac51c8963eaa9f36bb8a4e14
Author: FUJITA Tomonori <[email protected]>
AuthorDate: Tue, 10 Nov 2009 19:46:19 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 10 Nov 2009 12:32:00 +0100

swiotlb: Defer swiotlb init printing, export swiotlb_print_info()

This enables us to avoid printing swiotlb memory info when we
initialize swiotlb. After swiotlb initialization, we could find
that we don't need swiotlb.

This patch removes the code to print swiotlb memory info in
swiotlb_init() and exports the function to do that.

Signed-off-by: FUJITA Tomonori <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
[ -v2: merge up conflict ]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/ia64/kernel/pci-swiotlb.c | 4 ++--
arch/powerpc/kernel/setup_32.c | 2 +-
arch/powerpc/kernel/setup_64.c | 2 +-
arch/x86/kernel/pci-swiotlb.c | 3 +--
include/linux/swiotlb.h | 4 ++--
lib/swiotlb.c | 15 ++++++++-------
6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/ia64/kernel/pci-swiotlb.c b/arch/ia64/kernel/pci-swiotlb.c
index 285aae8..53292ab 100644
--- a/arch/ia64/kernel/pci-swiotlb.c
+++ b/arch/ia64/kernel/pci-swiotlb.c
@@ -41,7 +41,7 @@ struct dma_map_ops swiotlb_dma_ops = {
void __init swiotlb_dma_init(void)
{
dma_ops = &swiotlb_dma_ops;
- swiotlb_init();
+ swiotlb_init(1);
}

void __init pci_swiotlb_init(void)
@@ -51,7 +51,7 @@ void __init pci_swiotlb_init(void)
swiotlb = 1;
printk(KERN_INFO "PCI-DMA: Re-initialize machine vector.\n");
machvec_init("dig");
- swiotlb_init();
+ swiotlb_init(1);
dma_ops = &swiotlb_dma_ops;
#else
panic("Unable to find Intel IOMMU");
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 53bcf3d..b152de3 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -345,7 +345,7 @@ void __init setup_arch(char **cmdline_p)

#ifdef CONFIG_SWIOTLB
if (ppc_swiotlb_enable)
- swiotlb_init();
+ swiotlb_init(1);
#endif

paging_init();
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 04f638d..df2c9e9 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -550,7 +550,7 @@ void __init setup_arch(char **cmdline_p)

#ifdef CONFIG_SWIOTLB
if (ppc_swiotlb_enable)
- swiotlb_init();
+ swiotlb_init(1);
#endif

paging_init();
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index aaa6b78..ea20ef7 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -52,8 +52,7 @@ void __init pci_swiotlb_init(void)
if (swiotlb_force)
swiotlb = 1;
if (swiotlb) {
- printk(KERN_INFO "PCI-DMA: Using software bounce buffering for IO (SWIOTLB)\n");
- swiotlb_init();
+ swiotlb_init(0);
dma_ops = &swiotlb_dma_ops;
}
}
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 59bafa6..eb9bdb4 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -20,8 +20,7 @@ struct scatterlist;
*/
#define IO_TLB_SHIFT 11

-extern void
-swiotlb_init(void);
+extern void swiotlb_init(int verbose);

extern void
*swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -94,4 +93,5 @@ extern void __init swiotlb_free(void);
static inline void swiotlb_free(void) { }
#endif

+extern void swiotlb_print_info(void);
#endif /* __LINUX_SWIOTLB_H */
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index eee512b..0c12d7c 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -123,8 +123,9 @@ static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
return phys_to_dma(hwdev, virt_to_phys(address));
}

-static void swiotlb_print_info(unsigned long bytes)
+void swiotlb_print_info(void)
{
+ unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
phys_addr_t pstart, pend;

pstart = virt_to_phys(io_tlb_start);
@@ -142,7 +143,7 @@ static void swiotlb_print_info(unsigned long bytes)
* structures for the software IO TLB used to implement the DMA API.
*/
void __init
-swiotlb_init_with_default_size(size_t default_size)
+swiotlb_init_with_default_size(size_t default_size, int verbose)
{
unsigned long i, bytes;

@@ -178,14 +179,14 @@ swiotlb_init_with_default_size(size_t default_size)
io_tlb_overflow_buffer = alloc_bootmem_low(io_tlb_overflow);
if (!io_tlb_overflow_buffer)
panic("Cannot allocate SWIOTLB overflow buffer!\n");
-
- swiotlb_print_info(bytes);
+ if (verbose)
+ swiotlb_print_info();
}

void __init
-swiotlb_init(void)
+swiotlb_init(int verbose)
{
- swiotlb_init_with_default_size(64 * (1<<20)); /* default to 64MB */
+ swiotlb_init_with_default_size(64 * (1<<20), verbose); /* default to 64MB */
}

/*
@@ -262,7 +263,7 @@ swiotlb_late_init_with_default_size(size_t default_size)
if (!io_tlb_overflow_buffer)
goto cleanup4;

- swiotlb_print_info(bytes);
+ swiotlb_print_info();

late_alloc = 1;

2009-11-10 13:25:23

by FUJITA Tomonori

[permalink] [raw]
Subject: [tip:core/iommu] x86: Handle HW IOMMU initialization failure gracefully

Commit-ID: 75f1cdf1dda92cae037ec848ae63690d91913eac
Gitweb: http://git.kernel.org/tip/75f1cdf1dda92cae037ec848ae63690d91913eac
Author: FUJITA Tomonori <[email protected]>
AuthorDate: Tue, 10 Nov 2009 19:46:20 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 10 Nov 2009 12:32:07 +0100

x86: Handle HW IOMMU initialization failure gracefully

If HW IOMMU initialization fails (Intel VT-d often does this,
typically due to BIOS bugs), we fall back to nommu. It doesn't
work for the majority since nowadays we have more than 4GB
memory so we must use swiotlb instead of nommu.

The problem is that it's too late to initialize swiotlb when HW
IOMMU initialization fails. We need to allocate swiotlb memory
earlier from bootmem allocator. Chris explained the issue in
detail:

http://marc.info/?l=linux-kernel&m=125657444317079&w=2

The current x86 IOMMU initialization sequence is too complicated
and handling the above issue makes it more hacky.

This patch changes x86 IOMMU initialization sequence to handle
the above issue cleanly.

The new x86 IOMMU initialization sequence are:

1. we initialize the swiotlb (and setting swiotlb to 1) in the case
of (max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to
swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by
the boot option, we finish here.

2. we call the detection functions of all the IOMMUs

3. the detection function sets x86_init.iommu.iommu_init to the
IOMMU initialization function (so we can avoid calling the
initialization functions of all the IOMMUs needlessly).

4. if the IOMMU initialization function doesn't need to swiotlb
then sets swiotlb to zero (e.g. the initialization is
sucessful).

5. if we find that swiotlb is set to zero, we free swiotlb
resource.

Signed-off-by: FUJITA Tomonori <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/iommu.h | 1 -
arch/x86/kernel/amd_iommu.c | 2 +-
arch/x86/kernel/amd_iommu_init.c | 2 +-
arch/x86/kernel/aperture_64.c | 2 +-
arch/x86/kernel/pci-calgary_64.c | 10 +---------
arch/x86/kernel/pci-dma.c | 21 +++++++++++++--------
arch/x86/kernel/pci-gart_64.c | 1 +
arch/x86/kernel/pci-nommu.c | 9 ---------
arch/x86/kernel/pci-swiotlb.c | 7 +++----
drivers/pci/dmar.c | 3 +--
drivers/pci/intel-iommu.c | 6 ++++--
lib/swiotlb.c | 4 +++-
12 files changed, 29 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index 878b307..df42a71 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -2,7 +2,6 @@
#define _ASM_X86_IOMMU_H

static inline void iommu_shutdown_noop(void) {}
-extern void no_iommu_init(void);
extern struct dma_map_ops nommu_dma_ops;
extern int force_iommu, no_iommu;
extern int iommu_detected;
diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
index 0285521..66237fd 100644
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -2110,8 +2110,8 @@ int __init amd_iommu_init_dma_ops(void)
prealloc_protection_domains();

iommu_detected = 1;
- force_iommu = 1;
bad_dma_address = 0;
+ swiotlb = 0;
#ifdef CONFIG_GART_IOMMU
gart_iommu_aperture_disabled = 1;
gart_iommu_aperture = 0;
diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c
index c41aabd..0d4581e 100644
--- a/arch/x86/kernel/amd_iommu_init.c
+++ b/arch/x86/kernel/amd_iommu_init.c
@@ -1330,7 +1330,7 @@ static int __init early_amd_iommu_detect(struct acpi_table_header *table)

void __init amd_iommu_detect(void)
{
- if (swiotlb || no_iommu || (iommu_detected && !gart_iommu_aperture))
+ if (no_iommu || (iommu_detected && !gart_iommu_aperture))
return;

if (acpi_table_parse("IVRS", early_amd_iommu_detect) == 0) {
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 03933cf..e0dfb68 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -458,7 +458,7 @@ out:

if (aper_alloc) {
/* Got the aperture from the AGP bridge */
- } else if (swiotlb && !valid_agp) {
+ } else if (!valid_agp) {
/* Do nothing */
} else if ((!no_iommu && max_pfn > MAX_DMA32_PFN) ||
force_iommu ||
diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index 47bd419..833f491 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -1360,7 +1360,7 @@ void __init detect_calgary(void)
* if the user specified iommu=off or iommu=soft or we found
* another HW IOMMU already, bail out.
*/
- if (swiotlb || no_iommu || iommu_detected)
+ if (no_iommu || iommu_detected)
return;

if (!use_calgary)
@@ -1445,10 +1445,6 @@ void __init detect_calgary(void)
printk(KERN_INFO "PCI-DMA: Calgary TCE table spec is %d\n",
specified_table_size);

- /* swiotlb for devices that aren't behind the Calgary. */
- if (max_pfn > MAX_DMA32_PFN)
- swiotlb = 1;
-
x86_init.iommu.iommu_init = calgary_iommu_init;
}
return;
@@ -1476,11 +1472,7 @@ int __init calgary_iommu_init(void)
return ret;
}

- force_iommu = 1;
bad_dma_address = 0x0;
- /* dma_ops is set to swiotlb or nommu */
- if (!dma_ops)
- dma_ops = &nommu_dma_ops;

return 0;
}
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index bed05e2..a234e63 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -124,24 +124,24 @@ static void __init dma32_free_bootmem(void)

void __init pci_iommu_alloc(void)
{
+ /* swiotlb is forced by the boot option */
+ int use_swiotlb = swiotlb;
#ifdef CONFIG_X86_64
/* free the range so iommu could get some range less than 4G */
dma32_free_bootmem();
#endif
+ pci_swiotlb_init();
+ if (use_swiotlb)
+ return;

- /*
- * The order of these functions is important for
- * fall-back/fail-over reasons
- */
gart_iommu_hole_init();

detect_calgary();

detect_intel_iommu();

+ /* needs to be called after gart_iommu_hole_init */
amd_iommu_detect();
-
- pci_swiotlb_init();
}

void *dma_generic_alloc_coherent(struct device *dev, size_t size,
@@ -291,10 +291,15 @@ static int __init pci_iommu_init(void)
#ifdef CONFIG_PCI
dma_debug_add_bus(&pci_bus_type);
#endif
-
x86_init.iommu.iommu_init();

- no_iommu_init();
+ if (swiotlb) {
+ printk(KERN_INFO "PCI-DMA: "
+ "Using software bounce buffering for IO (SWIOTLB)\n");
+ swiotlb_print_info();
+ } else
+ swiotlb_free();
+
return 0;
}
/* Must execute after PCI subsystem */
diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index 0410bd3..919182e 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -833,6 +833,7 @@ int __init gart_iommu_init(void)
flush_gart();
dma_ops = &gart_dma_ops;
x86_platform.iommu_shutdown = gart_iommu_shutdown;
+ swiotlb = 0;

return 0;
}
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index a3933d4..875e382 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -103,12 +103,3 @@ struct dma_map_ops nommu_dma_ops = {
.sync_sg_for_device = nommu_sync_sg_for_device,
.is_phys = 1,
};
-
-void __init no_iommu_init(void)
-{
- if (dma_ops)
- return;
-
- force_iommu = 0; /* no HW IOMMU */
- dma_ops = &nommu_dma_ops;
-}
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index ea20ef7..17ce422 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -46,13 +46,12 @@ void __init pci_swiotlb_init(void)
{
/* don't initialize swiotlb if iommu=off (no_iommu=1) */
#ifdef CONFIG_X86_64
- if ((!iommu_detected && !no_iommu && max_pfn > MAX_DMA32_PFN))
+ if (!no_iommu && max_pfn > MAX_DMA32_PFN)
swiotlb = 1;
#endif
- if (swiotlb_force)
- swiotlb = 1;
if (swiotlb) {
swiotlb_init(0);
dma_ops = &swiotlb_dma_ops;
- }
+ } else
+ dma_ops = &nommu_dma_ops;
}
diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index bce9cd7..4373996 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -613,8 +613,7 @@ void __init detect_intel_iommu(void)
"x2apic and Intr-remapping.\n");
#endif
#ifdef CONFIG_DMAR
- if (ret && !no_iommu && !iommu_detected && !swiotlb &&
- !dmar_disabled)
+ if (ret && !no_iommu && !iommu_detected && !dmar_disabled)
iommu_detected = 1;
#endif
#ifdef CONFIG_X86
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index b1e97e6..43d755a 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -3231,7 +3231,7 @@ int __init intel_iommu_init(void)
* Check the need for DMA-remapping initialization now.
* Above initialization will also be used by Interrupt-remapping.
*/
- if (no_iommu || swiotlb || dmar_disabled)
+ if (no_iommu || dmar_disabled)
return -ENODEV;

iommu_init_mempool();
@@ -3252,7 +3252,9 @@ int __init intel_iommu_init(void)
"PCI-DMA: Intel(R) Virtualization Technology for Directed I/O\n");

init_timer(&unmap_timer);
- force_iommu = 1;
+#ifdef CONFIG_SWIOTLB
+ swiotlb = 0;
+#endif
dma_ops = &intel_dma_ops;

init_iommu_sysfs();
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 0c12d7c..e6755a0 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -109,8 +109,10 @@ setup_io_tlb_npages(char *str)
}
if (*str == ',')
++str;
- if (!strcmp(str, "force"))
+ if (!strcmp(str, "force")) {
swiotlb_force = 1;
+ swiotlb = 1;
+ }
return 1;
}
__setup("swiotlb=", setup_io_tlb_npages);

2009-11-10 13:43:14

by Ingo Molnar

[permalink] [raw]
Subject: [tip:core/iommu] x86: Add iommu_init to x86_init_ops, fix build

Commit-ID: b4941a9a606f0131559cc040b64e8437ac7b32c5
Gitweb: http://git.kernel.org/tip/b4941a9a606f0131559cc040b64e8437ac7b32c5
Author: Ingo Molnar <[email protected]>
AuthorDate: Tue, 10 Nov 2009 14:37:58 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 10 Nov 2009 14:37:58 +0100

x86: Add iommu_init to x86_init_ops, fix build

Most of the time x86_init.h is included in pci-dma.c - but not always,
leading to this rare build failure:

arch/x86/kernel/pci-dma.c:296: error: 'x86_init' undeclared (first use in this function)

So include asm/x86_init.h explicitly.

Cc: FUJITA Tomonori <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/pci-dma.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 63eebee..f79870e 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -11,6 +11,7 @@
#include <asm/gart.h>
#include <asm/calgary.h>
#include <asm/amd_iommu.h>
+#include <asm/x86_init.h>

static int forbid_dac __read_mostly;

2009-11-11 23:57:59

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -v2 6/9] bootmem: add free_bootmem_late

On Tue, 10 Nov 2009 19:46:17 +0900
FUJITA Tomonori <[email protected]> wrote:

> Add a new function for freeing bootmem after the bootmem allocator has
> been released and the unreserved pages given to the page allocator.
> This allows us to reserve bootmem and then release it if we later
> discover it was not needed.

OK, but the new function has no callers and curious people will be
wondering what happened to patches 1, 2, 3, 4, 5, 7, 8 and 9!

What's the plan here?

2009-11-12 07:47:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -v2 6/9] bootmem: add free_bootmem_late


* Andrew Morton <[email protected]> wrote:

> On Tue, 10 Nov 2009 19:46:17 +0900
> FUJITA Tomonori <[email protected]> wrote:
>
> > Add a new function for freeing bootmem after the bootmem allocator has
> > been released and the unreserved pages given to the page allocator.
> > This allows us to reserve bootmem and then release it if we later
> > discover it was not needed.
>
> OK, but the new function has no callers and curious people will be
> wondering what happened to patches 1, 2, 3, 4, 5, 7, 8 and 9!

They were posted to lkml alongside patch 6 - see this thread on lkml:

[PATCH v2 0/9] x86: handle HW IOMMU initialization failure gracefully

Ingo

2009-11-14 12:52:20

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH -v2 6/9] bootmem: add free_bootmem_late

Thanks for reviewing,

On Tue, 10 Nov 2009 13:00:53 +0100
Johannes Weiner <[email protected]> wrote:

> I find it a bit weird that free_all_bootmem() callers have to take
> care of totalram_pages while this function does the accounting on its
> own.

Ah, it might be consistent to make the callers take care of
totalram_pages like free_all_bootmem.


> And I think the function is more logically placed right below
> free_bootmem(), like you did in the header.

However, if we do the above, we have:

unsigned long free_bootmem_late(unsigned long addr, unsigned long
size)

Which looks inconsistent a bit with free_bootmem()?

2009-11-22 03:17:32

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH v2 0/9] x86: handle HW IOMMU initialization failure gracefully

On Tue, Nov 10, 2009 at 3:19 AM, Ingo Molnar <[email protected]> wrote:
>
> * FUJITA Tomonori <[email protected]> wrote:
>
>> This patchset is against tip/master.
>>
>> The first version is:
>>
>> http://marc.info/?l=linux-kernel&m=125671300920411&w=2
>>
>> The changes since v1 are:
>>
>> - replaced Chris' bootmem patches with the 6/9 patch to implement
>> ? free_bootmem_late in a simple way (thanks to Pekka).
>>
>> - fixed the bug to break 'iommu=soft' boot opiton (found by Joerg).
>>
>> - moved iommu_init_noop() to x86_init.c
>>
>> - added Muli's Acked-by to Calgary patch.
>>
>>
>> ==
>> ?arch/ia64/kernel/pci-swiotlb.c ? | ? ?4 +-
>> ?arch/powerpc/kernel/setup_32.c ? | ? ?2 +-
>> ?arch/powerpc/kernel/setup_64.c ? | ? ?2 +-
>> ?arch/x86/include/asm/amd_iommu.h | ? ?2 -
>> ?arch/x86/include/asm/calgary.h ? | ? ?2 -
>> ?arch/x86/include/asm/gart.h ? ? ?| ? ?5 +---
>> ?arch/x86/include/asm/iommu.h ? ? | ? ?1 -
>> ?arch/x86/include/asm/x86_init.h ?| ? ?9 +++++++
>> ?arch/x86/kernel/amd_iommu.c ? ? ?| ? ?2 +-
>> ?arch/x86/kernel/amd_iommu_init.c | ? 19 +++-----------
>> ?arch/x86/kernel/aperture_64.c ? ?| ? ?4 ++-
>> ?arch/x86/kernel/pci-calgary_64.c | ? 19 ++++-----------
>> ?arch/x86/kernel/pci-dma.c ? ? ? ?| ? 27 ++++++++++-----------
>> ?arch/x86/kernel/pci-gart_64.c ? ?| ? 16 ++++-------
>> ?arch/x86/kernel/pci-nommu.c ? ? ?| ? ?9 -------
>> ?arch/x86/kernel/pci-swiotlb.c ? ?| ? 10 +++----
>> ?arch/x86/kernel/x86_init.c ? ? ? | ? ?5 ++++
>> ?drivers/pci/dmar.c ? ? ? ? ? ? ? | ? ?7 ++++-
>> ?drivers/pci/intel-iommu.c ? ? ? ?| ? ?4 +-
>> ?include/linux/bootmem.h ? ? ? ? ?| ? ?1 +
>> ?include/linux/dmar.h ? ? ? ? ? ? | ? 10 -------
>> ?include/linux/swiotlb.h ? ? ? ? ?| ? ?5 ++-
>> ?lib/swiotlb.c ? ? ? ? ? ? ? ? ? ?| ? 49 +++++++++++++++++++++++++++++++------
>> ?mm/bootmem.c ? ? ? ? ? ? ? ? ? ? | ? 24 ++++++++++++++++++
>> ?24 files changed, 131 insertions(+), 107 deletions(-)
>
> Nice changes! I've applied them to tip:core/iommu (with the small build
> fix i mentioned in the previous mail) and will push them out later
> today.
>

this patch set will break some AMD system

amd 64 systems that
1. do not have AGP
2. do not have IOMMU
3. mem > 4g
4. BIOS do not allocate correct gart in NB.

will leave them to use SWIOTLB forcely.

following code that allocate some RAM as aperture will be called
anymore for them
} else if (!valid_agp) {
/* Do nothing */
} else if ((!no_iommu && max_pfn > MAX_DMA32_PFN) ||
force_iommu ||
valid_agp ||
fallback_aper_force) {
printk(KERN_INFO
"Your BIOS doesn't leave a aperture memory hole\n");
printk(KERN_INFO
"Please enable the IOMMU option in the BIOS setup\n");
printk(KERN_INFO
"This costs you %d MB of RAM\n",
32 << fallback_aper_order);

aper_order = fallback_aper_order;
aper_alloc = allocate_aperture();
if (!aper_alloc) {
/*
* Could disable AGP and IOMMU here, but it's
* probably not worth it. But the later users
* cannot deal with bad apertures and turning
* on the aperture over memory causes very
* strange problems, so it's better to panic
* early.
*/
panic("Not enough memory for aperture");
}

also in

void __init pci_iommu_alloc(void)
{
#ifdef CONFIG_X86_64
/* free the range so iommu could get some range less than 4G */
dma32_free_bootmem();
#endif
if (pci_swiotlb_init())
return;

gart_iommu_hole_init();

detect_calgary();

detect_intel_iommu();

/* needs to be called after gart_iommu_hole_init */
amd_iommu_detect();
}

pci_swiotlb_init is stealing the preallocate range that is
gart_iommu_hole workaround.

so please move back swiotlb_init to the end.

there is no point to allocate swiotlb at first and then free it if
some iommu is there.

YH

2009-11-22 04:26:38

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86: fix gart iommu using for amd 64 bit system


after
|commit 75f1cdf1dda92cae037ec848ae63690d91913eac
|Author: FUJITA Tomonori <[email protected]>
|Date: Tue Nov 10 19:46:20 2009 +0900
|
| x86: Handle HW IOMMU initialization failure gracefully
|
| If HW IOMMU initialization fails (Intel VT-d often does this,
| typically due to BIOS bugs), we fall back to nommu. It doesn't
| work for the majority since nowadays we have more than 4GB
| memory so we must use swiotlb instead of nommu.
...

amd 64 systems that
1. do not have AGP
2. do not have IOMMU
3. mem > 4g
4. BIOS do not allocate correct gart in NB.
will leave them to use SWIOTLB forcely.

also in pci_iommu_alloc()
pci_swiotlb_init is stealing the preallocate range that is for
gart_iommu_hole workaround.

so restore the sequence...

will get back
[ 0.000000] Your BIOS doesn't leave a aperture memory hole
[ 0.000000] Please enable the IOMMU option in the BIOS setup
[ 0.000000] This costs you 64 MB of RAM
[ 0.000000] Mapping aperture over 65536 KB of RAM @ 20000000

Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/kernel/aperture_64.c | 2 +-
arch/x86/kernel/pci-dma.c | 5 +++--
2 files changed, 4 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/x86/kernel/aperture_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/aperture_64.c
+++ linux-2.6/arch/x86/kernel/aperture_64.c
@@ -458,7 +458,7 @@ out:

if (aper_alloc) {
/* Got the aperture from the AGP bridge */
- } else if (!valid_agp) {
+ } else if (swiotlb && !valid_agp) {
/* Do nothing */
} else if ((!no_iommu && max_pfn > MAX_DMA32_PFN) ||
force_iommu ||
Index: linux-2.6/arch/x86/kernel/pci-dma.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-dma.c
+++ linux-2.6/arch/x86/kernel/pci-dma.c
@@ -124,11 +124,12 @@ void __init pci_iommu_alloc(void)
/* free the range so iommu could get some range less than 4G */
dma32_free_bootmem();
#endif
+ if (!swiotlb_force)
+ gart_iommu_hole_init();
+
if (pci_swiotlb_init())
return;

- gart_iommu_hole_init();
-
detect_calgary();

detect_intel_iommu();

2009-11-22 05:20:55

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86: fix gart iommu using for amd 64 bit system -v2


after
|commit 75f1cdf1dda92cae037ec848ae63690d91913eac
|Author: FUJITA Tomonori <[email protected]>
|Date: Tue Nov 10 19:46:20 2009 +0900
|
| x86: Handle HW IOMMU initialization failure gracefully
|
| If HW IOMMU initialization fails (Intel VT-d often does this,
| typically due to BIOS bugs), we fall back to nommu. It doesn't
| work for the majority since nowadays we have more than 4GB
| memory so we must use swiotlb instead of nommu.
...

amd 64 systems that
1. do not have AGP
2. do not have IOMMU
3. mem > 4g
4. BIOS do not allocate correct gart in NB.
will leave them to use SWIOTLB forcely.

also in pci_iommu_alloc()
pci_swiotlb_init is stealing the preallocate range that is for
gart_iommu_hole workaround.

so restore the sequence...

will get back
[ 0.000000] Your BIOS doesn't leave a aperture memory hole
[ 0.000000] Please enable the IOMMU option in the BIOS setup
[ 0.000000] This costs you 64 MB of RAM
[ 0.000000] Mapping aperture over 65536 KB of RAM @ 20000000
...
[ 12.702822] calling pci_iommu_init+0x0/0x31 @ 1
[ 12.708728] PCI-DMA: Disabling AGP.
[ 12.712812] PCI-DMA: aperture base @ 20000000 size 65536 KB
[ 12.718384] PCI-DMA: using GART IOMMU.
[ 12.722139] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
[ 12.736944] initcall pci_iommu_init+0x0/0x31 returned 0 after 28802 usecs

-v2: call shutdown when no agp is there

Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/kernel/aperture_64.c | 2 +-
arch/x86/kernel/pci-dma.c | 5 +++--
arch/x86/kernel/pci-gart_64.c | 3 ++-
3 files changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6/arch/x86/kernel/aperture_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/aperture_64.c
+++ linux-2.6/arch/x86/kernel/aperture_64.c
@@ -458,7 +458,7 @@ out:

if (aper_alloc) {
/* Got the aperture from the AGP bridge */
- } else if (!valid_agp) {
+ } else if (swiotlb && !valid_agp) {
/* Do nothing */
} else if ((!no_iommu && max_pfn > MAX_DMA32_PFN) ||
force_iommu ||
Index: linux-2.6/arch/x86/kernel/pci-dma.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-dma.c
+++ linux-2.6/arch/x86/kernel/pci-dma.c
@@ -124,11 +124,12 @@ void __init pci_iommu_alloc(void)
/* free the range so iommu could get some range less than 4G */
dma32_free_bootmem();
#endif
+ if (!swiotlb_force)
+ gart_iommu_hole_init();
+
if (pci_swiotlb_init())
return;

- gart_iommu_hole_init();
-
detect_calgary();

detect_intel_iommu();
Index: linux-2.6/arch/x86/kernel/pci-gart_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-gart_64.c
+++ linux-2.6/arch/x86/kernel/pci-gart_64.c
@@ -710,7 +710,8 @@ static void gart_iommu_shutdown(void)
struct pci_dev *dev;
int i;

- if (no_agp)
+ /* don't shutdown it if there is AGP installed */
+ if (!no_agp)
return;

for (i = 0; i < num_k8_northbridges; i++) {

2009-11-24 08:47:36

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

On Sat, 21 Nov 2009 20:24:52 -0800
Yinghai Lu <[email protected]> wrote:

> amd 64 systems that
> 1. do not have AGP
> 2. do not have IOMMU
> 3. mem > 4g
> 4. BIOS do not allocate correct gart in NB.
> will leave them to use SWIOTLB forcely.

Sorry, I forgot about this GART workaround.


> Index: linux-2.6/arch/x86/kernel/pci-dma.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/pci-dma.c
> +++ linux-2.6/arch/x86/kernel/pci-dma.c
> @@ -124,11 +124,12 @@ void __init pci_iommu_alloc(void)
> /* free the range so iommu could get some range less than 4G */
> dma32_free_bootmem();
> #endif
> + if (!swiotlb_force)
> + gart_iommu_hole_init();
> +
> if (pci_swiotlb_init())
> return;
>
> - gart_iommu_hole_init();
> -
> detect_calgary();
>
> detect_intel_iommu();

I prefer to detect all the IOMMU drivers in a consistent way;
detecting only GART before swioltb doesn't look nice.

As we did before, we could detect all the IOMMU driver before
swiotlb. However, I think that it's better to simply change
pci_swiotlb_init() not to steal the preallocate GART workaround memory.

btw, initializing swiotlb before IOMMU detection is useful to GART
too? If GART can't allocate the workaround memory, then the kernel
panic now. We can use swiotlb instead in that case?

=
From: FUJITA Tomonori <[email protected]>
Subject: [PATCH] x86: stop swiotlb stealing the GART workaround memory

swiotlb wrongly uses the GART workaround memory (for bad BIOS) that
dma32_reserv_bootmem allocates. We need to initialize swiotlb before
dma32_free_bootmem().

Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/x86/kernel/pci-dma.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index afcc58b..26fe2cd 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -120,11 +120,15 @@ static void __init dma32_free_bootmem(void)

void __init pci_iommu_alloc(void)
{
+ int use_swiotlb;
+
+ use_swiotlb = pci_swiotlb_init();
+
#ifdef CONFIG_X86_64
/* free the range so iommu could get some range less than 4G */
dma32_free_bootmem();
#endif
- if (pci_swiotlb_init())
+ if (use_swiotlb)
return;

gart_iommu_hole_init();
--
1.5.6.5

2009-11-24 09:20:56

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

FUJITA Tomonori wrote:
> On Sat, 21 Nov 2009 20:24:52 -0800
> Yinghai Lu <[email protected]> wrote:
>
>> amd 64 systems that
>> 1. do not have AGP
>> 2. do not have IOMMU
>> 3. mem > 4g
>> 4. BIOS do not allocate correct gart in NB.
>> will leave them to use SWIOTLB forcely.
>
> Sorry, I forgot about this GART workaround.
>
>
>> Index: linux-2.6/arch/x86/kernel/pci-dma.c
>> ===================================================================
>> --- linux-2.6.orig/arch/x86/kernel/pci-dma.c
>> +++ linux-2.6/arch/x86/kernel/pci-dma.c
>> @@ -124,11 +124,12 @@ void __init pci_iommu_alloc(void)
>> /* free the range so iommu could get some range less than 4G */
>> dma32_free_bootmem();
>> #endif
>> + if (!swiotlb_force)
>> + gart_iommu_hole_init();
>> +
>> if (pci_swiotlb_init())
>> return;
>>
>> - gart_iommu_hole_init();
>> -
>> detect_calgary();
>>
>> detect_intel_iommu();
>
> I prefer to detect all the IOMMU drivers in a consistent way;
> detecting only GART before swioltb doesn't look nice.
>
> As we did before, we could detect all the IOMMU driver before
> swiotlb. However, I think that it's better to simply change
> pci_swiotlb_init() not to steal the preallocate GART workaround memory.
>
> btw, initializing swiotlb before IOMMU detection is useful to GART
> too? If GART can't allocate the workaround memory, then the kernel
> panic now. We can use swiotlb instead in that case?
>
> =
> From: FUJITA Tomonori <[email protected]>
> Subject: [PATCH] x86: stop swiotlb stealing the GART workaround memory
>
> swiotlb wrongly uses the GART workaround memory (for bad BIOS) that
> dma32_reserv_bootmem allocates. We need to initialize swiotlb before
> dma32_free_bootmem().
>
> Signed-off-by: FUJITA Tomonori <[email protected]>
> ---
> arch/x86/kernel/pci-dma.c | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index afcc58b..26fe2cd 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -120,11 +120,15 @@ static void __init dma32_free_bootmem(void)
>
> void __init pci_iommu_alloc(void)
> {
> + int use_swiotlb;
> +
> + use_swiotlb = pci_swiotlb_init();
> +
> #ifdef CONFIG_X86_64
> /* free the range so iommu could get some range less than 4G */
> dma32_free_bootmem();
> #endif
> - if (pci_swiotlb_init())
> + if (use_swiotlb)
> return;
>
> gart_iommu_hole_init();

pci_swiotlb_init need be called after dma32_free_bootmem
otherwise it could fail for system with lots of memory and numa=off

so dma32_free_bootmem will work for gart_iommu_hole_init and pci_swiotlb_init..
make sure they can get RAM < 4g.

also in gart_iommu_hole_init we need check swiotlb and !valid_agp...
- } else if (!valid_agp) {
+ } else if (swiotlb && !valid_agp) {

otherwise the the workaround will be skipped...

YH

2009-11-24 09:36:21

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

On Tue, 24 Nov 2009 01:19:21 -0800
Yinghai Lu <[email protected]> wrote:

> FUJITA Tomonori wrote:
> > On Sat, 21 Nov 2009 20:24:52 -0800
> > Yinghai Lu <[email protected]> wrote:
> >
> >> amd 64 systems that
> >> 1. do not have AGP
> >> 2. do not have IOMMU
> >> 3. mem > 4g
> >> 4. BIOS do not allocate correct gart in NB.
> >> will leave them to use SWIOTLB forcely.
> >
> > Sorry, I forgot about this GART workaround.
> >
> >
> >> Index: linux-2.6/arch/x86/kernel/pci-dma.c
> >> ===================================================================
> >> --- linux-2.6.orig/arch/x86/kernel/pci-dma.c
> >> +++ linux-2.6/arch/x86/kernel/pci-dma.c
> >> @@ -124,11 +124,12 @@ void __init pci_iommu_alloc(void)
> >> /* free the range so iommu could get some range less than 4G */
> >> dma32_free_bootmem();
> >> #endif
> >> + if (!swiotlb_force)
> >> + gart_iommu_hole_init();
> >> +
> >> if (pci_swiotlb_init())
> >> return;
> >>
> >> - gart_iommu_hole_init();
> >> -
> >> detect_calgary();
> >>
> >> detect_intel_iommu();
> >
> > I prefer to detect all the IOMMU drivers in a consistent way;
> > detecting only GART before swioltb doesn't look nice.
> >
> > As we did before, we could detect all the IOMMU driver before
> > swiotlb. However, I think that it's better to simply change
> > pci_swiotlb_init() not to steal the preallocate GART workaround memory.
> >
> > btw, initializing swiotlb before IOMMU detection is useful to GART
> > too? If GART can't allocate the workaround memory, then the kernel
> > panic now. We can use swiotlb instead in that case?
> >
> > =
> > From: FUJITA Tomonori <[email protected]>
> > Subject: [PATCH] x86: stop swiotlb stealing the GART workaround memory
> >
> > swiotlb wrongly uses the GART workaround memory (for bad BIOS) that
> > dma32_reserv_bootmem allocates. We need to initialize swiotlb before
> > dma32_free_bootmem().
> >
> > Signed-off-by: FUJITA Tomonori <[email protected]>
> > ---
> > arch/x86/kernel/pci-dma.c | 6 +++++-
> > 1 files changed, 5 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> > index afcc58b..26fe2cd 100644
> > --- a/arch/x86/kernel/pci-dma.c
> > +++ b/arch/x86/kernel/pci-dma.c
> > @@ -120,11 +120,15 @@ static void __init dma32_free_bootmem(void)
> >
> > void __init pci_iommu_alloc(void)
> > {
> > + int use_swiotlb;
> > +
> > + use_swiotlb = pci_swiotlb_init();
> > +
> > #ifdef CONFIG_X86_64
> > /* free the range so iommu could get some range less than 4G */
> > dma32_free_bootmem();
> > #endif
> > - if (pci_swiotlb_init())
> > + if (use_swiotlb)
> > return;
> >
> > gart_iommu_hole_init();
>
> pci_swiotlb_init need be called after dma32_free_bootmem
> otherwise it could fail for system with lots of memory and numa=off

Why? swiotlb needs just 64MB (and some) in DMA32.

swiotlb had been worked fine without that hack until 2.6.26?

Only broken GART needs that hack (and I think that it's better to move
that hack to GART code).


> so dma32_free_bootmem will work for gart_iommu_hole_init and pci_swiotlb_init..
> make sure they can get RAM < 4g.
>
> also in gart_iommu_hole_init we need check swiotlb and !valid_agp...
> - } else if (!valid_agp) {
> + } else if (swiotlb && !valid_agp) {
>
> otherwise the the workaround will be skipped...
>
> YH
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2009-11-24 09:49:56

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

FUJITA Tomonori wrote:
> On Tue, 24 Nov 2009 01:19:21 -0800
> Yinghai Lu <[email protected]> wrote:
>
>> FUJITA Tomonori wrote:
>>> On Sat, 21 Nov 2009 20:24:52 -0800
>>> Yinghai Lu <[email protected]> wrote:
>>>
>>>> amd 64 systems that
>>>> 1. do not have AGP
>>>> 2. do not have IOMMU
>>>> 3. mem > 4g
>>>> 4. BIOS do not allocate correct gart in NB.
>>>> will leave them to use SWIOTLB forcely.
>>> Sorry, I forgot about this GART workaround.
>>>
>>>
>>>> Index: linux-2.6/arch/x86/kernel/pci-dma.c
>>>> ===================================================================
>>>> --- linux-2.6.orig/arch/x86/kernel/pci-dma.c
>>>> +++ linux-2.6/arch/x86/kernel/pci-dma.c
>>>> @@ -124,11 +124,12 @@ void __init pci_iommu_alloc(void)
>>>> /* free the range so iommu could get some range less than 4G */
>>>> dma32_free_bootmem();
>>>> #endif
>>>> + if (!swiotlb_force)
>>>> + gart_iommu_hole_init();
>>>> +
>>>> if (pci_swiotlb_init())
>>>> return;
>>>>
>>>> - gart_iommu_hole_init();
>>>> -
>>>> detect_calgary();
>>>>
>>>> detect_intel_iommu();
>>> I prefer to detect all the IOMMU drivers in a consistent way;
>>> detecting only GART before swioltb doesn't look nice.
>>>
>>> As we did before, we could detect all the IOMMU driver before
>>> swiotlb. However, I think that it's better to simply change
>>> pci_swiotlb_init() not to steal the preallocate GART workaround memory.
>>>
>>> btw, initializing swiotlb before IOMMU detection is useful to GART
>>> too? If GART can't allocate the workaround memory, then the kernel
>>> panic now. We can use swiotlb instead in that case?
>>>
>>> =
>>> From: FUJITA Tomonori <[email protected]>
>>> Subject: [PATCH] x86: stop swiotlb stealing the GART workaround memory
>>>
>>> swiotlb wrongly uses the GART workaround memory (for bad BIOS) that
>>> dma32_reserv_bootmem allocates. We need to initialize swiotlb before
>>> dma32_free_bootmem().
>>>
>>> Signed-off-by: FUJITA Tomonori <[email protected]>
>>> ---
>>> arch/x86/kernel/pci-dma.c | 6 +++++-
>>> 1 files changed, 5 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>>> index afcc58b..26fe2cd 100644
>>> --- a/arch/x86/kernel/pci-dma.c
>>> +++ b/arch/x86/kernel/pci-dma.c
>>> @@ -120,11 +120,15 @@ static void __init dma32_free_bootmem(void)
>>>
>>> void __init pci_iommu_alloc(void)
>>> {
>>> + int use_swiotlb;
>>> +
>>> + use_swiotlb = pci_swiotlb_init();
>>> +
>>> #ifdef CONFIG_X86_64
>>> /* free the range so iommu could get some range less than 4G */
>>> dma32_free_bootmem();
>>> #endif
>>> - if (pci_swiotlb_init())
>>> + if (use_swiotlb)
>>> return;
>>>
>>> gart_iommu_hole_init();
>> pci_swiotlb_init need be called after dma32_free_bootmem
>> otherwise it could fail for system with lots of memory and numa=off
>
> Why? swiotlb needs just 64MB (and some) in DMA32.
>
> swiotlb had been worked fine without that hack until 2.6.26?
>
> Only broken GART needs that hack (and I think that it's better to move
> that hack to GART code).

gart iommu workaround will allocate 64M by default.

fail config: like system with 512g and sparse mem, and vmmap.

YH

2009-11-24 10:32:05

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

On Tue, 24 Nov 2009 01:48:54 -0800
Yinghai Lu <[email protected]> wrote:

> FUJITA Tomonori wrote:
> > On Tue, 24 Nov 2009 01:19:21 -0800
> > Yinghai Lu <[email protected]> wrote:
> >
> >> FUJITA Tomonori wrote:
> >>> On Sat, 21 Nov 2009 20:24:52 -0800
> >>> Yinghai Lu <[email protected]> wrote:
> >>>
> >>>> amd 64 systems that
> >>>> 1. do not have AGP
> >>>> 2. do not have IOMMU
> >>>> 3. mem > 4g
> >>>> 4. BIOS do not allocate correct gart in NB.
> >>>> will leave them to use SWIOTLB forcely.
> >>> Sorry, I forgot about this GART workaround.
> >>>
> >>>
> >>>> Index: linux-2.6/arch/x86/kernel/pci-dma.c
> >>>> ===================================================================
> >>>> --- linux-2.6.orig/arch/x86/kernel/pci-dma.c
> >>>> +++ linux-2.6/arch/x86/kernel/pci-dma.c
> >>>> @@ -124,11 +124,12 @@ void __init pci_iommu_alloc(void)
> >>>> /* free the range so iommu could get some range less than 4G */
> >>>> dma32_free_bootmem();
> >>>> #endif
> >>>> + if (!swiotlb_force)
> >>>> + gart_iommu_hole_init();
> >>>> +
> >>>> if (pci_swiotlb_init())
> >>>> return;
> >>>>
> >>>> - gart_iommu_hole_init();
> >>>> -
> >>>> detect_calgary();
> >>>>
> >>>> detect_intel_iommu();
> >>> I prefer to detect all the IOMMU drivers in a consistent way;
> >>> detecting only GART before swioltb doesn't look nice.
> >>>
> >>> As we did before, we could detect all the IOMMU driver before
> >>> swiotlb. However, I think that it's better to simply change
> >>> pci_swiotlb_init() not to steal the preallocate GART workaround memory.
> >>>
> >>> btw, initializing swiotlb before IOMMU detection is useful to GART
> >>> too? If GART can't allocate the workaround memory, then the kernel
> >>> panic now. We can use swiotlb instead in that case?
> >>>
> >>> =
> >>> From: FUJITA Tomonori <[email protected]>
> >>> Subject: [PATCH] x86: stop swiotlb stealing the GART workaround memory
> >>>
> >>> swiotlb wrongly uses the GART workaround memory (for bad BIOS) that
> >>> dma32_reserv_bootmem allocates. We need to initialize swiotlb before
> >>> dma32_free_bootmem().
> >>>
> >>> Signed-off-by: FUJITA Tomonori <[email protected]>
> >>> ---
> >>> arch/x86/kernel/pci-dma.c | 6 +++++-
> >>> 1 files changed, 5 insertions(+), 1 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> >>> index afcc58b..26fe2cd 100644
> >>> --- a/arch/x86/kernel/pci-dma.c
> >>> +++ b/arch/x86/kernel/pci-dma.c
> >>> @@ -120,11 +120,15 @@ static void __init dma32_free_bootmem(void)
> >>>
> >>> void __init pci_iommu_alloc(void)
> >>> {
> >>> + int use_swiotlb;
> >>> +
> >>> + use_swiotlb = pci_swiotlb_init();
> >>> +
> >>> #ifdef CONFIG_X86_64
> >>> /* free the range so iommu could get some range less than 4G */
> >>> dma32_free_bootmem();
> >>> #endif
> >>> - if (pci_swiotlb_init())
> >>> + if (use_swiotlb)
> >>> return;
> >>>
> >>> gart_iommu_hole_init();
> >> pci_swiotlb_init need be called after dma32_free_bootmem
> >> otherwise it could fail for system with lots of memory and numa=off
> >
> > Why? swiotlb needs just 64MB (and some) in DMA32.
> >
> > swiotlb had been worked fine without that hack until 2.6.26?
> >
> > Only broken GART needs that hack (and I think that it's better to move
> > that hack to GART code).
>
> gart iommu workaround will allocate 64M by default.
>
> fail config: like system with 512g and sparse mem, and vmmap.

Well, some systems could fail but they can use swiotlb. If users
configures their systems not to use swioltb by boot option, it's fair
that systems don't work.

We are talking about a rare broken GART BIOS issue. I think that it's
much better to remove this hack completely. As VT-d does, we can use
swiotlb instead of this GART specific hack. I think that that's saner
approach to handle broken IOMMU hardware.

2009-11-24 10:43:44

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

FUJITA Tomonori wrote:
> On Tue, 24 Nov 2009 01:48:54 -0800
> Yinghai Lu <[email protected]> wrote:
>
>> FUJITA Tomonori wrote:
>>> On Tue, 24 Nov 2009 01:19:21 -0800
>>> Yinghai Lu <[email protected]> wrote:
>>>
>>>> FUJITA Tomonori wrote:
>>>>> On Sat, 21 Nov 2009 20:24:52 -0800
>>>>> Yinghai Lu <[email protected]> wrote:
>>>>>
>>>>>> amd 64 systems that
>>>>>> 1. do not have AGP
>>>>>> 2. do not have IOMMU
>>>>>> 3. mem > 4g
>>>>>> 4. BIOS do not allocate correct gart in NB.
>>>>>> will leave them to use SWIOTLB forcely.
>>>>> Sorry, I forgot about this GART workaround.
>>>>>
>>>>>
>>>>>> Index: linux-2.6/arch/x86/kernel/pci-dma.c
>>>>>> ===================================================================
>>>>>> --- linux-2.6.orig/arch/x86/kernel/pci-dma.c
>>>>>> +++ linux-2.6/arch/x86/kernel/pci-dma.c
>>>>>> @@ -124,11 +124,12 @@ void __init pci_iommu_alloc(void)
>>>>>> /* free the range so iommu could get some range less than 4G */
>>>>>> dma32_free_bootmem();
>>>>>> #endif
>>>>>> + if (!swiotlb_force)
>>>>>> + gart_iommu_hole_init();
>>>>>> +
>>>>>> if (pci_swiotlb_init())
>>>>>> return;
>>>>>>
>>>>>> - gart_iommu_hole_init();
>>>>>> -
>>>>>> detect_calgary();
>>>>>>
>>>>>> detect_intel_iommu();
>>>>> I prefer to detect all the IOMMU drivers in a consistent way;
>>>>> detecting only GART before swioltb doesn't look nice.
>>>>>
>>>>> As we did before, we could detect all the IOMMU driver before
>>>>> swiotlb. However, I think that it's better to simply change
>>>>> pci_swiotlb_init() not to steal the preallocate GART workaround memory.
>>>>>
>>>>> btw, initializing swiotlb before IOMMU detection is useful to GART
>>>>> too? If GART can't allocate the workaround memory, then the kernel
>>>>> panic now. We can use swiotlb instead in that case?
>>>>>
>>>>> =
>>>>> From: FUJITA Tomonori <[email protected]>
>>>>> Subject: [PATCH] x86: stop swiotlb stealing the GART workaround memory
>>>>>
>>>>> swiotlb wrongly uses the GART workaround memory (for bad BIOS) that
>>>>> dma32_reserv_bootmem allocates. We need to initialize swiotlb before
>>>>> dma32_free_bootmem().
>>>>>
>>>>> Signed-off-by: FUJITA Tomonori <[email protected]>
>>>>> ---
>>>>> arch/x86/kernel/pci-dma.c | 6 +++++-
>>>>> 1 files changed, 5 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>>>>> index afcc58b..26fe2cd 100644
>>>>> --- a/arch/x86/kernel/pci-dma.c
>>>>> +++ b/arch/x86/kernel/pci-dma.c
>>>>> @@ -120,11 +120,15 @@ static void __init dma32_free_bootmem(void)
>>>>>
>>>>> void __init pci_iommu_alloc(void)
>>>>> {
>>>>> + int use_swiotlb;
>>>>> +
>>>>> + use_swiotlb = pci_swiotlb_init();
>>>>> +
>>>>> #ifdef CONFIG_X86_64
>>>>> /* free the range so iommu could get some range less than 4G */
>>>>> dma32_free_bootmem();
>>>>> #endif
>>>>> - if (pci_swiotlb_init())
>>>>> + if (use_swiotlb)
>>>>> return;
>>>>>
>>>>> gart_iommu_hole_init();
>>>> pci_swiotlb_init need be called after dma32_free_bootmem
>>>> otherwise it could fail for system with lots of memory and numa=off
>>> Why? swiotlb needs just 64MB (and some) in DMA32.
>>>
>>> swiotlb had been worked fine without that hack until 2.6.26?
>>>
>>> Only broken GART needs that hack (and I think that it's better to move
>>> that hack to GART code).
>> gart iommu workaround will allocate 64M by default.
>>
>> fail config: like system with 512g and sparse mem, and vmmap.
>
> Well, some systems could fail but they can use swiotlb. If users
> configures their systems not to use swioltb by boot option, it's fair
> that systems don't work.
i mean when 512g and numa = off, even swiotlb could have chance not to get that 64M ram
>
> We are talking about a rare broken GART BIOS issue. I think that it's
> much better to remove this hack completely. As VT-d does, we can use
> swiotlb instead of this GART specific hack. I think that that's saner
> approach to handle broken IOMMU hardware.

your second patch could break swiotlb...

at GART iommu workaround : it is not rare. a lot of systems have this problem.
and that workaround works well for several years.
don't think user will like to use SWIOTLB instead of swiotlb.

that gart iommu hardware is not broken, just those BIOS guys just forget to program it.

here intel vt-d is different, it seems can not make it work just program some hardware register...

please check my v2 patch, to see if it breaks your setup.

YH

2009-11-24 13:51:25

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

On Tue, 24 Nov 2009 02:42:26 -0800
Yinghai Lu <[email protected]> wrote:

> at GART iommu workaround : it is not rare. a lot of systems have this problem.
> and that workaround works well for several years.

swiotlb has worked too.


> don't think user will like to use SWIOTLB instead of swiotlb.

I doubt that users care about a way to fix their problems.


> that gart iommu hardware is not broken, just those BIOS guys just forget to program it.
>
> here intel vt-d is different, it seems can not make it work just program some hardware register...

Because BIOS is broken, IIRC. Both cases are pretty similar.

> please check my v2 patch, to see if it breaks your setup.

As I said, adding another trick only for GART is not good. And using
force_iommu in pci-dma.c is confusing since force_iommu should be used
only in X86_64.

The following patch works for you?


diff --git a/arch/x86/include/asm/swiotlb.h b/arch/x86/include/asm/swiotlb.h
index 87ffcb1..8085277 100644
--- a/arch/x86/include/asm/swiotlb.h
+++ b/arch/x86/include/asm/swiotlb.h
@@ -5,13 +5,17 @@

#ifdef CONFIG_SWIOTLB
extern int swiotlb;
-extern int pci_swiotlb_init(void);
+extern int __init pci_swiotlb_detect(void);
+extern void __init pci_swiotlb_init(void);
#else
#define swiotlb 0
-static inline int pci_swiotlb_init(void)
+static inline int pci_swiotlb_detect(void)
{
return 0;
}
+static inline void pci_swiotlb_init(void)
+{
+}
#endif

static inline void dma_mark_clean(void *addr, size_t size) {}
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index e0dfb68..750dd8d 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -401,6 +401,7 @@ void __init gart_iommu_hole_init(void)

iommu_detected = 1;
gart_iommu_aperture = 1;
+ swiotlb = 0;
x86_init.iommu.iommu_init = gart_iommu_init;

aper_order = (read_pci_config(bus, slot, 3, AMD64_GARTAPERTURECTL) >> 1) & 7;
@@ -458,7 +459,7 @@ out:

if (aper_alloc) {
/* Got the aperture from the AGP bridge */
- } else if (!valid_agp) {
+ } else if (swiotlb && !valid_agp) {
/* Do nothing */
} else if ((!no_iommu && max_pfn > MAX_DMA32_PFN) ||
force_iommu ||
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index afcc58b..75e14e2 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -124,8 +124,8 @@ void __init pci_iommu_alloc(void)
/* free the range so iommu could get some range less than 4G */
dma32_free_bootmem();
#endif
- if (pci_swiotlb_init())
- return;
+ if (pci_swiotlb_detect())
+ goto out;

gart_iommu_hole_init();

@@ -135,6 +135,8 @@ void __init pci_iommu_alloc(void)

/* needs to be called after gart_iommu_hole_init */
amd_iommu_detect();
+out:
+ pci_swiotlb_init();
}

void *dma_generic_alloc_coherent(struct device *dev, size_t size,
diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index e6a0d40..2358409 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -847,7 +847,6 @@ int __init gart_iommu_init(void)
flush_gart();
dma_ops = &gart_dma_ops;
x86_platform.iommu_shutdown = gart_iommu_shutdown;
- swiotlb = 0;

return 0;
}
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index e36e71d..6971ba5 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -43,12 +43,12 @@ static struct dma_map_ops swiotlb_dma_ops = {
};

/*
- * pci_swiotlb_init - initialize swiotlb if necessary
+ * pci_swiotlb_init - set swiotlb to 1 if necessary
*
* This returns non-zero if we are forced to use swiotlb (by the boot
* option).
*/
-int __init pci_swiotlb_init(void)
+int __init pci_swiotlb_detect(void)
{
/* don't initialize swiotlb if iommu=off (no_iommu=1) */
#ifdef CONFIG_X86_64
@@ -58,10 +58,13 @@ int __init pci_swiotlb_init(void)
if (swiotlb_force)
swiotlb = 1;

+ return swiotlb_force;
+}
+
+void __init pci_swiotlb_init(void)
+{
if (swiotlb) {
swiotlb_init(0);
dma_ops = &swiotlb_dma_ops;
}
-
- return swiotlb_force;
}

2009-11-24 15:26:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system


* FUJITA Tomonori <[email protected]> wrote:

> > don't think user will like to use SWIOTLB instead of swiotlb.
>
> I doubt that users care about a way to fix their problems.

As far as swiotlb vs. GART goes, arguably a non-bouncing hw based
solution (GART) is superior to a bouncing one (swiotlb), right?

ngo

2009-11-24 18:30:06

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

Ingo Molnar wrote:
> * FUJITA Tomonori <[email protected]> wrote:
>
>>> don't think user will like to use SWIOTLB instead of swiotlb.
>> I doubt that users care about a way to fix their problems.
>
> As far as swiotlb vs. GART goes, arguably a non-bouncing hw based
> solution (GART) is superior to a bouncing one (swiotlb), right?
>

yes, should use GART if possible.

YH

2009-11-24 18:51:03

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

FUJITA Tomonori wrote:
> On Tue, 24 Nov 2009 02:42:26 -0800
> Yinghai Lu <[email protected]> wrote:
>
>> at GART iommu workaround : it is not rare. a lot of systems have this problem.
>> and that workaround works well for several years.
>
> swiotlb has worked too.
>
>
>> don't think user will like to use SWIOTLB instead of swiotlb.
>
> I doubt that users care about a way to fix their problems.
>
>
>> that gart iommu hardware is not broken, just those BIOS guys just forget to program it.
>>
>> here intel vt-d is different, it seems can not make it work just program some hardware register...
>
> Because BIOS is broken, IIRC. Both cases are pretty similar.
>
>> please check my v2 patch, to see if it breaks your setup.
>
> As I said, adding another trick only for GART is not good. And using
> force_iommu in pci-dma.c is confusing since force_iommu should be used
> only in X86_64.
>
> The following patch works for you?
>
>
> diff --git a/arch/x86/include/asm/swiotlb.h b/arch/x86/include/asm/swiotlb.h
> index 87ffcb1..8085277 100644
> --- a/arch/x86/include/asm/swiotlb.h
> +++ b/arch/x86/include/asm/swiotlb.h
> @@ -5,13 +5,17 @@
>
> #ifdef CONFIG_SWIOTLB
> extern int swiotlb;
> -extern int pci_swiotlb_init(void);
> +extern int __init pci_swiotlb_detect(void);
> +extern void __init pci_swiotlb_init(void);
> #else
> #define swiotlb 0
> -static inline int pci_swiotlb_init(void)
> +static inline int pci_swiotlb_detect(void)
> {
> return 0;
> }
> +static inline void pci_swiotlb_init(void)
> +{
> +}
> #endif
>
> static inline void dma_mark_clean(void *addr, size_t size) {}
> diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> index e0dfb68..750dd8d 100644
> --- a/arch/x86/kernel/aperture_64.c
> +++ b/arch/x86/kernel/aperture_64.c
> @@ -401,6 +401,7 @@ void __init gart_iommu_hole_init(void)
>
> iommu_detected = 1;
> gart_iommu_aperture = 1;
> + swiotlb = 0;
> x86_init.iommu.iommu_init = gart_iommu_init;
>
> aper_order = (read_pci_config(bus, slot, 3, AMD64_GARTAPERTURECTL) >> 1) & 7;
> @@ -458,7 +459,7 @@ out:
>
> if (aper_alloc) {
> /* Got the aperture from the AGP bridge */
> - } else if (!valid_agp) {
> + } else if (swiotlb && !valid_agp) {
> /* Do nothing */

still have some problem here.
because you set swiotlb to 0 before. it will break
iommu=soft

when amd 64bit system, ram > 4g, and no AGP bridge, and BIOS doesn't set gart.
if the user set iommu=soft.
old way, kernel will use swiotlb.

so instead set swiotlb before, it seems you should restore
iommu_detected
gart_iommu_aperture
swiotlb = 0;
x86_init.iommu.iommu_init
to make swiotlb happy later.

> } else if ((!no_iommu && max_pfn > MAX_DMA32_PFN) ||
> force_iommu ||
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index afcc58b..75e14e2 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -124,8 +124,8 @@ void __init pci_iommu_alloc(void)
> /* free the range so iommu could get some range less than 4G */
> dma32_free_bootmem();
> #endif
> - if (pci_swiotlb_init())
> - return;
> + if (pci_swiotlb_detect())
> + goto out;
>
> gart_iommu_hole_init();
>
> @@ -135,6 +135,8 @@ void __init pci_iommu_alloc(void)
>
> /* needs to be called after gart_iommu_hole_init */
> amd_iommu_detect();
> +out:
> + pci_swiotlb_init();
> }
>
> void *dma_generic_alloc_coherent(struct device *dev, size_t size,
> diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
> index e6a0d40..2358409 100644
> --- a/arch/x86/kernel/pci-gart_64.c
> +++ b/arch/x86/kernel/pci-gart_64.c
> @@ -847,7 +847,6 @@ int __init gart_iommu_init(void)
> flush_gart();
> dma_ops = &gart_dma_ops;
> x86_platform.iommu_shutdown = gart_iommu_shutdown;
> - swiotlb = 0;
>
> return 0;
> }
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> index e36e71d..6971ba5 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -43,12 +43,12 @@ static struct dma_map_ops swiotlb_dma_ops = {
> };
>
> /*
> - * pci_swiotlb_init - initialize swiotlb if necessary
> + * pci_swiotlb_init - set swiotlb to 1 if necessary

pci_swiotlb_detect ?

YH

2009-11-24 23:53:46

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

On Tue, 24 Nov 2009 10:48:32 -0800
Yinghai Lu <[email protected]> wrote:

> > if (aper_alloc) {
> > /* Got the aperture from the AGP bridge */
> > - } else if (!valid_agp) {
> > + } else if (swiotlb && !valid_agp) {
> > /* Do nothing */
>
> still have some problem here.
> because you set swiotlb to 0 before. it will break
> iommu=soft

Oops, thanks. However, it's not GART specific issue.

'iommu=soft' is already broken on all the configurations. It should be
fixed separately. I've just sent a fix.


> > void *dma_generic_alloc_coherent(struct device *dev, size_t size,
> > diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
> > index e6a0d40..2358409 100644
> > --- a/arch/x86/kernel/pci-gart_64.c
> > +++ b/arch/x86/kernel/pci-gart_64.c
> > @@ -847,7 +847,6 @@ int __init gart_iommu_init(void)
> > flush_gart();
> > dma_ops = &gart_dma_ops;
> > x86_platform.iommu_shutdown = gart_iommu_shutdown;
> > - swiotlb = 0;
> >
> > return 0;
> > }
> > diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> > index e36e71d..6971ba5 100644
> > --- a/arch/x86/kernel/pci-swiotlb.c
> > +++ b/arch/x86/kernel/pci-swiotlb.c
> > @@ -43,12 +43,12 @@ static struct dma_map_ops swiotlb_dma_ops = {
> > };
> >
> > /*
> > - * pci_swiotlb_init - initialize swiotlb if necessary
> > + * pci_swiotlb_init - set swiotlb to 1 if necessary
>
> pci_swiotlb_detect ?

I'll fix in the next version.

2009-11-25 00:03:12

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

On Tue, 24 Nov 2009 16:26:45 +0100
Ingo Molnar <[email protected]> wrote:

>
> * FUJITA Tomonori <[email protected]> wrote:
>
> > > don't think user will like to use SWIOTLB instead of swiotlb.
> >
> > I doubt that users care about a way to fix their problems.
>
> As far as swiotlb vs. GART goes, arguably a non-bouncing hw based
> solution (GART) is superior to a bouncing one (swiotlb), right?

Yeah, but GART is not a real IOMMU that always does address
translation. If users want performance, they need 64bit DMA capable
devices to skip GART. With such devices, GART and SWIOTLB are same.

So this hacky GART workaround is useful for only people who use non
64bit DMA devices and broken BIOS with GART. IMO, it's not worth
having the workaround.

2009-11-25 07:47:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system


* FUJITA Tomonori <[email protected]> wrote:

> On Tue, 24 Nov 2009 16:26:45 +0100
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * FUJITA Tomonori <[email protected]> wrote:
> >
> > > > don't think user will like to use SWIOTLB instead of swiotlb.
> > >
> > > I doubt that users care about a way to fix their problems.
> >
> > As far as swiotlb vs. GART goes, arguably a non-bouncing hw based
> > solution (GART) is superior to a bouncing one (swiotlb), right?
>
> Yeah, but GART is not a real IOMMU that always does address
> translation. If users want performance, they need 64bit DMA capable
> devices to skip GART. With such devices, GART and SWIOTLB are same.
>
> So this hacky GART workaround is useful for only people who use non
> 64bit DMA devices and broken BIOS with GART. IMO, it's not worth
> having the workaround.

Well, what Yinghai is saying that for that (nonzero) subset of people
this change is a regression, and his patch fixes that. We dont do
regressions so this needs to be addressed.

Thanks,

Ingo

2009-11-25 07:57:26

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] x86: fix gart iommu using for amd 64 bit system

On Wed, 25 Nov 2009 08:25:04 +0100
Ingo Molnar <[email protected]> wrote:

>
> * FUJITA Tomonori <[email protected]> wrote:
>
> > On Tue, 24 Nov 2009 16:26:45 +0100
> > Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * FUJITA Tomonori <[email protected]> wrote:
> > >
> > > > > don't think user will like to use SWIOTLB instead of swiotlb.
> > > >
> > > > I doubt that users care about a way to fix their problems.
> > >
> > > As far as swiotlb vs. GART goes, arguably a non-bouncing hw based
> > > solution (GART) is superior to a bouncing one (swiotlb), right?
> >
> > Yeah, but GART is not a real IOMMU that always does address
> > translation. If users want performance, they need 64bit DMA capable
> > devices to skip GART. With such devices, GART and SWIOTLB are same.
> >
> > So this hacky GART workaround is useful for only people who use non
> > 64bit DMA devices and broken BIOS with GART. IMO, it's not worth
> > having the workaround.
>
> Well, what Yinghai is saying that for that (nonzero) subset of people
> this change is a regression, and his patch fixes that. We dont do
> regressions so this needs to be addressed.

OK, I guess that I have to wait until nobody cares about GART.

Anyway, my patch fixes his issue without any regression:

http://marc.info/?l=linux-kernel&m=125907069703595&w=2

I'll post the second version of it after you merge the following fix:

http://marc.info/?l=linux-kernel&m=125910650500523&w=2


Thanks,