2015-02-09 22:45:52

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 0/7] Kernel huge I/O mapping support

ioremap() and its related interfaces are used to create I/O
mappings to memory-mapped I/O devices. The mapping sizes of
the traditional I/O devices are relatively small. Non-volatile
memory (NVM), however, has many GB and is going to have TB soon.
It is not very efficient to create large I/O mappings with 4KB.

This patchset extends the ioremap() interfaces to transparently
create I/O mappings with huge pages whenever possible. ioremap()
continues to use 4KB mappings when a huge page does not fit into
a requested range. There is no change necessary to the drivers
using ioremap(). A requested physical address must be aligned by
a huge page size (1GB or 2MB on x86) for using huge page mapping,
though. The kernel huge I/O mapping will improve performance of
NVM and other devices with large memory, and reduce the time to
create their mappings as well.

On x86, the huge I/O mapping may not be used when a target range is
covered by multiple MTRRs with different memory types. The caller
must make a separate request for each MTRR range, or the huge I/O
mapping can be disabled with the kernel boot option "nohugeiomap".
The detail of this issue is described in the email below, and this
patch takes option C) in favor of simplicity since MTRRs are legacy
feature.
https://lkml.org/lkml/2015/2/5/638

The patchset introduces the following configs:
HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
Require HAVE_ARCH_HUGE_VMAP set.
HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
Require X86_PAE set on X86_32.

Patch 1-4 changes common files to support huge I/O mappings. There
is no change in the functinalities until HUGE_IOMAP is set in patch 7.

Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
and set HAVE_ARCH_HUGE_VMAP on x86.

Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
x86.

---
v2:
- Addressed review comments from Andrew Morton.
- Changed HAVE_ARCH_HUGE_VMAP to require X86_PAE set on X86_32.
- Documented a x86 restriction with multiple MTRRs with different
memory types.

---
Toshi Kani (7):
1/7 mm: Change __get_vm_area_node() to use fls_long()
2/7 lib: Add huge I/O map capability interfaces
3/7 mm: Change ioremap to set up huge I/O mappings
4/7 mm: Change vunmap to tear down huge KVA mappings
5/7 x86, mm: Support huge KVA mappings on x86
6/7 x86, mm: Support huge I/O mappings on x86
7/7 mm: Add config HUGE_IOMAP to enable huge I/O mappings

---
Documentation/kernel-parameters.txt | 2 ++
arch/Kconfig | 3 +++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/page_types.h | 8 ++++++
arch/x86/mm/ioremap.c | 26 ++++++++++++++++--
arch/x86/mm/pgtable.c | 34 +++++++++++++++++++++++
include/asm-generic/pgtable.h | 12 +++++++++
include/linux/io.h | 7 +++++
init/main.c | 2 ++
lib/ioremap.c | 54 +++++++++++++++++++++++++++++++++++++
mm/Kconfig | 11 ++++++++
mm/vmalloc.c | 8 +++++-
12 files changed, 165 insertions(+), 3 deletions(-)


2015-02-09 22:45:57

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 1/7] mm: Change __get_vm_area_node() to use fls_long()

__get_vm_area_node() takes unsigned long size, which is a 64-bit
value on a 64-bit kernel. However, fls(size) simply ignores the
upper 32-bit. Change to use fls_long() to handle the size properly.

Signed-off-by: Toshi Kani <[email protected]>
---
mm/vmalloc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 39c3388..40ea214 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -29,6 +29,7 @@
#include <linux/atomic.h>
#include <linux/compiler.h>
#include <linux/llist.h>
+#include <linux/bitops.h>

#include <asm/uaccess.h>
#include <asm/tlbflush.h>
@@ -1314,7 +1315,8 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,

BUG_ON(in_interrupt());
if (flags & VM_IOREMAP)
- align = 1ul << clamp(fls(size), PAGE_SHIFT, IOREMAP_MAX_ORDER);
+ align = 1ul << clamp_t(int, fls_long(size),
+ PAGE_SHIFT, IOREMAP_MAX_ORDER);

size = PAGE_ALIGN(size);
if (unlikely(!size))

2015-02-09 22:45:59

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 2/7] lib: Add huge I/O map capability interfaces

Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which
return 1 when I/O mappings of pud/pmd are enabled on the kernel.

ioremap_huge_init() calls arch_ioremap_pud_supported() and
arch_ioremap_pmd_supported() to initialize the capabilities.

A new kernel option "nohugeiomap" is also added, so that user
can disable the huge I/O map capabilities when necessary.

Signed-off-by: Toshi Kani <[email protected]>
---
Documentation/kernel-parameters.txt | 2 ++
include/linux/io.h | 7 ++++++
init/main.c | 2 ++
lib/ioremap.c | 38 +++++++++++++++++++++++++++++++++++
4 files changed, 49 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 176d4fe..1872b46 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2304,6 +2304,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
register save and restore. The kernel will only save
legacy floating-point registers on task switch.

+ nohugeiomap [KNL,x86] Disable kernel huge I/O mappings.
+
noxsave [BUGS=X86] Disables x86 extended register state save
and restore using xsave. The kernel will fallback to
enabling legacy floating-point and sse state.
diff --git a/include/linux/io.h b/include/linux/io.h
index fa02e55..9acc588a 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -38,6 +38,13 @@ static inline int ioremap_page_range(unsigned long addr, unsigned long end,
}
#endif

+void __init ioremap_huge_init(void);
+
+#ifdef CONFIG_HUGE_IOMAP
+int arch_ioremap_pud_supported(void);
+int arch_ioremap_pmd_supported(void);
+#endif
+
/*
* Managed iomap interface
*/
diff --git a/init/main.c b/init/main.c
index 61b99376..9f871ac 100644
--- a/init/main.c
+++ b/init/main.c
@@ -80,6 +80,7 @@
#include <linux/list.h>
#include <linux/integrity.h>
#include <linux/proc_ns.h>
+#include <linux/io.h>

#include <asm/io.h>
#include <asm/bugs.h>
@@ -497,6 +498,7 @@ static void __init mm_init(void)
percpu_init_late();
pgtable_init();
vmalloc_init();
+ ioremap_huge_init();
}

asmlinkage __visible void __init start_kernel(void)
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 0c9216c..cafd83e 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -13,6 +13,44 @@
#include <asm/cacheflush.h>
#include <asm/pgtable.h>

+#ifdef CONFIG_HUGE_IOMAP
+int __read_mostly ioremap_pud_capable;
+int __read_mostly ioremap_pmd_capable;
+int __read_mostly ioremap_huge_disabled;
+
+static int __init set_nohugeiomap(char *str)
+{
+ ioremap_huge_disabled = 1;
+ return 0;
+}
+early_param("nohugeiomap", set_nohugeiomap);
+
+void __init ioremap_huge_init(void)
+{
+ if (!ioremap_huge_disabled) {
+ if (arch_ioremap_pud_supported())
+ ioremap_pud_capable = 1;
+ if (arch_ioremap_pmd_supported())
+ ioremap_pmd_capable = 1;
+ }
+}
+
+static inline int ioremap_pud_enabled(void)
+{
+ return ioremap_pud_capable;
+}
+
+static inline int ioremap_pmd_enabled(void)
+{
+ return ioremap_pmd_capable;
+}
+
+#else /* !CONFIG_HUGE_IOMAP */
+void __init ioremap_huge_init(void) { }
+static inline int ioremap_pud_enabled(void) { return 0; }
+static inline int ioremap_pmd_enabled(void) { return 0; }
+#endif /* CONFIG_HUGE_IOMAP */
+
static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
{

2015-02-09 22:46:02

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 3/7] mm: Change ioremap to set up huge I/O mappings

Change ioremap_pud_range() and ioremap_pmd_range() to set up
kernel huge I/O mappings when their capability is enabled, and
the request meets their conditions -- both virtual & physical
addresses are aligned and its range fufills the mapping size.

The changes are only enabled when both CONFIG_HUGE_IOMAP and
CONFIG_HAVE_ARCH_HUGE_VMAP are defined.

Signed-off-by: Toshi Kani <[email protected]>
---
arch/Kconfig | 3 +++
include/asm-generic/pgtable.h | 8 ++++++++
lib/ioremap.c | 16 ++++++++++++++++
3 files changed, 27 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 05d7a8a..55c4440 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -446,6 +446,9 @@ config HAVE_IRQ_TIME_ACCOUNTING
config HAVE_ARCH_TRANSPARENT_HUGEPAGE
bool

+config HAVE_ARCH_HUGE_VMAP
+ bool
+
config HAVE_ARCH_SOFT_DIRTY
bool

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 177d597..7dc3838 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -847,4 +847,12 @@ static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
#define io_remap_pfn_range remap_pfn_range
#endif

+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
+void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
+#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static inline void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { }
+static inline void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) { }
+#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
+
#endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/lib/ioremap.c b/lib/ioremap.c
index cafd83e..c447832 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -81,6 +81,14 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
return -ENOMEM;
do {
next = pmd_addr_end(addr, end);
+
+ if (ioremap_pmd_enabled() &&
+ ((next - addr) == PMD_SIZE) &&
+ IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
+ pmd_set_huge(pmd, phys_addr + addr, prot);
+ continue;
+ }
+
if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
return -ENOMEM;
} while (pmd++, addr = next, addr != end);
@@ -99,6 +107,14 @@ static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
return -ENOMEM;
do {
next = pud_addr_end(addr, end);
+
+ if (ioremap_pud_enabled() &&
+ ((next - addr) == PUD_SIZE) &&
+ IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
+ pud_set_huge(pud, phys_addr + addr, prot);
+ continue;
+ }
+
if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
return -ENOMEM;
} while (pud++, addr = next, addr != end);

2015-02-09 22:46:52

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 4/7] mm: Change vunmap to tear down huge KVA mappings

Change vunmap_pmd_range() and vunmap_pud_range() to tear down
huge KVA mappings when they are set.

These changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP
is defined.

Signed-off-by: Toshi Kani <[email protected]>
---
include/asm-generic/pgtable.h | 4 ++++
mm/vmalloc.c | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 7dc3838..1204ea6 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -850,9 +850,13 @@ static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
+int pud_clear_huge(pud_t *pud);
+int pmd_clear_huge(pmd_t *pmd);
#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
static inline void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { }
static inline void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) { }
+static inline int pud_clear_huge(pud_t *pud) { return 0; }
+static inline int pmd_clear_huge(pmd_t *pmd) { return 0; }
#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */

#endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 40ea214..dd53a9d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -75,6 +75,8 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end)
pmd = pmd_offset(pud, addr);
do {
next = pmd_addr_end(addr, end);
+ if (pmd_clear_huge(pmd))
+ continue;
if (pmd_none_or_clear_bad(pmd))
continue;
vunmap_pte_range(pmd, addr, next);
@@ -89,6 +91,8 @@ static void vunmap_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end)
pud = pud_offset(pgd, addr);
do {
next = pud_addr_end(addr, end);
+ if (pud_clear_huge(pud))
+ continue;
if (pud_none_or_clear_bad(pud))
continue;
vunmap_pmd_range(pud, addr, next);

2015-02-09 22:46:05

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86

Implement huge KVA mapping interfaces on x86. Select
HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
Without X86_PAE set, the X86_32 kernel has the 2-level page
tables and cannot provide the huge KVA mappings.

Signed-off-by: Toshi Kani <[email protected]>
---
arch/x86/Kconfig | 1 +
arch/x86/mm/pgtable.c | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0dc9d01..a79e286 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -97,6 +97,7 @@ config X86
select IRQ_FORCED_THREADING
select HAVE_BPF_JIT if X86_64
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+ select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
select ARCH_HAS_SG_CHAIN
select CLKEVT_I8253
select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 6fb6927..e495432 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -481,3 +481,37 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
{
__native_set_fixmap(idx, pfn_pte(phys >> PAGE_SHIFT, flags));
}
+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
+{
+ set_pte((pte_t *)pud, pfn_pte(
+ (u64)addr >> PAGE_SHIFT,
+ __pgprot(pgprot_val(prot) | _PAGE_PSE)));
+}
+
+void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
+{
+ set_pte((pte_t *)pmd, pfn_pte(
+ (u64)addr >> PAGE_SHIFT,
+ __pgprot(pgprot_val(prot) | _PAGE_PSE)));
+}
+
+int pud_clear_huge(pud_t *pud)
+{
+ if (pud_large(*pud)) {
+ pud_clear(pud);
+ return 1;
+ }
+ return 0;
+}
+
+int pmd_clear_huge(pmd_t *pmd)
+{
+ if (pmd_large(*pmd)) {
+ pmd_clear(pmd);
+ return 1;
+ }
+ return 0;
+}
+#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */

2015-02-09 22:46:33

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86

This patch implements huge I/O mapping capability interfaces on x86.

IOREMAP_MAX_ORDER is defined to the size of PUD on X86_64 and PMD
on x86_32. When IOREMAP_MAX_ORDER is not defined on x86, it is
defined to the generic value in <linux/vmalloc.h>.

On x86, the huge I/O mapping may not be used when a target range is
covered by multiple MTRRs with different memory types. The caller
must make a separate request for each MTRR range, or the huge I/O
mapping can be disabled with the kernel boot option "nohugeiomap".
The detail of this issue is described in the email below, and this
patch takes option C) in favor of simplicity since MTRRs are legacy
feature.
https://lkml.org/lkml/2015/2/5/638

Signed-off-by: Toshi Kani <[email protected]>
---
arch/x86/include/asm/page_types.h | 8 ++++++++
arch/x86/mm/ioremap.c | 26 ++++++++++++++++++++++++--
2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index f97fbe3..246426c 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -38,6 +38,14 @@

#define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START)

+#ifdef CONFIG_HUGE_IOMAP
+#ifdef CONFIG_X86_64
+#define IOREMAP_MAX_ORDER (PUD_SHIFT)
+#else
+#define IOREMAP_MAX_ORDER (PMD_SHIFT)
+#endif
+#endif /* CONFIG_HUGE_IOMAP */
+
#ifdef CONFIG_X86_64
#include <asm/page_64_types.h>
#else
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index fdf617c..f97b587 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -67,8 +67,14 @@ static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,

/*
* Remap an arbitrary physical address space into the kernel virtual
- * address space. Needed when the kernel wants to access high addresses
- * directly.
+ * address space. It transparently creates kernel huge I/O mapping when
+ * the physical address is aligned by a huge page size (1GB or 2MB) and
+ * the requested size is at least the huge page size.
+ *
+ * NOTE: The huge I/O mapping may not be used when a target range is
+ * covered by multiple MTRRs with different memory types. The caller
+ * must make a separate request for each MTRR range, or the huge I/O
+ * mapping can be disabled with the kernel boot option "nohugeiomap".
*
* NOTE! We need to allow non-page-aligned mappings too: we will obviously
* have to convert them into an offset in a page-aligned mapping, but the
@@ -326,6 +332,22 @@ void iounmap(volatile void __iomem *addr)
}
EXPORT_SYMBOL(iounmap);

+#ifdef CONFIG_HUGE_IOMAP
+int arch_ioremap_pud_supported(void)
+{
+#ifdef CONFIG_X86_64
+ return cpu_has_gbpages;
+#else
+ return 0;
+#endif
+}
+
+int arch_ioremap_pmd_supported(void)
+{
+ return cpu_has_pse;
+}
+#endif /* CONFIG_HUGE_IOMAP */
+
/*
* Convert a physical pointer to a virtual kernel pointer for /dev/mem
* access

2015-02-09 22:46:08

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 7/7] mm: Add config HUGE_IOMAP to enable huge I/O mappings

Add config HUGE_IOMAP to enable huge I/O mappings. This feature
is set to Y by default when HAVE_ARCH_HUGE_VMAP is defined on the
architecture.

Note that user can also disable this feature at boot-time by the
new kernel option "nohugeiomap" when necessary.

Signed-off-by: Toshi Kani <[email protected]>
---
mm/Kconfig | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 1d1ae6b..eb738ae 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -444,6 +444,17 @@ choice
benefit.
endchoice

+config HUGE_IOMAP
+ bool "Kernel huge I/O mapping support"
+ depends on HAVE_ARCH_HUGE_VMAP
+ default y
+ help
+ Kernel huge I/O mapping allows the kernel to transparently
+ create I/O mappings with huge pages for memory-mapped I/O
+ devices whenever possible. This feature can improve
+ performance of certain devices with large memory size, such
+ as NVM, and reduce the time to create their mappings.
+
#
# UP and nommu archs use km based percpu allocator
#

2015-02-10 18:59:09

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86

On 02/09/2015 02:45 PM, Toshi Kani wrote:
> Implement huge KVA mapping interfaces on x86. Select
> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> Without X86_PAE set, the X86_32 kernel has the 2-level page
> tables and cannot provide the huge KVA mappings.

Not that it's a big deal, but what's the limitation with the 2-level
page tables on 32-bit? We have a 4MB large page size available there
and we already use it for the kernel linear mapping.

2015-02-10 20:42:50

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86

On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> > Implement huge KVA mapping interfaces on x86. Select
> > HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> > Without X86_PAE set, the X86_32 kernel has the 2-level page
> > tables and cannot provide the huge KVA mappings.
>
> Not that it's a big deal, but what's the limitation with the 2-level
> page tables on 32-bit? We have a 4MB large page size available there
> and we already use it for the kernel linear mapping.

ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
with PTEs. This patch-set enables ioremap_page_range() to set up PUD &
PMD mappings. With 2-level page table, I do not think this PUD/PMD
mapping code works unless we add some special code.

Thanks,
-Toshi

2015-02-10 20:51:09

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86

On 02/10/2015 12:42 PM, Toshi Kani wrote:
> On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
>> On 02/09/2015 02:45 PM, Toshi Kani wrote:
>>> Implement huge KVA mapping interfaces on x86. Select
>>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
>>> Without X86_PAE set, the X86_32 kernel has the 2-level page
>>> tables and cannot provide the huge KVA mappings.
>>
>> Not that it's a big deal, but what's the limitation with the 2-level
>> page tables on 32-bit? We have a 4MB large page size available there
>> and we already use it for the kernel linear mapping.
>
> ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> with PTEs. This patch-set enables ioremap_page_range() to set up PUD &
> PMD mappings. With 2-level page table, I do not think this PUD/PMD
> mapping code works unless we add some special code.

What actually breaks, though?

Can't you just disable the pud code via ioremap_pud_enabled()?

2015-02-10 22:13:35

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86

On Tue, 2015-02-10 at 12:51 -0800, Dave Hansen wrote:
> On 02/10/2015 12:42 PM, Toshi Kani wrote:
> > On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> >> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> >>> Implement huge KVA mapping interfaces on x86. Select
> >>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> >>> Without X86_PAE set, the X86_32 kernel has the 2-level page
> >>> tables and cannot provide the huge KVA mappings.
> >>
> >> Not that it's a big deal, but what's the limitation with the 2-level
> >> page tables on 32-bit? We have a 4MB large page size available there
> >> and we already use it for the kernel linear mapping.
> >
> > ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> > with PTEs. This patch-set enables ioremap_page_range() to set up PUD &
> > PMD mappings. With 2-level page table, I do not think this PUD/PMD
> > mapping code works unless we add some special code.
>
> What actually breaks, though?
>
> Can't you just disable the pud code via ioremap_pud_enabled()?

That's what v1 did, and I found in testing that the PMD mapping code did
not work when PAE was unset. I think we need special handling similar
to one_md_table_init(), which returns pgd as pmd in case of non-PAE.
ioremap_page_range() does not have such handling and I thought it would
be worth adding it.

Thanks,
-Toshi

2015-02-10 22:21:07

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86

On Tue, 2015-02-10 at 15:13 -0700, Toshi Kani wrote:
> On Tue, 2015-02-10 at 12:51 -0800, Dave Hansen wrote:
> > On 02/10/2015 12:42 PM, Toshi Kani wrote:
> > > On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> > >> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> > >>> Implement huge KVA mapping interfaces on x86. Select
> > >>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> > >>> Without X86_PAE set, the X86_32 kernel has the 2-level page
> > >>> tables and cannot provide the huge KVA mappings.
> > >>
> > >> Not that it's a big deal, but what's the limitation with the 2-level
> > >> page tables on 32-bit? We have a 4MB large page size available there
> > >> and we already use it for the kernel linear mapping.
> > >
> > > ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> > > with PTEs. This patch-set enables ioremap_page_range() to set up PUD &
> > > PMD mappings. With 2-level page table, I do not think this PUD/PMD
> > > mapping code works unless we add some special code.
> >
> > What actually breaks, though?
> >
> > Can't you just disable the pud code via ioremap_pud_enabled()?
>
> That's what v1 did, and I found in testing that the PMD mapping code did
> not work when PAE was unset. I think we need special handling similar
> to one_md_table_init(), which returns pgd as pmd in case of non-PAE.
> ioremap_page_range() does not have such handling and I thought it would
> be worth adding it.

Oops, a typo. The last sentence should be "I thought it would not be
worth adding it."

Thanks,
-Toshi

2015-02-10 23:10:42

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86

On Tue, 2015-02-10 at 15:13 -0700, Toshi Kani wrote:
> On Tue, 2015-02-10 at 12:51 -0800, Dave Hansen wrote:
> > On 02/10/2015 12:42 PM, Toshi Kani wrote:
> > > On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> > >> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> > >>> Implement huge KVA mapping interfaces on x86. Select
> > >>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> > >>> Without X86_PAE set, the X86_32 kernel has the 2-level page
> > >>> tables and cannot provide the huge KVA mappings.
> > >>
> > >> Not that it's a big deal, but what's the limitation with the 2-level
> > >> page tables on 32-bit? We have a 4MB large page size available there
> > >> and we already use it for the kernel linear mapping.
> > >
> > > ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> > > with PTEs. This patch-set enables ioremap_page_range() to set up PUD &
> > > PMD mappings. With 2-level page table, I do not think this PUD/PMD
> > > mapping code works unless we add some special code.
> >
> > What actually breaks, though?
> >
> > Can't you just disable the pud code via ioremap_pud_enabled()?
>
> That's what v1 did, and I found in testing that the PMD mapping code did
> not work when PAE was unset. I think we need special handling similar
> to one_md_table_init(), which returns pgd as pmd in case of non-PAE.
> ioremap_page_range() does not have such handling and I thought it would
> not be worth adding it.

Actually pud_alloc() and pmd_alloc() should carry pgd in this case... I
will look into the problem to see why it did not work when PAE was
unset.

Thanks,
-Toshi

2015-02-18 20:44:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86


* Toshi Kani <[email protected]> wrote:

> This patch implements huge I/O mapping capability interfaces on x86.

> +#ifdef CONFIG_HUGE_IOMAP
> +#ifdef CONFIG_X86_64
> +#define IOREMAP_MAX_ORDER (PUD_SHIFT)
> +#else
> +#define IOREMAP_MAX_ORDER (PMD_SHIFT)
> +#endif
> +#endif /* CONFIG_HUGE_IOMAP */

> +#ifdef CONFIG_HUGE_IOMAP

Hm, so why is there a Kconfig option for this? It just
complicates things.

For example the kernel already defaults to mapping itself
with as large mappings as possible, without a Kconfig entry
for it. There's no reason to make this configurable - and
quite a bit of complexity in the patches comes from this
configurability.

Thanks,

Ingo

2015-02-18 21:14:07

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86

On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> * Toshi Kani <[email protected]> wrote:
>
> > This patch implements huge I/O mapping capability interfaces on x86.
>
> > +#ifdef CONFIG_HUGE_IOMAP
> > +#ifdef CONFIG_X86_64
> > +#define IOREMAP_MAX_ORDER (PUD_SHIFT)
> > +#else
> > +#define IOREMAP_MAX_ORDER (PMD_SHIFT)
> > +#endif
> > +#endif /* CONFIG_HUGE_IOMAP */
>
> > +#ifdef CONFIG_HUGE_IOMAP
>
> Hm, so why is there a Kconfig option for this? It just
> complicates things.
>
> For example the kernel already defaults to mapping itself
> with as large mappings as possible, without a Kconfig entry
> for it. There's no reason to make this configurable - and
> quite a bit of complexity in the patches comes from this
> configurability.

This Kconfig option was added to disable this feature in case there is
an issue. That said, since the patchset also added a new nohugeiomap
boot option for the same purpose, I agree that this Kconfig option can
be removed. So, I will remove it in the next version.

An example of such case is with multiple MTRRs described in patch 0/7.
However, I believe it is very unlikely to have such platform/use-case,
and it can also be avoided by a driver creating a separate mapping for
each MTRR range.

Thanks,
-Toshi

2015-02-18 21:16:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86


* Toshi Kani <[email protected]> wrote:

> On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> > * Toshi Kani <[email protected]> wrote:
> >
> > > This patch implements huge I/O mapping capability interfaces on x86.
> >
> > > +#ifdef CONFIG_HUGE_IOMAP
> > > +#ifdef CONFIG_X86_64
> > > +#define IOREMAP_MAX_ORDER (PUD_SHIFT)
> > > +#else
> > > +#define IOREMAP_MAX_ORDER (PMD_SHIFT)
> > > +#endif
> > > +#endif /* CONFIG_HUGE_IOMAP */
> >
> > > +#ifdef CONFIG_HUGE_IOMAP
> >
> > Hm, so why is there a Kconfig option for this? It just
> > complicates things.
> >
> > For example the kernel already defaults to mapping itself
> > with as large mappings as possible, without a Kconfig entry
> > for it. There's no reason to make this configurable - and
> > quite a bit of complexity in the patches comes from this
> > configurability.
>
> This Kconfig option was added to disable this feature in
> case there is an issue. [...]

If bugs are found then they should be fixed.

> [...] That said, since the patchset also added a new
> nohugeiomap boot option for the same purpose, I agree
> that this Kconfig option can be removed. So, I will
> remove it in the next version.
>
> An example of such case is with multiple MTRRs described
> in patch 0/7.

So the multi-MTRR case should probably be detected and
handled safely?

Thanks,

Ingo

2015-02-18 21:33:54

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86

On Wed, 2015-02-18 at 22:15 +0100, Ingo Molnar wrote:
> * Toshi Kani <[email protected]> wrote:
>
> > On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> > > * Toshi Kani <[email protected]> wrote:
> > >
> > > > This patch implements huge I/O mapping capability interfaces on x86.
> > >
> > > > +#ifdef CONFIG_HUGE_IOMAP
> > > > +#ifdef CONFIG_X86_64
> > > > +#define IOREMAP_MAX_ORDER (PUD_SHIFT)
> > > > +#else
> > > > +#define IOREMAP_MAX_ORDER (PMD_SHIFT)
> > > > +#endif
> > > > +#endif /* CONFIG_HUGE_IOMAP */
> > >
> > > > +#ifdef CONFIG_HUGE_IOMAP
> > >
> > > Hm, so why is there a Kconfig option for this? It just
> > > complicates things.
> > >
> > > For example the kernel already defaults to mapping itself
> > > with as large mappings as possible, without a Kconfig entry
> > > for it. There's no reason to make this configurable - and
> > > quite a bit of complexity in the patches comes from this
> > > configurability.
> >
> > This Kconfig option was added to disable this feature in
> > case there is an issue. [...]
>
> If bugs are found then they should be fixed.

Right.

> > [...] That said, since the patchset also added a new
> > nohugeiomap boot option for the same purpose, I agree
> > that this Kconfig option can be removed. So, I will
> > remove it in the next version.
> >
> > An example of such case is with multiple MTRRs described
> > in patch 0/7.
>
> So the multi-MTRR case should probably be detected and
> handled safely?

I considered two options to safely handle this case, i.e. option A) and
B) described in the link below.
https://lkml.org/lkml/2015/2/5/638

I thought about how much complication we should put into the code for an
imaginable platform with a combination of new NVM (or large I/O range)
and legacy MTRRs with multi-types & contiguous ranges. My thinking is
that we should go with option C) for simplicity, and implement A) or B)
later if we find it necessary.

Thanks,
-Toshi




2015-02-18 21:57:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86


* Toshi Kani <[email protected]> wrote:

> On Wed, 2015-02-18 at 22:15 +0100, Ingo Molnar wrote:
> > * Toshi Kani <[email protected]> wrote:
> >
> > > On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> > > > * Toshi Kani <[email protected]> wrote:
> > > >
> > > > > This patch implements huge I/O mapping capability interfaces on x86.
> > > >
> > > > > +#ifdef CONFIG_HUGE_IOMAP
> > > > > +#ifdef CONFIG_X86_64
> > > > > +#define IOREMAP_MAX_ORDER (PUD_SHIFT)
> > > > > +#else
> > > > > +#define IOREMAP_MAX_ORDER (PMD_SHIFT)
> > > > > +#endif
> > > > > +#endif /* CONFIG_HUGE_IOMAP */
> > > >
> > > > > +#ifdef CONFIG_HUGE_IOMAP
> > > >
> > > > Hm, so why is there a Kconfig option for this? It just
> > > > complicates things.
> > > >
> > > > For example the kernel already defaults to mapping itself
> > > > with as large mappings as possible, without a Kconfig entry
> > > > for it. There's no reason to make this configurable - and
> > > > quite a bit of complexity in the patches comes from this
> > > > configurability.
> > >
> > > This Kconfig option was added to disable this feature in
> > > case there is an issue. [...]
> >
> > If bugs are found then they should be fixed.
>
> Right.
>
> > > [...] That said, since the patchset also added a new
> > > nohugeiomap boot option for the same purpose, I agree
> > > that this Kconfig option can be removed. So, I will
> > > remove it in the next version.
> > >
> > > An example of such case is with multiple MTRRs described
> > > in patch 0/7.
> >
> > So the multi-MTRR case should probably be detected and
> > handled safely?
>
> I considered two options to safely handle this case, i.e.
> option A) and B) described in the link below.
>
> https://lkml.org/lkml/2015/2/5/638
>
> I thought about how much complication we should put into
> the code for an imaginable platform with a combination of
> new NVM (or large I/O range) and legacy MTRRs with
> multi-types & contiguous ranges. My thinking is that we
> should go with option C) for simplicity, and implement A)
> or B) later if we find it necessary.

Well, why not option D):

D) detect unaligned requests and reject them

?

Thanks,

Ingo

2015-02-18 22:14:42

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86

On Wed, 2015-02-18 at 22:57 +0100, Ingo Molnar wrote:
> * Toshi Kani <[email protected]> wrote:
>
> > On Wed, 2015-02-18 at 22:15 +0100, Ingo Molnar wrote:
> > > * Toshi Kani <[email protected]> wrote:
> > >
> > > > On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
:
> >
> > > > [...] That said, since the patchset also added a new
> > > > nohugeiomap boot option for the same purpose, I agree
> > > > that this Kconfig option can be removed. So, I will
> > > > remove it in the next version.
> > > >
> > > > An example of such case is with multiple MTRRs described
> > > > in patch 0/7.
> > >
> > > So the multi-MTRR case should probably be detected and
> > > handled safely?
> >
> > I considered two options to safely handle this case, i.e.
> > option A) and B) described in the link below.
> >
> > https://lkml.org/lkml/2015/2/5/638
> >
> > I thought about how much complication we should put into
> > the code for an imaginable platform with a combination of
> > new NVM (or large I/O range) and legacy MTRRs with
> > multi-types & contiguous ranges. My thinking is that we
> > should go with option C) for simplicity, and implement A)
> > or B) later if we find it necessary.
>
> Well, why not option D):
>
> D) detect unaligned requests and reject them
>

That sounds like a good idea! I will work on it.

Thanks,
-Toshi

2015-02-23 20:22:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] Kernel huge I/O mapping support

On Mon, 9 Feb 2015 15:45:28 -0700 Toshi Kani <[email protected]> wrote:

> ioremap() and its related interfaces are used to create I/O
> mappings to memory-mapped I/O devices. The mapping sizes of
> the traditional I/O devices are relatively small. Non-volatile
> memory (NVM), however, has many GB and is going to have TB soon.
> It is not very efficient to create large I/O mappings with 4KB.

The changelogging is very good - thanks for taking the time to do this.

> This patchset extends the ioremap() interfaces to transparently
> create I/O mappings with huge pages whenever possible.

I'm wondering if this is prudent. Existing code which was tested with
4k mappings will magically start to use huge tlb mappings. I don't
know what could go wrong, but I'd prefer not to find out! Wouldn't it
be safer to make this an explicit opt-in?

What operations can presently be performed against an ioremapped area?
Can kernel code perform change_page_attr() against individual pages?
Can kernel code run iounmap() against just part of that region (I
forget). There does seem to be potential for breakage if we start
using hugetlb mappings for such things?

> ioremap()
> continues to use 4KB mappings when a huge page does not fit into
> a requested range. There is no change necessary to the drivers
> using ioremap(). A requested physical address must be aligned by
> a huge page size (1GB or 2MB on x86) for using huge page mapping,
> though. The kernel huge I/O mapping will improve performance of
> NVM and other devices with large memory, and reduce the time to
> create their mappings as well.
>
> On x86, the huge I/O mapping may not be used when a target range is
> covered by multiple MTRRs with different memory types. The caller
> must make a separate request for each MTRR range, or the huge I/O
> mapping can be disabled with the kernel boot option "nohugeiomap".
> The detail of this issue is described in the email below, and this
> patch takes option C) in favor of simplicity since MTRRs are legacy
> feature.
> https://lkml.org/lkml/2015/2/5/638

How is this mtrr clash handled?

- The iomap call will fail if there are any MTRRs covering the region?

- The iomap call will fail if there are more than one MTRRs covering
the region?

- If the ioremap will succeed if a single MTRR covers the region,
must that MTRR cover the *entire* region?

- What happens if userspace tried fiddling the MTRRs after the region
has been established?

<reads the code>

Oh. We don't do any checking at all. We're just telling userspace
programmers "don't do that". hrm. What are your thoughts on adding
the overlap checks to the kernel?

This adds more potential for breaking existing code, doesn't it? If
there's code which is using 4k ioremap on regions which are covered by
mtrrs, the transparent switch to hugeptes will cause that code to enter
the "undefined behaviour" space?

> The patchset introduces the following configs:
> HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
> Require HAVE_ARCH_HUGE_VMAP set.
> HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
> Require X86_PAE set on X86_32.
>
> Patch 1-4 changes common files to support huge I/O mappings. There
> is no change in the functinalities until HUGE_IOMAP is set in patch 7.
>
> Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
> and set HAVE_ARCH_HUGE_VMAP on x86.
>
> Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
> x86.

What do other architectures need to do to utilize this?

2015-02-23 23:55:11

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] Kernel huge I/O mapping support

On Mon, 2015-02-23 at 12:22 -0800, Andrew Morton wrote:
> On Mon, 9 Feb 2015 15:45:28 -0700 Toshi Kani <[email protected]> wrote:
>
> > ioremap() and its related interfaces are used to create I/O
> > mappings to memory-mapped I/O devices. The mapping sizes of
> > the traditional I/O devices are relatively small. Non-volatile
> > memory (NVM), however, has many GB and is going to have TB soon.
> > It is not very efficient to create large I/O mappings with 4KB.
>
> The changelogging is very good - thanks for taking the time to do this.
>
> > This patchset extends the ioremap() interfaces to transparently
> > create I/O mappings with huge pages whenever possible.
>
> I'm wondering if this is prudent. Existing code which was tested with
> 4k mappings will magically start to use huge tlb mappings. I don't
> know what could go wrong, but I'd prefer not to find out! Wouldn't it
> be safer to make this an explicit opt-in?

There were related discussions on this. This v2 patchset actually has
CONFIG_HUGE_IOMAP, which allows user to select this feature. As
suggested in the thread below, I am going to remove this
CONFIG_HUGE_IOMAP, so that it will be simpler and similar to how we
create huge mappings to the kernel itself. If bugs are found, they will
be fixed.
https://lkml.org/lkml/2015/2/18/677

> What operations can presently be performed against an ioremapped area?
> Can kernel code perform change_page_attr() against individual pages?
> Can kernel code run iounmap() against just part of that region (I
> forget). There does seem to be potential for breakage if we start
> using hugetlb mappings for such things?

Yes, kernel code can use the CPA interfaces, such as set_memory_x() and
set_memory_ro() to an ioremapped area. CPA breaks a huge page to
smaller pages. I have included them into my test cases and confirmed
they work. (Note, memory type change interfaces, such as
set_memory_uc() and set_memory_wc(), are not supported to an ioremapped
area regardless of their page size.)

iounmap() only takes a single argument, virtual base addr. It looks up
the corresponding vm area object from the virt addr, and always removes
the entire mapping.

> > ioremap()
> > continues to use 4KB mappings when a huge page does not fit into
> > a requested range. There is no change necessary to the drivers
> > using ioremap(). A requested physical address must be aligned by
> > a huge page size (1GB or 2MB on x86) for using huge page mapping,
> > though. The kernel huge I/O mapping will improve performance of
> > NVM and other devices with large memory, and reduce the time to
> > create their mappings as well.
> >
> > On x86, the huge I/O mapping may not be used when a target range is
> > covered by multiple MTRRs with different memory types. The caller
> > must make a separate request for each MTRR range, or the huge I/O
> > mapping can be disabled with the kernel boot option "nohugeiomap".
> > The detail of this issue is described in the email below, and this
> > patch takes option C) in favor of simplicity since MTRRs are legacy
> > feature.
> > https://lkml.org/lkml/2015/2/5/638
>
> How is this mtrr clash handled?
>
> - The iomap call will fail if there are any MTRRs covering the region?
>
> - The iomap call will fail if there are more than one MTRRs covering
> the region?
>
> - If the ioremap will succeed if a single MTRR covers the region,
> must that MTRR cover the *entire* region?
>
> - What happens if userspace tried fiddling the MTRRs after the region
> has been established?
>
> <reads the code>

This issue was also discussed in the same thread:
https://lkml.org/lkml/2015/2/18/677

I am going to implement option D -- the iomap call will fail if there
are more than one MTRRs with "different types" covering the region.

> Oh. We don't do any checking at all. We're just telling userspace
> programmers "don't do that". hrm. What are your thoughts on adding
> the overlap checks to the kernel?
>
> This adds more potential for breaking existing code, doesn't it? If
> there's code which is using 4k ioremap on regions which are covered by
> mtrrs, the transparent switch to hugeptes will cause that code to enter
> the "undefined behaviour" space?

Yes, I agree with your concern, and I am going to add the check. I do
not think we have such platform today, and will be affected by this
change, though.

> > The patchset introduces the following configs:
> > HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
> > Require HAVE_ARCH_HUGE_VMAP set.
> > HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
> > Require X86_PAE set on X86_32.
> >
> > Patch 1-4 changes common files to support huge I/O mappings. There
> > is no change in the functinalities until HUGE_IOMAP is set in patch 7.
> >
> > Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
> > and set HAVE_ARCH_HUGE_VMAP on x86.
> >
> > Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
> > x86.
>
> What do other architectures need to do to utilize this?

Other architectures can implement their version of patch 5/7 and 6/7 to
utilize this feature.

Thanks,
-Toshi

2015-02-24 08:09:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] Kernel huge I/O mapping support


* Andrew Morton <[email protected]> wrote:

> <reads the code>
>
> Oh. We don't do any checking at all. We're just telling
> userspace programmers "don't do that". hrm. What are
> your thoughts on adding the overlap checks to the kernel?

I have requested such sanity checking in previous review as
well, it has to be made fool-proof for this optimization to
be usable.

Another alternative would be to make this not a transparent
optimization, but a separate API: ioremap_hugepage() or so.

The devices and drivers dealing with GBs of remapped pages
is still relatively low, so they could make explicit use of
the API and opt in to it.

What I was arguing against was to make it a CONFIG_ option:
that achieves very little in practice, such APIs should be
uniformly available.

Thanks,

Ingo