2008-01-03 17:27:10

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [0/8] GB pages (PDP1GB) support for the kernel direct mapping


This patchkit implement GB pages support for AMD Fam10h CPUs. This patchkit only
implements it for the kernel direct mapping for now; support for hugetlbfs is upcomming.

This allows to map the kernel direct mapping using 1GB TLBs instead of 2MB
TLBs and get hopefully less TLB misses for the kernel.

The GB pages are only implemented for 64bit (because the CPU only implements
them for long mode) and also only for data pages (because Fam10h doesn't have GB ITLBs
and AMD recommends against running code in them)

There is an option to turn them off (direct_gbpages=off), although I hope that
won't be needed.

Also includes one generic bug fix for clear_page_kernel.

-Andi


2008-01-03 17:27:29

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [1/8] GBPAGES: Handle kernel near memory hole in clear_kernel_mapping


This was a long standing obscure problem in the relocatable kernel. The
AMD GART driver needs to unmap part of the GART in the kernel direct mapping to
prevent cache corruption. With the relocatable kernel it is in theory possible
that the separate kernel text mapping straddles that area too.

Normally it should not happen because GART tends to be >= 2GB, and the kernel
is normally not loaded that high, but it is possible in theory.

Teach clear_kernel_mapping() about this case.

This will become more important once the kernel mapping uses 1GB pages.

Cc: [email protected]
Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86/mm/init_64.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -411,7 +411,8 @@ void __init paging_init(void)
from the CPU leading to inconsistent cache lines. address and size
must be aligned to 2MB boundaries.
Does nothing when the mapping doesn't exist. */
-void __init clear_kernel_mapping(unsigned long address, unsigned long size)
+static void __init
+__clear_kernel_mapping(unsigned long address, unsigned long size)
{
unsigned long end = address + size;

@@ -441,6 +442,23 @@ void __init clear_kernel_mapping(unsigne
__flush_tlb_all();
}

+#define overlaps(as,ae,bs,be) ((ae) >= (bs) && (as) <= (be))
+
+void __init clear_kernel_mapping(unsigned long address, unsigned long size)
+{
+ int sh = PMD_SHIFT;
+ unsigned long kernel = __pa(__START_KERNEL_map);
+
+ if (overlaps(kernel>>sh, (kernel + KERNEL_TEXT_SIZE)>>sh,
+ __pa(address)>>sh, __pa(address + size)>>sh)) {
+ printk(KERN_INFO
+ "Kernel at %lx overlaps memory hole at %lx-%lx\n",
+ kernel, __pa(address), __pa(address+size));
+ __clear_kernel_mapping(__START_KERNEL_map+__pa(address), size);
+ }
+ __clear_kernel_mapping(address, size);
+}
+
/*
* Memory hotplug specific functions
*/

2008-01-03 17:27:43

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [2/8] GBPAGES: Add feature macros for the gbpages cpuid bit


Signed-off-by: Andi Kleen <[email protected]>

---
include/asm-x86/cpufeature.h | 2 ++
1 file changed, 2 insertions(+)

Index: linux/include/asm-x86/cpufeature.h
===================================================================
--- linux.orig/include/asm-x86/cpufeature.h
+++ linux/include/asm-x86/cpufeature.h
@@ -49,6 +49,7 @@
#define X86_FEATURE_MP (1*32+19) /* MP Capable. */
#define X86_FEATURE_NX (1*32+20) /* Execute Disable */
#define X86_FEATURE_MMXEXT (1*32+22) /* AMD MMX extensions */
+#define X86_FEATURE_GBPAGES (1*32+26) /* GB pages */
#define X86_FEATURE_RDTSCP (1*32+27) /* RDTSCP */
#define X86_FEATURE_LM (1*32+29) /* Long Mode (x86-64) */
#define X86_FEATURE_3DNOWEXT (1*32+30) /* AMD 3DNow! extensions */
@@ -168,6 +169,7 @@
#define cpu_has_clflush boot_cpu_has(X86_FEATURE_CLFLSH)
#define cpu_has_bts boot_cpu_has(X86_FEATURE_BTS)
#define cpu_has_ss boot_cpu_has(X86_FEATURE_SELFSNOOP)
+#define cpu_has_gbpages boot_cpu_has(X86_FEATURE_GBPAGES)

#if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
# define cpu_has_invlpg 1

2008-01-03 17:27:58

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [3/8] GBPAGES: Split LARGE_PAGE_SIZE/MASK into PUD_PAGE_SIZE/PMD_PAGE_SIZE


Split the existing LARGE_PAGE_SIZE/MASK macro into two new macros
PUD_PAGE_SIZE/MASK and PMD_PAGE_SIZE/MASK.

Fix up all callers to use the new names.

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86/boot/compressed/head_64.S | 4 ++--
arch/x86/kernel/head_64.S | 4 ++--
arch/x86/kernel/pci-gart_64.c | 2 +-
arch/x86/mm/init_64.c | 6 +++---
arch/x86/mm/pageattr_64.c | 4 ++--
include/asm-x86/page_64.h | 7 +++++--
6 files changed, 15 insertions(+), 12 deletions(-)

Index: linux/include/asm-x86/page_64.h
===================================================================
--- linux.orig/include/asm-x86/page_64.h
+++ linux/include/asm-x86/page_64.h
@@ -29,8 +29,11 @@
#define MCE_STACK 5
#define N_EXCEPTION_STACKS 5 /* hw limit: 7 */

-#define LARGE_PAGE_MASK (~(LARGE_PAGE_SIZE-1))
-#define LARGE_PAGE_SIZE (_AC(1,UL) << PMD_SHIFT)
+#define PMD_PAGE_SIZE (_AC(1,UL) << PMD_SHIFT)
+#define PMD_PAGE_MASK (~(PMD_PAGE_SIZE-1))
+
+#define PUD_PAGE_SIZE (_AC(1,UL) << PUD_SHIFT)
+#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))

#define HPAGE_SHIFT PMD_SHIFT
#define HPAGE_SIZE (_AC(1,UL) << HPAGE_SHIFT)
Index: linux/arch/x86/boot/compressed/head_64.S
===================================================================
--- linux.orig/arch/x86/boot/compressed/head_64.S
+++ linux/arch/x86/boot/compressed/head_64.S
@@ -80,8 +80,8 @@ startup_32:

#ifdef CONFIG_RELOCATABLE
movl %ebp, %ebx
- addl $(LARGE_PAGE_SIZE -1), %ebx
- andl $LARGE_PAGE_MASK, %ebx
+ addl $(PMD_PAGE_SIZE -1), %ebx
+ andl $PMD_PAGE_MASK, %ebx
#else
movl $CONFIG_PHYSICAL_START, %ebx
#endif
Index: linux/arch/x86/kernel/pci-gart_64.c
===================================================================
--- linux.orig/arch/x86/kernel/pci-gart_64.c
+++ linux/arch/x86/kernel/pci-gart_64.c
@@ -501,7 +501,7 @@ static __init unsigned long check_iommu_
}

a = aper + iommu_size;
- iommu_size -= round_up(a, LARGE_PAGE_SIZE) - a;
+ iommu_size -= round_up(a, PMD_PAGE_SIZE) - a;

if (iommu_size < 64*1024*1024) {
printk(KERN_WARNING
Index: linux/arch/x86/kernel/head_64.S
===================================================================
--- linux.orig/arch/x86/kernel/head_64.S
+++ linux/arch/x86/kernel/head_64.S
@@ -63,7 +63,7 @@ startup_64:

/* Is the address not 2M aligned? */
movq %rbp, %rax
- andl $~LARGE_PAGE_MASK, %eax
+ andl $~PMD_PAGE_MASK, %eax
testl %eax, %eax
jnz bad_address

@@ -88,7 +88,7 @@ startup_64:

/* Add an Identity mapping if I am above 1G */
leaq _text(%rip), %rdi
- andq $LARGE_PAGE_MASK, %rdi
+ andq $PMD_PAGE_MASK, %rdi

movq %rdi, %rax
shrq $PUD_SHIFT, %rax
Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -416,10 +416,10 @@ __clear_kernel_mapping(unsigned long add
{
unsigned long end = address + size;

- BUG_ON(address & ~LARGE_PAGE_MASK);
- BUG_ON(size & ~LARGE_PAGE_MASK);
+ BUG_ON(address & ~PMD_PAGE_MASK);
+ BUG_ON(size & ~PMD_PAGE_MASK);

- for (; address < end; address += LARGE_PAGE_SIZE) {
+ for (; address < end; address += PMD_PAGE_SIZE) {
pgd_t *pgd = pgd_offset_k(address);
pud_t *pud;
pmd_t *pmd;
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -70,7 +70,7 @@ static struct page *split_large_page(uns
page_private(base) = 0;

address = __pa(address);
- addr = address & LARGE_PAGE_MASK;
+ addr = address & PMD_PAGE_MASK;
pbase = (pte_t *)page_address(base);
for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
pbase[i] = pfn_pte(addr >> PAGE_SHIFT,
@@ -150,7 +150,7 @@ static void revert_page(unsigned long ad
BUG_ON(pud_none(*pud));
pmd = pmd_offset(pud, address);
BUG_ON(pmd_val(*pmd) & _PAGE_PSE);
- pfn = (__pa(address) & LARGE_PAGE_MASK) >> PAGE_SHIFT;
+ pfn = (__pa(address) & PMD_PAGE_MASK) >> PAGE_SHIFT;
large_pte = pfn_pte(pfn, ref_prot);
large_pte = pte_mkhuge(large_pte);
set_pte((pte_t *)pmd, large_pte);

2008-01-03 17:28:23

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [4/8] GBPAGES: Add pgtable accessor functions for GB pages


Signed-off-by: Andi Kleen <[email protected]>

---
include/asm-x86/pgtable_64.h | 3 +++
1 file changed, 3 insertions(+)

Index: linux/include/asm-x86/pgtable_64.h
===================================================================
--- linux.orig/include/asm-x86/pgtable_64.h
+++ linux/include/asm-x86/pgtable_64.h
@@ -320,6 +320,9 @@ static inline int pmd_large(pmd_t pte) {
return (pmd_val(pte) & __LARGE_PTE) == __LARGE_PTE;
}

+static inline int pud_large(pud_t pte) {
+ return (pud_val(pte) & __LARGE_PTE) == __LARGE_PTE;
+}

/*
* Conversion functions: convert a page and protection to a page entry,

2008-01-03 17:28:38

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [5/8] GBPAGES: Support gbpages in pagetable dump


Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86/mm/fault_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86/mm/fault_64.c
===================================================================
--- linux.orig/arch/x86/mm/fault_64.c
+++ linux/arch/x86/mm/fault_64.c
@@ -288,7 +288,7 @@ void dump_pagetable(unsigned long addres
pud = pud_offset(pgd, address);
if (bad_address(pud)) goto bad;
printk("PUD %lx ", pud_val(*pud));
- if (!pud_present(*pud)) goto ret;
+ if (!pud_present(*pud) || pud_large(*pud)) goto ret;

pmd = pmd_offset(pud, address);
if (bad_address(pmd)) goto bad;

2008-01-03 17:28:57

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [6/8] GBPAGES: Add an option to disable direct mapping gbpages and a global variable


Signed-off-by: Andi Kleen <[email protected]>

---
Documentation/x86_64/boot-options.txt | 3 +++
arch/x86/mm/init_64.c | 12 ++++++++++++
include/asm-x86/pgtable_64.h | 2 ++
3 files changed, 17 insertions(+)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -57,6 +57,18 @@ static unsigned long dma_reserve __initd

DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);

+int direct_gbpages;
+
+static int __init parse_direct_gbpages(char *arg)
+{
+ if (!strcmp(arg, "off")) {
+ direct_gbpages = -1;
+ return 0;
+ }
+ return -1;
+}
+early_param("direct_gbpages", parse_direct_gbpages);
+
/*
* NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the
* physical space so we can cache the place of the first one and move
Index: linux/include/asm-x86/pgtable_64.h
===================================================================
--- linux.orig/include/asm-x86/pgtable_64.h
+++ linux/include/asm-x86/pgtable_64.h
@@ -408,6 +408,8 @@ static inline pte_t pte_modify(pte_t pte
__changed; \
})

+extern int direct_gbpages;
+
/* Encode and de-code a swap entry */
#define __swp_type(x) (((x).val >> 1) & 0x3f)
#define __swp_offset(x) ((x).val >> 8)
Index: linux/Documentation/x86_64/boot-options.txt
===================================================================
--- linux.orig/Documentation/x86_64/boot-options.txt
+++ linux/Documentation/x86_64/boot-options.txt
@@ -307,3 +307,6 @@ Debugging
stuck (default)

Miscellaneous
+
+ direct_gbpages=off
+ Do not use GB pages for kernel direct mapping.

2008-01-03 17:29:23

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [7/8] GBPAGES: Implement GBpages support in change_page_attr()


Teach c_p_a() to split and unsplit GB pages.

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86/mm/pageattr_64.c | 149 ++++++++++++++++++++++++++++++++++++----------
1 file changed, 118 insertions(+), 31 deletions(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -14,6 +14,8 @@
#include <asm/io.h>
#include <asm/kdebug.h>

+#define Cprintk(x...)
+
enum flush_mode { FLUSH_NONE, FLUSH_CACHE, FLUSH_TLB };

struct flush {
@@ -40,6 +42,9 @@ pte_t *lookup_address(unsigned long addr
pud = pud_offset(pgd, address);
if (!pud_present(*pud))
return NULL;
+ *level = 2;
+ if (pud_large(*pud))
+ return (pte_t *)pud;
pmd = pmd_offset(pud, address);
if (!pmd_present(*pmd))
return NULL;
@@ -53,30 +58,88 @@ pte_t *lookup_address(unsigned long addr
return pte;
}

-static struct page *split_large_page(unsigned long address, pgprot_t prot,
- pgprot_t ref_prot)
-{
- int i;
+static pte_t *alloc_split_page(struct page **base)
+{
+ struct page *p = alloc_page(GFP_KERNEL);
+ if (!p)
+ return NULL;
+ SetPagePrivate(p);
+ page_private(p) = 0;
+ *base = p;
+ return page_address(p);
+}
+
+static struct page *free_split_page(struct page *base)
+{
+ BUG_ON(!PagePrivate(base));
+ BUG_ON(page_private(base) != 0);
+ ClearPagePrivate(base);
+ __free_page(base);
+ return NULL;
+}
+
+static struct page *
+split_pmd(unsigned long paddr, pgprot_t prot, pgprot_t ref_prot)
+{
+ int i;
unsigned long addr;
- struct page *base = alloc_pages(GFP_KERNEL, 0);
- pte_t *pbase;
- if (!base)
+ struct page *base;
+ pte_t *pbase = alloc_split_page(&base);
+ if (!pbase)
return NULL;
- /*
- * page_private is used to track the number of entries in
- * the page table page have non standard attributes.
- */
- SetPagePrivate(base);
- page_private(base) = 0;

- address = __pa(address);
- addr = address & PMD_PAGE_MASK;
- pbase = (pte_t *)page_address(base);
- for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
- pbase[i] = pfn_pte(addr >> PAGE_SHIFT,
- addr == address ? prot : ref_prot);
+ Cprintk("cpa split l3 paddr %lx\n", paddr);
+ addr = paddr & PMD_PAGE_MASK;
+ for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE)
+ pbase[i] = pfn_pte(addr >> PAGE_SHIFT,
+ addr == paddr ? prot : ref_prot);
+
+ return base;
+}
+
+static struct page *
+split_gb(unsigned long paddr, pgprot_t prot, pgprot_t ref_prot)
+{
+ unsigned long addr;
+ int i;
+ struct page *base;
+ pte_t *pbase = alloc_split_page(&base);
+
+ if (!pbase)
+ return NULL;
+ Cprintk("cpa split gb paddr %lx\n", paddr);
+ addr = paddr & PUD_PAGE_MASK;
+ for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_PAGE_SIZE) {
+ if (paddr >= addr && paddr < addr + PMD_PAGE_SIZE) {
+ struct page *l3;
+ l3 = split_pmd(paddr, prot, ref_prot);
+ if (!l3)
+ return free_split_page(base);
+ page_private(l3)++;
+ pbase[i] = mk_pte(l3, ref_prot);
+ } else {
+ pbase[i] = pfn_pte(addr>>PAGE_SHIFT, ref_prot);
+ pbase[i] = pte_mkhuge(pbase[i]);
+ }
}
return base;
+}
+
+static struct page *split_large_page(unsigned long address, pgprot_t prot,
+ pgprot_t ref_prot, int level)
+{
+ unsigned long paddr = __pa(address);
+ Cprintk("cpa splitting %lx level %d\n", address, level);
+ if (level == 2)
+ return split_gb(paddr, prot, ref_prot);
+ else if (level == 3)
+ return split_pmd(paddr, prot, ref_prot);
+ else {
+ printk("address %lx\n", address);
+ dump_pagetable(address);
+ BUG();
+ }
+ return NULL;
}

struct flush_arg {
@@ -132,17 +195,42 @@ static inline void save_page(struct page
list_add(&fpage->lru, &deferred_pages);
}

+static void reset_large_pte(pte_t *pte, unsigned long addr, pgprot_t prot)
+{
+ unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
+ set_pte(pte, pte_mkhuge(pfn_pte(pfn, prot)));
+}
+
+static void
+revert_gb(unsigned long address, pud_t *pud, pmd_t *pmd, pgprot_t ref_prot)
+{
+ struct page *p = virt_to_page(pmd);
+
+ /* Reserved pages means it has been already set up at boot. Don't touch those. */
+ if (PageReserved(p))
+ return;
+
+ Cprintk("cpa revert gb %lx count %ld\n", address, page_private(p));
+ --page_private(p);
+ BUG_ON(page_private(p) < 0);
+ if (page_private(p) == 0) {
+ save_page(p);
+ reset_large_pte((pte_t *)pud, address & PUD_PAGE_MASK, ref_prot);
+ }
+}
+
/*
* No more special protections in this 2MB area - revert to a
- * large page again.
+ * large or GB page again.
*/
+
static void revert_page(unsigned long address, pgprot_t ref_prot)
{
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
- pte_t large_pte;
- unsigned long pfn;
+
+ Cprintk("cpa revert %lx\n", address);

pgd = pgd_offset_k(address);
BUG_ON(pgd_none(*pgd));
@@ -150,10 +238,9 @@ static void revert_page(unsigned long ad
BUG_ON(pud_none(*pud));
pmd = pmd_offset(pud, address);
BUG_ON(pmd_val(*pmd) & _PAGE_PSE);
- pfn = (__pa(address) & PMD_PAGE_MASK) >> PAGE_SHIFT;
- large_pte = pfn_pte(pfn, ref_prot);
- large_pte = pte_mkhuge(large_pte);
- set_pte((pte_t *)pmd, large_pte);
+ reset_large_pte((pte_t *)pmd, address & PMD_PAGE_MASK, ref_prot);
+
+ revert_gb(address, pud, pmd, ref_prot);
}

/*
@@ -189,6 +276,7 @@ static void set_tlb_flush(unsigned long
static unsigned short pat_bit[5] = {
[4] = _PAGE_PAT,
[3] = _PAGE_PAT_LARGE,
+ [2] = _PAGE_PAT_LARGE,
};

static int cache_attr_changed(pte_t pte, pgprot_t prot, int level)
@@ -224,15 +312,14 @@ __change_page_attr(unsigned long address
page_private(kpte_page)++;
set_pte(kpte, pfn_pte(pfn, prot));
} else {
- /*
- * split_large_page will take the reference for this
- * change_page_attr on the split page.
- */
struct page *split;
ref_prot2 = pte_pgprot(pte_clrhuge(*kpte));
- split = split_large_page(address, prot, ref_prot2);
+ split = split_large_page(address, prot, ref_prot2,
+ level);
if (!split)
return -ENOMEM;
+ if (level == 3 && !PageReserved(kpte_page))
+ page_private(kpte_page)++;
pgprot_val(ref_prot2) &= ~_PAGE_NX;
set_pte(kpte, mk_pte(split, ref_prot2));
kpte_page = split;

2008-01-03 17:29:37

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [8/8] GBPAGES: Do kernel direct mapping at boot using GB pages


This should decrease TLB pressure because the kernel will need
less TLB faults for its own data access.

Only done for 64bit because i386 does not support GB page tables.

This only applies to the data portion of the direct mapping; the
kernel text mapping stays with 2MB pages because the AMD Fam10h
microarchitecture does not support GB ITLBs and AMD recommends
against using GB mappings for code.

Can be disabled with direct_gbpages=off

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86/mm/init_64.c | 63 ++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 54 insertions(+), 9 deletions(-)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -264,13 +264,20 @@ __meminit void early_iounmap(void *addr,
__flush_tlb();
}

+static unsigned long direct_entry(unsigned long paddr)
+{
+ unsigned long entry;
+ entry = __PAGE_KERNEL_LARGE|_PAGE_GLOBAL|paddr;
+ entry &= __supported_pte_mask;
+ return entry;
+}
+
static void __meminit
phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)
{
int i = pmd_index(address);

for (; i < PTRS_PER_PMD; i++, address += PMD_SIZE) {
- unsigned long entry;
pmd_t *pmd = pmd_page + pmd_index(address);

if (address >= end) {
@@ -283,9 +290,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned
if (pmd_val(*pmd))
continue;

- entry = __PAGE_KERNEL_LARGE|_PAGE_GLOBAL|address;
- entry &= __supported_pte_mask;
- set_pmd(pmd, __pmd(entry));
+ set_pmd(pmd, __pmd(direct_entry(address)));
}
}

@@ -318,7 +323,13 @@ static void __meminit phys_pud_init(pud_
}

if (pud_val(*pud)) {
- phys_pmd_update(pud, addr, end);
+ if (!pud_large(*pud))
+ phys_pmd_update(pud, addr, end);
+ continue;
+ }
+
+ if (direct_gbpages > 0) {
+ set_pud(pud, __pud(direct_entry(addr)));
continue;
}

@@ -337,9 +348,11 @@ static void __init find_early_table_spac
unsigned long puds, pmds, tables, start;

puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
- pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
- tables = round_up(puds * sizeof(pud_t), PAGE_SIZE) +
- round_up(pmds * sizeof(pmd_t), PAGE_SIZE);
+ tables = round_up(puds * sizeof(pud_t), PAGE_SIZE);
+ if (!direct_gbpages) {
+ pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
+ tables += round_up(pmds * sizeof(pmd_t), PAGE_SIZE);
+ }

/* RED-PEN putting page tables only on node 0 could
cause a hotspot and fill up ZONE_DMA. The page tables
@@ -372,8 +385,15 @@ void __init_refok init_memory_mapping(un
* mapped. Unfortunately this is done currently before the nodes are
* discovered.
*/
- if (!after_bootmem)
+ if (!after_bootmem) {
+ if (direct_gbpages >= 0 && cpu_has_gbpages) {
+ printk(KERN_INFO "Using GB pages for direct mapping\n");
+ direct_gbpages = 1;
+ } else
+ direct_gbpages = 0;
+
find_early_table_space(end);
+ }

start = (unsigned long)__va(start);
end = (unsigned long)__va(end);
@@ -419,6 +439,27 @@ void __init paging_init(void)
}
#endif

+static void split_gb_page(pud_t *pud, unsigned long paddr)
+{
+ int i;
+ pmd_t *pmd;
+ struct page *p = alloc_page(GFP_KERNEL);
+ if (!p)
+ return;
+
+ Dprintk("split_gb_page %lx\n", paddr);
+
+ SetPagePrivate(p);
+ /* Set reference to 1 so that c_p_a() does not undo it */
+ page_private(p) = 1;
+
+ paddr &= PUD_PAGE_MASK;
+ pmd = page_address(p);
+ for (i = 0; i < PTRS_PER_PTE; i++, paddr += PMD_PAGE_SIZE)
+ pmd[i] = __pmd(direct_entry(paddr));
+ pud_populate(NULL, pud, pmd);
+}
+
/* Unmap a kernel mapping if it exists. This is useful to avoid prefetches
from the CPU leading to inconsistent cache lines. address and size
must be aligned to 2MB boundaries.
@@ -430,6 +471,8 @@ __clear_kernel_mapping(unsigned long add

BUG_ON(address & ~PMD_PAGE_MASK);
BUG_ON(size & ~PMD_PAGE_MASK);
+
+ Dprintk("clear_kernel_mapping %lx-%lx\n", address, address+size);

for (; address < end; address += PMD_PAGE_SIZE) {
pgd_t *pgd = pgd_offset_k(address);
@@ -438,6 +481,8 @@ __clear_kernel_mapping(unsigned long add
if (pgd_none(*pgd))
continue;
pud = pud_offset(pgd, address);
+ if (pud_large(*pud))
+ split_gb_page(pud, __pa(address));
if (pud_none(*pud))
continue;
pmd = pmd_offset(pud, address);

2008-01-03 18:29:46

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] [1/8] GBPAGES: Handle kernel near memory hole in clear_kernel_mapping

On Thu, Jan 03, 2008 at 06:26:57PM +0100, Andi Kleen wrote:
>
> This was a long standing obscure problem in the relocatable kernel. The
> AMD GART driver needs to unmap part of the GART in the kernel direct mapping to
> prevent cache corruption. With the relocatable kernel it is in theory possible
> that the separate kernel text mapping straddles that area too.
>
> Normally it should not happen because GART tends to be >= 2GB, and the kernel
> is normally not loaded that high, but it is possible in theory.
>
> Teach clear_kernel_mapping() about this case.
>
> This will become more important once the kernel mapping uses 1GB pages.
>
> Cc: [email protected]
> Cc: [email protected]
>
> Signed-off-by: Andi Kleen <[email protected]>
>
> ---
> arch/x86/mm/init_64.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> Index: linux/arch/x86/mm/init_64.c
> ===================================================================
> --- linux.orig/arch/x86/mm/init_64.c
> +++ linux/arch/x86/mm/init_64.c
> @@ -411,7 +411,8 @@ void __init paging_init(void)
> from the CPU leading to inconsistent cache lines. address and size
> must be aligned to 2MB boundaries.
> Does nothing when the mapping doesn't exist. */
> -void __init clear_kernel_mapping(unsigned long address, unsigned long size)
> +static void __init
> +__clear_kernel_mapping(unsigned long address, unsigned long size)
> {
> unsigned long end = address + size;
>
> @@ -441,6 +442,23 @@ void __init clear_kernel_mapping(unsigne
> __flush_tlb_all();
> }
>
> +#define overlaps(as,ae,bs,be) ((ae) >= (bs) && (as) <= (be))
> +
> +void __init clear_kernel_mapping(unsigned long address, unsigned long size)
> +{
> + int sh = PMD_SHIFT;
> + unsigned long kernel = __pa(__START_KERNEL_map);
> +
> + if (overlaps(kernel>>sh, (kernel + KERNEL_TEXT_SIZE)>>sh,
> + __pa(address)>>sh, __pa(address + size)>>sh)) {
> + printk(KERN_INFO
> + "Kernel at %lx overlaps memory hole at %lx-%lx\n",
> + kernel, __pa(address), __pa(address+size));
> + __clear_kernel_mapping(__START_KERNEL_map+__pa(address), size);

Hi Andi,

Got a question. How will kernel continue to run if we unmap the kernel
text/data region mappings?

Thanks
Vivek

2008-01-03 18:43:57

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [1/8] GBPAGES: Handle kernel near memory hole in clear_kernel_mapping


> Got a question. How will kernel continue to run if we unmap the kernel
> text/data region mappings?

Normally it shouldn't be in the same 2MB area as the aperture (which
is the only thing that is unmapped). The problem is mostly
the rest of the 40MB kernel mapping.

-Andi

2008-01-03 19:03:50

by Nish Aravamudan

[permalink] [raw]
Subject: Re: [PATCH] [6/8] GBPAGES: Add an option to disable direct mapping gbpages and a global variable

On 1/3/08, Andi Kleen <[email protected]> wrote:
>
> Signed-off-by: Andi Kleen <[email protected]>

<snip>

> Index: linux/Documentation/x86_64/boot-options.txt
> ===================================================================
> --- linux.orig/Documentation/x86_64/boot-options.txt
> +++ linux/Documentation/x86_64/boot-options.txt
> @@ -307,3 +307,6 @@ Debugging
> stuck (default)
>
> Miscellaneous
> +
> + direct_gbpages=off
> + Do not use GB pages for kernel direct mapping.

Sorry if this is a FAQ, but why do we have this file in addition to
kernel-parameters.txt? I see that kernel-parameters.txt refers to this
file, so I guess it's ok, but shouldn't we try to consolidate?

Thanks,
Nish