2008-02-12 12:53:43

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [1/2] CPA: Fix set_memory_x for ioremap v2


EFI currently calls set_memory_x() on addresses not in the direct mapping.

This is problematic for several reasons:

- The cpa code internally calls __pa on it which does not work for remapped
addresses and will give some random result. In the EFI case EFI passes
in a fixmap address which when run though __pa() gives a valid looking but
wrong memory address.
- cpa will try to change all potential aliases (like the kernel mapping
on x86-64), but that is not needed for NX because the caller does only needs
its specific virtual address executable. There is no requirement in the x86
architecture for nx bits to be coherent between mapping aliases. Also with the
previous problem of __pa returning a wrong address it would likely try to
change some random other page if you're unlucky and the random result would
match the kernel text range.

There would be several possible ways to fix this:
- Simply don't set the NX bit in the original ioremap and drop
set_memory_x and add a ioremap_exec(). That would be my preferred solution,
but unfortunately has been dismissed before
- Drop all __pas and always use the physical address derived
from the looked up PTE. This would need some significant restructuring
and would only fix the first problem above, not the second.
- Special case NX clear to change any aliases. I chose this one
because it happens to fix both problems, so is both a fix
and a optimization.

This implies that it's still not safe calling set_memory_(not x) on
any ioremaped/vmalloced/module addresses.

I don't have easy access to a EFI system so this is untested.

Cc: [email protected]

v2: Skip static_protections for the addronly case. This is needed
because static_protections calls __pa() too and gets a random result too.
static_protections is also not needed because all the bits it handles
can be already derived from the original PTE.
Improve description slightly.

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86/mm/pageattr.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

Index: linux/arch/x86/mm/pageattr.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr.c
+++ linux/arch/x86/mm/pageattr.c
@@ -28,6 +28,7 @@ struct cpa_data {
pgprot_t mask_clr;
int numpages;
int flushtlb;
+ int addronly;
};

static inline int
@@ -304,7 +305,8 @@ try_preserve_large_page(pte_t *kpte, uns

pgprot_val(new_prot) &= ~pgprot_val(cpa->mask_clr);
pgprot_val(new_prot) |= pgprot_val(cpa->mask_set);
- new_prot = static_protections(new_prot, address);
+ if (!cpa->addronly)
+ new_prot = static_protections(new_prot, address);

/*
* We need to check the full range, whether
@@ -610,7 +612,7 @@ static int change_page_attr_addr(struct
* fixup the low mapping first. __va() returns the virtual
* address in the linear mapping:
*/
- if (within(address, HIGH_MAP_START, HIGH_MAP_END))
+ if (within(address, HIGH_MAP_START, HIGH_MAP_END) && !cpa->addronly)
address = (unsigned long) __va(phys_addr);
#endif

@@ -623,7 +625,7 @@ static int change_page_attr_addr(struct
* If the physical address is inside the kernel map, we need
* to touch the high mapped kernel as well:
*/
- if (within(phys_addr, 0, KERNEL_TEXT_SIZE)) {
+ if (!cpa->addronly && within(phys_addr, 0, KERNEL_TEXT_SIZE)) {
/*
* Calc the high mapping address. See __phys_addr()
* for the non obvious details.
@@ -703,6 +705,8 @@ static int change_page_attr_set_clr(unsi
cpa.mask_set = mask_set;
cpa.mask_clr = mask_clr;
cpa.flushtlb = 0;
+ cpa.addronly = !pgprot_val(mask_set) &&
+ pgprot_val(mask_clr) == _PAGE_NX;

ret = __change_page_attr_set_clr(&cpa);


2008-02-12 12:54:00

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [2/2] Improve not NX check in i386 direct mapping setup


[This is a replacement for the direct mapping protections patchkit from
last week. It is simpler by implementing its own specialized duplicated logic
instead of adapting the existing pageattr logic to be usable for this case]

The i386 direct mapping support needs to avoid setting NX on pages
that are used by the kernel text. This patch improves the checks
for that case.

- Check the correct range of kernel text not including data only
sections at the end.
- Check the beginning too because there could be quite some pages
before the kernel text on a high loaded kernel
(previously all pages)
- Add a special check for the PCI-BIOS range in the first MB
that needs to be executable.
- Drop the inline for that function because it is larger now.

Result will be less pages in the direct mapping without NX
in some cases (particular with relocatable kernel loaded high)
giving a very minor increase of security against buffer overflows
in the kernel.

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86/kernel/vmlinux_32.lds.S | 1 +
arch/x86/mm/init_32.c | 14 +++++++++-----
2 files changed, 10 insertions(+), 5 deletions(-)

Index: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -143,9 +143,14 @@ page_table_range_init(unsigned long star
}
}

-static inline int is_kernel_text(unsigned long addr)
+extern char __end_text[];
+
+static int __init is_kernel_text(unsigned long start, unsigned long end)
{
- if (addr >= PAGE_OFFSET && addr <= (unsigned long)__init_end)
+ if (end >= (unsigned long)_text && start < (unsigned long)__end_text)
+ return 1;
+ /* Allow execution of 32bit BIOS */
+ if (end >= BIOS_BEGIN && start < BIOS_END)
return 1;
return 0;
}
@@ -188,8 +193,7 @@ static void __init kernel_physical_mappi
addr2 = (pfn + PTRS_PER_PTE-1) * PAGE_SIZE +
PAGE_OFFSET + PAGE_SIZE-1;

- if (is_kernel_text(addr) ||
- is_kernel_text(addr2))
+ if (is_kernel_text(addr, addr2))
prot = PAGE_KERNEL_LARGE_EXEC;

set_pmd(pmd, pfn_pmd(pfn, prot));
@@ -205,7 +209,7 @@ static void __init kernel_physical_mappi
pte++, pfn++, pte_ofs++, addr += PAGE_SIZE) {
pgprot_t prot = PAGE_KERNEL;

- if (is_kernel_text(addr))
+ if (is_kernel_text(addr, addr + PAGE_SIZE - 1))
prot = PAGE_KERNEL_EXEC;

set_pte(pte, pfn_pte(pfn, prot));
Index: linux/arch/x86/kernel/vmlinux_32.lds.S
===================================================================
--- linux.orig/arch/x86/kernel/vmlinux_32.lds.S
+++ linux/arch/x86/kernel/vmlinux_32.lds.S
@@ -165,6 +165,7 @@ SECTIONS
*(.parainstructions)
__parainstructions_end = .;
}
+ __end_text = .;
/* .exit.text is discard at runtime, not link time, to deal with references
from .altinstructions and .eh_frame */
.exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {

2008-02-12 23:03:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] [1/2] CPA: Fix set_memory_x for ioremap v2

On Tue, 12 Feb 2008, Andi Kleen wrote:
> There would be several possible ways to fix this:
> - Simply don't set the NX bit in the original ioremap and drop
> set_memory_x and add a ioremap_exec(). That would be my preferred solution,
> but unfortunately has been dismissed before
> - Drop all __pas and always use the physical address derived
> from the looked up PTE. This would need some significant restructuring
> and would only fix the first problem above, not the second.
> - Special case NX clear to change any aliases. I chose this one
> because it happens to fix both problems, so is both a fix
> and a optimization.
>
> This implies that it's still not safe calling set_memory_(not x) on
> any ioremaped/vmalloced/module addresses.

There is another option:

- Fix it proper.

The so-called "significant restructuring" took a mere 2 hours,
which is probably less than the time consumed in this thread.

http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=shortlog;h=mm

Thanks,

tglx

2008-02-13 11:26:34

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [1/2] CPA: Fix set_memory_x for ioremap v2



> The so-called "significant restructuring" took a mere 2 hours,
> which is probably less than the time consumed in this thread.

Hmm, it doesn't do what I meant and I don't think you solved
the problem. You still check against the vaddrs
which won't work for ioremaps or fixmap (and thus not
fix the EFI cases)

What I meant with restructuring is calling lookup_address() early,
get the physical address from the PTE, then check that against
the alias ranges in physical. That would actually work for fixmaps
and ioremaps and all other mappings too.

I'm sure it can be all done, but for me it to submit such
a change would likely require weeks of thread like this so
I'm not trying.

-Andi