Is it possible to constrain the FIXADDR range on x86/x86-64
(FIXADDR_START-FIXADDR_TOP) such that the entire range is read-only by
user-level? If so, we could simplify the permission test like this:
===== mm/memory.c 1.124 vs edited =====
--- 1.124/mm/memory.c Wed May 14 00:53:07 2003
+++ edited/mm/memory.c Thu Jun 12 18:08:42 2003
@@ -714,8 +714,7 @@
if (!pmd)
return i ? : -EFAULT;
pte = pte_offset_kernel(pmd, pg);
- if (!pte || !pte_present(*pte) || !pte_user(*pte) ||
- !(write ? pte_write(*pte) : pte_read(*pte)))
+ if (!pte || !pte_present(*pte) || write)
return i ? : -EFAULT;
if (pages) {
pages[i] = pte_page(*pte);
Advantages:
- simpler code
- gets rid of pte_user() (which was introduced just for this purpose)
- lets gdb work better on arches which use execute-only page for
privilege promotion
I can live with the existing code, but for ia64, it would be useful to
have this patch in place, because otherwise, the gate page used for
lightweight system calls and the signal trampoline is not readable via
ptrace() (that page must be mapped as EXECUTE-only, because otherwise
the SYSENTER-equivalent of ia64, called "epc", cannot be used).
(Note, there is no security issue here, because the EXECUTE-only page
only contains code or ELF-related constant data.)
I considered changing the PTE-checking, but it gets really tricky,
because on many platforms, the privilege-level and the
access-permission bits are not fully orthogonal (for example, the
EXECUTE-only page is actually a kernel-owned page, but it's still
executable at the user level). In the end I decided that the whole
purpose of the FIXADDR range stuff is to _allow_ user-level access to
that range, so if the range is chosen properly, it should be OK to
allow reads without further pte_read() access checking. (If writes
needs to be supported, we would have to add back the pte_user() &&
pte_write() checks).
--david
> Is it possible to constrain the FIXADDR range on x86/x86-64
> (FIXADDR_START-FIXADDR_TOP) such that the entire range is read-only by
> user-level?
The fixmap area is used for other kernel-only mappings for things that I
doubt should be exposed to users, not just user-accessible pages. At the
moment, the vsyscall page is the only user-accessible page in the fixmap
area. I wrote the get_user_pages change to be as generic as possible, so
it would do the right thing if other uses of the fixmap area were added.
Your patch makes the various other kernel-internal fixmap pages readable by
users, which is not right.
The pte_user predicate was added just for this purpose. It seems
reasonable to me to replace its use with a new pair of predicates,
pte_user_read and pte_user_write, whose meaning is clearly specified for
precisely this purpose. That is, those predicates check whether a user
process should be allowed to read/write the page via something like ptrace.
That's the obvious idea to me. But I have no special opinions about this
stuff myself. The current code is as it is because that's what Linus wanted.
Thanks,
Roland
>>>>> On Thu, 12 Jun 2003 19:07:40 -0700, Roland McGrath <[email protected]> said:
Roland> The pte_user predicate was added just for this purpose. It
Roland> seems reasonable to me to replace its use with a new pair of
Roland> predicates, pte_user_read and pte_user_write, whose meaning
Roland> is clearly specified for precisely this purpose. That is,
Roland> those predicates check whether a user process should be
Roland> allowed to read/write the page via something like ptrace.
Roland> That's the obvious idea to me. But I have no special
Roland> opinions about this stuff myself. The current code is as it
Roland> is because that's what Linus wanted.
I considered a pte_user_read()/pte_user_write()-like approach, but
rejected it. First of all, it doesn't really help with execute-only
pages. Of course, we could add a pte_user_exec() and treat those
pages as readable, but that's not a good solution: just because we
want to make the gate page readable via ptrace() doesn't mean that we
want _all_ execute-only pages to be readable (it wouldn't make a
difference today, but I'm worried about someone adding other
execute-only pages further down the road, not being aware that
ptrace() would cause a potential security problem).
For ia64, I think we really want to say: if it's accessing the gate
page, allow reads. There is just no way we can infer that from
looking at the PTE itself.
Is there really a point in allowing other FIXMAP pages to be read via
ptrace() on x86?
--david
On Thu, 12 Jun 2003, David Mosberger wrote:
>
> Is it possible to constrain the FIXADDR range on x86/x86-64
> (FIXADDR_START-FIXADDR_TOP) such that the entire range is read-only by
> user-level? If so, we could simplify the permission test like this:
Well, you could replace the uses of FIXADDR_START/FIXADDR_TOP with
something like FIXADDR_USER_START/FIXADDR_USER_TOP, and then force those
to cover only the _one_ user-accessible page.
Something like
#define FIXADDR_USER_START (fix_to_virt(FIX_VSYSCALL))
#define FIXADDR_USER_END (FIXADDR_USER_START + PAGE_SIZE)
should work. In that case you can drop the page table testing, since we
"know" it is safe.
But I'm too lazy to test, so please send a tested patch,
Linus
> I considered a pte_user_read()/pte_user_write()-like approach, but
> rejected it. First of all, it doesn't really help with execute-only
> pages.
The definition I gave was "should be readable by ptrace", and so it works
if that's how you categorize all execute-only pages. But...
> [...], but I'm worried about someone adding other
> execute-only pages further down the road, not being aware that
> ptrace() would cause a potential security problem).
Given that concern, I'll agree with your assessment.
> For ia64, I think we really want to say: if it's accessing the gate
> page, allow reads. There is just no way we can infer that from
> looking at the PTE itself.
>
> Is there really a point in allowing other FIXMAP pages to be read via
> ptrace() on x86?
Currently, none are (because pte_user is only true of the vsyscall page).
For each individual arch, it seems reasonable enough to me to just have
special cases rather than testing the page tables. Rather than the code
now in #ifdef FIXADDR_START it could just call an arch_get_user_pages
function to check for magic user addresses without vmas.
Linus's suggested change is obviously the minimal change from what we have
now. But the arch_get_user_pages idea might be the more conservative
implementation compared to the status quo before my get_user_pages patch.
Thanks,
Roland
>>>>> On Thu, 12 Jun 2003 23:34:35 -0700, Roland McGrath <[email protected]> said:
Roland> Linus's suggested change is obviously the minimal change
Roland> from what we have now. But the arch_get_user_pages idea
Roland> might be the more conservative implementation compared to
Roland> the status quo before my get_user_pages patch.
Could you test Linus' proposal? It would definitely do the trick for
ia64.
--david
> Could you test Linus' proposal? It would definitely do the trick for ia64.
This patch vs 2.5.70 works fine for me on x86. include/asm-x86_64/fixmap.h
needs the macros added for its vsyscall page too.
Enjoy,
Roland
--- linux-2.5.70/include/asm-i386/fixmap.h.~1~ Mon May 26 18:00:21 2003
+++ linux-2.5.70/include/asm-i386/fixmap.h Thu Jun 12 23:46:11 2003
@@ -107,6 +107,14 @@ extern void __set_fixmap (enum fixed_add
#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT))
#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT)
+/*
+ * This is the range that is readable by user mode, and things
+ * acting like user mode such as get_user_pages.
+ */
+#define FIXADDR_USER_START (__fix_to_virt(FIX_VSYSCALL))
+#define FIXADDR_USER_END (FIXADDR_USER_START + PAGE_SIZE)
+
+
extern void __this_fixmap_does_not_exist(void);
/*
--- linux-2.5.70/mm/memory.c.~1~ Mon May 26 18:00:39 2003
+++ linux-2.5.70/mm/memory.c Thu Jun 12 23:41:04 2003
@@ -689,15 +689,16 @@ int get_user_pages(struct task_struct *t
vma = find_extend_vma(mm, start);
-#ifdef FIXADDR_START
- if (!vma && start >= FIXADDR_START && start < FIXADDR_TOP) {
+#ifdef FIXADDR_USER_START
+ if (!vma &&
+ start >= FIXADDR_USER_START && start < FIXADDR_USER_END) {
static struct vm_area_struct fixmap_vma = {
/* Catch users - if there are any valid
ones, we can make this be "&init_mm" or
something. */
.vm_mm = NULL,
- .vm_start = FIXADDR_START,
- .vm_end = FIXADDR_TOP,
+ .vm_start = FIXADDR_USER_START,
+ .vm_end = FIXADDR_USER_END,
.vm_page_prot = PAGE_READONLY,
.vm_flags = VM_READ | VM_EXEC,
};
@@ -705,6 +706,8 @@ int get_user_pages(struct task_struct *t
pgd_t *pgd;
pmd_t *pmd;
pte_t *pte;
+ if (write) /* user fixmap pages are read-only */
+ return i ? : -EFAULT;
pgd = pgd_offset_k(pg);
if (!pgd)
return i ? : -EFAULT;
@@ -712,8 +715,7 @@ int get_user_pages(struct task_struct *t
if (!pmd)
return i ? : -EFAULT;
pte = pte_offset_kernel(pmd, pg);
- if (!pte || !pte_present(*pte) || !pte_user(*pte) ||
- !(write ? pte_write(*pte) : pte_read(*pte)))
+ if (!pte || !pte_present(*pte))
return i ? : -EFAULT;
if (pages) {
pages[i] = pte_page(*pte);
Roland McGrath <[email protected]> wrote:
>
> static struct vm_area_struct fixmap_vma = {
> /* Catch users - if there are any valid
> ones, we can make this be "&init_mm" or
> something. */
> .vm_mm = NULL,
> - .vm_start = FIXADDR_START,
> - .vm_end = FIXADDR_TOP,
> + .vm_start = FIXADDR_USER_START,
> + .vm_end = FIXADDR_USER_END,
> .vm_page_prot = PAGE_READONLY,
> .vm_flags = VM_READ | VM_EXEC,
> };
Note that the current version of this code does not compile for User Mode
Linux. Its FIXADDR_TOP is not a constant. It would be nice to fix that
this time around.
It appears that this patch will break x86_64, parisc and um.
>>>>> On Thu, 12 Jun 2003 23:56:16 -0700, Roland McGrath <[email protected]> said:
>> Could you test Linus' proposal? It would definitely do the trick
>> for ia64.
Roland> This patch vs 2.5.70 works fine for me on x86.
Roland> include/asm-x86_64/fixmap.h needs the macros added for its
Roland> vsyscall page too.
I updated the ia64 tree accordingly.
Thanks!
--david
Hi Linus, all.
>> Is it possible to constrain the FIXADDR range on x86/x86-64
>> (FIXADDR_START-FIXADDR_TOP) such that the entire range is
>> read-only by user-level? If so, we could simplify the
>> permission test like this:
> Well, you could replace the uses of FIXADDR_START/FIXADDR_TOP
> with something like FIXADDR_USER_START/FIXADDR_USER_TOP, and
> then force those to cover only the _one_ user-accessible page.
>
> Something like
>
> #define FIXADDR_USER_START (fix_to_virt(FIX_VSYSCALL))
> #define FIXADDR_USER_END (FIXADDR_USER_START + PAGE_SIZE)
>
> should work. In that case you can drop the page table testing,
> since we "know" it is safe.
Should FIXADDR_USER_END point to the last byte of the relevant page,
or to the first byte of the following page as per Linus's suggestion?
The above looks like an off-by-one bug to me?
Best wishes from Riley.
---
* Nothing as pretty as a smile, nothing as ugly as a frown.
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.488 / Virus Database: 287 - Release Date: 5-Jun-2003
> > static struct vm_area_struct fixmap_vma = {
> > /* Catch users - if there are any valid
> > ones, we can make this be "&init_mm" or
> > something. */
> > .vm_mm = NULL,
> > - .vm_start = FIXADDR_START,
> > - .vm_end = FIXADDR_TOP,
> > + .vm_start = FIXADDR_USER_START,
> > + .vm_end = FIXADDR_USER_END,
> > .vm_page_prot = PAGE_READONLY,
> > .vm_flags = VM_READ | VM_EXEC,
> > };
>
> Note that the current version of this code does not compile for User Mode
> Linux. Its FIXADDR_TOP is not a constant. It would be nice to fix that
> this time around.
>
> It appears that this patch will break x86_64, parisc and um.
Its a problem on ppc64 too. I want to put the signal trampolines into
a fixmap area above the stack, ie different places on 32bit and 64bit
executables.
Anton