2005-09-14 21:23:28

by Andrea Arcangeli

[permalink] [raw]
Subject: ptrace can't be transparent on readonly MAP_SHARED

Hello,

This patch went in recently:

http://kernel.org/hg/linux-2.6/?cmd=changeset;node=04131690c4cf45c50aedc8bec2366198e52eaf4e

I think it's wrong to allow exceptions to the rule in handle_mm_fault
(i.e. to allow ptes to be marked readonly after a write fault). The
current band-aid in get_user_pages that checks the VM_FAULT_WRITE is
just a side effect of breaking that rule.

The real bug is the fact maybe_mkwrite exists in the first place, and
the other hacks in get_user_pages (I mean the lookup_write now replaced
by the VM_FAULT_WRITE hack that at least now doesn't lockup anymore).

As soon as you generate a write fault on a readonly pte, you broke
coherency with the disk data, so the app _can_ break no matter what a
kernel developer wants to do. In turn this means the userland developer
doing the debugging must know what is going on regardless of all this
unnnecessary complexity. No matter what you change in the kernel it'll
still break.

Of course we can't write to the pagecache either, so we should just cow
and leave the pte writeable, the coherency would be lost regardless,
what _pratical_ gain do we have from marking it read-write when it's not
pagecache anymore?

So I suggest to backout the patch:

http://kernel.org/hg/linux-2.6/?cmd=changeset;node=04131690c4cf45c50aedc8bec2366198e52eaf4e;style=raw

and replace with this below code that makes the code simpler and faster
as well (removes a branch from the fast path of the page fault handler
that never does anything except for the extremely unlikely and unfixable
ptrace case).

I'm unsure if the below patch applies cleanly after reversing the above
changeset, but I can cleanup if there's some agreement that this is the
way to go.

About generating anonymous pages on top of map_shared that should be
fine with the vm, the way anon-vma works, it already happens for
map_private and it's not conceputally different for anon-vma to deal
with overlap with map-shared or map-private. So I don't think we need to
forbid ptrace (i.e. gdb) to write to a readonly map shared or stuff like
that.

Comments welcome. (especially if you see any bug in my simpler approach
please let me know because that's how I fixed the DoS in some kernel ;)
thanks!

----------------------------------

From: Andrea Arcangeli <[email protected]>
Subject: better fix, no need of VM_FAULT_RACE

readonly mappings will break regardless of what we do, by the time the
cow happened we lost coherency and the app can break randomly. So I
don't think it worth the complexity and speecial case of break_cow
marking the pte readonly. Making it read-write looks cleaner and simpler
and it's the natural effect that we _wrote_ and cowed the page.

This is the 2.4 way too.

Signed-off-by: Andrea Arcangeli <[email protected]>

memory.c | 41 ++++++++---------------------------------
1 files changed, 8 insertions(+), 33 deletions(-)

Index: sp2/mm/memory.c
--- sp2/mm/memory.c.orig 2005-09-12 23:52:42.000000000 +0200
+++ sp2/mm/memory.c 2005-09-13 00:02:08.000000000 +0200
@@ -814,8 +814,7 @@ int get_user_pages(struct task_struct *t
spin_lock(&mm->page_table_lock);
do {
struct page *map;
- int lookup_write = write;
- while (!(map = follow_page(mm, start, lookup_write))) {
+ while (!(map = follow_page(mm, start, write))) {
/*
* Shortcut for anonymous pages. We don't want
* to force the creation of pages tables for
@@ -823,7 +822,7 @@ int get_user_pages(struct task_struct *t
* nobody touched so far. This is important
* for doing a core dump for these mappings.
*/
- if (!lookup_write &&
+ if (!write &&
untouched_anonymous_page(mm,vma,start)) {
map = ZERO_PAGE(start);
break;
@@ -843,14 +842,6 @@ int get_user_pages(struct task_struct *t
default:
BUG();
}
- /*
- * Now that we have performed a write fault
- * and surely no longer have a shared page we
- * shouldn't write, we shouldn't ignore an
- * unwritable page in the page table if
- * we are forcing write access.
- */
- lookup_write = write && !force;
spin_lock(&mm->page_table_lock);
}
if (pages) {
@@ -1042,19 +1033,6 @@ int remap_page_range(struct vm_area_stru
EXPORT_SYMBOL(remap_page_range);

/*
- * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when
- * servicing faults for write access. In the normal case, do always want
- * pte_mkwrite. But get_user_pages can cause write faults for mappings
- * that do not have writing enabled, when used by access_process_vm.
- */
-static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
-{
- if (likely(vma->vm_flags & VM_WRITE))
- pte = pte_mkwrite(pte);
- return pte;
-}
-
-/*
* We hold the mm semaphore for reading and vma->vm_mm->page_table_lock
*/
static inline void break_cow(struct vm_area_struct * vma, struct page * new_page, unsigned long address,
@@ -1063,8 +1041,7 @@ static inline void break_cow(struct vm_a
pte_t entry;

flush_cache_page(vma, address);
- entry = maybe_mkwrite(pte_mkdirty(mk_pte(new_page, vma->vm_page_prot)),
- vma);
+ entry = pte_mkwrite(pte_mkdirty(mk_pte(new_page, vma->vm_page_prot)));
ptep_establish(vma, address, page_table, entry);
update_mmu_cache(vma, address, entry);
lazy_mmu_prot_update(entry);
@@ -1116,8 +1093,7 @@ static int do_wp_page(struct mm_struct *
unlock_page(old_page);
if (reuse) {
flush_cache_page(vma, address);
- entry = maybe_mkwrite(pte_mkyoung(pte_mkdirty(pte)),
- vma);
+ entry = pte_mkwrite(pte_mkyoung(pte_mkdirty(pte)));
ptep_establish(vma, address, page_table, entry);
update_mmu_cache(vma, address, entry);
lazy_mmu_prot_update(entry);
@@ -1461,7 +1437,7 @@ static int do_swap_page(struct mm_struct

pte = mk_pte(page, vma->vm_page_prot);
if (write_access && can_share_swap_page(page))
- pte = maybe_mkwrite(pte_mkdirty(pte), vma);
+ pte = pte_mkwrite(pte_mkdirty(pte));
unlock_page(page);

flush_icache_page(vma, page);
@@ -1520,9 +1496,8 @@ do_anonymous_page(struct mm_struct *mm,
mm->rss++;
acct_update_integrals();
update_mem_hiwater();
- entry = maybe_mkwrite(pte_mkdirty(mk_pte(page,
- vma->vm_page_prot)),
- vma);
+ entry = pte_mkwrite(pte_mkdirty(mk_pte(page,
+ vma->vm_page_prot)));
lru_cache_add_active(page);
mark_page_accessed(page);
anon = 1;
@@ -1658,7 +1633,7 @@ retry:
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma->vm_page_prot);
if (write_access)
- entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+ entry = pte_mkwrite(pte_mkdirty(entry));
set_pte(page_table, entry);
if (likely(pageable))
page_add_rmap(new_page, vma, address, anon);


2005-09-15 04:05:21

by Nick Piggin

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED




>>>Andrea Arcangeli <[email protected]> 09/15/05 7:24 am >>>
>
>About generating anonymous pages on top of map_shared that should be
>fine with the vm, the way anon-vma works, it already happens for
>map_private and it's not conceputally different for anon-vma to deal
>with overlap with map-shared or map-private. So I don't think we need
to
>forbid ptrace (i.e. gdb) to write to a readonly map shared or stuff
like
>that.
>
>Comments welcome. (especially if you see any bug in my simpler approach

>please let me know because that's how I fixed the DoS in some kernel ;)

>thanks!
>

I like the look of the patch.

I would like to go one step further and simply disallow writing to
MAP_SHARED memory full stop. It eliminates so many corner cases and
weird behaviour (ie. after writing to a readonly MAP_SHARED, the process
will no longer see updates to the file).

Actually, maybe that's too much. I imagine on a shared memory
application
there would be use in changing writeable MAP_SHARED memory in a
debugger.
How about we disallow writing to readonly MAP_SHARED?

However going back a step - I still think Andrea's patch is nicer than
what we have now.

2005-09-15 13:19:10

by Hugh Dickins

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED

On Wed, 14 Sep 2005, Andrea Arcangeli wrote:
>
> This patch went in recently:
>
> http://kernel.org/hg/linux-2.6/?cmd=changeset;node=04131690c4cf45c50aedc8bec2366198e52eaf4e
>
> I think it's wrong to allow exceptions to the rule in handle_mm_fault
> (i.e. to allow ptes to be marked readonly after a write fault). The
> current band-aid in get_user_pages that checks the VM_FAULT_WRITE is
> just a side effect of breaking that rule.
>
> The real bug is the fact maybe_mkwrite exists in the first place, and
> the other hacks in get_user_pages (I mean the lookup_write now replaced
> by the VM_FAULT_WRITE hack that at least now doesn't lockup anymore).
>
> As soon as you generate a write fault on a readonly pte, you broke
> coherency with the disk data, so the app _can_ break no matter what a
> kernel developer wants to do. In turn this means the userland developer
> doing the debugging must know what is going on regardless of all this
> unnnecessary complexity. No matter what you change in the kernel it'll
> still break.
>
> Of course we can't write to the pagecache either, so we should just cow
> and leave the pte writeable, the coherency would be lost regardless,
> what _pratical_ gain do we have from marking it read-write when it's not
> pagecache anymore?
>
> [ suggests reverting 2.6.13's VM_FAULT_WRITE and 2.6.4's maybe_mkwrite ]

Just like you and Nick, I'd be happy to revert maybe_mkwrite, and our
recent patch to rescue it. But I don't know the pressures which led
to its introduction in the first place. CC'ed Roland who introduced
it in 2.6.4, and Linus who supported it in the recent discussions.

[PATCH] prevent ptrace from altering page permissions

From: Roland McGrath <[email protected]>

Under some circumstances, ptrace PEEK/POKE_TEXT can cause page permissions
to be permanently changed. Thsi causes changes in application behaviour
when run under gdb.

Fix that by only marking the pte as writeable if the vma is marked for
writing. A write fault thus unshares the page but doesn't necessarily make
it writeable.

Personally, I've little interest in whether the pte is left writable
or not (but prefer your simplicity of pte_mkwrite to maybe_mkwrite).
I'm more concerned about the anomaly of a page being written in an
vma marked unwritable. It's been like that for a long time, I'm not
getting around to thinking it through for now. But certainly it
subverts the Committed_AS calculations, and needs an audit for other
assumptions. I've wondered whether we actually need ptrace to mprotect
(so splitting the vma) to handle it correctly.

> About generating anonymous pages on top of map_shared that should be
> fine with the vm, the way anon-vma works, it already happens for
> map_private and it's not conceputally different for anon-vma to deal
> with overlap with map-shared or map-private. So I don't think we need to
> forbid ptrace (i.e. gdb) to write to a readonly map shared or stuff like
> that.

It's certainly a big surprise to find an anonymous page in a shared vma,
bigger than just writing to a page in an unwritable vma. But you're
quite right that it doesn't conflict with anon_vma design: just with
other assumptions elsewhere. For example, where Nick's recent patch to
copy_page_range tests anon_vma is okay; whereas my VM_LOCKED|VM_MAYSHARE
test in page_referenced_file is strictly wrong (the file page may now be
unlocked because it's been replaced by an anonymous copy); there are
probably other places more seriously wrong.

I think get_user_pages' test
flags = write ? (VM_WRITE | VM_MAYWRITE) : (VM_READ | VM_MAYREAD);
was probably written without any thought for MAP_SHARED. Because it allows
ptrace to modify a write-protected MAP_SHARED vma if and only if the file
was opened for writing; but since do_wp_page COWs in this case (I thought
that a mistake, but you and Linus insist we not write to the file), what
relevance has it whether the mmap'ed file was opened for writing or not?

Anyway, let's hear Roland defend maybe_mkwrite.

Hugh

2005-09-15 15:13:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED



On Thu, 15 Sep 2005, Hugh Dickins wrote:
>
> Just like you and Nick, I'd be happy to revert maybe_mkwrite, and our
> recent patch to rescue it. But I don't know the pressures which led
> to its introduction in the first place. CC'ed Roland who introduced
> it in 2.6.4, and Linus who supported it in the recent discussions.

It is clear that ptrace() _has_ to change semantics when it writes to a
shared page. That's fundamental. There's no way to avoid the COW, and yes,
that COW will have strange behaviour because it's semantically illegal (ie
the changed page will not be shared with anything else, including
necessarily subsequent fork()'ed children depending on if/how we change
that at some point).

So the question is whether we do the _minimal_ semantic changes (just COW)
or whether we do even more semantic breakage.

Roland's patch tries to keep the semantic breakage to a minimum. The COW
is forced upon us and there's no way we can avoid it, but the permission
change can be temporary and local to the ptrace, which is what we do now.

I personally think that it's a pretty ugly thing, but all the real
ugliness is in the COW part - which we can't avoid. Rolands change is less
so. In fact, in many ways I think Roland's change is quite nice: you can
have a PROT_READONLY/PROT_NONE area that is visible from the debugger, but
continues to cause SIGSEGV's if the user process itself tries to access
it. To me, that's good.

There would have to be some real advantage to _not_ doing what we're doing
now. And I don't see an advantage.

The real complexity is not "maybe_mkwrite()", which is trivial. The real
complexity is the fact that we COW, but that's not something we can avoid,
and that's not something that has anything to do with maybe_mkwrite().

Linus

2005-09-15 15:46:58

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED

On Thu, Sep 15, 2005 at 08:12:59AM -0700, Linus Torvalds wrote:
> have a PROT_READONLY/PROT_NONE area that is visible from the debugger, but
> continues to cause SIGSEGV's if the user process itself tries to access
> it. To me, that's good.

Continue to cause sigsegv yes, but on the wrong page, when it will read
the page it can contain different data compared to what is on
disk/pagecache.

> There would have to be some real advantage to _not_ doing what we're doing
> now. And I don't see an advantage.

The advantage is a faster fast path and less special cases to keep in
mind.

> The real complexity is not "maybe_mkwrite()", which is trivial. The real

It is trivial yes, but for it to work without deadlocks, it requires
non-trivial changes to the page fault handler and get_user_pages.

I guess this is mostly a matter of taste, but my taste is about keeping
it simple and fast (though the difference is certainly not measurable).

Thanks.

2005-09-15 16:13:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED



On Thu, 15 Sep 2005, Andrea Arcangeli wrote:

> On Thu, Sep 15, 2005 at 08:12:59AM -0700, Linus Torvalds wrote:
> > have a PROT_READONLY/PROT_NONE area that is visible from the debugger, but
> > continues to cause SIGSEGV's if the user process itself tries to access
> > it. To me, that's good.
>
> Continue to cause sigsegv yes, but on the wrong page, when it will read
> the page it can contain different data compared to what is on
> disk/pagecache.

So? You're not making any sense.

I repeat: we CANNOT AVOID the fact that we will do COW.

That COW is required. No way we can avoid it. It has _nothing_ to do with
maybe_mkwrite().

So I don't know why you continually refuse to just admit that fact. Why do
you mix up the COW semantics with the maybe_mkwrite() semantics.

If you can't argue against maybe_mkwrite() without involving the COW
argument, then stop arguing. They are two totally different thigns.

Linus

2005-09-15 16:23:40

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED

On Thu, Sep 15, 2005 at 09:13:02AM -0700, Linus Torvalds wrote:
>
>
> On Thu, 15 Sep 2005, Andrea Arcangeli wrote:
>
> > On Thu, Sep 15, 2005 at 08:12:59AM -0700, Linus Torvalds wrote:
> > > have a PROT_READONLY/PROT_NONE area that is visible from the debugger, but
> > > continues to cause SIGSEGV's if the user process itself tries to access
> > > it. To me, that's good.
> >
> > Continue to cause sigsegv yes, but on the wrong page, when it will read
> > the page it can contain different data compared to what is on
> > disk/pagecache.
>
> So? You're not making any sense.

I'll try again: what is the point of still getting page faults on writes
when the first read will contain the wrong data?

And what is the point of writing to a prot_none with ptrace? That really
makes no sense.

I'm not saying the cow should go away, I'm saying that marking the pte
readonly after writing to it makes no sense.

I've said maybe_mkwrite makes no sense, I didn't say the cow makes no
sense.

> I repeat: we CANNOT AVOID the fact that we will do COW.
>
> That COW is required. No way we can avoid it. It has _nothing_ to do with
> maybe_mkwrite().
>
> So I don't know why you continually refuse to just admit that fact. Why do
> you mix up the COW semantics with the maybe_mkwrite() semantics.
>
> If you can't argue against maybe_mkwrite() without involving the COW
> argument, then stop arguing. They are two totally different thigns.

I wasn't arguing about that, infact Hugh noticed that I suggested that
avoiding the cow is not an option (he's the one that disagreed with you
on this if something).

2005-09-15 16:34:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED



On Thu, 15 Sep 2005, Andrea Arcangeli wrote:
>
> I'll try again: what is the point of still getting page faults on writes
> when the first read will contain the wrong data?

What's the point of having the page AT ALL if the data is wrong?

You are _still_ arguing that the "data" and the "page fault" are somehow
connected. They aren't.

If you think the data is wrong, then you are arguing against the COW. Yes,
the COW will make the data "wrong", but you can't escape that. That's what
a "write" by ptrace does.

Btw, that's true even if we didn't do the COW - the COW just makes it even
more so. But even without the COW, the ptrace has written data that the
process didn't expect, and the process didn't write.

Here's a big clue. A ptrace PTRACE_POKE-induced write WRITES DATA.

Afterwards, the data is different from what if would have been if the
ptrace hadn't written. It's "wrong". Tough titties. It's what ptrace does.
Live with it. If you don't want wrong data, don't use ptrace to write
wrong data.

However, you seem to confuse "write data" with "write data and make the
page writable".

And as long as you continue to mix the two, there's no point in talking
about it. They are different.

To recap: PTRACE_POKE _will_ write "wrong data" to the process. Part of it
directly (the actual data written), and part of it indirectly (the fact
that it has to break the COW connection in order to do the write). THAT IS
INESCAPABLE, AND IT IS A DIRECT RESULT OF PTRACE_POKE.

And it has _nothing_ to do with whether we fault afterwards or not.

Linus

2005-09-15 16:51:11

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED

On Thu, Sep 15, 2005 at 09:34:12AM -0700, Linus Torvalds wrote:
> If you think the data is wrong, then you are arguing against the COW. Yes,
> the COW will make the data "wrong", but you can't escape that. That's what
> a "write" by ptrace does.

My point is that exactly because this is the wrong page with the wrong
data with the wrong ptrace usage, these kind of things will happen in a
very controlled environment, so if you can deal with the wrong data and
the wrong page with wrong ptrace usage, you can sure deal with the pte
being writeable too. Especially the ptrace over PROT_NONE that you
mentioned is a total nosense, so that is really wrong ptrace usage, it's
not worth it to even try to make transparent such a wrong usage of ptrace.

I don't see why the kernel should bother to fix an unfixable case.

The only thing we might want to argue about is the ptrace over anonymous
memory (not the subject here but still affected by my changes), that
is the only one that can work right and it does currently, but even that
is quite a corner case that doesn't seem worth it IMHO since ptrace on
readonly vma is a corner case since the first place.

2005-09-15 17:52:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED



On Thu, 15 Sep 2005, Andrea Arcangeli wrote:
>
> On Thu, Sep 15, 2005 at 09:34:12AM -0700, Linus Torvalds wrote:
> > If you think the data is wrong, then you are arguing against the COW. Yes,
> > the COW will make the data "wrong", but you can't escape that. That's what
> > a "write" by ptrace does.
>
> My point is that exactly because this is the wrong page with the wrong
> data with the wrong ptrace usage, these kind of things will happen in a
> very controlled environment

I disagree.

_The_ most common use of PTRACE_POKE is debugging. And gdb doesn't "know"
what the process is doing.

> I don't see why the kernel should bother to fix an unfixable case.

I don't see why you argue against trying to do our best.

Let me tell you why you're wrong, with a real-life example.

I debugged "sparse" with making all de-allocations turn the page
protections to PROT_NONE.

That's absolutely _wonderful_. It means that when the process does a bad
access, I get a SIGSEGV (and the memory hasn't been allocated to something
else, as would happen if I had just munmapped it). And when debugging
this, I could _see_ what the old contents were - because PTRACE_PEEK
could punch through the protections - so I had lots of extra information.

For example, I could look at the free'd data to see where it came from,
because the free'd data actually contained pointers to other stuff, giving
me a very good view of _what_ had gotten free'd too early.

Now, in this case, a SIGSEGV was always fatal, but the fact is, some
applications actually catch it, and do their own VM management. It's rare,
but it definitely exists. So should the fact that I looked at the data
make that the protections changed? HELL NO! My debugger may have looked at
it, but that doesn't invalidate the notion that the program still expected
to get a SIGSEGV on it.

And the PTRACE_POKE is _exactly_ the same thing. There's _zero_
difference. The fact that PTRACE_POKE _changes_ the data instead of just
reading it doesn't change anything at all - the fact that data got changed
in NO WAY invalidates the fact that processes might still depend on
getting a SIGSEGV.

If you want to change permissions, then add a PTRACE_MPROTECT thing or
something, and teach gdb to use that. But if you don't want to change
permissions, then you shouldn't change permissions.

Now, if you have a technical reason why "maybe_mkwrite()" needs to go
away, then that's a different thing. BUT IT HAS NOTHING TO DO WITH THE
FACT THAT WE LOOKED AT OR CHANGED THE DATA!

Linus

2005-09-15 18:09:19

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED

On Thu, Sep 15, 2005 at 10:52:14AM -0700, Linus Torvalds wrote:
> And the PTRACE_POKE is _exactly_ the same thing. There's _zero_
> difference. The fact that PTRACE_POKE _changes_ the data instead of just
> reading it doesn't change anything at all - the fact that data got changed
> in NO WAY invalidates the fact that processes might still depend on
> getting a SIGSEGV.

And this process may as well depend to see the on-disk changes that
other threads are doing on the shared memory, and that will break
regardless of what Linus changes in the kernel.

You also didn't make up any useful example where _writing_ (not reading
like in your example) was involved. Your example is totally offtopic,
since it only involved reading as far as I can tell.

I can't imagine where writing to a PROT_NONE is actually useful.

> Now, if you have a technical reason why "maybe_mkwrite()" needs to go
> away, then that's a different thing. BUT IT HAS NOTHING TO DO WITH THE
> FACT THAT WE LOOKED AT OR CHANGED THE DATA!

It just looks unnecessary cruft, but I'll stick with kernel crashing
bugs for my own safety.

2005-09-15 21:09:36

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED

On Thu, Sep 15, 2005 at 08:09:28PM +0200, Andrea Arcangeli wrote:
> On Thu, Sep 15, 2005 at 10:52:14AM -0700, Linus Torvalds wrote:
> > And the PTRACE_POKE is _exactly_ the same thing. There's _zero_
> > difference. The fact that PTRACE_POKE _changes_ the data instead of just
> > reading it doesn't change anything at all - the fact that data got changed
> > in NO WAY invalidates the fact that processes might still depend on
> > getting a SIGSEGV.
>
> And this process may as well depend to see the on-disk changes that
> other threads are doing on the shared memory, and that will break
> regardless of what Linus changes in the kernel.
>
> You also didn't make up any useful example where _writing_ (not reading
> like in your example) was involved. Your example is totally offtopic,
> since it only involved reading as far as I can tell.
>
> I can't imagine where writing to a PROT_NONE is actually useful.

Well, you won't like this example any better, then, but this was a
frequently reported GDB bug for a while:

const int x;
int main()
{
*x = 1;
return 0;
}

x goes in rodata -> text segment -> on the same page as main. If you
run to main in GDB, the page becomes writable. The store doesn't
crash. If you run it out of GDB, it crashes.

Sure, the trivial example's uninteresting. But you can construct a
larger example with, say, *foo() = x replaced by *foo = x. That's not
legal in C for a function foo, of course. But you could probably
manage it in some other language, or in asm. So you debug right around
where you're getting a crash in your application, and it doesn't crash.

Ptrace needs to be as unintrusive as possible. Having the page COW
unexpectedly is a lot less bad than having it COW _and_ remain
writable.

--
Daniel Jacobowitz
CodeSourcery, LLC

2005-09-15 21:57:54

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: ptrace can't be transparent on readonly MAP_SHARED

On Thu, Sep 15, 2005 at 05:09:31PM -0400, Daniel Jacobowitz wrote:
> Well, you won't like this example any better, then, but this was a
> frequently reported GDB bug for a while:
>
> const int x;
> int main()
> {
> *x = 1;
> return 0;
> }
>
> x goes in rodata -> text segment -> on the same page as main. If you
> run to main in GDB, the page becomes writable. The store doesn't
> crash. If you run it out of GDB, it crashes.
>
> Sure, the trivial example's uninteresting. But you can construct a
> larger example with, say, *foo() = x replaced by *foo = x. That's not
> legal in C for a function foo, of course. But you could probably
> manage it in some other language, or in asm. So you debug right around
> where you're getting a crash in your application, and it doesn't crash.
>
> Ptrace needs to be as unintrusive as possible. Having the page COW
> unexpectedly is a lot less bad than having it COW _and_ remain
> writable.

Well this is the first real life example that explains why having
the pte marked readonly might actually make a difference in practice
(debugging a segfault when overwriting .rodata makes sense), so I like
it a lot better and infact I now for the first time can see why it can
help.

.text is not going to change on-disk, so the coherency loss for the
pagecache is not significant but losing the readonly protection would
prevent the segfault to trigger and so it makes debugging more fancy.

It certainly wasn't related to any PROT_NONE.

Thanks.