2007-10-12 07:19:38

by Suleiman Souhlal

[permalink] [raw]
Subject: Re: [PATCH] Don't needlessly dirty mlocked pages when initially faulting them in.


On Jul 26, 2007, at 11:33 PM, Peter Zijlstra wrote:

> On Thu, 2007-07-26 at 17:23 -0700, Andrew Morton wrote:
>> On Thu, 26 Jul 2007 16:52:44 -0700 Suleiman Souhlal
>> <[email protected]> wrote:
>>
>>> make_pages_present() is dirtying mlocked pages if the VMA is
>>> writable, even
>>> though it shouldn't, by telling get_user_pages() to simulate a
>>> write fault.
>>>
>>> A simple way to test this is to mlock a multi-GB file, and then
>>> sync.
>>> The sync will take a long time.
>>
>> ugh, how bad of us.
>>
>>> As far as I can see, it should be safe to just not simulate a
>>> write fault.
>>
>> We pass in "write=1" to force a COW. This is because we want to
>> do all
>> that memory allocation at mlock()-time, not later on, when the app
>> writes
>> to the page.
>
> <snip patch>
>
>> So something sterner will need to be done. I guess the
>> write_access arg to
>> handle_mm_fault() would need to become a three-value thing.
>
> That would be most painfull. Can't we simply set write=0 for shared
> mappings? Those won't have COW to break and are the onces that do
> requires writeback. Anonymous and private mappings do COW but will
> never
> writeback and are thus save to touch with write=1.
>
> How about something like this:
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> mm/memory.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/mm/memory.c
> ===================================================================
> --- linux-2.6.orig/mm/memory.c
> +++ linux-2.6/mm/memory.c
> @@ -2716,7 +2716,12 @@ int make_pages_present(unsigned long add
> vma = find_vma(current->mm, addr);
> if (!vma)
> return -1;
> - write = (vma->vm_flags & VM_WRITE) != 0;
> + /*
> + * We want to touch writable mappings with a write fault in otder
> + * to break COW, except for shared mappings because these don't COW
> + * and we would not want to dirty them for nothing.
> + */
> + write = (vma->vm_flags & VM_WRITE|VM_SHARED) == VM_WRITE;
> BUG_ON(addr >= end);
> BUG_ON(end > vma->vm_end);
> len = DIV_ROUND_UP(end, PAGE_SIZE) - addr/PAGE_SIZE;

Oops, I completely forgot about this. Sorry about that.

It works for me (as long as you put parens around VM_WRITE|VM_SHARED).

Please consider committing it. :-)

Signed-off-by: Suleiman Souhlal <[email protected]>

-- Suleiman


2007-10-12 09:03:39

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] mm: avoid dirtying shared mappings on mlock

Subject: mm: avoid dirtying shared mappings on mlock

Suleiman noticed that shared mappings get dirtied when mlocked.
Avoid this by teaching make_pages_present about this case.

Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Suleiman Souhlal <[email protected]>
---
mm/memory.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -2719,7 +2719,12 @@ int make_pages_present(unsigned long add
vma = find_vma(current->mm, addr);
if (!vma)
return -1;
- write = (vma->vm_flags & VM_WRITE) != 0;
+ /*
+ * We want to touch writable mappings with a write fault in order
+ * to break COW, except for shared mappings because these don't COW
+ * and we would not want to dirty them for nothing.
+ */
+ write = (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE;
BUG_ON(addr >= end);
BUG_ON(end > vma->vm_end);
len = DIV_ROUND_UP(end, PAGE_SIZE) - addr/PAGE_SIZE;


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-10-12 09:29:00

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] mm: avoid dirtying shared mappings on mlock

On Friday 12 October 2007 19:03, Peter Zijlstra wrote:
> Subject: mm: avoid dirtying shared mappings on mlock
>
> Suleiman noticed that shared mappings get dirtied when mlocked.
> Avoid this by teaching make_pages_present about this case.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Acked-by: Suleiman Souhlal <[email protected]>

Umm, I don't see the other piece of this thread, so I don't
know what the actual problem was.

But I would really rather not do this. If you do this, then you
now can get random SIGBUSes when you write into the memory if it
can't allocate blocks or ... (some other filesystem specific
condition).

I agree it feels suboptimal, but we've _always_ done this, right?
Is it suddenly a problem? Unless a really nice general way is
made to solve this, optimising it like this makes it harder to
ensure good semantics for all corner cases I think (and then once
the optimisation is in place, it's a lot harder to remove it).

I'll go away and have a better look for the old thread.

> ---
> mm/memory.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/mm/memory.c
> ===================================================================
> --- linux-2.6.orig/mm/memory.c
> +++ linux-2.6/mm/memory.c
> @@ -2719,7 +2719,12 @@ int make_pages_present(unsigned long add
> vma = find_vma(current->mm, addr);
> if (!vma)
> return -1;
> - write = (vma->vm_flags & VM_WRITE) != 0;
> + /*
> + * We want to touch writable mappings with a write fault in order
> + * to break COW, except for shared mappings because these don't COW
> + * and we would not want to dirty them for nothing.
> + */
> + write = (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE;
> BUG_ON(addr >= end);
> BUG_ON(end > vma->vm_end);
> len = DIV_ROUND_UP(end, PAGE_SIZE) - addr/PAGE_SIZE;

2007-10-12 09:39:48

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] mm: avoid dirtying shared mappings on mlock

On Friday 12 October 2007 02:57, Nick Piggin wrote:
> On Friday 12 October 2007 19:03, Peter Zijlstra wrote:
> > Subject: mm: avoid dirtying shared mappings on mlock
> >
> > Suleiman noticed that shared mappings get dirtied when mlocked.
> > Avoid this by teaching make_pages_present about this case.
> >
> > Signed-off-by: Peter Zijlstra <[email protected]>
> > Acked-by: Suleiman Souhlal <[email protected]>
>
> Umm, I don't see the other piece of this thread, so I don't
> know what the actual problem was.

Found it, but no more clues. Presumably it's some horrible
google workload... they're pretty happy to carry these kinds
of patches internally, right? ;)


> But I would really rather not do this. If you do this, then you
> now can get random SIGBUSes when you write into the memory if it
> can't allocate blocks or ... (some other filesystem specific
> condition).
>
> I agree it feels suboptimal, but we've _always_ done this, right?
> Is it suddenly a problem? Unless a really nice general way is
> made to solve this, optimising it like this makes it harder to
> ensure good semantics for all corner cases I think (and then once
> the optimisation is in place, it's a lot harder to remove it).

Yeah, I really would rather not do this. If we actually go
through the whole fault path in mlock, then it doesn't
really matter what future baggage we attach to fault handlers...
(OK, we still technically have some problems with invalidations,
but mostly they're avoidable unless doing something silly).

How about just doing another PROT_READ mmap, and mlocking that?
(I was going to suggest another syscall, but that's probably
overkill).

2007-10-12 10:37:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] mm: avoid dirtying shared mappings on mlock

On Fri, 2007-10-12 at 02:57 +1000, Nick Piggin wrote:
> On Friday 12 October 2007 19:03, Peter Zijlstra wrote:
> > Subject: mm: avoid dirtying shared mappings on mlock
> >
> > Suleiman noticed that shared mappings get dirtied when mlocked.
> > Avoid this by teaching make_pages_present about this case.
> >
> > Signed-off-by: Peter Zijlstra <[email protected]>
> > Acked-by: Suleiman Souhlal <[email protected]>
>
> Umm, I don't see the other piece of this thread, so I don't
> know what the actual problem was.
>
> But I would really rather not do this. If you do this, then you
> now can get random SIGBUSes when you write into the memory if it
> can't allocate blocks or ... (some other filesystem specific
> condition).

I'm not getting this, make_pages_present() only has to ensure all the
pages are read from disk and in memory. How is this different from a
read-scan?

The pages will still be read-only due to dirty tracking, so the first
write will still do page_mkwrite().



Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-10-12 10:46:04

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] mm: avoid dirtying shared mappings on mlock

On Friday 12 October 2007 20:37, Peter Zijlstra wrote:
> On Fri, 2007-10-12 at 02:57 +1000, Nick Piggin wrote:
> > On Friday 12 October 2007 19:03, Peter Zijlstra wrote:
> > > Subject: mm: avoid dirtying shared mappings on mlock
> > >
> > > Suleiman noticed that shared mappings get dirtied when mlocked.
> > > Avoid this by teaching make_pages_present about this case.
> > >
> > > Signed-off-by: Peter Zijlstra <[email protected]>
> > > Acked-by: Suleiman Souhlal <[email protected]>
> >
> > Umm, I don't see the other piece of this thread, so I don't
> > know what the actual problem was.
> >
> > But I would really rather not do this. If you do this, then you
> > now can get random SIGBUSes when you write into the memory if it
> > can't allocate blocks or ... (some other filesystem specific
> > condition).
>
> I'm not getting this, make_pages_present() only has to ensure all the
> pages are read from disk and in memory. How is this different from a
> read-scan?

I guess because we've mlocked a region that has PROT_WRITE access...
but actually, I suppose mlock doesn't technically require that we
can write to the memory, only that the page isn't swapped out.

Still, it is nice to be able to have a reasonable guarantee of
writability.


> The pages will still be read-only due to dirty tracking, so the first
> write will still do page_mkwrite().

Which can SIGBUS, no?

2007-10-12 10:50:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] mm: avoid dirtying shared mappings on mlock

On Fri, 2007-10-12 at 04:14 +1000, Nick Piggin wrote:
> On Friday 12 October 2007 20:37, Peter Zijlstra wrote:
> > On Fri, 2007-10-12 at 02:57 +1000, Nick Piggin wrote:
> > > On Friday 12 October 2007 19:03, Peter Zijlstra wrote:
> > > > Subject: mm: avoid dirtying shared mappings on mlock
> > > >
> > > > Suleiman noticed that shared mappings get dirtied when mlocked.
> > > > Avoid this by teaching make_pages_present about this case.
> > > >
> > > > Signed-off-by: Peter Zijlstra <[email protected]>
> > > > Acked-by: Suleiman Souhlal <[email protected]>
> > >
> > > Umm, I don't see the other piece of this thread, so I don't
> > > know what the actual problem was.
> > >
> > > But I would really rather not do this. If you do this, then you
> > > now can get random SIGBUSes when you write into the memory if it
> > > can't allocate blocks or ... (some other filesystem specific
> > > condition).
> >
> > I'm not getting this, make_pages_present() only has to ensure all the
> > pages are read from disk and in memory. How is this different from a
> > read-scan?
>
> I guess because we've mlocked a region that has PROT_WRITE access...
> but actually, I suppose mlock doesn't technically require that we
> can write to the memory, only that the page isn't swapped out.
>
> Still, it is nice to be able to have a reasonable guarantee of
> writability.
>
>
> > The pages will still be read-only due to dirty tracking, so the first
> > write will still do page_mkwrite().
>
> Which can SIGBUS, no?

Sure, but that is no different than any other mmap'ed write. I'm not
seeing how an mlocked region is special here.

I agree it would be nice if mmap'ed writes would have better error
reporting than SIGBUS, but such is life.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-10-12 12:23:30

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] mm: avoid dirtying shared mappings on mlock

On Friday 12 October 2007 20:50, Peter Zijlstra wrote:
> On Fri, 2007-10-12 at 04:14 +1000, Nick Piggin wrote:
> > On Friday 12 October 2007 20:37, Peter Zijlstra wrote:

> > > The pages will still be read-only due to dirty tracking, so the first
> > > write will still do page_mkwrite().
> >
> > Which can SIGBUS, no?
>
> Sure, but that is no different than any other mmap'ed write. I'm not
> seeing how an mlocked region is special here.

Well it is a change in behaviour (admittedly, so was the change
to SIGBUS mmaped writes in the first place). It's a matter of
semantics I guess. Is the current behaviour actually a _problem_
for anyone? If not, then do we need to change it?

I'm not saying it does matter, just that it might matter ;)

2007-10-12 14:55:31

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] mm: avoid dirtying shared mappings on mlock

On Fri, 12 Oct 2007 12:50:22 +0200
> > > The pages will still be read-only due to dirty tracking, so the
> > > first write will still do page_mkwrite().
> >
> > Which can SIGBUS, no?
>
> Sure, but that is no different than any other mmap'ed write. I'm not
> seeing how an mlocked region is special here.
>
> I agree it would be nice if mmap'ed writes would have better error
> reporting than SIGBUS, but such is life.

well... there's another consideration
people use mlock() in cases where they don't want to go to the
filesystem for paging and stuff as well (think the various iscsi
daemons and other things that get in trouble).. those kind of uses
really use mlock to avoid
1) IO to the filesystem
2) Needing memory allocations for pagefault like things
at least for the more "hidden" cases...

prefaulting everything ready pretty much gives them that... letting
things fault on demand... nicely breaks that.

2007-10-12 14:58:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] mm: avoid dirtying shared mappings on mlock

On Fri, 2007-10-12 at 07:53 -0700, Arjan van de Ven wrote:
> On Fri, 12 Oct 2007 12:50:22 +0200
> > > > The pages will still be read-only due to dirty tracking, so the
> > > > first write will still do page_mkwrite().
> > >
> > > Which can SIGBUS, no?
> >
> > Sure, but that is no different than any other mmap'ed write. I'm not
> > seeing how an mlocked region is special here.
> >
> > I agree it would be nice if mmap'ed writes would have better error
> > reporting than SIGBUS, but such is life.
>
> well... there's another consideration
> people use mlock() in cases where they don't want to go to the
> filesystem for paging and stuff as well (think the various iscsi
> daemons and other things that get in trouble).. those kind of uses
> really use mlock to avoid
> 1) IO to the filesystem
> 2) Needing memory allocations for pagefault like things
> at least for the more "hidden" cases...
>
> prefaulting everything ready pretty much gives them that... letting
> things fault on demand... nicely breaks that.

Non of that is changed. So I'm a little puzzled as to which side you
argue.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-10-12 17:45:43

by Suleiman Souhlal

[permalink] [raw]
Subject: Re: [PATCH] mm: avoid dirtying shared mappings on mlock


On Oct 12, 2007, at 7:58 AM, Peter Zijlstra wrote:

> On Fri, 2007-10-12 at 07:53 -0700, Arjan van de Ven wrote:
>> On Fri, 12 Oct 2007 12:50:22 +0200
>>>>> The pages will still be read-only due to dirty tracking, so the
>>>>> first write will still do page_mkwrite().
>>>>
>>>> Which can SIGBUS, no?
>>>
>>> Sure, but that is no different than any other mmap'ed write. I'm not
>>> seeing how an mlocked region is special here.
>>>
>>> I agree it would be nice if mmap'ed writes would have better error
>>> reporting than SIGBUS, but such is life.
>>
>> well... there's another consideration
>> people use mlock() in cases where they don't want to go to the
>> filesystem for paging and stuff as well (think the various iscsi
>> daemons and other things that get in trouble).. those kind of uses
>> really use mlock to avoid
>> 1) IO to the filesystem
>> 2) Needing memory allocations for pagefault like things
>> at least for the more "hidden" cases...
>>
>> prefaulting everything ready pretty much gives them that... letting
>> things fault on demand... nicely breaks that.
>
> Non of that is changed. So I'm a little puzzled as to which side you
> argue.

I think this might change the behavior in case you mlock sparse files.
I guess currently the holes disappear when you mlock them, but with
the patch the blocks wouldn't get allocated until they get written to.

-- Suleiman

2007-10-12 17:56:24

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] mm: avoid dirtying shared mappings on mlock

On Fri, 12 Oct 2007 10:45:17 -0700
Suleiman Souhlal <[email protected]> wrote:

>
> On Oct 12, 2007, at 7:58 AM, Peter Zijlstra wrote:
>
> > On Fri, 2007-10-12 at 07:53 -0700, Arjan van de Ven wrote:
> >> On Fri, 12 Oct 2007 12:50:22 +0200
> >>>>> The pages will still be read-only due to dirty tracking, so the
> >>>>> first write will still do page_mkwrite().
> >>>>
> >>>> Which can SIGBUS, no?
> >>>
> >>> Sure, but that is no different than any other mmap'ed write. I'm
> >>> not seeing how an mlocked region is special here.
> >>>
> >>> I agree it would be nice if mmap'ed writes would have better error
> >>> reporting than SIGBUS, but such is life.
> >>
> >> well... there's another consideration
> >> people use mlock() in cases where they don't want to go to the
> >> filesystem for paging and stuff as well (think the various iscsi
> >> daemons and other things that get in trouble).. those kind of uses
> >> really use mlock to avoid
> >> 1) IO to the filesystem
> >> 2) Needing memory allocations for pagefault like things
> >> at least for the more "hidden" cases...
> >>
> >> prefaulting everything ready pretty much gives them that... letting
> >> things fault on demand... nicely breaks that.
> >
> > Non of that is changed. So I'm a little puzzled as to which side you
> > argue.
>
> I think this might change the behavior in case you mlock sparse files.
> I guess currently the holes disappear when you mlock them, but with
> the patch the blocks wouldn't get allocated until they get written to.

eh yeah I forgot to mention this was for the sparse case....

2007-10-12 18:03:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] mm: avoid dirtying shared mappings on mlock


On Fri, 2007-10-12 at 10:45 -0700, Suleiman Souhlal wrote:
> On Oct 12, 2007, at 7:58 AM, Peter Zijlstra wrote:
>
> > On Fri, 2007-10-12 at 07:53 -0700, Arjan van de Ven wrote:
> >> On Fri, 12 Oct 2007 12:50:22 +0200
> >>>>> The pages will still be read-only due to dirty tracking, so the
> >>>>> first write will still do page_mkwrite().
> >>>>
> >>>> Which can SIGBUS, no?
> >>>
> >>> Sure, but that is no different than any other mmap'ed write. I'm not
> >>> seeing how an mlocked region is special here.
> >>>
> >>> I agree it would be nice if mmap'ed writes would have better error
> >>> reporting than SIGBUS, but such is life.
> >>
> >> well... there's another consideration
> >> people use mlock() in cases where they don't want to go to the
> >> filesystem for paging and stuff as well (think the various iscsi
> >> daemons and other things that get in trouble).. those kind of uses
> >> really use mlock to avoid
> >> 1) IO to the filesystem
> >> 2) Needing memory allocations for pagefault like things
> >> at least for the more "hidden" cases...
> >>
> >> prefaulting everything ready pretty much gives them that... letting
> >> things fault on demand... nicely breaks that.
> >
> > Non of that is changed. So I'm a little puzzled as to which side you
> > argue.
>
> I think this might change the behavior in case you mlock sparse files.
> I guess currently the holes disappear when you mlock them, but with
> the patch the blocks wouldn't get allocated until they get written to.

Sure, but by point 1 - avoiding IO - that doesn't matter. Once you write
to a shared mapping you'll generate IO and you'll hit kernel allocations
and other delays no matter what you do.