2009-04-24 07:33:18

by Miklos Szeredi

[permalink] [raw]
Subject: Re: Why doesn't zap_pte_range() call page_mkwrite()

On Fri, 24 Apr 2009, Miklos Szeredi wrote:
> Hmm, I guess this is a bit nasty: the VM promises filesystems that
> ->page_mkwrite() will be called when the page is dirtied through a
> mapping, _almost_ all of the time. Except when munmap happens to race
> with clear_page_dirty_for_io().
>
> I don't have any ideas how this could be fixed, CC-ing linux-mm...

On second thought, we could possibly just ignore the dirty bit in that
case. Trying to write to a mapping _during_ munmap() will have pretty
undefined results, I don't think any sane application out there should
rely on the results of this.

But how knows, the world is a weird place...

Miklos


2009-04-24 12:59:49

by Chris Mason

[permalink] [raw]
Subject: Re: Why doesn't zap_pte_range() call page_mkwrite()

On Fri, 2009-04-24 at 09:33 +0200, Miklos Szeredi wrote:
> On Fri, 24 Apr 2009, Miklos Szeredi wrote:
> > Hmm, I guess this is a bit nasty: the VM promises filesystems that
> > ->page_mkwrite() will be called when the page is dirtied through a
> > mapping, _almost_ all of the time. Except when munmap happens to race
> > with clear_page_dirty_for_io().
> >
> > I don't have any ideas how this could be fixed, CC-ing linux-mm...
>
> On second thought, we could possibly just ignore the dirty bit in that
> case. Trying to write to a mapping _during_ munmap() will have pretty
> undefined results, I don't think any sane application out there should
> rely on the results of this.
>
> But how knows, the world is a weird place...

It does happen in practice, btrfs has fallback code that triggers the
page_mkwrite when it finds a dirty page that wasn't dirtied with help
from the FS.

I'd love to get rid of the fallback ;)

-chris



2009-04-24 13:31:54

by Trond Myklebust

[permalink] [raw]
Subject: Re: Why doesn't zap_pte_range() call page_mkwrite()

On Fri, 2009-04-24 at 08:59 -0400, Chris Mason wrote:
> On Fri, 2009-04-24 at 09:33 +0200, Miklos Szeredi wrote:
> > On Fri, 24 Apr 2009, Miklos Szeredi wrote:
> > > Hmm, I guess this is a bit nasty: the VM promises filesystems that
> > > ->page_mkwrite() will be called when the page is dirtied through a
> > > mapping, _almost_ all of the time. Except when munmap happens to race
> > > with clear_page_dirty_for_io().
> > >
> > > I don't have any ideas how this could be fixed, CC-ing linux-mm...
> >
> > On second thought, we could possibly just ignore the dirty bit in that
> > case. Trying to write to a mapping _during_ munmap() will have pretty
> > undefined results, I don't think any sane application out there should
> > rely on the results of this.
> >
> > But how knows, the world is a weird place...
>
> It does happen in practice, btrfs has fallback code that triggers the
> page_mkwrite when it finds a dirty page that wasn't dirtied with help
> from the FS.
>
> I'd love to get rid of the fallback ;)

So is there any reason why we shouldn't put calls to page_mkwrite in
zap_pte_range?

The only alternative I can think of would be to unmap the page when the
filesystem starts to write it out in order to force another page fault
if the user application writes more data into that page.

Cheers
Trond

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2009-04-24 14:06:58

by Trond Myklebust

[permalink] [raw]
Subject: Re: Why doesn't zap_pte_range() call page_mkwrite()

On Fri, 2009-04-24 at 09:31 -0400, Trond Myklebust wrote:
> The only alternative I can think of would be to unmap the page when the
> filesystem starts to write it out in order to force another page fault
> if the user application writes more data into that page.

Actually, this might be fairly trivial to implement in NFS. We'd tag the
nfs_page request as having been created by page_mkwrite(), then unmap
any such tagged page in the ->writepage() callback (assuming that
calling unmap_mapping_range() from ->writepage() is allowed?).

AFAICS that should get rid of those residual dirty ptes in sys_munmap().

Cheers
Trond

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2009-04-24 16:18:19

by Jamie Lokier

[permalink] [raw]
Subject: Re: Why doesn't zap_pte_range() call page_mkwrite()

Miklos Szeredi wrote:
> On Fri, 24 Apr 2009, Miklos Szeredi wrote:
> > Hmm, I guess this is a bit nasty: the VM promises filesystems that
> > ->page_mkwrite() will be called when the page is dirtied through a
> > mapping, _almost_ all of the time. Except when munmap happens to race
> > with clear_page_dirty_for_io().
> >
> > I don't have any ideas how this could be fixed, CC-ing linux-mm...
>
> On second thought, we could possibly just ignore the dirty bit in that
> case. Trying to write to a mapping _during_ munmap() will have pretty
> undefined results, I don't think any sane application out there should
> rely on the results of this.
>
> But how knows, the world is a weird place...

I think it's a sane but unusual thing to do.

App has a thread writing to random places in a mapped file, and
another calling munmap() or mprotect() to trap writes to some parts of
the file in order to track what parts the first thread is dirtying.
Second thread's SIGSEGV handler reinstates those mappings. First
thread doesn't know about any of this, it just writes and the only
side effect is timing. Or should be.

Think garbage collection, change tracking, tracing, and debugging.

-- Jamie

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>