2007-02-27 23:16:35

by Miklos Szeredi

[permalink] [raw]
Subject: [patch 01/22] update ctime and mtime for mmaped write

From: Miklos Szeredi <[email protected]>

Changes:
o moved check from __fput() to remove_vma(), which is more logical
o changed set_page_dirty() to set_page_dirty_mapping in hugetlb.c
o cleaned up #ifdef CONFIG_BLOCK mess


This patch makes writing to shared memory mappings update st_ctime and
st_mtime as defined by SUSv3:

The st_ctime and st_mtime fields of a file that is mapped with
MAP_SHARED and PROT_WRITE shall be marked for update at some point
in the interval between a write reference to the mapped region and
the next call to msync() with MS_ASYNC or MS_SYNC for that portion
of the file by any process. If there is no such call and if the
underlying file is modified as a result of a write reference, then
these fields shall be marked for update at some time after the
write reference.

A new address_space flag is introduced: AS_CMTIME. This is set each
time a page is dirtied through a userspace memory mapping. This
includes write accesses via get_user_pages().

Note, the flag is set unconditionally, even if the page is already
dirty. This is important, because the page might have been dirtied
earlier by a non-mmap write.

This flag is checked in msync() and munmap()/mremap(), and if set, the
file times are updated and the flag is cleared

The flag is also cleared, if the time update is triggered by a normal
write. This is not mandated by the standard, but seems to be a sane
thing to do.

Fixes Novell Bugzilla #206431.

Inspired by Peter Staubach's patch and the resulting comments.

Signed-off-by: Miklos Szeredi <[email protected]>
---

Index: linux/include/linux/pagemap.h
===================================================================
--- linux.orig/include/linux/pagemap.h 2007-02-27 14:41:06.000000000 +0100
+++ linux/include/linux/pagemap.h 2007-02-27 14:41:07.000000000 +0100
@@ -18,6 +18,7 @@
*/
#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */
#define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */
+#define AS_CMTIME (__GFP_BITS_SHIFT + 2) /* ctime/mtime update needed */

static inline void mapping_set_error(struct address_space * mapping, int error)
{
Index: linux/mm/msync.c
===================================================================
--- linux.orig/mm/msync.c 2007-02-27 14:40:56.000000000 +0100
+++ linux/mm/msync.c 2007-02-27 14:41:07.000000000 +0100
@@ -77,6 +77,7 @@ asmlinkage long sys_msync(unsigned long
}
file = vma->vm_file;
start = vma->vm_end;
+ mapping_update_time(file);
if ((flags & MS_SYNC) && file &&
(vma->vm_flags & VM_SHARED)) {
get_file(file);
Index: linux/fs/inode.c
===================================================================
--- linux.orig/fs/inode.c 2007-02-27 14:40:56.000000000 +0100
+++ linux/fs/inode.c 2007-02-27 14:41:07.000000000 +0100
@@ -1218,6 +1218,7 @@ void file_update_time(struct file *file)
struct timespec now;
int sync_it = 0;

+ clear_bit(AS_CMTIME, &file->f_mapping->flags);
if (IS_NOCMTIME(inode))
return;
if (IS_RDONLY(inode))
@@ -1240,6 +1241,12 @@ void file_update_time(struct file *file)

EXPORT_SYMBOL(file_update_time);

+void mapping_update_time(struct file *file)
+{
+ if (test_bit(AS_CMTIME, &file->f_mapping->flags))
+ file_update_time(file);
+}
+
int inode_needs_sync(struct inode *inode)
{
if (IS_SYNC(inode))
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h 2007-02-27 14:40:56.000000000 +0100
+++ linux/include/linux/fs.h 2007-02-27 14:41:07.000000000 +0100
@@ -1893,6 +1893,7 @@ extern int inode_change_ok(struct inode
extern int __must_check inode_setattr(struct inode *, struct iattr *);

extern void file_update_time(struct file *file);
+extern void mapping_update_time(struct file *file);

static inline ino_t parent_ino(struct dentry *dentry)
{
Index: linux/include/linux/mm.h
===================================================================
--- linux.orig/include/linux/mm.h 2007-02-27 14:40:56.000000000 +0100
+++ linux/include/linux/mm.h 2007-02-27 14:41:07.000000000 +0100
@@ -791,6 +791,7 @@ int redirty_page_for_writepage(struct wr
struct page *page);
int FASTCALL(set_page_dirty(struct page *page));
int set_page_dirty_lock(struct page *page);
+int set_page_dirty_mapping(struct page *page);
int clear_page_dirty_for_io(struct page *page);

extern unsigned long do_mremap(unsigned long addr,
Index: linux/mm/memory.c
===================================================================
--- linux.orig/mm/memory.c 2007-02-27 14:40:56.000000000 +0100
+++ linux/mm/memory.c 2007-02-27 14:41:07.000000000 +0100
@@ -676,7 +676,7 @@ static unsigned long zap_pte_range(struc
anon_rss--;
else {
if (pte_dirty(ptent))
- set_page_dirty(page);
+ set_page_dirty_mapping(page);
if (pte_young(ptent))
SetPageReferenced(page);
file_rss--;
@@ -954,7 +954,7 @@ struct page *follow_page(struct vm_area_
if (flags & FOLL_TOUCH) {
if ((flags & FOLL_WRITE) &&
!pte_dirty(pte) && !PageDirty(page))
- set_page_dirty(page);
+ set_page_dirty_mapping(page);
mark_page_accessed(page);
}
unlock:
@@ -1514,6 +1514,15 @@ static inline void cow_user_page(struct
copy_user_highpage(dst, src, va, vma);
}

+static void set_page_dirty_mapping_balance(struct page *page)
+{
+ if (set_page_dirty_mapping(page)) {
+ struct address_space *mapping = page_mapping(page);
+ if (mapping)
+ balance_dirty_pages_ratelimited(mapping);
+ }
+}
+
/*
* This routine handles present pages, when users try to write
* to a shared page. It is done by copying the page to a new address
@@ -1664,7 +1673,7 @@ gotten:
unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {
- set_page_dirty_balance(dirty_page);
+ set_page_dirty_mapping_balance(dirty_page);
put_page(dirty_page);
}
return ret;
@@ -2316,7 +2325,7 @@ retry:
unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {
- set_page_dirty_balance(dirty_page);
+ set_page_dirty_mapping_balance(dirty_page);
put_page(dirty_page);
}
return ret;
Index: linux/mm/page-writeback.c
===================================================================
--- linux.orig/mm/page-writeback.c 2007-02-27 14:41:06.000000000 +0100
+++ linux/mm/page-writeback.c 2007-02-27 14:41:07.000000000 +0100
@@ -244,16 +244,6 @@ static void balance_dirty_pages(struct a
pdflush_operation(background_writeout, 0);
}

-void set_page_dirty_balance(struct page *page)
-{
- if (set_page_dirty(page)) {
- struct address_space *mapping = page_mapping(page);
-
- if (mapping)
- balance_dirty_pages_ratelimited(mapping);
- }
-}
-
/**
* balance_dirty_pages_ratelimited_nr - balance dirty memory state
* @mapping: address_space which was dirtied
@@ -798,6 +788,10 @@ int redirty_page_for_writepage(struct wr
}
EXPORT_SYMBOL(redirty_page_for_writepage);

+#ifndef CONFIG_BLOCK
+#define __set_page_dirty_buffers NULL
+#endif
+
/*
* If the mapping doesn't provide a set_page_dirty a_op, then
* just fall through and assume that it wants buffer_heads.
@@ -808,10 +802,8 @@ int fastcall set_page_dirty(struct page

if (likely(mapping)) {
int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
-#ifdef CONFIG_BLOCK
if (!spd)
spd = __set_page_dirty_buffers;
-#endif
return (*spd)(page);
}
if (!PageDirty(page)) {
@@ -823,6 +815,28 @@ int fastcall set_page_dirty(struct page
EXPORT_SYMBOL(set_page_dirty);

/*
+ * Special set_page_dirty() variant for dirtiness coming from a memory
+ * mapping. In this case the ctime/mtime update flag needs to be set.
+ */
+int set_page_dirty_mapping(struct page *page)
+{
+ struct address_space *mapping = page_mapping(page);
+
+ if (likely(mapping)) {
+ int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
+ if (!spd)
+ spd = __set_page_dirty_buffers;
+ set_bit(AS_CMTIME, &mapping->flags);
+ return (*spd)(page);
+ }
+ if (!PageDirty(page)) {
+ if (!TestSetPageDirty(page))
+ return 1;
+ }
+ return 0;
+}
+
+/*
* set_page_dirty() is racy if the caller has no reference against
* page->mapping->host, and if the page is unlocked. This is because another
* CPU could truncate the page off the mapping and then free the mapping.
Index: linux/mm/rmap.c
===================================================================
--- linux.orig/mm/rmap.c 2007-02-27 14:40:56.000000000 +0100
+++ linux/mm/rmap.c 2007-02-27 14:41:07.000000000 +0100
@@ -598,7 +598,7 @@ void page_remove_rmap(struct page *page,
* faster for those pages still in swapcache.
*/
if (page_test_and_clear_dirty(page))
- set_page_dirty(page);
+ set_page_dirty_mapping(page);
__dec_zone_page_state(page,
PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
}
@@ -643,7 +643,7 @@ static int try_to_unmap_one(struct page

/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
- set_page_dirty(page);
+ set_page_dirty_mapping(page);

/* Update high watermark before we lower rss */
update_hiwater_rss(mm);
@@ -777,7 +777,7 @@ static void try_to_unmap_cluster(unsigne

/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
- set_page_dirty(page);
+ set_page_dirty_mapping(page);

page_remove_rmap(page, vma);
page_cache_release(page);
Index: linux/include/linux/writeback.h
===================================================================
--- linux.orig/include/linux/writeback.h 2007-02-27 14:40:56.000000000 +0100
+++ linux/include/linux/writeback.h 2007-02-27 14:41:07.000000000 +0100
@@ -117,7 +117,6 @@ int sync_page_range(struct inode *inode,
loff_t pos, loff_t count);
int sync_page_range_nolock(struct inode *inode, struct address_space *mapping,
loff_t pos, loff_t count);
-void set_page_dirty_balance(struct page *page);
void writeback_set_ratelimit(void);

/* pdflush.c */
Index: linux/mm/mmap.c
===================================================================
--- linux.orig/mm/mmap.c 2007-02-27 14:40:56.000000000 +0100
+++ linux/mm/mmap.c 2007-02-27 14:41:07.000000000 +0100
@@ -226,8 +226,10 @@ static struct vm_area_struct *remove_vma
might_sleep();
if (vma->vm_ops && vma->vm_ops->close)
vma->vm_ops->close(vma);
- if (vma->vm_file)
+ if (vma->vm_file) {
+ mapping_update_time(vma->vm_file);
fput(vma->vm_file);
+ }
mpol_free(vma_policy(vma));
kmem_cache_free(vm_area_cachep, vma);
return next;
Index: linux/mm/hugetlb.c
===================================================================
--- linux.orig/mm/hugetlb.c 2007-02-27 14:40:56.000000000 +0100
+++ linux/mm/hugetlb.c 2007-02-27 14:41:07.000000000 +0100
@@ -390,7 +390,7 @@ void __unmap_hugepage_range(struct vm_ar

page = pte_page(pte);
if (pte_dirty(pte))
- set_page_dirty(page);
+ set_page_dirty_mapping(page);
list_add(&page->lru, &page_list);
}
spin_unlock(&mm->page_table_lock);

--


2007-02-28 14:16:35

by Peter Staubach

[permalink] [raw]
Subject: Re: [patch 01/22] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> Changes:
> o moved check from __fput() to remove_vma(), which is more logical
> o changed set_page_dirty() to set_page_dirty_mapping in hugetlb.c
> o cleaned up #ifdef CONFIG_BLOCK mess
>
>
> This patch makes writing to shared memory mappings update st_ctime and
> st_mtime as defined by SUSv3:
>
> The st_ctime and st_mtime fields of a file that is mapped with
> MAP_SHARED and PROT_WRITE shall be marked for update at some point
> in the interval between a write reference to the mapped region and
> the next call to msync() with MS_ASYNC or MS_SYNC for that portion
> of the file by any process. If there is no such call and if the
> underlying file is modified as a result of a write reference, then
> these fields shall be marked for update at some time after the
> write reference.
>
> A new address_space flag is introduced: AS_CMTIME. This is set each
> time a page is dirtied through a userspace memory mapping. This
> includes write accesses via get_user_pages().
>
> Note, the flag is set unconditionally, even if the page is already
> dirty. This is important, because the page might have been dirtied
> earlier by a non-mmap write.
>
> This flag is checked in msync() and munmap()/mremap(), and if set, the
> file times are updated and the flag is cleared
>
> The flag is also cleared, if the time update is triggered by a normal
> write. This is not mandated by the standard, but seems to be a sane
> thing to do.
>
> Fixes Novell Bugzilla #206431.
>
> Inspired by Peter Staubach's patch and the resulting comments.
>
> Signed-off-by: Miklos Szeredi <[email protected]>

These change still have the undesirable property that although the
modified pages may be flushed to stable storage, the metadata on
the file will not be updated until the application takes positive
action. This is permissible given the current wording in the
specifications, but it would be much more desirable if sync(2),
fsync(P), or the inode being written out due to normal system
activity would also cause the metadata to be updated.

Perhaps the setting of the flag could be checked in some places
like __sync_single_inode() and do_fsync()?

Thanx...

ps

2007-02-28 17:07:14

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [patch 01/22] update ctime and mtime for mmaped write

> These change still have the undesirable property that although the
> modified pages may be flushed to stable storage, the metadata on
> the file will not be updated until the application takes positive
> action. This is permissible given the current wording in the
> specifications, but it would be much more desirable if sync(2),
> fsync(P), or the inode being written out due to normal system
> activity would also cause the metadata to be updated.
>
> Perhaps the setting of the flag could be checked in some places
> like __sync_single_inode() and do_fsync()?

I don't see the point in updating the timestamp from these functions.

The file isn't _modified_ by sync() or fsync(). Just as it's not
modified by stat().

sync() and fsync() do cache->disk, while the file itself stays the
same.

OTOH msync(MS_ASYNC) does memory->file, which is a conceptually file
modifying operation. OK, msync(MS_ASYNC) is actually a no-op on
2.6.18+, but that's purely an implementation detail and no application
should be relying on it.

Before 2.6.18 sync() or fsync() acually didn't flush data written
through a shared mapping to disk, only msync(MS_SYNC), because the
dirty state was only available in the page tables, not in the page or
the inode.

Miklos

2007-02-28 17:22:56

by Peter Staubach

[permalink] [raw]
Subject: Re: [patch 01/22] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
>> These change still have the undesirable property that although the
>> modified pages may be flushed to stable storage, the metadata on
>> the file will not be updated until the application takes positive
>> action. This is permissible given the current wording in the
>> specifications, but it would be much more desirable if sync(2),
>> fsync(P), or the inode being written out due to normal system
>> activity would also cause the metadata to be updated.
>>
>> Perhaps the setting of the flag could be checked in some places
>> like __sync_single_inode() and do_fsync()?
>>
>
> I don't see the point in updating the timestamp from these functions.
>
> The file isn't _modified_ by sync() or fsync(). Just as it's not
> modified by stat().
>
> sync() and fsync() do cache->disk, while the file itself stays the
> same.
>
> OTOH msync(MS_ASYNC) does memory->file, which is a conceptually file
> modifying operation. OK, msync(MS_ASYNC) is actually a no-op on
> 2.6.18+, but that's purely an implementation detail and no application
> should be relying on it.
>
> Before 2.6.18 sync() or fsync() acually didn't flush data written
> through a shared mapping to disk, only msync(MS_SYNC), because the
> dirty state was only available in the page tables, not in the page or
> the inode.

While these entry points do not actually modify the file itself,
as was pointed out, they are handy points at which the kernel gains
control and could actually notice that the contents of the file are
no longer the same as they were, ie. modified.

From the operating system viewpoint, this is where the semantics of
modification to file contents via mmap differs from the semantics of
modification to file contents via write(2).

It is desirable for the file times to be updated as quickly as
possible after the actual modification has occurred. However, this
can only happen when the kernel has a chance to gain control and
either look or notice that a page has been modified.

---

A better design for all of this would be to update the file times
and mark the inode as needing to be written out when a page fault
is taken for a page which either does not exist or needs to be made
writable and that page is part of an appropriate style mapping.

ps

2007-02-28 17:52:28

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [patch 01/22] update ctime and mtime for mmaped write

> While these entry points do not actually modify the file itself,
> as was pointed out, they are handy points at which the kernel gains
> control and could actually notice that the contents of the file are
> no longer the same as they were, ie. modified.
>
> From the operating system viewpoint, this is where the semantics of
> modification to file contents via mmap differs from the semantics of
> modification to file contents via write(2).
>
> It is desirable for the file times to be updated as quickly as
> possible after the actual modification has occurred.

I disagree.

You don't worry about the timestamp being updated _during_ a large
write() call, even though the file is constantly being modified.

You think of write() as something instantaneous, while you think of
writing to a shared mapping, then doing msync() as something taking a
long time. In actual fact both of these are basically equivalent
operations, the differences being, that you can easily modify
non-contiguous parts of a file with mmap, while you can't do that with
write. The disadvantage from mmap comes from the cost of setting up
the page tables and handling the faults.

Think of it this way:

shared mmap write + msync(MS_ASYNC) == write()
msync(MS_ASYNC) + fsync() == msync(MS_SYNC)

> A better design for all of this would be to update the file times
> and mark the inode as needing to be written out when a page fault
> is taken for a page which either does not exist or needs to be made
> writable and that page is part of an appropriate style mapping.

I think this would just be a waste of CPU.

Miklos

2007-02-28 20:01:28

by Peter Staubach

[permalink] [raw]
Subject: Re: [patch 01/22] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
>> While these entry points do not actually modify the file itself,
>> as was pointed out, they are handy points at which the kernel gains
>> control and could actually notice that the contents of the file are
>> no longer the same as they were, ie. modified.
>>
>> From the operating system viewpoint, this is where the semantics of
>> modification to file contents via mmap differs from the semantics of
>> modification to file contents via write(2).
>>
>> It is desirable for the file times to be updated as quickly as
>> possible after the actual modification has occurred.
>>
>
> I disagree.
>
> You don't worry about the timestamp being updated _during_ a large
> write() call, even though the file is constantly being modified.
>
>

No, but you do worry about the timestamps being updated after
every write() call, no matter how large or small.

> You think of write() as something instantaneous, while you think of
> writing to a shared mapping, then doing msync() as something taking a
> long time. In actual fact both of these are basically equivalent
> operations, the differences being, that you can easily modify
> non-contiguous parts of a file with mmap, while you can't do that with
> write. The disadvantage from mmap comes from the cost of setting up
> the page tables and handling the faults.
>
> Think of it this way:
>
> shared mmap write + msync(MS_ASYNC) == write()
> msync(MS_ASYNC) + fsync() == msync(MS_SYNC)
>
>

I don't believe that this is a valid characterization because the
changes to the contents of the file, made through the mmap'd region,
are immediately visible to any and all other applications accessing
the file. Since the contents of the file are changing, then so
should the timestamps to reflect this.

>> A better design for all of this would be to update the file times
>> and mark the inode as needing to be written out when a page fault
>> is taken for a page which either does not exist or needs to be made
>> writable and that page is part of an appropriate style mapping.
>>
>
> I think this would just be a waste of CPU.

I think that we are going to have to agree to disagree because
I don't agree either with your characterizations of the desirable
semantics associated with shared mmap or that maintaining the
correctness in the system is a waste of CPU.

I view mmap as a way for an application to treat the contents of
a file as another segment in its address space. This allows it to
manipulate the contents of a file without incurring the overhead
of the read and write system calls and the double buffering that
naturally occurs with those system calls. I think that:

char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
*p = 1;
*(p + 4096) = 2;

should have the same effect as:

char c = 1;
pwrite(fd, &c, 1, 0);
c = 2;
pwrite(fd, &c, 1, 4096);

Clearly, the two can't be equivalent since the operating system
can only become involved at certain times in order to update the
timestamps. That's why there are specifications about the
timestamps for things like msync. They should be as close as
possible though.

However, since I seem to be the only one presenting a different
viewpoint, then I will agree to disagree and commit. I will see
if I can sell your semantics to my customer and find out if that
will satisfy them.

Thanx...

ps

2007-02-28 20:36:04

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [patch 01/22] update ctime and mtime for mmaped write

> >> While these entry points do not actually modify the file itself,
> >> as was pointed out, they are handy points at which the kernel gains
> >> control and could actually notice that the contents of the file are
> >> no longer the same as they were, ie. modified.
> >>
> >> From the operating system viewpoint, this is where the semantics of
> >> modification to file contents via mmap differs from the semantics of
> >> modification to file contents via write(2).
> >>
> >> It is desirable for the file times to be updated as quickly as
> >> possible after the actual modification has occurred.
> >>
> >
> > I disagree.
> >
> > You don't worry about the timestamp being updated _during_ a large
> > write() call, even though the file is constantly being modified.
> >
> >
>
> No, but you do worry about the timestamps being updated after
> every write() call, no matter how large or small.

Right. All I'm saying is that just writing to a shared mapping
without calling msync() is similar to a write() which hasn't yet
finished. In both cases, you have a modified file, without a modified
timestamp.

> > You think of write() as something instantaneous, while you think of
> > writing to a shared mapping, then doing msync() as something taking a
> > long time. In actual fact both of these are basically equivalent
> > operations, the differences being, that you can easily modify
> > non-contiguous parts of a file with mmap, while you can't do that with
> > write. The disadvantage from mmap comes from the cost of setting up
> > the page tables and handling the faults.
> >
> > Think of it this way:
> >
> > shared mmap write + msync(MS_ASYNC) == write()
> > msync(MS_ASYNC) + fsync() == msync(MS_SYNC)
> >
> >
>
> I don't believe that this is a valid characterization because the
> changes to the contents of the file, made through the mmap'd region,
> are immediately visible to any and all other applications accessing
> the file. Since the contents of the file are changing, then so
> should the timestamps to reflect this.

Same case with a large write(). Nothing prevents you from reading a
file, while a huge write is taking place to it, and yet, the
modification time isn't updated.

> I think that we are going to have to agree to disagree because
> I don't agree either with your characterizations of the desirable
> semantics associated with shared mmap or that maintaining the
> correctness in the system is a waste of CPU.

I didn't quite say _that_ in so many words :). I said that updating
the timestamp on a per-page first dirtying base, or per-inode first
dirtying base is a waste of effort. Why?

What happens if the application overwrites what it had written some
time later? Nothing. The page is already read-write, the pte dirty,
so even though the file was clearly modified, there's absolutely no
way in which this can be used to force an update to the timestamp.

Is there anything special about the _first_ modification? I don't
think so. From an external application's point of view it doesn't
matter one whit, whether a modification was through write() or after a
page-fault, or on an already present read-write page.

So what exactly _are_ the semantics we are trying to achieve?

> I view mmap as a way for an application to treat the contents of a
> file as another segment in its address space. This allows it to
> manipulate the contents of a file without incurring the overhead of
> the read and write system calls and the double buffering that
> naturally occurs with those system calls. I think that:
>
> char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> *p = 1;
> *(p + 4096) = 2;
>
> should have the same effect as:
>
> char c = 1;
> pwrite(fd, &c, 1, 0);
> c = 2;
> pwrite(fd, &c, 1, 4096);

Not necessarily. This is the equivalent _portable_ call sequence:

char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
*p = 1;
*(p + 4096) = 2;
msync(p, 4097, MS_ASYNC);

Yes, on linux the prior would work too, but there's really no point in
allowing applications to be lax and not do it properly. But we've
been over this.

Miklos

2007-02-28 20:59:30

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [patch 01/22] update ctime and mtime for mmaped write

> What happens if the application overwrites what it had written some
> time later? Nothing. The page is already read-write, the pte dirty,
> so even though the file was clearly modified, there's absolutely no
> way in which this can be used to force an update to the timestamp.

Which, I realize now, actually means, that the patch is wrong. Msync
will have to write protect the page table entries, so that later
dirtyings may have an effect on the timestamp.

Thanks,
Miklos

2007-02-28 21:09:38

by Peter Staubach

[permalink] [raw]
Subject: Re: [patch 01/22] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
>> What happens if the application overwrites what it had written some
>> time later? Nothing. The page is already read-write, the pte dirty,
>> so even though the file was clearly modified, there's absolutely no
>> way in which this can be used to force an update to the timestamp.
>>
>
> Which, I realize now, actually means, that the patch is wrong. Msync
> will have to write protect the page table entries, so that later
> dirtyings may have an effect on the timestamp.

I thought that PeterZ's changes were to write-protect the page after
cleaning it so that future modifications could be detected and tracked
accordingly? Does the right thing not happen already?

Thanx...

ps