2007-02-21 17:53:26

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH] update ctime and mtime for mmaped write

From: Miklos Szeredi <[email protected]>

This patch makes writing to shared memory mappings update st_ctime and
st_mtime as defined by SUSv3:

The st_ctime and st_mtime fields of a file that is mapped with
MAP_SHARED and PROT_WRITE shall be marked for update at some point
in the interval between a write reference to the mapped region and
the next call to msync() with MS_ASYNC or MS_SYNC for that portion
of the file by any process. If there is no such call and if the
underlying file is modified as a result of a write reference, then
these fields shall be marked for update at some time after the
write reference.

A new address_space flag is introduced: AS_CMTIME. This is set each
time a page is dirtied through a userspace memory mapping. This
includes write accesses via get_user_pages().

Note, the flag is set unconditionally, even if the page is already
dirty. This is important, because the page might have been dirtied
earlier by a non-mmap write.

This flag is checked in msync() and __fput(), and if set, the file
times are updated and the flag is cleared

The flag is also cleared, if the time update is triggered by a normal
write. This is not mandated by the standard, but seems to be a sane
thing to do.

Fixes Novell Bugzilla #206431.

Inspired by Peter Staubach's patch and the resulting comments.

Signed-off-by: Miklos Szeredi <[email protected]>
---

Index: linux/include/linux/pagemap.h
===================================================================
--- linux.orig/include/linux/pagemap.h 2007-02-21 14:15:06.000000000 +0100
+++ linux/include/linux/pagemap.h 2007-02-21 14:16:04.000000000 +0100
@@ -18,6 +18,7 @@
*/
#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */
#define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */
+#define AS_CMTIME (__GFP_BITS_SHIFT + 2) /* ctime/mtime update needed */

static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
{
Index: linux/mm/msync.c
===================================================================
--- linux.orig/mm/msync.c 2007-02-21 14:14:43.000000000 +0100
+++ linux/mm/msync.c 2007-02-21 14:16:04.000000000 +0100
@@ -77,6 +77,7 @@ asmlinkage long sys_msync(unsigned long
}
file = vma->vm_file;
start = vma->vm_end;
+ mapping_update_time(file);
if ((flags & MS_SYNC) && file &&
(vma->vm_flags & VM_SHARED)) {
get_file(file);
Index: linux/fs/file_table.c
===================================================================
--- linux.orig/fs/file_table.c 2007-02-21 14:14:43.000000000 +0100
+++ linux/fs/file_table.c 2007-02-21 14:16:04.000000000 +0100
@@ -165,6 +165,7 @@ void fastcall __fput(struct file *file)
*/
eventpoll_release(file);
locks_remove_flock(file);
+ mapping_update_time(file);

if (file->f_op && file->f_op->release)
file->f_op->release(inode, file);
Index: linux/fs/inode.c
===================================================================
--- linux.orig/fs/inode.c 2007-02-21 14:14:43.000000000 +0100
+++ linux/fs/inode.c 2007-02-21 14:16:04.000000000 +0100
@@ -1219,6 +1219,7 @@ void file_update_time(struct file *file)
struct timespec now;
int sync_it = 0;

+ clear_bit(AS_CMTIME, &file->f_mapping->flags);
if (IS_NOCMTIME(inode))
return;
if (IS_RDONLY(inode))
@@ -1241,6 +1242,12 @@ void file_update_time(struct file *file)

EXPORT_SYMBOL(file_update_time);

+void mapping_update_time(struct file *file)
+{
+ if (test_bit(AS_CMTIME, &file->f_mapping->flags))
+ file_update_time(file);
+}
+
int inode_needs_sync(struct inode *inode)
{
if (IS_SYNC(inode))
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h 2007-02-21 14:15:06.000000000 +0100
+++ linux/include/linux/fs.h 2007-02-21 14:16:04.000000000 +0100
@@ -1887,6 +1887,7 @@ extern int inode_change_ok(struct inode
extern int __must_check inode_setattr(struct inode *, struct iattr *);

extern void file_update_time(struct file *file);
+extern void mapping_update_time(struct file *file);

static inline ino_t parent_ino(struct dentry *dentry)
{
Index: linux/include/linux/mm.h
===================================================================
--- linux.orig/include/linux/mm.h 2007-02-21 14:14:43.000000000 +0100
+++ linux/include/linux/mm.h 2007-02-21 14:16:04.000000000 +0100
@@ -790,6 +790,7 @@ int redirty_page_for_writepage(struct wr
struct page *page);
int FASTCALL(set_page_dirty(struct page *page));
int set_page_dirty_lock(struct page *page);
+int set_page_dirty_mapping(struct page *page);
int clear_page_dirty_for_io(struct page *page);

extern unsigned long do_mremap(unsigned long addr,
Index: linux/mm/memory.c
===================================================================
--- linux.orig/mm/memory.c 2007-02-21 14:15:06.000000000 +0100
+++ linux/mm/memory.c 2007-02-21 14:16:04.000000000 +0100
@@ -676,7 +676,7 @@ static unsigned long zap_pte_range(struc
anon_rss--;
else {
if (pte_dirty(ptent))
- set_page_dirty(page);
+ set_page_dirty_mapping(page);
if (pte_young(ptent))
mark_page_accessed(page);
file_rss--;
@@ -954,7 +954,7 @@ struct page *follow_page(struct vm_area_
if (flags & FOLL_TOUCH) {
if ((flags & FOLL_WRITE) &&
!pte_dirty(pte) && !PageDirty(page))
- set_page_dirty(page);
+ set_page_dirty_mapping(page);
mark_page_accessed(page);
}
unlock:
@@ -1469,6 +1469,15 @@ static inline void cow_user_page(struct
copy_user_highpage(dst, src, va, vma);
}

+static void set_page_dirty_mapping_balance(struct page *page)
+{
+ if (set_page_dirty_mapping(page)) {
+ struct address_space *mapping = page_mapping(page);
+ if (mapping)
+ balance_dirty_pages_ratelimited(mapping);
+ }
+}
+
/*
* This routine handles present pages, when users try to write
* to a shared page. It is done by copying the page to a new address
@@ -1620,7 +1629,7 @@ gotten:
unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {
- set_page_dirty_balance(dirty_page);
+ set_page_dirty_mapping_balance(dirty_page);
put_page(dirty_page);
}
return ret;
@@ -2274,7 +2283,7 @@ retry:
unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {
- set_page_dirty_balance(dirty_page);
+ set_page_dirty_mapping_balance(dirty_page);
put_page(dirty_page);
}
return ret;
Index: linux/mm/page-writeback.c
===================================================================
--- linux.orig/mm/page-writeback.c 2007-02-21 14:15:54.000000000 +0100
+++ linux/mm/page-writeback.c 2007-02-21 14:16:04.000000000 +0100
@@ -244,16 +244,6 @@ static void balance_dirty_pages(struct a
pdflush_operation(background_writeout, 0);
}

-void set_page_dirty_balance(struct page *page)
-{
- if (set_page_dirty(page)) {
- struct address_space *mapping = page_mapping(page);
-
- if (mapping)
- balance_dirty_pages_ratelimited(mapping);
- }
-}
-
/**
* balance_dirty_pages_ratelimited_nr - balance dirty memory state
* @mapping: address_space which was dirtied
@@ -819,6 +809,30 @@ int fastcall set_page_dirty(struct page
EXPORT_SYMBOL(set_page_dirty);

/*
+ * Special set_page_dirty() variant for dirtiness coming from a memory
+ * mapping. In this case the ctime/mtime update flag needs to be set.
+ */
+int set_page_dirty_mapping(struct page *page)
+{
+ struct address_space *mapping = page_mapping(page);
+
+ if (likely(mapping)) {
+ int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
+#ifdef CONFIG_BLOCK
+ if (!spd)
+ spd = __set_page_dirty_buffers;
+#endif
+ set_bit(AS_CMTIME, &mapping->flags);
+ return (*spd)(page);
+ }
+ if (!PageDirty(page)) {
+ if (!TestSetPageDirty(page))
+ return 1;
+ }
+ return 0;
+}
+
+/*
* set_page_dirty() is racy if the caller has no reference against
* page->mapping->host, and if the page is unlocked. This is because another
* CPU could truncate the page off the mapping and then free the mapping.
Index: linux/mm/rmap.c
===================================================================
--- linux.orig/mm/rmap.c 2007-02-21 14:14:43.000000000 +0100
+++ linux/mm/rmap.c 2007-02-21 14:16:04.000000000 +0100
@@ -598,7 +598,7 @@ void page_remove_rmap(struct page *page,
* faster for those pages still in swapcache.
*/
if (page_test_and_clear_dirty(page))
- set_page_dirty(page);
+ set_page_dirty_mapping(page);
__dec_zone_page_state(page,
PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
}
@@ -643,7 +643,7 @@ static int try_to_unmap_one(struct page

/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
- set_page_dirty(page);
+ set_page_dirty_mapping(page);

/* Update high watermark before we lower rss */
update_hiwater_rss(mm);
@@ -777,7 +777,7 @@ static void try_to_unmap_cluster(unsigne

/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
- set_page_dirty(page);
+ set_page_dirty_mapping(page);

page_remove_rmap(page, vma);
page_cache_release(page);
Index: linux/include/linux/writeback.h
===================================================================
--- linux.orig/include/linux/writeback.h 2007-02-21 14:15:05.000000000 +0100
+++ linux/include/linux/writeback.h 2007-02-21 14:16:04.000000000 +0100
@@ -117,7 +117,6 @@ int sync_page_range(struct inode *inode,
loff_t pos, loff_t count);
int sync_page_range_nolock(struct inode *inode, struct address_space *mapping,
loff_t pos, loff_t count);
-void set_page_dirty_balance(struct page *page);
void writeback_set_ratelimit(void);

/* pdflush.c */


2007-02-21 18:07:21

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> This patch makes writing to shared memory mappings update st_ctime and
> st_mtime as defined by SUSv3:
>
> The st_ctime and st_mtime fields of a file that is mapped with
> MAP_SHARED and PROT_WRITE shall be marked for update at some point
> in the interval between a write reference to the mapped region and
> the next call to msync() with MS_ASYNC or MS_SYNC for that portion
> of the file by any process. If there is no such call and if the
> underlying file is modified as a result of a write reference, then
> these fields shall be marked for update at some time after the
> write reference.
>
> A new address_space flag is introduced: AS_CMTIME. This is set each
> time a page is dirtied through a userspace memory mapping. This
> includes write accesses via get_user_pages().
>
> Note, the flag is set unconditionally, even if the page is already
> dirty. This is important, because the page might have been dirtied
> earlier by a non-mmap write.
>
> This flag is checked in msync() and __fput(), and if set, the file
> times are updated and the flag is cleared
>
> The flag is also cleared, if the time update is triggered by a normal
> write. This is not mandated by the standard, but seems to be a sane
> thing to do.
>
> Fixes Novell Bugzilla #206431.
>
> Inspired by Peter Staubach's patch and the resulting comments.
>
>

An updated version of the original patch was submitted to LKML
yesterday... :-)

Some comments below --

> Signed-off-by: Miklos Szeredi <[email protected]>
> ---
>
> Index: linux/include/linux/pagemap.h
> ===================================================================
> --- linux.orig/include/linux/pagemap.h 2007-02-21 14:15:06.000000000 +0100
> +++ linux/include/linux/pagemap.h 2007-02-21 14:16:04.000000000 +0100
> @@ -18,6 +18,7 @@
> */
> #define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */
> #define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */
> +#define AS_CMTIME (__GFP_BITS_SHIFT + 2) /* ctime/mtime update needed */
>
> static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
> {
> Index: linux/mm/msync.c
> ===================================================================
> --- linux.orig/mm/msync.c 2007-02-21 14:14:43.000000000 +0100
> +++ linux/mm/msync.c 2007-02-21 14:16:04.000000000 +0100
> @@ -77,6 +77,7 @@ asmlinkage long sys_msync(unsigned long
> }
> file = vma->vm_file;
> start = vma->vm_end;
> + mapping_update_time(file);
> if ((flags & MS_SYNC) && file &&
> (vma->vm_flags & VM_SHARED)) {
> get_file(file);
>

It seems to me that this might lead to file times being updated for
non-MAP_SHARED mappings.

> Index: linux/fs/file_table.c
> ===================================================================
> --- linux.orig/fs/file_table.c 2007-02-21 14:14:43.000000000 +0100
> +++ linux/fs/file_table.c 2007-02-21 14:16:04.000000000 +0100
> @@ -165,6 +165,7 @@ void fastcall __fput(struct file *file)
> */
> eventpoll_release(file);
> locks_remove_flock(file);
> + mapping_update_time(file);
>
> if (file->f_op && file->f_op->release)
> file->f_op->release(inode, file);
> Index: linux/fs/inode.c
> ===================================================================
> --- linux.orig/fs/inode.c 2007-02-21 14:14:43.000000000 +0100
> +++ linux/fs/inode.c 2007-02-21 14:16:04.000000000 +0100
> @@ -1219,6 +1219,7 @@ void file_update_time(struct file *file)
> struct timespec now;
> int sync_it = 0;
>
> + clear_bit(AS_CMTIME, &file->f_mapping->flags);
> if (IS_NOCMTIME(inode))
> return;
> if (IS_RDONLY(inode))
> @@ -1241,6 +1242,12 @@ void file_update_time(struct file *file)
>
> EXPORT_SYMBOL(file_update_time);
>
> +void mapping_update_time(struct file *file)
> +{
> + if (test_bit(AS_CMTIME, &file->f_mapping->flags))
> + file_update_time(file);
> +}
> +
> int inode_needs_sync(struct inode *inode)
> {
> if (IS_SYNC(inode))
> Index: linux/include/linux/fs.h
> ===================================================================
> --- linux.orig/include/linux/fs.h 2007-02-21 14:15:06.000000000 +0100
> +++ linux/include/linux/fs.h 2007-02-21 14:16:04.000000000 +0100
> @@ -1887,6 +1887,7 @@ extern int inode_change_ok(struct inode
> extern int __must_check inode_setattr(struct inode *, struct iattr *);
>
> extern void file_update_time(struct file *file);
> +extern void mapping_update_time(struct file *file);
>
> static inline ino_t parent_ino(struct dentry *dentry)
> {
> Index: linux/include/linux/mm.h
> ===================================================================
> --- linux.orig/include/linux/mm.h 2007-02-21 14:14:43.000000000 +0100
> +++ linux/include/linux/mm.h 2007-02-21 14:16:04.000000000 +0100
> @@ -790,6 +790,7 @@ int redirty_page_for_writepage(struct wr
> struct page *page);
> int FASTCALL(set_page_dirty(struct page *page));
> int set_page_dirty_lock(struct page *page);
> +int set_page_dirty_mapping(struct page *page);
>

This aspect of the design seems intrusive to me. I didn't see a strong
reason to introduce new versions of many of the routines just to handle
these semantics. What motivated this part of your design? Why the new
_mapping versions of routines?

Thanx...

ps

> int clear_page_dirty_for_io(struct page *page);
>
> extern unsigned long do_mremap(unsigned long addr,
> Index: linux/mm/memory.c
> ===================================================================
> --- linux.orig/mm/memory.c 2007-02-21 14:15:06.000000000 +0100
> +++ linux/mm/memory.c 2007-02-21 14:16:04.000000000 +0100
> @@ -676,7 +676,7 @@ static unsigned long zap_pte_range(struc
> anon_rss--;
> else {
> if (pte_dirty(ptent))
> - set_page_dirty(page);
> + set_page_dirty_mapping(page);
> if (pte_young(ptent))
> mark_page_accessed(page);
> file_rss--;
> @@ -954,7 +954,7 @@ struct page *follow_page(struct vm_area_
> if (flags & FOLL_TOUCH) {
> if ((flags & FOLL_WRITE) &&
> !pte_dirty(pte) && !PageDirty(page))
> - set_page_dirty(page);
> + set_page_dirty_mapping(page);
> mark_page_accessed(page);
> }
> unlock:
> @@ -1469,6 +1469,15 @@ static inline void cow_user_page(struct
> copy_user_highpage(dst, src, va, vma);
> }
>
> +static void set_page_dirty_mapping_balance(struct page *page)
> +{
> + if (set_page_dirty_mapping(page)) {
> + struct address_space *mapping = page_mapping(page);
> + if (mapping)
> + balance_dirty_pages_ratelimited(mapping);
> + }
> +}
> +
> /*
> * This routine handles present pages, when users try to write
> * to a shared page. It is done by copying the page to a new address
> @@ -1620,7 +1629,7 @@ gotten:
> unlock:
> pte_unmap_unlock(page_table, ptl);
> if (dirty_page) {
> - set_page_dirty_balance(dirty_page);
> + set_page_dirty_mapping_balance(dirty_page);
> put_page(dirty_page);
> }
> return ret;
> @@ -2274,7 +2283,7 @@ retry:
> unlock:
> pte_unmap_unlock(page_table, ptl);
> if (dirty_page) {
> - set_page_dirty_balance(dirty_page);
> + set_page_dirty_mapping_balance(dirty_page);
> put_page(dirty_page);
> }
> return ret;
> Index: linux/mm/page-writeback.c
> ===================================================================
> --- linux.orig/mm/page-writeback.c 2007-02-21 14:15:54.000000000 +0100
> +++ linux/mm/page-writeback.c 2007-02-21 14:16:04.000000000 +0100
> @@ -244,16 +244,6 @@ static void balance_dirty_pages(struct a
> pdflush_operation(background_writeout, 0);
> }
>
> -void set_page_dirty_balance(struct page *page)
> -{
> - if (set_page_dirty(page)) {
> - struct address_space *mapping = page_mapping(page);
> -
> - if (mapping)
> - balance_dirty_pages_ratelimited(mapping);
> - }
> -}
> -
> /**
> * balance_dirty_pages_ratelimited_nr - balance dirty memory state
> * @mapping: address_space which was dirtied
> @@ -819,6 +809,30 @@ int fastcall set_page_dirty(struct page
> EXPORT_SYMBOL(set_page_dirty);
>
> /*
> + * Special set_page_dirty() variant for dirtiness coming from a memory
> + * mapping. In this case the ctime/mtime update flag needs to be set.
> + */
> +int set_page_dirty_mapping(struct page *page)
> +{
> + struct address_space *mapping = page_mapping(page);
> +
> + if (likely(mapping)) {
> + int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
> +#ifdef CONFIG_BLOCK
> + if (!spd)
> + spd = __set_page_dirty_buffers;
> +#endif
> + set_bit(AS_CMTIME, &mapping->flags);
> + return (*spd)(page);
> + }
> + if (!PageDirty(page)) {
> + if (!TestSetPageDirty(page))
> + return 1;
> + }
> + return 0;
> +}
> +
> +/*
> * set_page_dirty() is racy if the caller has no reference against
> * page->mapping->host, and if the page is unlocked. This is because another
> * CPU could truncate the page off the mapping and then free the mapping.
> Index: linux/mm/rmap.c
> ===================================================================
> --- linux.orig/mm/rmap.c 2007-02-21 14:14:43.000000000 +0100
> +++ linux/mm/rmap.c 2007-02-21 14:16:04.000000000 +0100
> @@ -598,7 +598,7 @@ void page_remove_rmap(struct page *page,
> * faster for those pages still in swapcache.
> */
> if (page_test_and_clear_dirty(page))
> - set_page_dirty(page);
> + set_page_dirty_mapping(page);
> __dec_zone_page_state(page,
> PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
> }
> @@ -643,7 +643,7 @@ static int try_to_unmap_one(struct page
>
> /* Move the dirty bit to the physical page now the pte is gone. */
> if (pte_dirty(pteval))
> - set_page_dirty(page);
> + set_page_dirty_mapping(page);
>
> /* Update high watermark before we lower rss */
> update_hiwater_rss(mm);
> @@ -777,7 +777,7 @@ static void try_to_unmap_cluster(unsigne
>
> /* Move the dirty bit to the physical page now the pte is gone. */
> if (pte_dirty(pteval))
> - set_page_dirty(page);
> + set_page_dirty_mapping(page);
>
> page_remove_rmap(page, vma);
> page_cache_release(page);
> Index: linux/include/linux/writeback.h
> ===================================================================
> --- linux.orig/include/linux/writeback.h 2007-02-21 14:15:05.000000000 +0100
> +++ linux/include/linux/writeback.h 2007-02-21 14:16:04.000000000 +0100
> @@ -117,7 +117,6 @@ int sync_page_range(struct inode *inode,
> loff_t pos, loff_t count);
> int sync_page_range_nolock(struct inode *inode, struct address_space *mapping,
> loff_t pos, loff_t count);
> -void set_page_dirty_balance(struct page *page);
> void writeback_set_ratelimit(void);
>
> /* pdflush.c */
>

2007-02-21 18:12:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

On Wed, 2007-02-21 at 18:51 +0100, Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> This patch makes writing to shared memory mappings update st_ctime and
> st_mtime as defined by SUSv3:
>
> The st_ctime and st_mtime fields of a file that is mapped with
> MAP_SHARED and PROT_WRITE shall be marked for update at some point
> in the interval between a write reference to the mapped region and
> the next call to msync() with MS_ASYNC or MS_SYNC for that portion
> of the file by any process. If there is no such call and if the
> underlying file is modified as a result of a write reference, then
> these fields shall be marked for update at some time after the
> write reference.
>
> A new address_space flag is introduced: AS_CMTIME. This is set each
> time a page is dirtied through a userspace memory mapping. This
> includes write accesses via get_user_pages().
>
> Note, the flag is set unconditionally, even if the page is already
> dirty. This is important, because the page might have been dirtied
> earlier by a non-mmap write.
>
> This flag is checked in msync() and __fput(), and if set, the file
> times are updated and the flag is cleared

Why not also check inside vfs_getattr?

Cheers
Trond

2007-02-21 18:24:31

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

> > Inspired by Peter Staubach's patch and the resulting comments.
> >
> >
>
> An updated version of the original patch was submitted to LKML
> yesterday... :-)

Strange coincidence :)

> > file = vma->vm_file;
> > start = vma->vm_end;
> > + mapping_update_time(file);
> > if ((flags & MS_SYNC) && file &&
> > (vma->vm_flags & VM_SHARED)) {
> > get_file(file);
> >
>
> It seems to me that this might lead to file times being updated for
> non-MAP_SHARED mappings.

In theory no, because the COW-ed pages become anonymous and are not
part of the original mapping any more.

> > +int set_page_dirty_mapping(struct page *page);
> >
>
> This aspect of the design seems intrusive to me. I didn't see a strong
> reason to introduce new versions of many of the routines just to handle
> these semantics. What motivated this part of your design? Why the new
> _mapping versions of routines?

Because there's no way to know inside the set_page_dirty() functions
if the dirtying comes from a memory mapping or from a modification
through a normal write(). And they have different semantics, for
write() the modification times are updated immediately.

Thanks,
Miklos

2007-02-21 18:29:40

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

> > This flag is checked in msync() and __fput(), and if set, the file
> > times are updated and the flag is cleared
>
> Why not also check inside vfs_getattr?

This is the minimum, that the standard asks for.

Note, your porposal would touch the times in vfs_getattr(), which
means, that the modification times would depend on the time of the
last stat() call, which is not really right, though it would still be
conforming.

It is much saner, if the modification time is always the time of the
last write() or msync().

Miklos

2007-02-21 18:36:54

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

On Wed, 2007-02-21 at 19:28 +0100, Miklos Szeredi wrote:
> > > This flag is checked in msync() and __fput(), and if set, the file
> > > times are updated and the flag is cleared
> >
> > Why not also check inside vfs_getattr?
>
> This is the minimum, that the standard asks for.
>
> Note, your porposal would touch the times in vfs_getattr(), which
> means, that the modification times would depend on the time of the
> last stat() call, which is not really right, though it would still be
> conforming.
>
> It is much saner, if the modification time is always the time of the
> last write() or msync().

I disagree. The above doesn't allow a program like 'make' to discover
whether or not the file has changed by simply calling stat(). Instead,
you're forcing a call to msync()+stat().

Cheers
Trond

2007-02-21 18:51:10

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

Trond Myklebust wrote:
> On Wed, 2007-02-21 at 19:28 +0100, Miklos Szeredi wrote:
>
>>>> This flag is checked in msync() and __fput(), and if set, the file
>>>> times are updated and the flag is cleared
>>>>
>>> Why not also check inside vfs_getattr?
>>>
>> This is the minimum, that the standard asks for.
>>
>> Note, your porposal would touch the times in vfs_getattr(), which
>> means, that the modification times would depend on the time of the
>> last stat() call, which is not really right, though it would still be
>> conforming.
>>
>> It is much saner, if the modification time is always the time of the
>> last write() or msync().
>>
>
> I disagree. The above doesn't allow a program like 'make' to discover
> whether or not the file has changed by simply calling stat(). Instead,
> you're forcing a call to msync()+stat().

Right, but that's what the specification specifies. The file times
for mmap'd files are not maintained as tightly as they are for files
which are being modified via write(2).

Rightly or wrongly.

ps

2007-02-21 18:51:38

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

> > > > This flag is checked in msync() and __fput(), and if set, the file
> > > > times are updated and the flag is cleared
> > >
> > > Why not also check inside vfs_getattr?
> >
> > This is the minimum, that the standard asks for.
> >
> > Note, your porposal would touch the times in vfs_getattr(), which
> > means, that the modification times would depend on the time of the
> > last stat() call, which is not really right, though it would still be
> > conforming.
> >
> > It is much saner, if the modification time is always the time of the
> > last write() or msync().
>
> I disagree. The above doesn't allow a program like 'make' to discover
> whether or not the file has changed by simply calling stat(). Instead,
> you're forcing a call to msync()+stat().

Yes, but that's the only portable way _anyway_.

And it probably doesn't matter, programs using mmap to write to a file
_will_ call msync, or at least close the file, when they're done.

Thanks,
Miklos

2007-02-21 18:55:38

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
>>> Inspired by Peter Staubach's patch and the resulting comments.
>>>
>>>
>>>
>> An updated version of the original patch was submitted to LKML
>> yesterday... :-)
>>
>
> Strange coincidence :)
>
>
>>> file = vma->vm_file;
>>> start = vma->vm_end;
>>> + mapping_update_time(file);
>>> if ((flags & MS_SYNC) && file &&
>>> (vma->vm_flags & VM_SHARED)) {
>>> get_file(file);
>>>
>>>
>> It seems to me that this might lead to file times being updated for
>> non-MAP_SHARED mappings.
>>
>
> In theory no, because the COW-ed pages become anonymous and are not
> part of the original mapping any more.
>
>

I must profess to having a incomplete understanding of all of this
support, but then why would it be necessary to test VM_SHARED at
this point in msync()?

I ran into problems early on with file times being updated incorrectly
so I am a little sensitive this aspect.

>>> +int set_page_dirty_mapping(struct page *page);
>>>
>>>
>> This aspect of the design seems intrusive to me. I didn't see a strong
>> reason to introduce new versions of many of the routines just to handle
>> these semantics. What motivated this part of your design? Why the new
>> _mapping versions of routines?
>>
>
> Because there's no way to know inside the set_page_dirty() functions
> if the dirtying comes from a memory mapping or from a modification
> through a normal write(). And they have different semantics, for
> write() the modification times are updated immediately.

Perhaps I didn't understand what page_mapped() does, but it does seem to
have the right semantics as far as I could see.

Thanx...

ps

2007-02-21 19:08:27

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

> >>> Inspired by Peter Staubach's patch and the resulting comments.
> >>>
> >>>
> >>>
> >> An updated version of the original patch was submitted to LKML
> >> yesterday... :-)
> >>
> >
> > Strange coincidence :)
> >
> >
> >>> file = vma->vm_file;
> >>> start = vma->vm_end;
> >>> + mapping_update_time(file);
> >>> if ((flags & MS_SYNC) && file &&
> >>> (vma->vm_flags & VM_SHARED)) {
> >>> get_file(file);
> >>>
> >>>
> >> It seems to me that this might lead to file times being updated for
> >> non-MAP_SHARED mappings.
> >>
> >
> > In theory no, because the COW-ed pages become anonymous and are not
> > part of the original mapping any more.
> >
> >
>
> I must profess to having a incomplete understanding of all of this
> support, but then why would it be necessary to test VM_SHARED at
> this point in msync()?

That's basically just an optimization. If it wasn't there, then data
from a another (shared) mapping could be written back, which is not
wrong, but not required either.

> I ran into problems early on with file times being updated incorrectly
> so I am a little sensitive this aspect.
>
> >>> +int set_page_dirty_mapping(struct page *page);
> >>>
> >>>
> >> This aspect of the design seems intrusive to me. I didn't see a strong
> >> reason to introduce new versions of many of the routines just to handle
> >> these semantics. What motivated this part of your design? Why the new
> >> _mapping versions of routines?
> >>
> >
> > Because there's no way to know inside the set_page_dirty() functions
> > if the dirtying comes from a memory mapping or from a modification
> > through a normal write(). And they have different semantics, for
> > write() the modification times are updated immediately.
>
> Perhaps I didn't understand what page_mapped() does, but it does seem to
> have the right semantics as far as I could see.

The problems will start, when you have a file that is both mapped and
modified with write(). Then the dirying from the write() will set the
flag, and that will have undesirable consequences.

Miklos

2007-02-22 04:26:32

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

On Wed, 21 Feb 2007 18:51:52 +0100 Miklos Szeredi <[email protected]> wrote:

> This patch makes writing to shared memory mappings update st_ctime and
> st_mtime as defined by SUSv3:
>
> The st_ctime and st_mtime fields of a file that is mapped with
> MAP_SHARED and PROT_WRITE shall be marked for update at some point
> in the interval between a write reference to the mapped region and
> the next call to msync() with MS_ASYNC or MS_SYNC for that portion
> of the file by any process. If there is no such call and if the
> underlying file is modified as a result of a write reference, then
> these fields shall be marked for update at some time after the
> write reference.
>
> A new address_space flag is introduced: AS_CMTIME. This is set each
> time a page is dirtied through a userspace memory mapping. This
> includes write accesses via get_user_pages().
>
> Note, the flag is set unconditionally, even if the page is already
> dirty. This is important, because the page might have been dirtied
> earlier by a non-mmap write.
>
> This flag is checked in msync() and __fput(), and if set, the file
> times are updated and the flag is cleared
>
> The flag is also cleared, if the time update is triggered by a normal
> write. This is not mandated by the standard, but seems to be a sane
> thing to do.

Why is the flag checked in __fput()?

2007-02-22 07:49:50

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

> On Wed, 21 Feb 2007 18:51:52 +0100 Miklos Szeredi <[email protected]> wrote:
>
> > This patch makes writing to shared memory mappings update st_ctime and
> > st_mtime as defined by SUSv3:
> >
> > The st_ctime and st_mtime fields of a file that is mapped with
> > MAP_SHARED and PROT_WRITE shall be marked for update at some point
> > in the interval between a write reference to the mapped region and
> > the next call to msync() with MS_ASYNC or MS_SYNC for that portion
> > of the file by any process. If there is no such call and if the
> > underlying file is modified as a result of a write reference, then
> > these fields shall be marked for update at some time after the
> > write reference.
> >
> > A new address_space flag is introduced: AS_CMTIME. This is set each
> > time a page is dirtied through a userspace memory mapping. This
> > includes write accesses via get_user_pages().
> >
> > Note, the flag is set unconditionally, even if the page is already
> > dirty. This is important, because the page might have been dirtied
> > earlier by a non-mmap write.
> >
> > This flag is checked in msync() and __fput(), and if set, the file
> > times are updated and the flag is cleared
> >
> > The flag is also cleared, if the time update is triggered by a normal
> > write. This is not mandated by the standard, but seems to be a sane
> > thing to do.
>
> Why is the flag checked in __fput()?

It's because of this bit in the standard:

If there is no such call and if the underlying file is modified
as a result of a write reference, then these fields shall be
marked for update at some time after the write reference.

It could be done in munmap/mremap, but it seemed more difficult to
track down all the places where the vma is removed. But yes, that may
be a nicer solution.

Miklos

2007-02-22 17:39:55

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
>> On Wed, 21 Feb 2007 18:51:52 +0100 Miklos Szeredi <[email protected]> wrote:
>>
>>
>>> This patch makes writing to shared memory mappings update st_ctime and
>>> st_mtime as defined by SUSv3:
>>>
>>> The st_ctime and st_mtime fields of a file that is mapped with
>>> MAP_SHARED and PROT_WRITE shall be marked for update at some point
>>> in the interval between a write reference to the mapped region and
>>> the next call to msync() with MS_ASYNC or MS_SYNC for that portion
>>> of the file by any process. If there is no such call and if the
>>> underlying file is modified as a result of a write reference, then
>>> these fields shall be marked for update at some time after the
>>> write reference.
>>>
>>> A new address_space flag is introduced: AS_CMTIME. This is set each
>>> time a page is dirtied through a userspace memory mapping. This
>>> includes write accesses via get_user_pages().
>>>
>>> Note, the flag is set unconditionally, even if the page is already
>>> dirty. This is important, because the page might have been dirtied
>>> earlier by a non-mmap write.
>>>
>>> This flag is checked in msync() and __fput(), and if set, the file
>>> times are updated and the flag is cleared
>>>
>>> The flag is also cleared, if the time update is triggered by a normal
>>> write. This is not mandated by the standard, but seems to be a sane
>>> thing to do.
>>>
>> Why is the flag checked in __fput()?
>>
>
> It's because of this bit in the standard:
>
> If there is no such call and if the underlying file is modified
> as a result of a write reference, then these fields shall be
> marked for update at some time after the write reference.
>
> It could be done in munmap/mremap, but it seemed more difficult to
> track down all the places where the vma is removed. But yes, that may
> be a nicer solution.

It seems to me that, with this support, a file, which is mmap'd,
modified, but never msync'd or munmap'd, will never get its mtime
updated. Or did I miss that?

I also don't see how an mmap'd block device will get its mtime
updated either.

Thanx...

ps

2007-02-22 17:53:33

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
>>>>> Inspired by Peter Staubach's patch and the resulting comments.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> An updated version of the original patch was submitted to LKML
>>>> yesterday... :-)
>>>>
>>>>
>>> Strange coincidence :)
>>>
>>>
>>>
>>>>> file = vma->vm_file;
>>>>> start = vma->vm_end;
>>>>> + mapping_update_time(file);
>>>>> if ((flags & MS_SYNC) && file &&
>>>>> (vma->vm_flags & VM_SHARED)) {
>>>>> get_file(file);
>>>>>
>>>>>
>>>>>
>>>> It seems to me that this might lead to file times being updated for
>>>> non-MAP_SHARED mappings.
>>>>
>>>>
>>> In theory no, because the COW-ed pages become anonymous and are not
>>> part of the original mapping any more.
>>>
>>>
>>>
>> I must profess to having a incomplete understanding of all of this
>> support, but then why would it be necessary to test VM_SHARED at
>> this point in msync()?
>>
>
> That's basically just an optimization. If it wasn't there, then data
> from a another (shared) mapping could be written back, which is not
> wrong, but not required either.
>
>
>> I ran into problems early on with file times being updated incorrectly
>> so I am a little sensitive this aspect.
>>
>>
>>>>> +int set_page_dirty_mapping(struct page *page);
>>>>>
>>>>>
>>>>>
>>>> This aspect of the design seems intrusive to me. I didn't see a strong
>>>> reason to introduce new versions of many of the routines just to handle
>>>> these semantics. What motivated this part of your design? Why the new
>>>> _mapping versions of routines?
>>>>
>>>>
>>> Because there's no way to know inside the set_page_dirty() functions
>>> if the dirtying comes from a memory mapping or from a modification
>>> through a normal write(). And they have different semantics, for
>>> write() the modification times are updated immediately.
>>>
>> Perhaps I didn't understand what page_mapped() does, but it does seem to
>> have the right semantics as far as I could see.
>>
>
> The problems will start, when you have a file that is both mapped and
> modified with write(). Then the dirying from the write() will set the
> flag, and that will have undesirable consequences.

I don't think that I quite follow the logic. The dirtying from write()
will set the flag, but then the mtime will get updated and the flag will
be cleared by the hook in file_update_time(). Right?

Thanx...

ps

2007-02-22 18:09:32

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

> >> Why is the flag checked in __fput()?
> >>
> >
> > It's because of this bit in the standard:
> >
> > If there is no such call and if the underlying file is modified
> > as a result of a write reference, then these fields shall be
> > marked for update at some time after the write reference.
> >
> > It could be done in munmap/mremap, but it seemed more difficult to
> > track down all the places where the vma is removed. But yes, that may
> > be a nicer solution.
>
> It seems to me that, with this support, a file, which is mmap'd,
> modified, but never msync'd or munmap'd, will never get its mtime
> updated. Or did I miss that?
>
> I also don't see how an mmap'd block device will get its mtime
> updated either.

__fput() will be called when there are no more references to 'file',
then it will update the time if the flag is set. This applies to
regular files as well as devices.

But I've moved the check from __fput to remove_vma() in the next
revision of the patch, which would give slightly nicer semantics, and
be equally conforming.

Thanks,
Miklos

2007-02-22 18:17:20

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

> >>>>> +int set_page_dirty_mapping(struct page *page);
> >>>>>
> >>>>>
> >>>>>
> >>>> This aspect of the design seems intrusive to me. I didn't see a strong
> >>>> reason to introduce new versions of many of the routines just to handle
> >>>> these semantics. What motivated this part of your design? Why the new
> >>>> _mapping versions of routines?
> >>>>
> >>>>
> >>> Because there's no way to know inside the set_page_dirty() functions
> >>> if the dirtying comes from a memory mapping or from a modification
> >>> through a normal write(). And they have different semantics, for
> >>> write() the modification times are updated immediately.
> >>>
> >> Perhaps I didn't understand what page_mapped() does, but it does seem to
> >> have the right semantics as far as I could see.
> >>
> >
> > The problems will start, when you have a file that is both mapped and
> > modified with write(). Then the dirying from the write() will set the
> > flag, and that will have undesirable consequences.
>
> I don't think that I quite follow the logic. The dirtying from write()
> will set the flag, but then the mtime will get updated and the flag will
> be cleared by the hook in file_update_time(). Right?

Take this example:

fd = open()
addr = mmap(.., fd)
write(fd, ...)
close(fd)
sleep(100)
msync(addr,...)
munmap(addr)

The file times will be updated in write(), but with your patch, the
bit in the mapping will also be set.

Then in msync() the file times will be updated again, which is wrong,
since the memory was _not_ modified through the mapping.

Thanks,
Miklos

2007-02-22 20:11:45

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
>>>>>>> +int set_page_dirty_mapping(struct page *page);
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> This aspect of the design seems intrusive to me. I didn't see a strong
>>>>>> reason to introduce new versions of many of the routines just to handle
>>>>>> these semantics. What motivated this part of your design? Why the new
>>>>>> _mapping versions of routines?
>>>>>>
>>>>>>
>>>>>>
>>>>> Because there's no way to know inside the set_page_dirty() functions
>>>>> if the dirtying comes from a memory mapping or from a modification
>>>>> through a normal write(). And they have different semantics, for
>>>>> write() the modification times are updated immediately.
>>>>>
>>>>>
>>>> Perhaps I didn't understand what page_mapped() does, but it does seem to
>>>> have the right semantics as far as I could see.
>>>>
>>>>
>>> The problems will start, when you have a file that is both mapped and
>>> modified with write(). Then the dirying from the write() will set the
>>> flag, and that will have undesirable consequences.
>>>
>> I don't think that I quite follow the logic. The dirtying from write()
>> will set the flag, but then the mtime will get updated and the flag will
>> be cleared by the hook in file_update_time(). Right?
>>
>
> Take this example:
>
> fd = open()
> addr = mmap(.., fd)
> write(fd, ...)
> close(fd)
> sleep(100)
> msync(addr,...)
> munmap(addr)
>
> The file times will be updated in write(), but with your patch, the
> bit in the mapping will also be set.
>
> Then in msync() the file times will be updated again, which is wrong,
> since the memory was _not_ modified through the mapping.

This is correct. I have updated my proposed patch to include the clearing
of AS_MCTIME in the routine which updates the mtime field. I haven't
reposted it yet until I complete testing of the new resulting system. I
anticipate doing this later today.

Thanx..

ps

2007-02-22 20:15:14

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
>>>> Why is the flag checked in __fput()?
>>>>
>>>>
>>> It's because of this bit in the standard:
>>>
>>> If there is no such call and if the underlying file is modified
>>> as a result of a write reference, then these fields shall be
>>> marked for update at some time after the write reference.
>>>
>>> It could be done in munmap/mremap, but it seemed more difficult to
>>> track down all the places where the vma is removed. But yes, that may
>>> be a nicer solution.
>>>
>> It seems to me that, with this support, a file, which is mmap'd,
>> modified, but never msync'd or munmap'd, will never get its mtime
>> updated. Or did I miss that?
>>
>> I also don't see how an mmap'd block device will get its mtime
>> updated either.
>>
>
> __fput() will be called when there are no more references to 'file',
> then it will update the time if the flag is set. This applies to
> regular files as well as devices.
>
>

I suspect that you will find that, for a block device, the wrong inode
gets updated. That's where the bd_inode_update_time() portion of my
proposed patch came from.

> But I've moved the check from __fput to remove_vma() in the next
> revision of the patch, which would give slightly nicer semantics, and
> be equally conforming.

This still does not address the situation where a file is 'permanently'
mmap'd, does it?

Thanx...

ps

2007-02-22 20:44:12

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

> > Take this example:
> >
> > fd = open()
> > addr = mmap(.., fd)
> > write(fd, ...)
> > close(fd)
> > sleep(100)
> > msync(addr,...)
> > munmap(addr)
> >
> > The file times will be updated in write(), but with your patch, the
> > bit in the mapping will also be set.
> >
> > Then in msync() the file times will be updated again, which is wrong,
> > since the memory was _not_ modified through the mapping.
>
> This is correct. I have updated my proposed patch to include the clearing
> of AS_MCTIME in the routine which updates the mtime field.

That doesn't really help. Look at __generic_file_aio_write_nolock().
file_update_time() is called before the data is written, so after the
last write, there will be nothing to clear the flag.

And even if fixed this case by moving the file_update_time() call to
the end of the function, there's no guarantee, that some filesystem
won't do something exotic and call set_page_dirty() indenpendently of
write(). Good luck auditing all the set_page_dirty() calls ;)

Thanks,
Miklos

2007-02-22 20:48:56

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

> > __fput() will be called when there are no more references to 'file',
> > then it will update the time if the flag is set. This applies to
> > regular files as well as devices.
> >
> >
>
> I suspect that you will find that, for a block device, the wrong inode
> gets updated. That's where the bd_inode_update_time() portion of my
> proposed patch came from.

How horrible :( I haven't noticed that part of the patch. But I don't
think that's needed. Updating the times through the file pointer
should be OK. You have this problem because you use the inode which
comes from the blockdev pseudo-filesystem.

>
> > But I've moved the check from __fput to remove_vma() in the next
> > revision of the patch, which would give slightly nicer semantics, and
> > be equally conforming.
>
> This still does not address the situation where a file is 'permanently'
> mmap'd, does it?

So? If application doesn't do msync, then the file times won't be
updated. That's allowed by the standard, and so portable applications
will have to call msync.

Thanks,
Miklos

2007-02-22 20:50:30

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
>>> Take this example:
>>>
>>> fd = open()
>>> addr = mmap(.., fd)
>>> write(fd, ...)
>>> close(fd)
>>> sleep(100)
>>> msync(addr,...)
>>> munmap(addr)
>>>
>>> The file times will be updated in write(), but with your patch, the
>>> bit in the mapping will also be set.
>>>
>>> Then in msync() the file times will be updated again, which is wrong,
>>> since the memory was _not_ modified through the mapping.
>>>
>> This is correct. I have updated my proposed patch to include the clearing
>> of AS_MCTIME in the routine which updates the mtime field.
>>
>
> That doesn't really help. Look at __generic_file_aio_write_nolock().
> file_update_time() is called before the data is written, so after the
> last write, there will be nothing to clear the flag.
>
> And even if fixed this case by moving the file_update_time() call to
> the end of the function, there's no guarantee, that some filesystem
> won't do something exotic and call set_page_dirty() indenpendently of
> write(). Good luck auditing all the set_page_dirty() calls ;)

Interesting.

No, I wouldn't want to get into worrying about set_page_dirty() calls.
Life is short and people have too much imagination... :-)

Thanx...

ps

2007-02-22 20:55:55

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
>>> __fput() will be called when there are no more references to 'file',
>>> then it will update the time if the flag is set. This applies to
>>> regular files as well as devices.
>>>
>>>
>>>
>> I suspect that you will find that, for a block device, the wrong inode
>> gets updated. That's where the bd_inode_update_time() portion of my
>> proposed patch came from.
>>
>
> How horrible :( I haven't noticed that part of the patch. But I don't
> think that's needed. Updating the times through the file pointer
> should be OK. You have this problem because you use the inode which
> comes from the blockdev pseudo-filesystem.
>
>

It was nasty, I certainly agree. :-)

>>> But I've moved the check from __fput to remove_vma() in the next
>>> revision of the patch, which would give slightly nicer semantics, and
>>> be equally conforming.
>>>
>> This still does not address the situation where a file is 'permanently'
>> mmap'd, does it?
>>
>
> So? If application doesn't do msync, then the file times won't be
> updated. That's allowed by the standard, and so portable applications
> will have to call msync.

Well, there allowable by the specification, and then there is expected and
reasonable. It seems reasonable to me that the file times should be
updated _sometime_, even if the application does not take proactive action
to cause them to be updated. Otherwise, it would be easy to end up with a
file, whose contents are updated and reside on stable storage, but whose
mtime never changes. Part of the motivation behind starting this work was
to address the situation where an application modifies files using mmap
but then backup software would never see the need to backup those files.

It seems to me that if something like sync() causes the file contents to
be written to stable storage, then the file metadata should follow in a
not too distant fashion.

Thanx...

ps

2007-02-22 21:04:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

On Thu, 2007-02-22 at 21:48 +0100, Miklos Szeredi wrote:
> > This still does not address the situation where a file is 'permanently'
> > mmap'd, does it?
>
> So? If application doesn't do msync, then the file times won't be
> updated. That's allowed by the standard, and so portable applications
> will have to call msync.

It is allowed, but it is clearly not useful behaviour. Nowhere is it set
in stone that we should be implementing just the minimum allowed.

Trond

2007-02-22 21:30:22

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

> > > This still does not address the situation where a file is 'permanently'
> > > mmap'd, does it?
> >
> > So? If application doesn't do msync, then the file times won't be
> > updated. That's allowed by the standard, and so portable applications
> > will have to call msync.
>
> It is allowed, but it is clearly not useful behaviour. Nowhere is it set
> in stone that we should be implementing just the minimum allowed.

You're right. In theory, at least. But in practice I don't think
this matters. Show me an application that writes to a shared mapping
then doesn't call either msync or munmap and doesn't even exit.

If there were lot of these apps, then this bug would have been fixed
lots of years earlier. In fact there are _very_ few apps writing to
shared mappings at all.

Applications should be encouraged to call msync(MS_ASYNC) because:

- it's very fast (basically a no-op) on recent linux kernels

- it's the only portable way to guarantee, that the data you written
will _ever_ hit the disk.

There's really no downside to using msync(MS_ASYNC) in your
application, so making an effort to support applications that don't do
this is stupid, IMO.

Thanks,
Miklos

2007-02-22 21:52:38

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

Miklos Szeredi wrote:
>>>> This still does not address the situation where a file is 'permanently'
>>>> mmap'd, does it?
>>>>
>>> So? If application doesn't do msync, then the file times won't be
>>> updated. That's allowed by the standard, and so portable applications
>>> will have to call msync.
>>>
>> It is allowed, but it is clearly not useful behaviour. Nowhere is it set
>> in stone that we should be implementing just the minimum allowed.
>>
>
> You're right. In theory, at least. But in practice I don't think
> this matters. Show me an application that writes to a shared mapping
> then doesn't call either msync or munmap and doesn't even exit.
>
> If there were lot of these apps, then this bug would have been fixed
> lots of years earlier. In fact there are _very_ few apps writing to
> shared mappings at all.
>
>

Perhaps true, although I know of at least one customer of Red Hat who
does have an application (or more) than uses mmap'd files and is
suffering from this lack of appropriate semantics. They are not
getting files backed up which need to be.

> Applications should be encouraged to call msync(MS_ASYNC) because:
>
> - it's very fast (basically a no-op) on recent linux kernels
>
> - it's the only portable way to guarantee, that the data you written
> will _ever_ hit the disk.
>
> There's really no downside to using msync(MS_ASYNC) in your
> application, so making an effort to support applications that don't do
> this is stupid, IMO.

It may be a no-op on recent Linux kernels, but I don't think that it
is a no-op on other systems.

I do not wish to defend applications which don't use msync or some
such, but it seems that the appropriate semantics to support are
more than just depending upon the application to "do the right thing".

Thanx...

ps

2007-02-22 22:09:13

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] update ctime and mtime for mmaped write

> Miklos Szeredi wrote:
> >>>> This still does not address the situation where a file is 'permanently'
> >>>> mmap'd, does it?
> >>>>
> >>> So? If application doesn't do msync, then the file times won't be
> >>> updated. That's allowed by the standard, and so portable applications
> >>> will have to call msync.
> >>>
> >> It is allowed, but it is clearly not useful behaviour. Nowhere is it set
> >> in stone that we should be implementing just the minimum allowed.
> >>
> >
> > You're right. In theory, at least. But in practice I don't think
> > this matters. Show me an application that writes to a shared mapping
> > then doesn't call either msync or munmap and doesn't even exit.
> >
> > If there were lot of these apps, then this bug would have been fixed
> > lots of years earlier. In fact there are _very_ few apps writing to
> > shared mappings at all.
> >
> >
>
> Perhaps true, although I know of at least one customer of Red Hat who
> does have an application (or more) than uses mmap'd files and is
> suffering from this lack of appropriate semantics. They are not
> getting files backed up which need to be.

Yeah, but I bet, that just updating the times on msync and munmap will
cure the problem. Until you prove me wrong, this is a purely
theoretical argument.

> > Applications should be encouraged to call msync(MS_ASYNC) because:
> >
> > - it's very fast (basically a no-op) on recent linux kernels
> >
> > - it's the only portable way to guarantee, that the data you written
> > will _ever_ hit the disk.
> >
> > There's really no downside to using msync(MS_ASYNC) in your
> > application, so making an effort to support applications that don't do
> > this is stupid, IMO.
>
> It may be a no-op on recent Linux kernels, but I don't think that it
> is a no-op on other systems.

Exactly. Which is the reason applications _have_ to call msync on
these OSs if they want data to be synchronized with the file.

There are two cases:

1) app don't cares about it's data being synchronized with the file

2) app does care

In the first case there's no problem if the times aren't updated,
since the app didn't care.

In the second case the a portable application has to call msync, and
on linux it doesn't even lose any performance from this, so it should
be happy.

You are trying to provide for an app that requires 1) on non-linux and
2) on linux. Does that make any sense?

Miklos