2012-02-16 13:46:49

by Jan Kara

[permalink] [raw]
Subject: [PATCH 00/11] Push file_update_time() into .page_mkwrite

Hello,

to provide reliable support for filesystem freezing, filesystems need to have
complete control over when metadata is changed. In particular,
file_update_time() calls from page fault code make it impossible for
filesystems to prevent inodes from being dirtied while the filesystem is
frozen.

To fix the issue, this patch set changes page fault code to call
file_update_time() only when ->page_mkwrite() callback is not provided. If the
callback is provided, it is the responsibility of the filesystem to perform
update of i_mtime / i_ctime if needed. We also push file_update_time() call
to all existing ->page_mkwrite() implementations if the time update does not
obviously happen by other means. If you know your filesystem does not need
update of modification times in ->page_mkwrite() handler, please speak up and
I'll drop the patch for your filesystem.

As a side note, an alternative would be to remove call of file_update_time()
from page fault code altogether and require all filesystems needing it to do
that in their ->page_mkwrite() implementation. That is certainly possible
although maybe slightly inefficient and would require auditting 100+
vm_operations_structs *shake*.

If I get acks on these patches, Andrew, would you be willing to take these
patches?

Honza

CC: Peter Zijlstra <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: Arnaldo Carvalho de Melo <[email protected]>
CC: Jaya Kumar <[email protected]>
CC: Sage Weil <[email protected]>
CC: [email protected]
CC: Steve French <[email protected]>
CC: [email protected]
CC: Eric Van Hensbergen <[email protected]>
CC: Ron Minnich <[email protected]>
CC: Latchesar Ionkov <[email protected]>
CC: [email protected]
CC: Miklos Szeredi <[email protected]>
CC: [email protected]
CC: Steven Whitehouse <[email protected]>
CC: [email protected]
CC: Greg Kroah-Hartman <[email protected]>
CC: Trond Myklebust <[email protected]>
CC: [email protected]


2012-02-16 13:46:57

by Jan Kara

[permalink] [raw]
Subject: [PATCH 10/11] nfs: Push file_update_time() into nfs_vm_page_mkwrite()

CC: Trond Myklebust <[email protected]>
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
---
fs/nfs/file.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index c43a452..2407922 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -525,6 +525,9 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
/* make sure the cache has finished storing the page */
nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);

+ /* Update file times before taking page lock */
+ file_update_time(filp);
+
lock_page(page);
mapping = page->mapping;
if (mapping != dentry->d_inode->i_mapping)
--
1.7.1

2012-02-16 13:46:55

by Jan Kara

[permalink] [raw]
Subject: [PATCH 04/11] ceph: Push file_update_time() into ceph_page_mkwrite()

CC: Sage Weil <[email protected]>
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
---
fs/ceph/addr.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 173b1d2..12b139f 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1181,6 +1181,9 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
loff_t size, len;
int ret;

+ /* Update time before taking page lock */
+ file_update_time(vma->vm_file);
+
size = i_size_read(inode);
if (off + PAGE_CACHE_SIZE <= size)
len = PAGE_CACHE_SIZE;
--
1.7.1

2012-02-16 13:47:18

by Jan Kara

[permalink] [raw]
Subject: [PATCH 11/11] mm: Update file times from fault path only if .page_mkwrite is not set

Filesystems wanting to properly support freezing need to have control
when file_update_time() is called. After pushing file_update_time()
to all relevant .page_mkwrite implementations we can just stop calling
file_update_time() when filesystem implements .page_mkwrite.

Signed-off-by: Jan Kara <[email protected]>
---
mm/memory.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index fa2f04e..17b72d7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2621,6 +2621,9 @@ reuse:
if (!page_mkwrite) {
wait_on_page_locked(dirty_page);
set_page_dirty_balance(dirty_page, page_mkwrite);
+ /* file_update_time outside page_lock */
+ if (vma->vm_file)
+ file_update_time(vma->vm_file);
}
put_page(dirty_page);
if (page_mkwrite) {
@@ -2638,10 +2641,6 @@ reuse:
}
}

- /* file_update_time outside page_lock */
- if (vma->vm_file)
- file_update_time(vma->vm_file);
-
return ret;
}

@@ -3324,7 +3323,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
}

/* file_update_time outside page_lock */
- if (vma->vm_file)
+ if (vma->vm_file && !page_mkwrite)
file_update_time(vma->vm_file);
} else {
unlock_page(vmf.page);
--
1.7.1

2012-02-16 13:47:49

by Jan Kara

[permalink] [raw]
Subject: [PATCH 09/11] sysfs: Push file_update_time() into bin_page_mkwrite()

CC: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
fs/sysfs/bin.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index a475983..6ceb16f 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -225,6 +225,8 @@ static int bin_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
if (!sysfs_get_active(attr_sd))
return VM_FAULT_SIGBUS;

+ file_update_time(file);
+
ret = 0;
if (bb->vm_ops->page_mkwrite)
ret = bb->vm_ops->page_mkwrite(vma, vmf);
--
1.7.1

2012-02-16 13:48:07

by Jan Kara

[permalink] [raw]
Subject: [PATCH 08/11] gfs2: Push file_update_time() into gfs2_page_mkwrite()

CC: Steven Whitehouse <[email protected]>
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
---
fs/gfs2/file.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index c5fb359..1f03531 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -375,6 +375,9 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
*/
vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);

+ /* Update file times before taking page lock */
+ file_update_time(vma->vm_file);
+
gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
ret = gfs2_glock_nq(&gh);
if (ret)
--
1.7.1

2012-02-16 13:48:38

by Jan Kara

[permalink] [raw]
Subject: [PATCH 07/11] fuse: Push file_update_time() into fuse_page_mkwrite()

CC: Miklos Szeredi <[email protected]>
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
---
fs/fuse/file.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 4a199fd..eade72e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1323,6 +1323,7 @@ static int fuse_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
*/
struct inode *inode = vma->vm_file->f_mapping->host;

+ file_update_time(vma->vm_file);
fuse_wait_on_page_writeback(inode, page->index);
return 0;
}
--
1.7.1

2012-02-16 13:46:53

by Jan Kara

[permalink] [raw]
Subject: [PATCH 05/11] cifs: Push file_update_time() into cifs_page_mkwrite()

CC: Steve French <[email protected]>
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
---
fs/cifs/file.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

BTW: How does cifs_page_mkwrite() protect against races with truncate?
See e.g. checks in __block_page_mkwrite()...

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 4dd9283..8e3b23b 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2425,6 +2425,9 @@ cifs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
{
struct page *page = vmf->page;

+ /* Update file times before taking page lock */
+ file_update_time(vma->vm_file);
+
lock_page(page);
return VM_FAULT_LOCKED;
}
--
1.7.1

2012-02-16 13:49:16

by Jan Kara

[permalink] [raw]
Subject: [PATCH 06/11] 9p: Push file_update_time() into v9fs_vm_page_mkwrite()

CC: Eric Van Hensbergen <[email protected]>
CC: Ron Minnich <[email protected]>
CC: Latchesar Ionkov <[email protected]>
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
---
fs/9p/vfs_file.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

BTW: I see you don't check whether the page you locked isn't beyond i_size.
Cannot your code race with truncate? See checks in __block_page_mkwrite()...

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index fc06fd2..dd6f7ee 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -610,6 +610,9 @@ v9fs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
p9_debug(P9_DEBUG_VFS, "page %p fid %lx\n",
page, (unsigned long)filp->private_data);

+ /* Update file times before taking page lock */
+ file_update_time(filp);
+
v9inode = V9FS_I(inode);
/* make sure the cache has finished storing the page */
v9fs_fscache_wait_on_page_write(inode, page);
--
1.7.1

2012-02-16 13:49:42

by Jan Kara

[permalink] [raw]
Subject: [PATCH 02/11] fb_defio: Push file_update_time() into fb_deferred_io_mkwrite()

CC: Jaya Kumar <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
drivers/video/fb_defio.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/video/fb_defio.c b/drivers/video/fb_defio.c
index c27e153..7a09c06 100644
--- a/drivers/video/fb_defio.c
+++ b/drivers/video/fb_defio.c
@@ -104,6 +104,8 @@ static int fb_deferred_io_mkwrite(struct vm_area_struct *vma,
deferred framebuffer IO. then if userspace touches a page
again, we repeat the same scheme */

+ file_update_time(vma->vm_file);
+
/* protect against the workqueue changing the page list */
mutex_lock(&fbdefio->lock);

--
1.7.1

2012-02-16 13:49:41

by Jan Kara

[permalink] [raw]
Subject: [PATCH 03/11] fs: Push file_update_time() into __block_page_mkwrite()

Signed-off-by: Jan Kara <[email protected]>
---
fs/buffer.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 1a30db7..5294a33 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2300,6 +2300,12 @@ int __block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
loff_t size;
int ret;

+ /*
+ * Update file times before taking page lock. We may end up failing the
+ * fault so this update may be superfluous but who really cares...
+ */
+ file_update_time(vma->vm_file);
+
lock_page(page);
size = i_size_read(inode);
if ((page->mapping != inode->i_mapping) ||
--
1.7.1

2012-02-16 13:50:10

by Jan Kara

[permalink] [raw]
Subject: [PATCH 01/11] perf: Push file_update_time() into perf_mmap_fault()

CC: Peter Zijlstra <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
kernel/events/core.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index ba36013..61a67f3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3316,8 +3316,10 @@ static int perf_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
int ret = VM_FAULT_SIGBUS;

if (vmf->flags & FAULT_FLAG_MKWRITE) {
- if (vmf->pgoff == 0)
+ if (vmf->pgoff == 0) {
ret = 0;
+ file_update_time(vma->vm_file);
+ }
return ret;
}

--
1.7.1

2012-02-16 13:50:43

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 10/11] nfs: Push file_update_time() into nfs_vm_page_mkwrite()

On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> CC: Trond Myklebust <[email protected]>
> CC: [email protected]
> Signed-off-by: Jan Kara <[email protected]>
> ---
> fs/nfs/file.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index c43a452..2407922 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -525,6 +525,9 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> /* make sure the cache has finished storing the page */
> nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);
>
> + /* Update file times before taking page lock */
> + file_update_time(filp);
> +
> lock_page(page);
> mapping = page->mapping;
> if (mapping != dentry->d_inode->i_mapping)

Hi Jan,

file_update_time() is a no-op in NFS, since we set S_NOATIME|S_NOCMTIME
on all inodes.

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2012-02-16 13:51:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 01/11] perf: Push file_update_time() into perf_mmap_fault()

On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> CC: Peter Zijlstra <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Paul Mackerras <[email protected]>
> CC: Arnaldo Carvalho de Melo <[email protected]>
> Signed-off-by: Jan Kara <[email protected]>
> ---
> kernel/events/core.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index ba36013..61a67f3 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3316,8 +3316,10 @@ static int perf_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> int ret = VM_FAULT_SIGBUS;
>
> if (vmf->flags & FAULT_FLAG_MKWRITE) {
> - if (vmf->pgoff == 0)
> + if (vmf->pgoff == 0) {
> ret = 0;
> + file_update_time(vma->vm_file);
> + }
> return ret;
> }

Its an anon filedesc, there's no actual file and while I guess one could
probably call fstat() on it, people really shouldn't care.

So feel free to introduce this patch to the bitbucket.

2012-02-16 16:24:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 09/11] sysfs: Push file_update_time() into bin_page_mkwrite()

On Thu, Feb 16, 2012 at 02:46:17PM +0100, Jan Kara wrote:
> CC: Greg Kroah-Hartman <[email protected]>

Acked-by: Greg Kroah-Hartman <[email protected]>

2012-02-16 16:47:05

by Steven Whitehouse

[permalink] [raw]
Subject: Re: [PATCH 08/11] gfs2: Push file_update_time() into gfs2_page_mkwrite()

Hi,

On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> CC: Steven Whitehouse <[email protected]>
> CC: [email protected]
> Signed-off-by: Jan Kara <[email protected]>
> ---
> fs/gfs2/file.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
That looks ok to me...

Acked-by: Steven Whitehouse <[email protected]>

Steve.

> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index c5fb359..1f03531 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -375,6 +375,9 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> */
> vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
>
> + /* Update file times before taking page lock */
> + file_update_time(vma->vm_file);
> +
> gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
> ret = gfs2_glock_nq(&gh);
> if (ret)

2012-02-16 19:04:49

by Alex Elder

[permalink] [raw]
Subject: Re: [PATCH 09/11] sysfs: Push file_update_time() into bin_page_mkwrite()

On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> CC: Greg Kroah-Hartman <[email protected]>
> Signed-off-by: Jan Kara <[email protected]>
> ---
> fs/sysfs/bin.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
> index a475983..6ceb16f 100644
> --- a/fs/sysfs/bin.c
> +++ b/fs/sysfs/bin.c
> @@ -225,6 +225,8 @@ static int bin_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> if (!sysfs_get_active(attr_sd))
> return VM_FAULT_SIGBUS;
>
> + file_update_time(file);
> +
> ret = 0;
> if (bb->vm_ops->page_mkwrite)
> ret = bb->vm_ops->page_mkwrite(vma, vmf);

If the filesystem's page_mkwrite() function is responsible
for updating the time, can't the call to file_update_time()
here be conditional?

I.e:
ret = 0;
if (bb->vm_ops->page_mkwrite)
ret = bb->vm_ops->page_mkwrite(vma, vmf);
else
file_update_time(file);

-Alex



2012-02-16 19:04:43

by Alex Elder

[permalink] [raw]
Subject: Re: [PATCH 04/11] ceph: Push file_update_time() into ceph_page_mkwrite()

On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> CC: Sage Weil <[email protected]>
> CC: [email protected]
> Signed-off-by: Jan Kara <[email protected]>


This will update the timestamp even if a write
fault fails, which is different from before.

Hard to avoid though.

Looks good to me.

Signed-off-by: Alex Elder <[email protected]>

> fs/ceph/addr.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 173b1d2..12b139f 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -1181,6 +1181,9 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> loff_t size, len;
> int ret;
>
> + /* Update time before taking page lock */
> + file_update_time(vma->vm_file);
> +
> size = i_size_read(inode);
> if (off + PAGE_CACHE_SIZE <= size)
> len = PAGE_CACHE_SIZE;


2012-02-16 19:13:59

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCH 04/11] ceph: Push file_update_time() into ceph_page_mkwrite()

On Thu, 16 Feb 2012, Alex Elder wrote:
> On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> > CC: Sage Weil <[email protected]>
> > CC: [email protected]
> > Signed-off-by: Jan Kara <[email protected]>
>
>
> This will update the timestamp even if a write
> fault fails, which is different from before.
>
> Hard to avoid though.
>
> Looks good to me.

Yeah. Let's put something in the tracker to take a look later (I think we
can do better), but this is okay for now.

Signed-off-by: Sage Weil <[email protected]>

>
> Signed-off-by: Alex Elder <[email protected]>
>
> > fs/ceph/addr.c | 3 +++
> > 1 files changed, 3 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index 173b1d2..12b139f 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -1181,6 +1181,9 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > loff_t size, len;
> > int ret;
> >
> > + /* Update time before taking page lock */
> > + file_update_time(vma->vm_file);
> > +
> > size = i_size_read(inode);
> > if (off + PAGE_CACHE_SIZE <= size)
> > len = PAGE_CACHE_SIZE;
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2012-02-20 11:00:11

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 09/11] sysfs: Push file_update_time() into bin_page_mkwrite()

On Thu 16-02-12 13:04:44, Alex Elder wrote:
> On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> > CC: Greg Kroah-Hartman <[email protected]>
> > Signed-off-by: Jan Kara <[email protected]>
> > ---
> > fs/sysfs/bin.c | 2 ++
> > 1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
> > index a475983..6ceb16f 100644
> > --- a/fs/sysfs/bin.c
> > +++ b/fs/sysfs/bin.c
> > @@ -225,6 +225,8 @@ static int bin_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > if (!sysfs_get_active(attr_sd))
> > return VM_FAULT_SIGBUS;
> >
> > + file_update_time(file);
> > +
> > ret = 0;
> > if (bb->vm_ops->page_mkwrite)
> > ret = bb->vm_ops->page_mkwrite(vma, vmf);
>
> If the filesystem's page_mkwrite() function is responsible
> for updating the time, can't the call to file_update_time()
> here be conditional?
>
> I.e:
> ret = 0;
> if (bb->vm_ops->page_mkwrite)
> ret = bb->vm_ops->page_mkwrite(vma, vmf);
> else
> file_update_time(file);
Hmm, I didn't look previously where do we get bb->vm_ops. It seems they
are inherited from vma->vm_ops so what you suggest should be safe without
any further changes. So I can do that if someone who understands the sysfs
code likes it more.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-02-20 11:06:32

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 04/11] ceph: Push file_update_time() into ceph_page_mkwrite()

On Thu 16-02-12 13:04:37, Alex Elder wrote:
> On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> > CC: Sage Weil <[email protected]>
> > CC: [email protected]
> > Signed-off-by: Jan Kara <[email protected]>
>
>
> This will update the timestamp even if a write
> fault fails, which is different from before.
>
> Hard to avoid though.
Yes. Relatively easy solution for this (at least for some filesystems)
is to include time update into other operations filesystem is doing.
Usually filesystem can do some preparations, then take page lock and then
update time stamps together with other things it wants to do. It usually
will be even faster than current scheme. But I decided to leave that to
fs maintainers because page_mkwrite() code tends to be tricky wrt locking
etc.


> Looks good to me.
>
> Signed-off-by: Alex Elder <[email protected]>
Thanks.

Honza

> > fs/ceph/addr.c | 3 +++
> > 1 files changed, 3 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index 173b1d2..12b139f 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -1181,6 +1181,9 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > loff_t size, len;
> > int ret;
> >
> > + /* Update time before taking page lock */
> > + file_update_time(vma->vm_file);
> > +
> > size = i_size_read(inode);
> > if (off + PAGE_CACHE_SIZE <= size)
> > len = PAGE_CACHE_SIZE;
>
>
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-02-20 22:32:59

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 10/11] nfs: Push file_update_time() into nfs_vm_page_mkwrite()

On Thu 16-02-12 13:50:19, Myklebust, Trond wrote:
> On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> > CC: Trond Myklebust <[email protected]>
> > CC: [email protected]
> > Signed-off-by: Jan Kara <[email protected]>
> > ---
> > fs/nfs/file.c | 3 +++
> > 1 files changed, 3 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> > index c43a452..2407922 100644
> > --- a/fs/nfs/file.c
> > +++ b/fs/nfs/file.c
> > @@ -525,6 +525,9 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > /* make sure the cache has finished storing the page */
> > nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);
> >
> > + /* Update file times before taking page lock */
> > + file_update_time(filp);
> > +
> > lock_page(page);
> > mapping = page->mapping;
> > if (mapping != dentry->d_inode->i_mapping)
>
> Hi Jan,
>
> file_update_time() is a no-op in NFS, since we set S_NOATIME|S_NOCMTIME
> on all inodes.
Thanks. I've discarded the patch.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-02-20 22:33:17

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 01/11] perf: Push file_update_time() into perf_mmap_fault()

On Thu 16-02-12 14:50:47, Peter Zijlstra wrote:
> On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> > CC: Peter Zijlstra <[email protected]>
> > CC: Ingo Molnar <[email protected]>
> > CC: Paul Mackerras <[email protected]>
> > CC: Arnaldo Carvalho de Melo <[email protected]>
> > Signed-off-by: Jan Kara <[email protected]>
> > ---
> > kernel/events/core.c | 4 +++-
> > 1 files changed, 3 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index ba36013..61a67f3 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -3316,8 +3316,10 @@ static int perf_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> > int ret = VM_FAULT_SIGBUS;
> >
> > if (vmf->flags & FAULT_FLAG_MKWRITE) {
> > - if (vmf->pgoff == 0)
> > + if (vmf->pgoff == 0) {
> > ret = 0;
> > + file_update_time(vma->vm_file);
> > + }
> > return ret;
> > }
>
> Its an anon filedesc, there's no actual file and while I guess one could
> probably call fstat() on it, people really shouldn't care.
>
> So feel free to introduce this patch to the bitbucket.
Thanks for letting me know. Patch removed.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-02-20 22:42:50

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 04/11] ceph: Push file_update_time() into ceph_page_mkwrite()

On Thu 16-02-12 11:13:53, Sage Weil wrote:
> On Thu, 16 Feb 2012, Alex Elder wrote:
> > On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> > > CC: Sage Weil <[email protected]>
> > > CC: [email protected]
> > > Signed-off-by: Jan Kara <[email protected]>
> >
> >
> > This will update the timestamp even if a write
> > fault fails, which is different from before.
> >
> > Hard to avoid though.
> >
> > Looks good to me.
>
> Yeah. Let's put something in the tracker to take a look later (I think we
> can do better), but this is okay for now.
>
> Signed-off-by: Sage Weil <[email protected]>
Thanks! Just an administrative note - the tag above should rather be
Acked-by or Reviewed-by. You'd use Signed-off-by only if you took the patch
and merged it via your tree... So can I add Acked-by?

Honza

> > Signed-off-by: Alex Elder <[email protected]>
> >
> > > fs/ceph/addr.c | 3 +++
> > > 1 files changed, 3 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > > index 173b1d2..12b139f 100644
> > > --- a/fs/ceph/addr.c
> > > +++ b/fs/ceph/addr.c
> > > @@ -1181,6 +1181,9 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > > loff_t size, len;
> > > int ret;
> > >
> > > + /* Update time before taking page lock */
> > > + file_update_time(vma->vm_file);
> > > +
> > > size = i_size_read(inode);
> > > if (off + PAGE_CACHE_SIZE <= size)
> > > len = PAGE_CACHE_SIZE;
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-02-20 22:57:03

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCH 04/11] ceph: Push file_update_time() into ceph_page_mkwrite()

On Mon, 20 Feb 2012, Jan Kara wrote:
> On Thu 16-02-12 11:13:53, Sage Weil wrote:
> > On Thu, 16 Feb 2012, Alex Elder wrote:
> > > On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
> > > > CC: Sage Weil <[email protected]>
> > > > CC: [email protected]
> > > > Signed-off-by: Jan Kara <[email protected]>
> > >
> > >
> > > This will update the timestamp even if a write
> > > fault fails, which is different from before.
> > >
> > > Hard to avoid though.
> > >
> > > Looks good to me.
> >
> > Yeah. Let's put something in the tracker to take a look later (I think we
> > can do better), but this is okay for now.
> >
> > Signed-off-by: Sage Weil <[email protected]>
> Thanks! Just an administrative note - the tag above should rather be
> Acked-by or Reviewed-by. You'd use Signed-off-by only if you took the patch
> and merged it via your tree... So can I add Acked-by?

Oh, right. Acked-by!

sage

>
> Honza
>
> > > Signed-off-by: Alex Elder <[email protected]>
> > >
> > > > fs/ceph/addr.c | 3 +++
> > > > 1 files changed, 3 insertions(+), 0 deletions(-)
> > > >
> > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > > > index 173b1d2..12b139f 100644
> > > > --- a/fs/ceph/addr.c
> > > > +++ b/fs/ceph/addr.c
> > > > @@ -1181,6 +1181,9 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > > > loff_t size, len;
> > > > int ret;
> > > >
> > > > + /* Update time before taking page lock */
> > > > + file_update_time(vma->vm_file);
> > > > +
> > > > size = i_size_read(inode);
> > > > if (off + PAGE_CACHE_SIZE <= size)
> > > > len = PAGE_CACHE_SIZE;
> > >
> > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > >
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2012-02-29 17:43:36

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 09/11] sysfs: Push file_update_time() into bin_page_mkwrite()

Jan Kara <[email protected]> writes:

> On Thu 16-02-12 13:04:44, Alex Elder wrote:
>> On Thu, 2012-02-16 at 14:46 +0100, Jan Kara wrote:
>> > CC: Greg Kroah-Hartman <[email protected]>
>> > Signed-off-by: Jan Kara <[email protected]>
>> > ---
>> > fs/sysfs/bin.c | 2 ++
>> > 1 files changed, 2 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
>> > index a475983..6ceb16f 100644
>> > --- a/fs/sysfs/bin.c
>> > +++ b/fs/sysfs/bin.c
>> > @@ -225,6 +225,8 @@ static int bin_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>> > if (!sysfs_get_active(attr_sd))
>> > return VM_FAULT_SIGBUS;
>> >
>> > + file_update_time(file);
>> > +
>> > ret = 0;
>> > if (bb->vm_ops->page_mkwrite)
>> > ret = bb->vm_ops->page_mkwrite(vma, vmf);
>>
>> If the filesystem's page_mkwrite() function is responsible
>> for updating the time, can't the call to file_update_time()
>> here be conditional?
>>
>> I.e:
>> ret = 0;
>> if (bb->vm_ops->page_mkwrite)
>> ret = bb->vm_ops->page_mkwrite(vma, vmf);
>> else
>> file_update_time(file);
> Hmm, I didn't look previously where do we get bb->vm_ops. It seems they
> are inherited from vma->vm_ops so what you suggest should be safe without
> any further changes. So I can do that if someone who understands the sysfs
> code likes it more.

I do. Essentially sysfs is being a stackable filesystem here, because
sysfs needs the ability to remove a file mapping.

In practice we could probably get away without a single
file_update_time(file) here because there are mmio mappings. Normally
for pci resources, but we might as well use good form since we can.

Eric