2004-11-08 09:58:46

by Suparna Bhattacharya

[permalink] [raw]
Subject: [PATCH] Fix O_SYNC speedup for generic_file_write_nolock


The O_SYNC speedup patches missed the generic_file_xxx_nolock cases,
which means that pages weren't actually getting sync'ed in those
cases. This patch fixes that.

Signed-off-by: Suparna Bhattacharya <[email protected]>


include/linux/writeback.h | 2 +
mm/filemap.c | 67 +++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 66 insertions(+), 3 deletions(-)

diff -urp -X dontdiff2 linux-2.6.10-rc1/include/linux/writeback.h linux-2.6.10-rc1-aio/include/linux/writeback.h
--- linux-2.6.10-rc1/include/linux/writeback.h 2004-11-03 12:04:10.000000000 +0530
+++ linux-2.6.10-rc1-aio/include/linux/writeback.h 2004-11-04 10:10:31.000000000 +0530
@@ -106,6 +106,8 @@ int pdflush_operation(void (*fn)(unsigne
int do_writepages(struct address_space *mapping, struct writeback_control *wbc);
int sync_page_range(struct inode *inode, struct address_space *mapping,
loff_t pos, size_t count);
+int sync_page_range_nolock(struct inode *inode, struct address_space
+ *mapping, loff_t pos, size_t count);

/* pdflush.c */
extern int nr_pdflush_threads; /* Global so it can be exported to sysctl
diff -urp -X dontdiff2 linux-2.6.10-rc1/mm/filemap.c linux-2.6.10-rc1-aio/mm/filemap.c
--- linux-2.6.10-rc1/mm/filemap.c 2004-11-03 12:04:24.000000000 +0530
+++ linux-2.6.10-rc1-aio/mm/filemap.c 2004-11-04 10:10:31.000000000 +0530
@@ -283,6 +283,30 @@ int sync_page_range(struct inode *inode,
}
EXPORT_SYMBOL(sync_page_range);

+/*
+ * Note: Holding i_sem across sync_page_range_nolock is not a good idea
+ * as it forces O_SYNC writers to different parts of the same file
+ * to be serialised right until io completion.
+ */
+int sync_page_range_nolock(struct inode *inode, struct address_space *mapping,
+ loff_t pos, size_t count)
+{
+ pgoff_t start = pos >> PAGE_CACHE_SHIFT;
+ pgoff_t end = (pos + count - 1) >> PAGE_CACHE_SHIFT;
+ int ret;
+
+ if (mapping->backing_dev_info->memory_backed || !count)
+ return 0;
+ ret = filemap_fdatawrite_range(mapping, pos, pos + count - 1);
+ if (ret == 0) {
+ ret = generic_osync_inode(inode, mapping, OSYNC_METADATA);
+ }
+ if (ret == 0)
+ ret = wait_on_page_writeback_range(mapping, start, end);
+ return ret;
+}
+EXPORT_SYMBOL(sync_page_range_nolock);
+
/**
* filemap_fdatawait - walk the list of under-writeback pages of the given
* address space and wait for all of them.
@@ -1998,7 +2022,7 @@ generic_file_buffered_write(struct kiocb
EXPORT_SYMBOL(generic_file_buffered_write);

ssize_t
-generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov,
+__generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t *ppos)
{
struct file *file = iocb->ki_filp;
@@ -2075,6 +2099,43 @@ out:
EXPORT_SYMBOL(generic_file_aio_write_nolock);

ssize_t
+generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos)
+{
+ struct file *file = iocb->ki_filp;
+ struct address_space *mapping = file->f_mapping;
+ struct inode *inode = mapping->host;
+ ssize_t ret;
+ loff_t pos = *ppos;
+
+ ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, ppos);
+
+ if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+ int err;
+
+ err = sync_page_range_nolock(inode, mapping, pos, ret);
+ if (err < 0)
+ ret = err;
+ }
+ return ret;
+}
+
+
+ssize_t
+__generic_file_write_nolock(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos)
+{
+ struct kiocb kiocb;
+ ssize_t ret;
+
+ init_sync_kiocb(&kiocb, file);
+ ret = __generic_file_aio_write_nolock(&kiocb, iov, nr_segs, ppos);
+ if (-EIOCBQUEUED == ret)
+ ret = wait_on_sync_kiocb(&kiocb);
+ return ret;
+}
+
+ssize_t
generic_file_write_nolock(struct file *file, const struct iovec *iov,
unsigned long nr_segs, loff_t *ppos)
{
@@ -2128,7 +2189,7 @@ ssize_t generic_file_write(struct file *
.iov_len = count };

down(&inode->i_sem);
- ret = generic_file_write_nolock(file, &local_iov, 1, ppos);
+ ret = __generic_file_write_nolock(file, &local_iov, 1, ppos);
up(&inode->i_sem);

if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
@@ -2165,7 +2226,7 @@ ssize_t generic_file_writev(struct file
ssize_t ret;

down(&inode->i_sem);
- ret = generic_file_write_nolock(file, iov, nr_segs, ppos);
+ ret = __generic_file_write_nolock(file, iov, nr_segs, ppos);
up(&inode->i_sem);

if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India


2004-11-08 10:04:55

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] Fix O_SYNC speedup for generic_file_write_nolock

> +EXPORT_SYMBOL(sync_page_range_nolock);


why adding this export? nothing appears to be using it (AIO isn't a module after all)

2004-11-08 11:44:24

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH] Fix O_SYNC speedup for generic_file_write_nolock

On Mon, Nov 08, 2004 at 11:04:38AM +0100, Arjan van de Ven wrote:
> > +EXPORT_SYMBOL(sync_page_range_nolock);
>
>
> why adding this export? nothing appears to be using it (AIO isn't a module after all)
>

This doesn't have anything to do with AIO. Filesystems which implement
the equivalent of generic_file_write_nolock may use sync_page_range_nolock
for O_SYNC.

Does that help clarify ?

Regards
Suparna

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

2004-11-08 12:06:04

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] Fix O_SYNC speedup for generic_file_write_nolock

On Mon, 2004-11-08 at 17:23 +0530, Suparna Bhattacharya wrote:
> On Mon, Nov 08, 2004 at 11:04:38AM +0100, Arjan van de Ven wrote:
> > > +EXPORT_SYMBOL(sync_page_range_nolock);
> >
> >
> > why adding this export? nothing appears to be using it (AIO isn't a module after all)
> >
>
> This doesn't have anything to do with AIO. Filesystems which implement
> the equivalent of generic_file_write_nolock may use sync_page_range_nolock
> for O_SYNC.
>
> Does that help clarify ?

not really; none do so far so how about not adding the export until
someone does use it ?


2004-11-08 12:23:40

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH] Fix O_SYNC speedup for generic_file_write_nolock

On Mon, Nov 08, 2004 at 01:05:44PM +0100, Arjan van de Ven wrote:
> On Mon, 2004-11-08 at 17:23 +0530, Suparna Bhattacharya wrote:
> > On Mon, Nov 08, 2004 at 11:04:38AM +0100, Arjan van de Ven wrote:
> > > > +EXPORT_SYMBOL(sync_page_range_nolock);
> > >
> > >
> > > why adding this export? nothing appears to be using it (AIO isn't a module after all)
> > >
> >
> > This doesn't have anything to do with AIO. Filesystems which implement
> > the equivalent of generic_file_write_nolock may use sync_page_range_nolock
> > for O_SYNC.
> >
> > Does that help clarify ?
>
> not really; none do so far so how about not adding the export until
> someone does use it ?

It is just like sync_page_range() from akpm's original O_SYNC speedup
patch. Andrew, do we keep or remove the export ?

Regards
Suparna

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

2004-11-08 16:57:42

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] Fix O_SYNC speedup for generic_file_write_nolock

On Mon, 2004-11-08 at 07:20 -0800, Joel Becker wrote:
> OCFS2 uses generic_file_write_nolock(), and as such we might
> want to look into this problem and the sync_page_range_nolock() fix.

are you going to submit ocfs2 soon for inclusion ?

2004-11-08 18:33:37

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH] Fix O_SYNC speedup for generic_file_write_nolock

On Mon, Nov 08, 2004 at 04:31:17PM +0100, Arjan van de Ven wrote:
> On Mon, 2004-11-08 at 07:20 -0800, Joel Becker wrote:
> > OCFS2 uses generic_file_write_nolock(), and as such we might
> > want to look into this problem and the sync_page_range_nolock() fix.
>
> are you going to submit ocfs2 soon for inclusion ?

FSVO "soon", yes.

Joel


--

Life's Little Instruction Book #157

"Take time to smell the roses."

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: [email protected]
Phone: (650) 506-8127

2004-11-08 19:29:27

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH] Fix O_SYNC speedup for generic_file_write_nolock

On Mon, Nov 08, 2004 at 01:05:44PM +0100, Arjan van de Ven wrote:
> On Mon, 2004-11-08 at 17:23 +0530, Suparna Bhattacharya wrote:
> > On Mon, Nov 08, 2004 at 11:04:38AM +0100, Arjan van de Ven wrote:
> > > > +EXPORT_SYMBOL(sync_page_range_nolock);
> > >
> > >
> > > why adding this export? nothing appears to be using it (AIO isn't a module after all)
> > >
> >
> > This doesn't have anything to do with AIO. Filesystems which implement
> > the equivalent of generic_file_write_nolock may use sync_page_range_nolock
> > for O_SYNC.
> >
> > Does that help clarify ?
>
> not really; none do so far so how about not adding the export until
> someone does use it ?

OCFS2 uses generic_file_write_nolock(), and as such we might
want to look into this problem and the sync_page_range_nolock() fix.

Joel


--

"I always thought the hardest questions were those I could not answer.
Now I know they are the ones I can never ask."
- Charlie Watkins

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: [email protected]
Phone: (650) 506-8127