2012-02-08 11:11:51

by Fengguang Wu

[permalink] [raw]
Subject: [PATCH] block: remove plugging at buffered write time

Buffered write(2) is not directly tied to IO, so it's not suitable to
handle plug in generic_file_aio_write().

Also remove the redundant plug calls around ->direct_IO(), since we now
do unplugging in the lower layer do_blockdev_direct_IO().

CC: Jens Axboe <[email protected]>
CC: Li Shaohua <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
fs/direct-io.c | 4 ++++
mm/filemap.c | 7 -------
2 files changed, 4 insertions(+), 7 deletions(-)

--- linux-next.orig/mm/filemap.c 2012-02-08 18:51:50.000000000 +0800
+++ linux-next/mm/filemap.c 2012-02-08 18:52:19.000000000 +0800
@@ -1421,12 +1421,8 @@ generic_file_aio_read(struct kiocb *iocb
retval = filemap_write_and_wait_range(mapping, pos,
pos + iov_length(iov, nr_segs) - 1);
if (!retval) {
- struct blk_plug plug;
-
- blk_start_plug(&plug);
retval = mapping->a_ops->direct_IO(READ, iocb,
iov, pos, nr_segs);
- blk_finish_plug(&plug);
}
if (retval > 0) {
*ppos = pos + retval;
@@ -2610,13 +2606,11 @@ ssize_t generic_file_aio_write(struct ki
{
struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
- struct blk_plug plug;
ssize_t ret;

BUG_ON(iocb->ki_pos != pos);

mutex_lock(&inode->i_mutex);
- blk_start_plug(&plug);
ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
mutex_unlock(&inode->i_mutex);

@@ -2627,7 +2621,6 @@ ssize_t generic_file_aio_write(struct ki
if (err < 0 && ret > 0)
ret = err;
}
- blk_finish_plug(&plug);
return ret;
}
EXPORT_SYMBOL(generic_file_aio_write);
--- linux-next.orig/fs/direct-io.c 2012-02-08 18:51:37.000000000 +0800
+++ linux-next/fs/direct-io.c 2012-02-08 18:52:19.000000000 +0800
@@ -1106,6 +1106,7 @@ do_blockdev_direct_IO(int rw, struct kio
unsigned long user_addr;
size_t bytes;
struct buffer_head map_bh = { 0, };
+ struct blk_plug plug;

if (rw & WRITE)
rw = WRITE_ODIRECT;
@@ -1221,6 +1222,8 @@ do_blockdev_direct_IO(int rw, struct kio
PAGE_SIZE - user_addr / PAGE_SIZE);
}

+ blk_start_plug(&plug);
+
for (seg = 0; seg < nr_segs; seg++) {
user_addr = (unsigned long)iov[seg].iov_base;
sdio.size += bytes = iov[seg].iov_len;
@@ -1314,6 +1317,7 @@ do_blockdev_direct_IO(int rw, struct kio
} else
BUG_ON(retval != -EIOCBQUEUED);

+ blk_finish_plug(&plug);
out:
return retval;
}


2012-02-08 23:27:46

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

On Wed, Feb 08, 2012 at 07:01:44PM +0800, Wu Fengguang wrote:
> Buffered write(2) is not directly tied to IO, so it's not suitable to
> handle plug in generic_file_aio_write().

But generic_sync_write() does issue IO for O_SYNC writes, so unless
there is plugging at a lower layer in the writeback code then it
appears to me that plugging is still necessary (at least inside the
sync branch)....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2012-02-09 01:17:06

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

On Wed, 2012-02-08 at 19:01 +0800, Wu Fengguang wrote:
> Buffered write(2) is not directly tied to IO, so it's not suitable to
> handle plug in generic_file_aio_write().
>
> Also remove the redundant plug calls around ->direct_IO(), since we now
> do unplugging in the lower layer do_blockdev_direct_IO().
>
> CC: Jens Axboe <[email protected]>
> CC: Li Shaohua <[email protected]>
> Signed-off-by: Wu Fengguang <[email protected]>
> ---
> fs/direct-io.c | 4 ++++
> mm/filemap.c | 7 -------
> 2 files changed, 4 insertions(+), 7 deletions(-)
>
> --- linux-next.orig/mm/filemap.c 2012-02-08 18:51:50.000000000 +0800
> +++ linux-next/mm/filemap.c 2012-02-08 18:52:19.000000000 +0800
> @@ -1421,12 +1421,8 @@ generic_file_aio_read(struct kiocb *iocb
> retval = filemap_write_and_wait_range(mapping, pos,
> pos + iov_length(iov, nr_segs) - 1);
> if (!retval) {
> - struct blk_plug plug;
> -
> - blk_start_plug(&plug);
> retval = mapping->a_ops->direct_IO(READ, iocb,
> iov, pos, nr_segs);
> - blk_finish_plug(&plug);
> }
> if (retval > 0) {
> *ppos = pos + retval;
> @@ -2610,13 +2606,11 @@ ssize_t generic_file_aio_write(struct ki
> {
> struct file *file = iocb->ki_filp;
> struct inode *inode = file->f_mapping->host;
> - struct blk_plug plug;
> ssize_t ret;
>
> BUG_ON(iocb->ki_pos != pos);
>
> mutex_lock(&inode->i_mutex);
> - blk_start_plug(&plug);
> ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
> mutex_unlock(&inode->i_mutex);
>
> @@ -2627,7 +2621,6 @@ ssize_t generic_file_aio_write(struct ki
> if (err < 0 && ret > 0)
> ret = err;
> }
> - blk_finish_plug(&plug);
> return ret;
> }
> EXPORT_SYMBOL(generic_file_aio_write);
> --- linux-next.orig/fs/direct-io.c 2012-02-08 18:51:37.000000000 +0800
> +++ linux-next/fs/direct-io.c 2012-02-08 18:52:19.000000000 +0800
> @@ -1106,6 +1106,7 @@ do_blockdev_direct_IO(int rw, struct kio
> unsigned long user_addr;
> size_t bytes;
> struct buffer_head map_bh = { 0, };
> + struct blk_plug plug;
>
> if (rw & WRITE)
> rw = WRITE_ODIRECT;
> @@ -1221,6 +1222,8 @@ do_blockdev_direct_IO(int rw, struct kio
> PAGE_SIZE - user_addr / PAGE_SIZE);
> }
>
> + blk_start_plug(&plug);
> +
> for (seg = 0; seg < nr_segs; seg++) {
> user_addr = (unsigned long)iov[seg].iov_base;
> sdio.size += bytes = iov[seg].iov_len;
> @@ -1314,6 +1317,7 @@ do_blockdev_direct_IO(int rw, struct kio
> } else
> BUG_ON(retval != -EIOCBQUEUED);
>
> + blk_finish_plug(&plug);
This one can be moved up a little bit (before dio_cleanup)

Thanks,
Shaohua

> out:
> return retval;
> }

2012-02-09 08:12:33

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

On Thu, Feb 09, 2012 at 10:27:19AM +1100, Dave Chinner wrote:
> On Wed, Feb 08, 2012 at 07:01:44PM +0800, Wu Fengguang wrote:
> > Buffered write(2) is not directly tied to IO, so it's not suitable to
> > handle plug in generic_file_aio_write().
>
> But generic_sync_write() does issue IO for O_SYNC writes, so unless
> there is plugging at a lower layer in the writeback code then it
> appears to me that plugging is still necessary (at least inside the
> sync branch)....

Good catch! It looks that generic_write_sync() eventually calls into
vfs_fsync_range() which further calls ->fsync(). We may add plugging
around it:

--- linux-next.orig/fs/sync.c 2012-02-09 15:59:52.000000000 +0800
+++ linux-next/fs/sync.c 2012-02-09 16:01:02.000000000 +0800
@@ -164,9 +164,17 @@ SYSCALL_DEFINE1(syncfs, int, fd)
*/
int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
{
+ struct blk_plug plug;
+ int ret;
+
if (!file->f_op || !file->f_op->fsync)
return -EINVAL;
- return file->f_op->fsync(file, start, end, datasync);
+
+ blk_start_plug(&plug);
+ ret = file->f_op->fsync(file, start, end, datasync);
+ blk_finish_plug(&plug);
+
+ return ret;
}
EXPORT_SYMBOL(vfs_fsync_range);

2012-02-09 08:17:32

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

> > @@ -1221,6 +1222,8 @@ do_blockdev_direct_IO(int rw, struct kio
> > PAGE_SIZE - user_addr / PAGE_SIZE);
> > }
> >
> > + blk_start_plug(&plug);
> > +
> > for (seg = 0; seg < nr_segs; seg++) {
> > user_addr = (unsigned long)iov[seg].iov_base;
> > sdio.size += bytes = iov[seg].iov_len;
> > @@ -1314,6 +1317,7 @@ do_blockdev_direct_IO(int rw, struct kio
> > } else
> > BUG_ON(retval != -EIOCBQUEUED);
> >
> > + blk_finish_plug(&plug);
> This one can be moved up a little bit (before dio_cleanup)

Done, thanks! Here is the updated patch with O_SYNC write fix.

Thanks,
Fengguang
---
Subject: block: remove plugging at buffered write time
Date: Tue Jan 31 18:25:48 CST 2012

Buffered write(2) is not directly tied to IO, so it's not suitable to
handle plug in generic_file_aio_write().

Also moves unplugging to lower layers:
- for direct I/O, from around ->direct_IO() to do_blockdev_direct_IO()
- for O_SYNC writes, to around ->fsync()

CC: Jens Axboe <[email protected]>
CC: Li Shaohua <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
fs/direct-io.c | 5 +++++
fs/sync.c | 10 +++++++++-
mm/filemap.c | 7 -------
3 files changed, 14 insertions(+), 8 deletions(-)

--- linux-next.orig/mm/filemap.c 2012-02-08 19:33:29.000000000 +0800
+++ linux-next/mm/filemap.c 2012-02-09 15:59:47.000000000 +0800
@@ -1421,12 +1421,8 @@ generic_file_aio_read(struct kiocb *iocb
retval = filemap_write_and_wait_range(mapping, pos,
pos + iov_length(iov, nr_segs) - 1);
if (!retval) {
- struct blk_plug plug;
-
- blk_start_plug(&plug);
retval = mapping->a_ops->direct_IO(READ, iocb,
iov, pos, nr_segs);
- blk_finish_plug(&plug);
}
if (retval > 0) {
*ppos = pos + retval;
@@ -2610,13 +2606,11 @@ ssize_t generic_file_aio_write(struct ki
{
struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
- struct blk_plug plug;
ssize_t ret;

BUG_ON(iocb->ki_pos != pos);

mutex_lock(&inode->i_mutex);
- blk_start_plug(&plug);
ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
mutex_unlock(&inode->i_mutex);

@@ -2627,7 +2621,6 @@ ssize_t generic_file_aio_write(struct ki
if (err < 0 && ret > 0)
ret = err;
}
- blk_finish_plug(&plug);
return ret;
}
EXPORT_SYMBOL(generic_file_aio_write);
--- linux-next.orig/fs/direct-io.c 2012-02-08 19:33:29.000000000 +0800
+++ linux-next/fs/direct-io.c 2012-02-09 16:03:04.000000000 +0800
@@ -1106,6 +1106,7 @@ do_blockdev_direct_IO(int rw, struct kio
unsigned long user_addr;
size_t bytes;
struct buffer_head map_bh = { 0, };
+ struct blk_plug plug;

if (rw & WRITE)
rw = WRITE_ODIRECT;
@@ -1221,6 +1222,8 @@ do_blockdev_direct_IO(int rw, struct kio
PAGE_SIZE - user_addr / PAGE_SIZE);
}

+ blk_start_plug(&plug);
+
for (seg = 0; seg < nr_segs; seg++) {
user_addr = (unsigned long)iov[seg].iov_base;
sdio.size += bytes = iov[seg].iov_len;
@@ -1279,6 +1282,8 @@ do_blockdev_direct_IO(int rw, struct kio
if (sdio.bio)
dio_bio_submit(dio, &sdio);

+ blk_finish_plug(&plug);
+
/*
* It is possible that, we return short IO due to end of file.
* In that case, we need to release all the pages we got hold on.
--- linux-next.orig/fs/sync.c 2012-02-09 15:59:52.000000000 +0800
+++ linux-next/fs/sync.c 2012-02-09 16:01:02.000000000 +0800
@@ -164,9 +164,17 @@ SYSCALL_DEFINE1(syncfs, int, fd)
*/
int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
{
+ struct blk_plug plug;
+ int ret;
+
if (!file->f_op || !file->f_op->fsync)
return -EINVAL;
- return file->f_op->fsync(file, start, end, datasync);
+
+ blk_start_plug(&plug);
+ ret = file->f_op->fsync(file, start, end, datasync);
+ blk_finish_plug(&plug);
+
+ return ret;
}
EXPORT_SYMBOL(vfs_fsync_range);

2012-02-09 09:35:09

by Damien Wyart

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

Hi,

* Wu Fengguang <[email protected]> [2012-02-09 16:07]:
> Done, thanks! Here is the updated patch with O_SYNC write fix.

Got these errors with the updated patch on top of 3.3-rc3:

fs/sync.c: In function 'vfs_fsync_range':
fs/sync.c:167:18: error: storage size of 'plug' isn't known
fs/sync.c:173:2: error: implicit declaration of function 'blk_start_plug' [-Werror=implicit-function-declaration]
fs/sync.c:175:2: error: implicit declaration of function 'blk_finish_plug' [-Werror=implicit-function-declaration]
fs/sync.c:167:18: warning: unused variable 'plug' [-Wunused-variable]

--
Damien

2012-02-09 09:50:08

by Damien Wyart

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

> > Done, thanks! Here is the updated patch with O_SYNC write fix.

> Got these errors with the updated patch on top of 3.3-rc3:

> fs/sync.c: In function 'vfs_fsync_range':
> fs/sync.c:167:18: error: storage size of 'plug' isn't known
> fs/sync.c:173:2: error: implicit declaration of function 'blk_start_plug' [-Werror=implicit-function-declaration]
> fs/sync.c:175:2: error: implicit declaration of function 'blk_finish_plug' [-Werror=implicit-function-declaration]
> fs/sync.c:167:18: warning: unused variable 'plug' [-Wunused-variable]

Adding a #include <linux/blkdev.h> at the top of fs/sync.c solved the
problem. Of course, this needs approval of a kernel developper before
being added to the patch and resubmitted because I know adding #includes
can have unwanted consequences sometimes or be seen as bloat...

--
Damien Wyart

2012-02-09 09:59:46

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

On Thu, Feb 09, 2012 at 10:40:19AM +0100, Damien Wyart wrote:
> > > Done, thanks! Here is the updated patch with O_SYNC write fix.
>
> > Got these errors with the updated patch on top of 3.3-rc3:
>
> > fs/sync.c: In function 'vfs_fsync_range':
> > fs/sync.c:167:18: error: storage size of 'plug' isn't known
> > fs/sync.c:173:2: error: implicit declaration of function 'blk_start_plug' [-Werror=implicit-function-declaration]
> > fs/sync.c:175:2: error: implicit declaration of function 'blk_finish_plug' [-Werror=implicit-function-declaration]
> > fs/sync.c:167:18: warning: unused variable 'plug' [-Wunused-variable]
>
> Adding a #include <linux/blkdev.h> at the top of fs/sync.c solved the
> problem. Of course, this needs approval of a kernel developper before
> being added to the patch and resubmitted because I know adding #includes
> can have unwanted consequences sometimes or be seen as bloat...

Thank you! This includes the necessary blkdev.h and it compiles here too:

Subject: block: remove plugging at buffered write time
Date: Tue Jan 31 18:25:48 CST 2012

Buffered write(2) is not directly tied to IO, so it's not suitable to
handle plug in generic_file_aio_write().

Also moves unplugging to lower layers:
- for direct I/O, from around ->direct_IO() to do_blockdev_direct_IO()
- for O_SYNC writes, to around ->fsync()

CC: Jens Axboe <[email protected]>
CC: Li Shaohua <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
fs/direct-io.c | 5 +++++
fs/sync.c | 11 ++++++++++-
mm/filemap.c | 7 -------
3 files changed, 15 insertions(+), 8 deletions(-)

--- linux-next.orig/mm/filemap.c 2012-02-08 19:33:29.000000000 +0800
+++ linux-next/mm/filemap.c 2012-02-09 15:59:47.000000000 +0800
@@ -1421,12 +1421,8 @@ generic_file_aio_read(struct kiocb *iocb
retval = filemap_write_and_wait_range(mapping, pos,
pos + iov_length(iov, nr_segs) - 1);
if (!retval) {
- struct blk_plug plug;
-
- blk_start_plug(&plug);
retval = mapping->a_ops->direct_IO(READ, iocb,
iov, pos, nr_segs);
- blk_finish_plug(&plug);
}
if (retval > 0) {
*ppos = pos + retval;
@@ -2610,13 +2606,11 @@ ssize_t generic_file_aio_write(struct ki
{
struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
- struct blk_plug plug;
ssize_t ret;

BUG_ON(iocb->ki_pos != pos);

mutex_lock(&inode->i_mutex);
- blk_start_plug(&plug);
ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
mutex_unlock(&inode->i_mutex);

@@ -2627,7 +2621,6 @@ ssize_t generic_file_aio_write(struct ki
if (err < 0 && ret > 0)
ret = err;
}
- blk_finish_plug(&plug);
return ret;
}
EXPORT_SYMBOL(generic_file_aio_write);
--- linux-next.orig/fs/direct-io.c 2012-02-08 19:33:29.000000000 +0800
+++ linux-next/fs/direct-io.c 2012-02-09 16:03:04.000000000 +0800
@@ -1106,6 +1106,7 @@ do_blockdev_direct_IO(int rw, struct kio
unsigned long user_addr;
size_t bytes;
struct buffer_head map_bh = { 0, };
+ struct blk_plug plug;

if (rw & WRITE)
rw = WRITE_ODIRECT;
@@ -1221,6 +1222,8 @@ do_blockdev_direct_IO(int rw, struct kio
PAGE_SIZE - user_addr / PAGE_SIZE);
}

+ blk_start_plug(&plug);
+
for (seg = 0; seg < nr_segs; seg++) {
user_addr = (unsigned long)iov[seg].iov_base;
sdio.size += bytes = iov[seg].iov_len;
@@ -1279,6 +1282,8 @@ do_blockdev_direct_IO(int rw, struct kio
if (sdio.bio)
dio_bio_submit(dio, &sdio);

+ blk_finish_plug(&plug);
+
/*
* It is possible that, we return short IO due to end of file.
* In that case, we need to release all the pages we got hold on.
--- linux-next.orig/fs/sync.c 2012-02-09 15:59:52.000000000 +0800
+++ linux-next/fs/sync.c 2012-02-09 17:39:02.000000000 +0800
@@ -15,6 +15,7 @@
#include <linux/pagemap.h>
#include <linux/quotaops.h>
#include <linux/backing-dev.h>
+#include <linux/blkdev.h>
#include "internal.h"

#define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \
@@ -164,9 +165,17 @@ SYSCALL_DEFINE1(syncfs, int, fd)
*/
int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
{
+ struct blk_plug plug;
+ int ret;
+
if (!file->f_op || !file->f_op->fsync)
return -EINVAL;
- return file->f_op->fsync(file, start, end, datasync);
+
+ blk_start_plug(&plug);
+ ret = file->f_op->fsync(file, start, end, datasync);
+ blk_finish_plug(&plug);
+
+ return ret;
}
EXPORT_SYMBOL(vfs_fsync_range);

2012-02-09 18:06:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

On Thu, Feb 09, 2012 at 04:02:24PM +0800, Wu Fengguang wrote:
> On Thu, Feb 09, 2012 at 10:27:19AM +1100, Dave Chinner wrote:
> > On Wed, Feb 08, 2012 at 07:01:44PM +0800, Wu Fengguang wrote:
> > > Buffered write(2) is not directly tied to IO, so it's not suitable to
> > > handle plug in generic_file_aio_write().
> >
> > But generic_sync_write() does issue IO for O_SYNC writes, so unless
> > there is plugging at a lower layer in the writeback code then it
> > appears to me that plugging is still necessary (at least inside the
> > sync branch)....
>
> Good catch! It looks that generic_write_sync() eventually calls into
> vfs_fsync_range() which further calls ->fsync(). We may add plugging
> around it:


NAK, please keep the plugging down in the fs, or the libraries used but
not common VFS code.

2012-02-09 18:30:52

by Chris Mason

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

On Thu, Feb 09, 2012 at 01:06:35PM -0500, Christoph Hellwig wrote:
> On Thu, Feb 09, 2012 at 04:02:24PM +0800, Wu Fengguang wrote:
> > On Thu, Feb 09, 2012 at 10:27:19AM +1100, Dave Chinner wrote:
> > > On Wed, Feb 08, 2012 at 07:01:44PM +0800, Wu Fengguang wrote:
> > > > Buffered write(2) is not directly tied to IO, so it's not suitable to
> > > > handle plug in generic_file_aio_write().
> > >
> > > But generic_sync_write() does issue IO for O_SYNC writes, so unless
> > > there is plugging at a lower layer in the writeback code then it
> > > appears to me that plugging is still necessary (at least inside the
> > > sync branch)....
> >
> > Good catch! It looks that generic_write_sync() eventually calls into
> > vfs_fsync_range() which further calls ->fsync(). We may add plugging
> > around it:
>
>
> NAK, please keep the plugging down in the fs, or the libraries used but
> not common VFS code.

Please, what Christoph said. At least for btrfs plugging here is wrong.

-chris

2012-02-10 02:02:25

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

On Thu, Feb 09, 2012 at 01:30:27PM -0500, Chris Mason wrote:
> On Thu, Feb 09, 2012 at 01:06:35PM -0500, Christoph Hellwig wrote:
> > On Thu, Feb 09, 2012 at 04:02:24PM +0800, Wu Fengguang wrote:
> > > On Thu, Feb 09, 2012 at 10:27:19AM +1100, Dave Chinner wrote:
> > > > On Wed, Feb 08, 2012 at 07:01:44PM +0800, Wu Fengguang wrote:
> > > > > Buffered write(2) is not directly tied to IO, so it's not suitable to
> > > > > handle plug in generic_file_aio_write().
> > > >
> > > > But generic_sync_write() does issue IO for O_SYNC writes, so unless
> > > > there is plugging at a lower layer in the writeback code then it
> > > > appears to me that plugging is still necessary (at least inside the
> > > > sync branch)....
> > >
> > > Good catch! It looks that generic_write_sync() eventually calls into
> > > vfs_fsync_range() which further calls ->fsync(). We may add plugging
> > > around it:
> >
> >
> > NAK, please keep the plugging down in the fs, or the libraries used but
> > not common VFS code.
>
> Please, what Christoph said. At least for btrfs plugging here is wrong.

OK, I get the point: the fs knows best when to unplug. Since any
higher level plug nesting will turn such low level efforts into no-op,
it's highly undesirable to do it in the high level.

Thanks,
Fengguang

2012-02-10 02:57:24

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

On Fri, Feb 10, 2012 at 09:52:18AM +0800, Wu Fengguang wrote:
> On Thu, Feb 09, 2012 at 01:30:27PM -0500, Chris Mason wrote:
> > On Thu, Feb 09, 2012 at 01:06:35PM -0500, Christoph Hellwig wrote:
> > > On Thu, Feb 09, 2012 at 04:02:24PM +0800, Wu Fengguang wrote:
> > > > On Thu, Feb 09, 2012 at 10:27:19AM +1100, Dave Chinner wrote:
> > > > > On Wed, Feb 08, 2012 at 07:01:44PM +0800, Wu Fengguang wrote:
> > > > > > Buffered write(2) is not directly tied to IO, so it's not suitable to
> > > > > > handle plug in generic_file_aio_write().
> > > > >
> > > > > But generic_sync_write() does issue IO for O_SYNC writes, so unless
> > > > > there is plugging at a lower layer in the writeback code then it
> > > > > appears to me that plugging is still necessary (at least inside the
> > > > > sync branch)....
> > > >
> > > > Good catch! It looks that generic_write_sync() eventually calls into
> > > > vfs_fsync_range() which further calls ->fsync(). We may add plugging
> > > > around it:
> > >
> > >
> > > NAK, please keep the plugging down in the fs, or the libraries used but
> > > not common VFS code.
> >
> > Please, what Christoph said. At least for btrfs plugging here is wrong.
>
> OK, I get the point: the fs knows best when to unplug. Since any
> higher level plug nesting will turn such low level efforts into no-op,
> it's highly undesirable to do it in the high level.

It's actually wrong to do plugging around vfs_fsync_range().

Because these call paths

write() with O_SYNC
generic_write_sync()
vfs_fsync_range()
->fsync()
generic_file_fsync()

fsync()
do_fsync()
vfs_fsync()
vfs_fsync_range()

pass arbitrary @size arguments, which may be much larger than the
preferable I/O size, or may cross extent/device boundaries.

generic_file_fsync() starts with a filemap_write_and_wait_range()
call, which already has proper plugging somewhere underneath. Then
followed by metadata writes, which has plugging inside
fsync_buffers_list(). At last, sync_inode_metadata() calls into
->write_inode() which may or may not care plugging.

The other fs specific ->fsync() do similar steps, varying in the
metadata and fs specific housekeeping part.

I'll just drop this code. Shall the fs specific metadata I/O be
plugged accordingly? I'm afraid this is beyond my knowledge base...

Thanks,
Fengguang

2012-02-10 09:41:42

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] block: remove plugging at buffered write time

On Fri 10-02-12 10:47:16, Wu Fengguang wrote:
> On Fri, Feb 10, 2012 at 09:52:18AM +0800, Wu Fengguang wrote:
> > On Thu, Feb 09, 2012 at 01:30:27PM -0500, Chris Mason wrote:
> > > On Thu, Feb 09, 2012 at 01:06:35PM -0500, Christoph Hellwig wrote:
> > > > On Thu, Feb 09, 2012 at 04:02:24PM +0800, Wu Fengguang wrote:
> > > > > On Thu, Feb 09, 2012 at 10:27:19AM +1100, Dave Chinner wrote:
> > > > > > On Wed, Feb 08, 2012 at 07:01:44PM +0800, Wu Fengguang wrote:
> > > > > > > Buffered write(2) is not directly tied to IO, so it's not suitable to
> > > > > > > handle plug in generic_file_aio_write().
> > > > > >
> > > > > > But generic_sync_write() does issue IO for O_SYNC writes, so unless
> > > > > > there is plugging at a lower layer in the writeback code then it
> > > > > > appears to me that plugging is still necessary (at least inside the
> > > > > > sync branch)....
> > > > >
> > > > > Good catch! It looks that generic_write_sync() eventually calls into
> > > > > vfs_fsync_range() which further calls ->fsync(). We may add plugging
> > > > > around it:
> > > >
> > > >
> > > > NAK, please keep the plugging down in the fs, or the libraries used but
> > > > not common VFS code.
> > >
> > > Please, what Christoph said. At least for btrfs plugging here is wrong.
> >
> > OK, I get the point: the fs knows best when to unplug. Since any
> > higher level plug nesting will turn such low level efforts into no-op,
> > it's highly undesirable to do it in the high level.
>
> It's actually wrong to do plugging around vfs_fsync_range().
>
> Because these call paths
>
> write() with O_SYNC
> generic_write_sync()
> vfs_fsync_range()
> ->fsync()
> generic_file_fsync()
>
> fsync()
> do_fsync()
> vfs_fsync()
> vfs_fsync_range()
>
> pass arbitrary @size arguments, which may be much larger than the
> preferable I/O size, or may cross extent/device boundaries.
>
> generic_file_fsync() starts with a filemap_write_and_wait_range()
> call, which already has proper plugging somewhere underneath. Then
> followed by metadata writes, which has plugging inside
> fsync_buffers_list(). At last, sync_inode_metadata() calls into
> ->write_inode() which may or may not care plugging.
>
> The other fs specific ->fsync() do similar steps, varying in the
> metadata and fs specific housekeeping part.
>
> I'll just drop this code. Shall the fs specific metadata I/O be
> plugged accordingly? I'm afraid this is beyond my knowledge base...
The filesystems I know (ext?, ocfs2, reiserfs, udf) either don't do any
metadata io from ->fsync (it happens from a journalling thread) or the io
is random so plugging is not desirable anyway AFAIU (well,
mpage_writepages() is clever enough to submit metadata which is interleaved
with data in one sequential stream together with the data so metadata that
remain are mostly random).

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR