LinuxLists.cc - [PATCH 0/8 v4] No wait AIO

2017-04-03 18:52:59

Subject: [PATCH 0/8 v4] No wait AIO

Formerly known as non-blocking AIO.

This series adds nonblocking feature to asynchronous I/O writes.
io_submit() can be delayed because of a number of reason:
- Block allocation for files
- Data writebacks for direct I/O
- Sleeping because of waiting to acquire i_rwsem
- Congested block device

The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
any of these conditions are met. This way userspace can push most
of the write()s to the kernel to the best of its ability to complete
and if it returns -EAGAIN, can defer it to another thread.

In order to enable this, IOCB_RW_FLAG_NOWAIT is introduced in
uapi/linux/aio_abi.h. If set for aio_rw_flags, it translates to
IOCB_NOWAIT for struct iocb, BIO_NOWAIT for bio and IOMAP_NOWAIT for
iomap. aio_rw_flags is a new flag replacing aio_reserved1. We could
not use aio_flags because it is not currently checked for invalidity
in the kernel.

This feature is provided for direct I/O of asynchronous I/O only. I have
tested it against xfs, ext4, and btrfs while I intend to add more filesystems.
Same with QUEUE_FLAG_NOWAIT, which is currently set for sd and virtio devices.
This is primarily to block md/dm devices which may wait in places such as
recovery/sync/suspend. In the future, I intend to add support to
these devices as well. Applications will have to check supportability
by sending a async direct write and any other error besides -EAGAIN
would mean it is not supported.

Changes since v1:
+ changed name from _NONBLOCKING to *_NOWAIT
+ filemap_range_has_page call moved to closer to (just before) calling filemap_write_and_wait_range().
+ BIO_NOWAIT limited to get_request()
+ XFS fixes
- included reflink
- use of xfs_ilock_nowait() instead of a XFS_IOLOCK_NONBLOCKING flag
- Translate the flag through IOMAP_NOWAIT (iomap) to check for
block allocation for the file.
+ ext4 coding style

Changes since v2:
+ Using aio_reserved1 as aio_rw_flags instead of aio_flags
+ blk-mq support
+ xfs uptodate with kernel and reflink changes

Changes since v3:
+ Added FS_NOWAIT, which is set if the filesystem supports NOWAIT feature.
+ Checks in generic_make_request() to make sure BIO_NOWAIT comes in
for async direct writes only.
+ Added QUEUE_FLAG_NOWAIT, which is set if the device supports BIO_NOWAIT.
This is added (rather not set) to block devices such as dm/md currently.

--
Goldwyn

2017-04-03 18:53:02

by Goldwyn Rodrigues

[permalink] [raw]

Subject: [PATCH 3/8] nowait aio: return if direct write will trigger writeback

From: Goldwyn Rodrigues <[email protected]>

Find out if the write will trigger a wait due to writeback. If yes,
return -EAGAIN.

This introduces a new function filemap_range_has_page() which
returns true if the file's mapping has a page within the range
mentioned.

Return -EINVAL for buffered AIO: there are multiple causes of
delay such as page locks, dirty throttling logic, page loading
from disk etc. which cannot be taken care of.
---
include/linux/fs.h | 2 ++
mm/filemap.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 802cfe2..4721136 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2515,6 +2515,8 @@ extern int filemap_fdatawait(struct address_space *);
extern void filemap_fdatawait_keep_errors(struct address_space *);
extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
loff_t lend);
+extern int filemap_range_has_page(struct address_space *, loff_t lstart,
+ loff_t lend);
extern int filemap_write_and_wait(struct address_space *mapping);
extern int filemap_write_and_wait_range(struct address_space *mapping,
loff_t lstart, loff_t lend);
diff --git a/mm/filemap.c b/mm/filemap.c
index e08f3b9..c020e23 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -376,6 +376,39 @@ int filemap_flush(struct address_space *mapping)
}
EXPORT_SYMBOL(filemap_flush);

+/**
+ * filemap_range_has_page - check if a page exists in range.
+ * @mapping: address space structure to wait for
+ * @start_byte: offset in bytes where the range starts
+ * @end_byte: offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback.
+ */
+int filemap_range_has_page(struct address_space *mapping,
+ loff_t start_byte, loff_t end_byte)
+{
+ pgoff_t index = start_byte >> PAGE_SHIFT;
+ pgoff_t end = end_byte >> PAGE_SHIFT;
+ struct pagevec pvec;
+ int ret;
+
+ if (end_byte < start_byte)
+ return 0;
+
+ if (mapping->nrpages == 0)
+ return 0;
+
+ pagevec_init(&pvec, 0);
+ ret = pagevec_lookup(&pvec, mapping, index, 1);
+ if (!ret)
+ return 0;
+ ret = (pvec.pages[0]->index <= end);
+ pagevec_release(&pvec);
+ return ret;
+}
+EXPORT_SYMBOL(filemap_range_has_page);
+
static int __filemap_fdatawait_range(struct address_space *mapping,
loff_t start_byte, loff_t end_byte)
{
@@ -2640,6 +2673,9 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)

pos = iocb->ki_pos;

+ if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
+ return -EINVAL;
+
if (limit != RLIM_INFINITY) {
if (iocb->ki_pos >= limit) {
send_sig(SIGXFSZ, current, 0);
@@ -2709,9 +2745,17 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
write_len = iov_iter_count(from);
end = (pos + write_len - 1) >> PAGE_SHIFT;

- written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1);
- if (written)
- goto out;
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ /* If there are pages to writeback, return */
+ if (filemap_range_has_page(inode->i_mapping, pos,
+ pos + iov_iter_count(from)))
+ return -EAGAIN;
+ } else {
+ written = filemap_write_and_wait_range(mapping, pos,
+ pos + write_len - 1);
+ if (written)
+ goto out;
+ }

/*
* After a write we want buffered reads to be sure to go to disk to get
--
2.10.2

2017-04-03 18:53:03

by Goldwyn Rodrigues

[permalink] [raw]

Subject: [PATCH 4/8] nowait-aio: Introduce IOMAP_NOWAIT

From: Goldwyn Rodrigues <[email protected]>

IOCB_NOWAIT translates to IOMAP_NOWAIT for iomaps.
This is used by XFS in the XFS patch.
---
fs/iomap.c | 2 ++
include/linux/iomap.h | 1 +
2 files changed, 3 insertions(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index 141c3cd..d1c8175 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -885,6 +885,8 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
} else {
dio->flags |= IOMAP_DIO_WRITE;
flags |= IOMAP_WRITE;
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ flags |= IOMAP_NOWAIT;
}

if (mapping->nrpages) {
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 7291810..53f6af8 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -51,6 +51,7 @@ struct iomap {
#define IOMAP_REPORT (1 << 2) /* report extent status, e.g. FIEMAP */
#define IOMAP_FAULT (1 << 3) /* mapping for page fault */
#define IOMAP_DIRECT (1 << 4) /* direct I/O */
+#define IOMAP_NOWAIT (1 << 5) /* Don't wait for writeback */

struct iomap_ops {
/*
--
2.10.2

2017-04-03 18:53:04

by Goldwyn Rodrigues

[permalink] [raw]

Subject: [PATCH 5/8] nowait aio: return on congested block device

From: Goldwyn Rodrigues <[email protected]>

A new flag BIO_NOWAIT is introduced to identify bio's
orignating from iocb with IOCB_NOWAIT. This flag indicates
to return immediately if a request cannot be made instead
of retrying.

To facilitate this, QUEUE_FLAG_NOWAIT is set to devices
which support this. While currently this is set to
virtio and sd only. Support to more devices will be added soon.

Signed-off-by: Goldwyn Rodrigues <[email protected]>
---
block/blk-core.c | 24 ++++++++++++++++++++++--
block/blk-mq-sched.c | 3 +++
block/blk-mq.c | 4 ++++
drivers/block/virtio_blk.c | 3 +++
drivers/scsi/sd.c | 3 +++
fs/direct-io.c | 11 +++++++++--
include/linux/bio.h | 6 ++++++
include/linux/blk_types.h | 1 +
include/linux/blkdev.h | 2 ++
9 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d772c22..95a9b18 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1232,6 +1232,11 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
if (!IS_ERR(rq))
return rq;

+ if (bio && bio_flagged(bio, BIO_NOWAIT)) {
+ blk_put_rl(rl);
+ return ERR_PTR(-EAGAIN);
+ }
+
if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
blk_put_rl(rl);
return rq;
@@ -1870,6 +1875,18 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}

+ if (bio_flagged(bio, BIO_NOWAIT)) {
+ if (!blk_queue_nowait(q)) {
+ err = -EOPNOTSUPP;
+ goto end_io;
+ }
+ if (!(bio->bi_opf & (REQ_SYNC | REQ_IDLE))) {
+ err = -EINVAL;
+ goto end_io;
+ }
+ }
+
+
part = bio->bi_bdev->bd_part;
if (should_fail_request(part, bio->bi_iter.bi_size) ||
should_fail_request(&part_to_disk(part)->part0,
@@ -2021,7 +2038,7 @@ blk_qc_t generic_make_request(struct bio *bio)
do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);

- if (likely(blk_queue_enter(q, false) == 0)) {
+ if (likely(blk_queue_enter(q, bio_flagged(bio, BIO_NOWAIT)) == 0)) {
struct bio_list lower, same;

/* Create a fresh bio_list for all subordinate requests */
@@ -2046,7 +2063,10 @@ blk_qc_t generic_make_request(struct bio *bio)
bio_list_merge(&bio_list_on_stack[0], &same);
bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
} else {
- bio_io_error(bio);
+ if (unlikely(!blk_queue_dying(q) && bio_flagged(bio, BIO_NOWAIT)))
+ bio_wouldblock_error(bio);
+ else
+ bio_io_error(bio);
}
bio = bio_list_pop(&bio_list_on_stack[0]);
} while (bio);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 09af8ff..40e78b5 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -119,6 +119,9 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
if (likely(!data->hctx))
data->hctx = blk_mq_map_queue(q, data->ctx->cpu);

+ if (likely(bio) && bio_flagged(bio, BIO_NOWAIT))
+ data->flags |= BLK_MQ_REQ_NOWAIT;
+
if (e) {
data->flags |= BLK_MQ_REQ_INTERNAL;

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6b6e7bc..2d90b12 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1511,6 +1511,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
if (unlikely(!rq)) {
__wbt_done(q->rq_wb, wb_acct);
+ if (bio && bio_flagged(bio, BIO_NOWAIT))
+ bio_wouldblock_error(bio);
return BLK_QC_T_NONE;
}

@@ -1635,6 +1637,8 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
if (unlikely(!rq)) {
__wbt_done(q->rq_wb, wb_acct);
+ if (bio && bio_flagged(bio, BIO_NOWAIT))
+ bio_wouldblock_error(bio);
return BLK_QC_T_NONE;
}

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 1d4c9f8..7481124 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -731,6 +731,9 @@ static int virtblk_probe(struct virtio_device *vdev)
/* No real sector limit. */
blk_queue_max_hw_sectors(q, -1U);

+ /* Request queue supports BIO_NOWAIT */
+ queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, q);
+
/* Host can optionally specify maximum segment size and number of
* segments. */
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fcfeddc..9df85ee 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3177,6 +3177,9 @@ static int sd_probe(struct device *dev)
SD_MOD_TIMEOUT);
}

+ /* Support BIO_NOWAIT */
+ queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, sdp->request_queue);
+
device_initialize(&sdkp->dev);
sdkp->dev.parent = dev;
sdkp->dev.class = &sd_disk_class;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index a04ebea..f6835d3 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -386,6 +386,9 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
else
bio->bi_end_io = dio_bio_end_io;

+ if (dio->iocb->ki_flags & IOCB_NOWAIT)
+ bio_set_flag(bio, BIO_NOWAIT);
+
sdio->bio = bio;
sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
}
@@ -480,8 +483,12 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
unsigned i;
int err;

- if (bio->bi_error)
- dio->io_error = -EIO;
+ if (bio->bi_error) {
+ if (bio_flagged(bio, BIO_NOWAIT))
+ dio->io_error = -EAGAIN;
+ else
+ dio->io_error = -EIO;
+ }

if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
err = bio->bi_error;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8e52119..1a92707 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -425,6 +425,12 @@ static inline void bio_io_error(struct bio *bio)
bio_endio(bio);
}

+static inline void bio_wouldblock_error(struct bio *bio)
+{
+ bio->bi_error = -EAGAIN;
+ bio_endio(bio);
+}
+
struct request_queue;
extern int bio_phys_segments(struct request_queue *, struct bio *);

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d703acb..514c08e 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -102,6 +102,7 @@ struct bio {
#define BIO_REFFED 8 /* bio has elevated ->bi_cnt */
#define BIO_THROTTLED 9 /* This bio has already been subjected to
* throttling rules. Don't do it again. */
+#define BIO_NOWAIT 10 /* don't block over blk device congestion */

/*
* Flags starting here get preserved by bio_reset() - this includes
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5a7da60..ae38ab6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -611,6 +611,7 @@ struct request_queue {
#define QUEUE_FLAG_DAX 26 /* device supports DAX */
#define QUEUE_FLAG_STATS 27 /* track rq completion times */
#define QUEUE_FLAG_RESTART 28 /* queue needs restart at completion */
+#define QUEUE_FLAG_NOWAIT 29 /* queue supports BIO_NOWAIT */

#define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
@@ -701,6 +702,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
#define blk_queue_secure_erase(q) \
(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
#define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_nowait(q) test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)

#define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
--
2.10.2

2017-04-03 18:54:03

by Goldwyn Rodrigues

[permalink] [raw]

Subject: [PATCH 6/8] nowait aio: ext4

From: Goldwyn Rodrigues <[email protected]>

Return EAGAIN if any of the following checks fail for direct I/O:
+ i_rwsem is lockable
+ Writing beyond end of file (will trigger allocation)
+ Blocks are not allocated at the write location
---
fs/ext4/file.c | 48 +++++++++++++++++++++++++++++++-----------------
fs/ext4/super.c | 6 +++---
2 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 8210c1f..e223b9f 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -127,27 +127,22 @@ ext4_unaligned_aio(struct inode *inode, struct iov_iter *from, loff_t pos)
return 0;
}

-/* Is IO overwriting allocated and initialized blocks? */
-static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len)
+/* Are IO blocks allocated */
+static bool ext4_blocks_mapped(struct inode *inode, loff_t pos, loff_t len,
+ struct ext4_map_blocks *map)
{
- struct ext4_map_blocks map;
unsigned int blkbits = inode->i_blkbits;
int err, blklen;

if (pos + len > i_size_read(inode))
return false;

- map.m_lblk = pos >> blkbits;
- map.m_len = EXT4_MAX_BLOCKS(len, pos, blkbits);
- blklen = map.m_len;
+ map->m_lblk = pos >> blkbits;
+ map->m_len = EXT4_MAX_BLOCKS(len, pos, blkbits);
+ blklen = map->m_len;

- err = ext4_map_blocks(NULL, inode, &map, 0);
- /*
- * 'err==len' means that all of the blocks have been preallocated,
- * regardless of whether they have been initialized or not. To exclude
- * unwritten extents, we need to check m_flags.
- */
- return err == blklen && (map.m_flags & EXT4_MAP_MAPPED);
+ err = ext4_map_blocks(NULL, inode, map, 0);
+ return err == blklen;
}

static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from)
@@ -204,6 +199,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct inode *inode = file_inode(iocb->ki_filp);
int o_direct = iocb->ki_flags & IOCB_DIRECT;
+ int nowait = iocb->ki_flags & IOCB_NOWAIT;
int unaligned_aio = 0;
int overwrite = 0;
ssize_t ret;
@@ -216,7 +212,13 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
return ext4_dax_write_iter(iocb, from);
#endif

- inode_lock(inode);
+ if (o_direct && nowait) {
+ if (!inode_trylock(inode))
+ return -EAGAIN;
+ } else {
+ inode_lock(inode);
+ }
+
ret = ext4_write_checks(iocb, from);
if (ret <= 0)
goto out;
@@ -235,9 +237,21 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)

iocb->private = &overwrite;
/* Check whether we do a DIO overwrite or not */
- if (o_direct && ext4_should_dioread_nolock(inode) && !unaligned_aio &&
- ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from)))
- overwrite = 1;
+ if (o_direct && !unaligned_aio) {
+ struct ext4_map_blocks map;
+ if (ext4_blocks_mapped(inode, iocb->ki_pos,
+ iov_iter_count(from), &map)) {
+ /* To exclude unwritten extents, we need to check
+ * m_flags.
+ */
+ if (ext4_should_dioread_nolock(inode) &&
+ (map.m_flags & EXT4_MAP_MAPPED))
+ overwrite = 1;
+ } else if (iocb->ki_flags & IOCB_NOWAIT) {
+ ret = -EAGAIN;
+ goto out;
+ }
+ }

ret = __generic_file_write_iter(iocb, from);
inode_unlock(inode);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a9448db..e1d424a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -117,7 +117,7 @@ static struct file_system_type ext2_fs_type = {
.name = "ext2",
.mount = ext4_mount,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT,
};
MODULE_ALIAS_FS("ext2");
MODULE_ALIAS("ext2");
@@ -132,7 +132,7 @@ static struct file_system_type ext3_fs_type = {
.name = "ext3",
.mount = ext4_mount,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT,
};
MODULE_ALIAS_FS("ext3");
MODULE_ALIAS("ext3");
@@ -5623,7 +5623,7 @@ static struct file_system_type ext4_fs_type = {
.name = "ext4",
.mount = ext4_mount,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT,
};
MODULE_ALIAS_FS("ext4");

--
2.10.2

2017-04-03 18:53:06

by Goldwyn Rodrigues

[permalink] [raw]

Subject: [PATCH 7/8] nowait aio: xfs

From: Goldwyn Rodrigues <[email protected]>

If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable
immediately.

IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin
if it needs allocation either due to file extension, writing to a hole,
or COW or waiting for other DIOs to finish.
---
fs/xfs/xfs_file.c | 15 +++++++++++----
fs/xfs/xfs_iomap.c | 13 +++++++++++++
fs/xfs/xfs_super.c | 2 +-
3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 35703a8..08a5eef 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -541,8 +541,11 @@ xfs_file_dio_aio_write(
iolock = XFS_IOLOCK_SHARED;
}

- xfs_ilock(ip, iolock);
-
+ if (!xfs_ilock_nowait(ip, iolock)) {
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ return -EAGAIN;
+ xfs_ilock(ip, iolock);
+ }
ret = xfs_file_aio_write_checks(iocb, from, &iolock);
if (ret)
goto out;
@@ -553,9 +556,13 @@ xfs_file_dio_aio_write(
* otherwise demote the lock if we had to take the exclusive lock
* for other reasons in xfs_file_aio_write_checks.
*/
- if (unaligned_io)
+ if (unaligned_io) {
+ /* If we are going to wait for other DIO to finish, bail */
+ if ((iocb->ki_flags & IOCB_NOWAIT) &&
+ atomic_read(&inode->i_dio_count))
+ return -EAGAIN;
inode_dio_wait(inode);
- else if (iolock == XFS_IOLOCK_EXCL) {
+ } else if (iolock == XFS_IOLOCK_EXCL) {
xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
iolock = XFS_IOLOCK_SHARED;
}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 288ee5b..6843725 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1015,6 +1015,11 @@ xfs_file_iomap_begin(

if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
if (flags & IOMAP_DIRECT) {
+ /* A reflinked inode will result in CoW alloc */
+ if (flags & IOMAP_NOWAIT) {
+ error = -EAGAIN;
+ goto out_unlock;
+ }
/* may drop and re-acquire the ilock */
error = xfs_reflink_allocate_cow(ip, &imap, &shared,
&lockmode);
@@ -1032,6 +1037,14 @@ xfs_file_iomap_begin(

if ((flags & IOMAP_WRITE) && imap_needs_alloc(inode, &imap, nimaps)) {
/*
+ * If nowait is set bail since we are going to make
+ * allocations.
+ */
+ if (flags & IOMAP_NOWAIT) {
+ error = -EAGAIN;
+ goto out_unlock;
+ }
+ /*
* We cap the maximum length we map here to MAX_WRITEBACK_PAGES
* pages to keep the chunks of work done where somewhat symmetric
* with the work writeback does. This is a completely arbitrary
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 685c042..070a30e 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1749,7 +1749,7 @@ static struct file_system_type xfs_fs_type = {
.name = "xfs",
.mount = xfs_fs_mount,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT,
};
MODULE_ALIAS_FS("xfs");

--
2.10.2

2017-04-03 18:53:43

by Goldwyn Rodrigues

[permalink] [raw]

Subject: [PATCH 2/8] nowait aio: Return if cannot get hold of i_rwsem

From: Goldwyn Rodrigues <[email protected]>

A failure to lock i_rwsem would mean there is I/O being performed
by another thread. So, let's bail.

Reviewed-by: Christoph Hellwig <[email protected]>
---
mm/filemap.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 1694623..e08f3b9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2982,7 +2982,12 @@ ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
ssize_t ret;

- inode_lock(inode);
+ if (!inode_trylock(inode)) {
+ /* Don't sleep on inode rwsem */
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ return -EAGAIN;
+ inode_lock(inode);
+ }
ret = generic_write_checks(iocb, from);
if (ret > 0)
ret = __generic_file_write_iter(iocb, from);
--
2.10.2

2017-04-04 06:49:34

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 5/8] nowait aio: return on congested block device

Please make this a REQ_* flag so that it can be passed in the bio,
the request and as an argument to the get_request functions instead
of testing for a bio.

2017-04-04 06:52:11

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 7/8] nowait aio: xfs

> + if (unaligned_io) {
> + /* If we are going to wait for other DIO to finish, bail */
> + if ((iocb->ki_flags & IOCB_NOWAIT) &&
> + atomic_read(&inode->i_dio_count))
> + return -EAGAIN;
> inode_dio_wait(inode);

This checks i_dio_count twice in the nowait case, I think it should be:

if (iocb->ki_flags & IOCB_NOWAIT) {
if (atomic_read(&inode->i_dio_count))
return -EAGAIN;
} else {
inode_dio_wait(inode);
}

> if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
> if (flags & IOMAP_DIRECT) {
> + /* A reflinked inode will result in CoW alloc */
> + if (flags & IOMAP_NOWAIT) {
> + error = -EAGAIN;
> + goto out_unlock;
> + }

This is a bit pessimistic - just because the inode has any shared
extents we could still write into unshared ones. For now I think this
pessimistic check is fine, but the comment should be corrected.

2017-04-04 07:58:53

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 6/8] nowait aio: ext4

On Mon 03-04-17 13:53:05, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <[email protected]>
>
> Return EAGAIN if any of the following checks fail for direct I/O:
> + i_rwsem is lockable
> + Writing beyond end of file (will trigger allocation)
> + Blocks are not allocated at the write location

Patches seem to be missing your Signed-off-by tag...

> @@ -235,9 +237,21 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>
> iocb->private = &overwrite;
> /* Check whether we do a DIO overwrite or not */
> - if (o_direct && ext4_should_dioread_nolock(inode) && !unaligned_aio &&
> - ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from)))
> - overwrite = 1;
> + if (o_direct && !unaligned_aio) {
> + struct ext4_map_blocks map;
> + if (ext4_blocks_mapped(inode, iocb->ki_pos,
> + iov_iter_count(from), &map)) {
> + /* To exclude unwritten extents, we need to check
> + * m_flags.
> + */
> + if (ext4_should_dioread_nolock(inode) &&
> + (map.m_flags & EXT4_MAP_MAPPED))
> + overwrite = 1;
> + } else if (iocb->ki_flags & IOCB_NOWAIT) {
> + ret = -EAGAIN;
> + goto out;
> + }
> + }

Actually, overwriting unwritten extents is relatively complex in ext4 as
well. In particular we need to start a transaction and split out the
written part of the extent. So I don't think we can easily support this
without blocking. As a result I'd keep the condition for IOCB_NOWAIT the
same as for overwrite IO.

> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -117,7 +117,7 @@ static struct file_system_type ext2_fs_type = {
> .name = "ext2",
> .mount = ext4_mount,
> .kill_sb = kill_block_super,
> - .fs_flags = FS_REQUIRES_DEV,
> + .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT,

FS_NOWAIT looks a bit too generic given these are filesystem feature flags.
Can we call it FS_NOWAIT_IO?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-04-04 08:41:22

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 6/8] nowait aio: ext4

On Tue, Apr 04, 2017 at 09:58:53AM +0200, Jan Kara wrote:
> FS_NOWAIT looks a bit too generic given these are filesystem feature flags.
> Can we call it FS_NOWAIT_IO?

It's way to generic as it's a feature of the particular file_operations
instance. But once we switch to using RWF_* we can just the existing
per-op feature checks for thos and the per-fs flag should just go away.

2017-04-04 18:41:09

by Goldwyn Rodrigues

[permalink] [raw]

Subject: Re: [PATCH 6/8] nowait aio: ext4

On 04/04/2017 03:41 AM, Christoph Hellwig wrote:
> On Tue, Apr 04, 2017 at 09:58:53AM +0200, Jan Kara wrote:
>> FS_NOWAIT looks a bit too generic given these are filesystem feature flags.
>> Can we call it FS_NOWAIT_IO?
>
> It's way to generic as it's a feature of the particular file_operations
> instance. But once we switch to using RWF_* we can just the existing
> per-op feature checks for thos and the per-fs flag should just go away.
>

I am working on incorporating RWF_* flags. However, I am not sure how
RWF_* flags would get rid of FS_NOWAIT/FS_NOWAIT_IO. Since most of
"blocking" information is with the filesystem, it is a per-filesystem
flag to block out (EOPNOTSUPP) the filesystems which do not support it.

--
Goldwyn

2017-04-06 22:54:15

by Darrick J. Wong

[permalink] [raw]

Subject: Re: [PATCH 7/8] nowait aio: xfs

On Mon, Apr 03, 2017 at 11:52:11PM -0700, Christoph Hellwig wrote:
> > + if (unaligned_io) {
> > + /* If we are going to wait for other DIO to finish, bail */
> > + if ((iocb->ki_flags & IOCB_NOWAIT) &&
> > + atomic_read(&inode->i_dio_count))
> > + return -EAGAIN;
> > inode_dio_wait(inode);
>
> This checks i_dio_count twice in the nowait case, I think it should be:
>
> if (iocb->ki_flags & IOCB_NOWAIT) {
> if (atomic_read(&inode->i_dio_count))
> return -EAGAIN;
> } else {
> inode_dio_wait(inode);
> }
>
> > if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
> > if (flags & IOMAP_DIRECT) {
> > + /* A reflinked inode will result in CoW alloc */
> > + if (flags & IOMAP_NOWAIT) {
> > + error = -EAGAIN;
> > + goto out_unlock;
> > + }
>
> This is a bit pessimistic - just because the inode has any shared
> extents we could still write into unshared ones. For now I think this
> pessimistic check is fine, but the comment should be corrected.

Consider what happens in both _reflink_{allocate,reserve}_cow. If there
is already an existing reservation in the CoW fork then we'll have to
CoW and therefore can't satisfy the NOWAIT flag. If there isn't already
anything in the CoW fork, then we have to see if there are shared blocks
by calling _reflink_trim_around_shared. That performs a refcountbt
lookup, which involves locking the AGF, so we also can't satisfy NOWAIT.

IOWs, I think this hunk has to move outside the IOMAP_DIRECT check to
cover both write-to-reflinked-file cases.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2017-04-07 11:34:28

by Goldwyn Rodrigues

[permalink] [raw]

Subject: Re: [PATCH 7/8] nowait aio: xfs

On 04/06/2017 05:54 PM, Darrick J. Wong wrote:
> On Mon, Apr 03, 2017 at 11:52:11PM -0700, Christoph Hellwig wrote:
>>> + if (unaligned_io) {
>>> + /* If we are going to wait for other DIO to finish, bail */
>>> + if ((iocb->ki_flags & IOCB_NOWAIT) &&
>>> + atomic_read(&inode->i_dio_count))
>>> + return -EAGAIN;
>>> inode_dio_wait(inode);
>>
>> This checks i_dio_count twice in the nowait case, I think it should be:
>>
>> if (iocb->ki_flags & IOCB_NOWAIT) {
>> if (atomic_read(&inode->i_dio_count))
>> return -EAGAIN;
>> } else {
>> inode_dio_wait(inode);
>> }
>>
>>> if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
>>> if (flags & IOMAP_DIRECT) {
>>> + /* A reflinked inode will result in CoW alloc */
>>> + if (flags & IOMAP_NOWAIT) {
>>> + error = -EAGAIN;
>>> + goto out_unlock;
>>> + }
>>
>> This is a bit pessimistic - just because the inode has any shared
>> extents we could still write into unshared ones. For now I think this
>> pessimistic check is fine, but the comment should be corrected.
>
> Consider what happens in both _reflink_{allocate,reserve}_cow. If there
> is already an existing reservation in the CoW fork then we'll have to
> CoW and therefore can't satisfy the NOWAIT flag. If there isn't already
> anything in the CoW fork, then we have to see if there are shared blocks
> by calling _reflink_trim_around_shared. That performs a refcountbt
> lookup, which involves locking the AGF, so we also can't satisfy NOWAIT.
>
> IOWs, I think this hunk has to move outside the IOMAP_DIRECT check to
> cover both write-to-reflinked-file cases.
>

IOMAP_NOWAIT is set only with IOMAP_DIRECT since the nowait feature is
for direct-IO only. This is checked early on, when we are checking for
user-passed flags, and if not, -EINVAL is returned.

--
Goldwyn

2017-04-07 15:08:51

by Darrick J. Wong

[permalink] [raw]

Subject: Re: [PATCH 7/8] nowait aio: xfs

On Fri, Apr 07, 2017 at 06:34:28AM -0500, Goldwyn Rodrigues wrote:
>
>
> On 04/06/2017 05:54 PM, Darrick J. Wong wrote:
> > On Mon, Apr 03, 2017 at 11:52:11PM -0700, Christoph Hellwig wrote:
> >>> + if (unaligned_io) {
> >>> + /* If we are going to wait for other DIO to finish, bail */
> >>> + if ((iocb->ki_flags & IOCB_NOWAIT) &&
> >>> + atomic_read(&inode->i_dio_count))
> >>> + return -EAGAIN;
> >>> inode_dio_wait(inode);
> >>
> >> This checks i_dio_count twice in the nowait case, I think it should be:
> >>
> >> if (iocb->ki_flags & IOCB_NOWAIT) {
> >> if (atomic_read(&inode->i_dio_count))
> >> return -EAGAIN;
> >> } else {
> >> inode_dio_wait(inode);
> >> }
> >>
> >>> if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
> >>> if (flags & IOMAP_DIRECT) {
> >>> + /* A reflinked inode will result in CoW alloc */
> >>> + if (flags & IOMAP_NOWAIT) {
> >>> + error = -EAGAIN;
> >>> + goto out_unlock;
> >>> + }
> >>
> >> This is a bit pessimistic - just because the inode has any shared
> >> extents we could still write into unshared ones. For now I think this
> >> pessimistic check is fine, but the comment should be corrected.
> >
> > Consider what happens in both _reflink_{allocate,reserve}_cow. If there
> > is already an existing reservation in the CoW fork then we'll have to
> > CoW and therefore can't satisfy the NOWAIT flag. If there isn't already
> > anything in the CoW fork, then we have to see if there are shared blocks
> > by calling _reflink_trim_around_shared. That performs a refcountbt
> > lookup, which involves locking the AGF, so we also can't satisfy NOWAIT.
> >
> > IOWs, I think this hunk has to move outside the IOMAP_DIRECT check to
> > cover both write-to-reflinked-file cases.
> >
>
> IOMAP_NOWAIT is set only with IOMAP_DIRECT since the nowait feature is
> for direct-IO only. This is checked early on, when we are checking for

Ah, ok. Disregard what I said about moving it then.

--D

> user-passed flags, and if not, -EINVAL is returned.
>
>
> --
> Goldwyn

2017-04-10 07:45:39

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 6/8] nowait aio: ext4

On Tue, Apr 04, 2017 at 01:41:09PM -0500, Goldwyn Rodrigues wrote:
> I am working on incorporating RWF_* flags. However, I am not sure how
> RWF_* flags would get rid of FS_NOWAIT/FS_NOWAIT_IO. Since most of
> "blocking" information is with the filesystem, it is a per-filesystem
> flag to block out (EOPNOTSUPP) the filesystems which do not support it.

You need to check the flag in the actual read/write methods as the
support for features on Linux is not a per-file_system_type thing.

2017-04-10 12:37:50

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 6/8] nowait aio: ext4

On Mon 10-04-17 00:45:39, Christoph Hellwig wrote:
> On Tue, Apr 04, 2017 at 01:41:09PM -0500, Goldwyn Rodrigues wrote:
> > I am working on incorporating RWF_* flags. However, I am not sure how
> > RWF_* flags would get rid of FS_NOWAIT/FS_NOWAIT_IO. Since most of
> > "blocking" information is with the filesystem, it is a per-filesystem
> > flag to block out (EOPNOTSUPP) the filesystems which do not support it.
>
> You need to check the flag in the actual read/write methods as the
> support for features on Linux is not a per-file_system_type thing.

I don't understand here. Do you want that all filesystems support NOWAIT
direct IO? IMO that's not realistic and also not necessary. In reality
different filesystems support different sets or operations and we have a
precedens for that in various fallocate operations, rename exchange, or
O_TMPFILE support...

Honza
--
Jan Kara <jack-IBi9RG/[email protected]>
SUSE Labs, CR

2017-04-10 14:39:43

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 6/8] nowait aio: ext4

On Mon, Apr 10, 2017 at 02:37:50PM +0200, Jan Kara wrote:
> I don't understand here. Do you want that all filesystems support NOWAIT
> direct IO?

No. Per-file_system_type is way to coarse grained. All feature flag
needs to be per-file_operation at least for cases like ext4 with our
without extents (or journal) XFS v4 vs v5, different NFS versions, etc.

For RWF_* each file operation simply declares if the feature is
supported not by rejecting unknown ones. FIEMAP does the same as do
a few other interfaces.

2017-04-10 15:13:59

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 6/8] nowait aio: ext4

On Mon 10-04-17 07:39:43, Christoph Hellwig wrote:
> On Mon, Apr 10, 2017 at 02:37:50PM +0200, Jan Kara wrote:
> > I don't understand here. Do you want that all filesystems support NOWAIT
> > direct IO?
>
> No. Per-file_system_type is way to coarse grained. All feature flag
> needs to be per-file_operation at least for cases like ext4 with our
> without extents (or journal) XFS v4 vs v5, different NFS versions, etc.

Ah, I see your point now. Thanks for patience. I think we could make this
work by making generic_file_write/read_iter() refuse NOWAIT IO with
EOPNOTSUPP and then only modify those few filesystems that implement their
own iter helpers and will not initially support NOWAIT IO. Sounds easy
enough.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR