2019-12-19 13:53:57

by Ritesh Harjani

[permalink] [raw]
Subject: Re: [PATCH] ext4: Optimize ext4 DIO overwrites



On 12/18/19 11:14 PM, Jan Kara wrote:
> Currently we start transaction for mapping every extent for writing
> using direct IO. This is unnecessary when we know we are overwriting
> already allocated blocks and the overhead of starting a transaction can
> be significant especially for multithreaded workloads doing small writes.
> Use iomap operations that avoid starting a transaction for direct IO
> overwrites.
>
> This improves throughput of 4k random writes - fio jobfile:
> [global]
> rw=randrw
> norandommap=1
> invalidate=0
> bs=4k
> numjobs=16
> time_based=1
> ramp_time=30
> runtime=120
> group_reporting=1
> ioengine=psync
> direct=1
> size=16G
> filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31
> file_service_type=random
> nrfiles=32
>
> from 3018MB/s to 4059MB/s in my test VM running test against simulated
> pmem device (note that before iomap conversion, this workload was able
> to achieve 3708MB/s because old direct IO path avoided transaction start
> for overwrites as well). For dax, the win is even larger improving
> throughput from 3042MB/s to 4311MB/s.

However for dax via ext4_dax_write_iter() path, we still need a way to
detect if it's overwrite and that path can be optimized too right?
I see, that this path could use both `shared inode locking` and
`no journal transaction` optimizations in case of overwrites. Correct?


>
> Reported-by: Dan Williams <[email protected]>
> Signed-off-by: Jan Kara <[email protected]>

This was one of the next AI I too wanted to do. I guess since everyone
loves performance improvements. :)

No problem with current patch. Looks good. Gave it a run too on my
system.

Reviewed-by: Ritesh Harjani <[email protected]>


However depending on which patch lands first one may need a
re-basing. Will conflict with this-
https://marc.info/?l=linux-ext4&m=157613016931238&w=2



> ---
> fs/ext4/ext4.h | 1 +
> fs/ext4/file.c | 4 +++-
> fs/ext4/inode.c | 21 +++++++++++++++++++++
> 3 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index f8578caba40d..e31fc5749a19 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3390,6 +3390,7 @@ static inline void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end)
> }
>
> extern const struct iomap_ops ext4_iomap_ops;
> +extern const struct iomap_ops ext4_iomap_overwrite_ops;
> extern const struct iomap_ops ext4_iomap_report_ops;
>
> static inline int ext4_buffer_uptodate(struct buffer_head *bh)
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 6a7293a5cda2..f8e4af72d64d 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -370,6 +370,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> loff_t offset;
> handle_t *handle;
> struct inode *inode = file_inode(iocb->ki_filp);
> + const struct iomap_ops *iomap_ops = &ext4_iomap_ops;
> bool extend = false, overwrite = false, unaligned_aio = false;
>
> if (iocb->ki_flags & IOCB_NOWAIT) {
> @@ -415,6 +416,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> if (!unaligned_aio && ext4_overwrite_io(inode, offset, count) &&
> ext4_should_dioread_nolock(inode)) {
> overwrite = true;
> + iomap_ops = &ext4_iomap_overwrite_ops;
> downgrade_write(&inode->i_rwsem);
> }
>
> @@ -435,7 +437,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> ext4_journal_stop(handle);
> }
>
> - ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops,
> + ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
> is_sync_kiocb(iocb) || unaligned_aio || extend);
>
> if (extend)
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 28f28de0c1b6..e1eb4493aacc 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3448,6 +3448,22 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> return 0;
> }
>
> +static int ext4_iomap_overwrite_begin(struct inode *inode, loff_t offset,
> + loff_t length, unsigned flags, struct iomap *iomap,
> + struct iomap *srcmap)
> +{
> + int ret;
> +
> + /*
> + * Even for writes we don't need to allocate blocks, so just pretend
> + * we are reading to save overhead of starting a transaction.
> + */
> + flags &= ~IOMAP_WRITE;
> + ret = ext4_iomap_begin(inode, offset, length, flags, iomap, srcmap);
> + WARN_ON_ONCE(iomap->type != IOMAP_MAPPED);
> + return ret;
> +}
> +
> static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length,
> ssize_t written, unsigned flags, struct iomap *iomap)
> {
> @@ -3469,6 +3485,11 @@ const struct iomap_ops ext4_iomap_ops = {
> .iomap_end = ext4_iomap_end,
> };
>
> +const struct iomap_ops ext4_iomap_overwrite_ops = {
> + .iomap_begin = ext4_iomap_overwrite_begin,
> + .iomap_end = ext4_iomap_end,
> +};
> +
> static bool ext4_iomap_is_delalloc(struct inode *inode,
> struct ext4_map_blocks *map)
> {
>


2019-12-19 19:29:55

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: Optimize ext4 DIO overwrites

On Thu 19-12-19 19:23:28, Ritesh Harjani wrote:
> On 12/18/19 11:14 PM, Jan Kara wrote:
> > Currently we start transaction for mapping every extent for writing
> > using direct IO. This is unnecessary when we know we are overwriting
> > already allocated blocks and the overhead of starting a transaction can
> > be significant especially for multithreaded workloads doing small writes.
> > Use iomap operations that avoid starting a transaction for direct IO
> > overwrites.
> >
> > This improves throughput of 4k random writes - fio jobfile:
> > [global]
> > rw=randrw
> > norandommap=1
> > invalidate=0
> > bs=4k
> > numjobs=16
> > time_based=1
> > ramp_time=30
> > runtime=120
> > group_reporting=1
> > ioengine=psync
> > direct=1
> > size=16G
> > filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31
> > file_service_type=random
> > nrfiles=32
> >
> > from 3018MB/s to 4059MB/s in my test VM running test against simulated
> > pmem device (note that before iomap conversion, this workload was able
> > to achieve 3708MB/s because old direct IO path avoided transaction start
> > for overwrites as well). For dax, the win is even larger improving
> > throughput from 3042MB/s to 4311MB/s.
>
> However for dax via ext4_dax_write_iter() path, we still need a way to
> detect if it's overwrite and that path can be optimized too right?
> I see, that this path could use both `shared inode locking` and
> `no journal transaction` optimizations in case of overwrites. Correct?

I don't think we can really afford the shared locking in
ext4_dax_write_iter() as POSIX requires overlapping writes to be
serialized. But we could still optimize-away the transaction starts.

> > Reported-by: Dan Williams <[email protected]>
> > Signed-off-by: Jan Kara <[email protected]>
>
> This was one of the next AI I too wanted to do. I guess since everyone
> loves performance improvements. :)
>
> No problem with current patch. Looks good. Gave it a run too on my
> system.
>
> Reviewed-by: Ritesh Harjani <[email protected]>

Thanks!

> However depending on which patch lands first one may need a
> re-basing. Will conflict with this-
> https://marc.info/?l=linux-ext4&m=157613016931238&w=2

Yes, but the conflict is minor and trivial to resolve.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2019-12-26 17:18:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: Optimize ext4 DIO overwrites

On Thu, Dec 19, 2019 at 08:28:23PM +0100, Jan Kara wrote:
> > However depending on which patch lands first one may need a
> > re-basing. Will conflict with this-
> > https://marc.info/?l=linux-ext4&m=157613016931238&w=2
>
> Yes, but the conflict is minor and trivial to resolve.
>

Is this the correct resolution?

--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -447,6 +447,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file_inode(iocb->ki_filp);
loff_t offset = iocb->ki_pos;
size_t count = iov_iter_count(from);
+ const struct iomap_ops *iomap_ops = &ext4_iomap_ops;
bool extend = false, unaligned_io = false;
bool ilock_shared = true;

@@ -526,7 +527,9 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
ext4_journal_stop(handle);
}

- ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops,
+ if (ilock_shared)
+ iomap_ops = &ext4_iomap_overwrite_ops;
+ ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
is_sync_kiocb(iocb) || unaligned_io || extend);

if (extend)

- Ted

2019-12-27 05:33:00

by Ritesh Harjani

[permalink] [raw]
Subject: Re: [PATCH] ext4: Optimize ext4 DIO overwrites



On 12/26/19 10:47 PM, Theodore Y. Ts'o wrote:
> On Thu, Dec 19, 2019 at 08:28:23PM +0100, Jan Kara wrote:
>>> However depending on which patch lands first one may need a
>>> re-basing. Will conflict with this-
>>> https://marc.info/?l=linux-ext4&m=157613016931238&w=2
>>
>> Yes, but the conflict is minor and trivial to resolve.
>>
>
> Is this the correct resolution?
>
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -447,6 +447,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> struct inode *inode = file_inode(iocb->ki_filp);
> loff_t offset = iocb->ki_pos;
> size_t count = iov_iter_count(from);
> + const struct iomap_ops *iomap_ops = &ext4_iomap_ops;
> bool extend = false, unaligned_io = false;
> bool ilock_shared = true;
>
> @@ -526,7 +527,9 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> ext4_journal_stop(handle);
> }
>
> - ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops,
> + if (ilock_shared)
> + iomap_ops = &ext4_iomap_overwrite_ops;
> + ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
> is_sync_kiocb(iocb) || unaligned_io || extend);
>
> if (extend)
>
>

Yes, this looks correct to me.

Thanks
-ritesh

2020-01-06 09:33:29

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: Optimize ext4 DIO overwrites

On Thu 26-12-19 12:17:31, Theodore Y. Ts'o wrote:
> On Thu, Dec 19, 2019 at 08:28:23PM +0100, Jan Kara wrote:
> > > However depending on which patch lands first one may need a
> > > re-basing. Will conflict with this-
> > > https://marc.info/?l=linux-ext4&m=157613016931238&w=2
> >
> > Yes, but the conflict is minor and trivial to resolve.
> >
>
> Is this the correct resolution?

Looks good to me as well.

Honza

>
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -447,6 +447,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> struct inode *inode = file_inode(iocb->ki_filp);
> loff_t offset = iocb->ki_pos;
> size_t count = iov_iter_count(from);
> + const struct iomap_ops *iomap_ops = &ext4_iomap_ops;
> bool extend = false, unaligned_io = false;
> bool ilock_shared = true;
>
> @@ -526,7 +527,9 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> ext4_journal_stop(handle);
> }
>
> - ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops,
> + if (ilock_shared)
> + iomap_ops = &ext4_iomap_overwrite_ops;
> + ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
> is_sync_kiocb(iocb) || unaligned_io || extend);
>
> if (extend)
>
> - Ted
>
--
Jan Kara <[email protected]>
SUSE Labs, CR