2022-05-28 19:25:41

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH] f2fs: add sysfs entry to avoid FUA

On 05/27, Eric Biggers wrote:
> [+Cc linux-block for FUA, and linux-xfs for iomap]
>
> On Fri, May 27, 2022 at 01:59:55PM -0700, Jaegeuk Kim wrote:
> > Some UFS storage gives slower performance on FUA than write+cache_flush.
> > Let's give a way to manage it.
> >
> > Signed-off-by: Jaegeuk Kim <[email protected]>
>
> Should the driver even be saying that it has FUA support in this case? If the
> driver didn't claim FUA support, that would also solve this problem.

I think there's still some benefit to use FUA such as small chunk writes
for checkpoint.

>
> > ---
> > Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++++++
> > fs/f2fs/data.c | 2 ++
> > fs/f2fs/f2fs.h | 1 +
> > fs/f2fs/sysfs.c | 2 ++
> > 4 files changed, 12 insertions(+)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
> > index 9b583dd0298b..cd96b09d7182 100644
> > --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> > +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> > @@ -434,6 +434,7 @@ Date: April 2020
> > Contact: "Daeho Jeong" <[email protected]>
> > Description: Give a way to change iostat_period time. 3secs by default.
> > The new iostat trace gives stats gap given the period.
> > +
> > What: /sys/fs/f2fs/<disk>/max_io_bytes
> > Date: December 2020
> > Contact: "Jaegeuk Kim" <[email protected]>
> > @@ -442,6 +443,12 @@ Description: This gives a control to limit the bio size in f2fs.
> > whereas, if it has a certain bytes value, f2fs won't submit a
> > bio larger than that size.
> >
> > +What: /sys/fs/f2fs/<disk>/no_fua_dio
> > +Date: May 2022
> > +Contact: "Jaegeuk Kim" <[email protected]>
> > +Description: This gives a signal to iomap, which should not use FUA for
> > + direct IOs. Default: 0.
>
> iomap is an implementation detail, so it shouldn't be mentioned in UAPI
> documentation. UAPI documentation should describe user-visible behavior only.

Ok.

>
> > +
> > What: /sys/fs/f2fs/<disk>/stat/sb_status
> > Date: December 2020
> > Contact: "Chao Yu" <[email protected]>
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index f5f2b7233982..23486486eab2 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -4153,6 +4153,8 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> > if ((inode->i_state & I_DIRTY_DATASYNC) ||
> > offset + length > i_size_read(inode))
> > iomap->flags |= IOMAP_F_DIRTY;
> > + if (F2FS_I_SB(inode)->no_fua_dio)
> > + iomap->flags |= IOMAP_F_DIRTY;
>
> This is overloading the IOMAP_F_DIRTY flag to mean something other than dirty.
> Perhaps this flag needs to be renamed, or a new flag should be added?

I'm not sure it's acceptable to add another flag for f2fs only.

>
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index e10838879538..c2400ea0080b 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1671,6 +1671,7 @@ struct f2fs_sb_info {
> > int dir_level; /* directory level */
> > int readdir_ra; /* readahead inode in readdir */
> > u64 max_io_bytes; /* max io bytes to merge IOs */
> > + int no_fua_dio; /* avoid FUA in DIO */
>
> Make this a bool?

Done.

>
> > diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> > index 4c50aedd5144..24d628ca92cc 100644
> > --- a/fs/f2fs/sysfs.c
> > +++ b/fs/f2fs/sysfs.c
> > @@ -771,6 +771,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_period_ms, iostat_period_ms);
> > #endif
> > F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra);
> > F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_io_bytes, max_io_bytes);
> > +F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, no_fua_dio, no_fua_dio);
> > F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold);
> > F2FS_RW_ATTR(F2FS_SBI, f2fs_super_block, extension_list, extension_list);
> > #ifdef CONFIG_F2FS_FAULT_INJECTION
> > @@ -890,6 +891,7 @@ static struct attribute *f2fs_attrs[] = {
> > #endif
> > ATTR_LIST(readdir_ra),
> > ATTR_LIST(max_io_bytes),
> > + ATTR_LIST(no_fua_dio),
>
> Where is it validated that only valid values (0 or 1) can be written to this
> file?

Added.

>
> - Eric


2022-05-28 19:36:55

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] f2fs: add sysfs entry to avoid FUA

On Fri, May 27, 2022 at 06:06:08PM -0700, Jaegeuk Kim wrote:
> On 05/27, Eric Biggers wrote:
> > [+Cc linux-block for FUA, and linux-xfs for iomap]
> >
> > On Fri, May 27, 2022 at 01:59:55PM -0700, Jaegeuk Kim wrote:
> > > Some UFS storage gives slower performance on FUA than write+cache_flush.
> > > Let's give a way to manage it.
> > >
> > > Signed-off-by: Jaegeuk Kim <[email protected]>
> >
> > Should the driver even be saying that it has FUA support in this case? If the
> > driver didn't claim FUA support, that would also solve this problem.
>
> I think there's still some benefit to use FUA such as small chunk writes
> for checkpoint.
>
> >
> > > ---
> > > Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++++++
> > > fs/f2fs/data.c | 2 ++
> > > fs/f2fs/f2fs.h | 1 +
> > > fs/f2fs/sysfs.c | 2 ++
> > > 4 files changed, 12 insertions(+)
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
> > > index 9b583dd0298b..cd96b09d7182 100644
> > > --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> > > +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> > > @@ -434,6 +434,7 @@ Date: April 2020
> > > Contact: "Daeho Jeong" <[email protected]>
> > > Description: Give a way to change iostat_period time. 3secs by default.
> > > The new iostat trace gives stats gap given the period.
> > > +
> > > What: /sys/fs/f2fs/<disk>/max_io_bytes
> > > Date: December 2020
> > > Contact: "Jaegeuk Kim" <[email protected]>
> > > @@ -442,6 +443,12 @@ Description: This gives a control to limit the bio size in f2fs.
> > > whereas, if it has a certain bytes value, f2fs won't submit a
> > > bio larger than that size.
> > >
> > > +What: /sys/fs/f2fs/<disk>/no_fua_dio
> > > +Date: May 2022
> > > +Contact: "Jaegeuk Kim" <[email protected]>
> > > +Description: This gives a signal to iomap, which should not use FUA for
> > > + direct IOs. Default: 0.
> >
> > iomap is an implementation detail, so it shouldn't be mentioned in UAPI
> > documentation. UAPI documentation should describe user-visible behavior only.
>
> Ok.
>
> >
> > > +
> > > What: /sys/fs/f2fs/<disk>/stat/sb_status
> > > Date: December 2020
> > > Contact: "Chao Yu" <[email protected]>
> > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > index f5f2b7233982..23486486eab2 100644
> > > --- a/fs/f2fs/data.c
> > > +++ b/fs/f2fs/data.c
> > > @@ -4153,6 +4153,8 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> > > if ((inode->i_state & I_DIRTY_DATASYNC) ||
> > > offset + length > i_size_read(inode))
> > > iomap->flags |= IOMAP_F_DIRTY;
> > > + if (F2FS_I_SB(inode)->no_fua_dio)
> > > + iomap->flags |= IOMAP_F_DIRTY;
> >
> > This is overloading the IOMAP_F_DIRTY flag to mean something other than dirty.
> > Perhaps this flag needs to be renamed, or a new flag should be added?
>
> I'm not sure it's acceptable to add another flag for f2fs only.

I think Al and willy have been throwing around patches to tell
iomap_dio_rw or someone that the caller will handle cache flushes and
that it shouldn't initiate them on its own; would that help here?

--D

> >
> > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > index e10838879538..c2400ea0080b 100644
> > > --- a/fs/f2fs/f2fs.h
> > > +++ b/fs/f2fs/f2fs.h
> > > @@ -1671,6 +1671,7 @@ struct f2fs_sb_info {
> > > int dir_level; /* directory level */
> > > int readdir_ra; /* readahead inode in readdir */
> > > u64 max_io_bytes; /* max io bytes to merge IOs */
> > > + int no_fua_dio; /* avoid FUA in DIO */
> >
> > Make this a bool?
>
> Done.
>
> >
> > > diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> > > index 4c50aedd5144..24d628ca92cc 100644
> > > --- a/fs/f2fs/sysfs.c
> > > +++ b/fs/f2fs/sysfs.c
> > > @@ -771,6 +771,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_period_ms, iostat_period_ms);
> > > #endif
> > > F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra);
> > > F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_io_bytes, max_io_bytes);
> > > +F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, no_fua_dio, no_fua_dio);
> > > F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold);
> > > F2FS_RW_ATTR(F2FS_SBI, f2fs_super_block, extension_list, extension_list);
> > > #ifdef CONFIG_F2FS_FAULT_INJECTION
> > > @@ -890,6 +891,7 @@ static struct attribute *f2fs_attrs[] = {
> > > #endif
> > > ATTR_LIST(readdir_ra),
> > > ATTR_LIST(max_io_bytes),
> > > + ATTR_LIST(no_fua_dio),
> >
> > Where is it validated that only valid values (0 or 1) can be written to this
> > file?
>
> Added.
>
> >
> > - Eric

2022-05-28 20:15:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] f2fs: add sysfs entry to avoid FUA

On Fri, May 27, 2022 at 06:06:08PM -0700, Jaegeuk Kim wrote:
> I think there's still some benefit to use FUA such as small chunk writes
> for checkpoint.

Did you measure if there is? Because some SSDs basically implemented
FUA as an implied flush after the write, in which case it would not
really help there either (but also not hurt).

But as the previous two maintainers already said - this needs quirking
at the driver layer, not in the submitter.

2022-06-01 19:57:50

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH] f2fs: add sysfs entry to avoid FUA

On 05/27, Christoph Hellwig wrote:
> On Fri, May 27, 2022 at 06:06:08PM -0700, Jaegeuk Kim wrote:
> > I think there's still some benefit to use FUA such as small chunk writes
> > for checkpoint.
>
> Did you measure if there is? Because some SSDs basically implemented
> FUA as an implied flush after the write, in which case it would not
> really help there either (but also not hurt).
>
> But as the previous two maintainers already said - this needs quirking
> at the driver layer, not in the submitter.

Thanks, I indeed measured this using UFS, and it turned out cache_flush
is better than FUA all the time like this. Hence, I posted a quirk [1].

Write(us/KB) 4 64 256 1024 2048
FUA 873.792 754.604 995.624 1011.67 1067.99
CACHE_FLUSH 824.703 712.98 800.307 1019.5 1037.37

[1] https://lore.kernel.org/linux-scsi/[email protected]/T/#u