2021-10-26 16:23:05

by Jingbo Xu

[permalink] [raw]
Subject: [Question] ext4/xfs: Default behavior changed after per-file DAX

Hi,

Recently I'm working on supporting per-file DAX for virtiofs [1]. Vivek
Goyal and I are interested [2] why the default behavior has changed
since introduction of per-file DAX on ext4 and xfs [3][4].

That is, before the introduction of per-file DAX, when user doesn't
specify '-o dax', DAX is disabled for all files. After supporting
per-file DAX, when neither '-o dax' nor '-o dax=always|inode|never' is
specified, it actually works in a '-o dax=inode' way if the underlying
blkdev is DAX capable, i.e. depending on the persistent inode flag. That
is, the default behavior has changed from user's perspective.

We are not sure if this is intentional or not. Appreciate if anyone
could offer some hint.


[1] https://lore.kernel.org/all/[email protected]/T/
[2]
https://lore.kernel.org/all/[email protected]/T/#mf067498887ca2023c64c8b8f6aec879557eb28f8
[3] 9cb20f94afcd2964944f9468e38da736ee855b19 ("fs/ext4: Make DAX mount
option a tri-state")
[4] 02beb2686ff964884756c581d513e103542dcc6a ("fs/xfs: Make DAX mount
option a tri-state")


--
Thanks,
Jeffle


2021-10-26 19:24:15

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

On Tue, Oct 26, 2021 at 10:12:17PM +0800, JeffleXu wrote:
> Hi,
>
> Recently I'm working on supporting per-file DAX for virtiofs [1]. Vivek
> Goyal and I are interested [2] why the default behavior has changed
> since introduction of per-file DAX on ext4 and xfs [3][4].
>
> That is, before the introduction of per-file DAX, when user doesn't
> specify '-o dax', DAX is disabled for all files. After supporting
> per-file DAX, when neither '-o dax' nor '-o dax=always|inode|never' is
> specified, it actually works in a '-o dax=inode' way if the underlying
> blkdev is DAX capable, i.e. depending on the persistent inode flag. That
> is, the default behavior has changed from user's perspective.
>
> We are not sure if this is intentional or not. Appreciate if anyone
> could offer some hint.

Yes, that was an intentional change to all three filesystems to make the
steps we expose to sysadmins/users consistent and documented officially:

https://lore.kernel.org/linux-fsdevel/[email protected]/

(This was the first step; ext* were converted as separate series around
the same time.)

--D

>
>
> [1] https://lore.kernel.org/all/[email protected]/T/
> [2]
> https://lore.kernel.org/all/[email protected]/T/#mf067498887ca2023c64c8b8f6aec879557eb28f8
> [3] 9cb20f94afcd2964944f9468e38da736ee855b19 ("fs/ext4: Make DAX mount
> option a tri-state")
> [4] 02beb2686ff964884756c581d513e103542dcc6a ("fs/xfs: Make DAX mount
> option a tri-state")
>
>
> --
> Thanks,
> Jeffle

2021-10-27 07:35:00

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

On Tue, Oct 26, 2021 at 08:48:34AM -0700, Darrick J. Wong wrote:
> On Tue, Oct 26, 2021 at 10:12:17PM +0800, JeffleXu wrote:
> > Hi,
> >
> > Recently I'm working on supporting per-file DAX for virtiofs [1]. Vivek
> > Goyal and I are interested [2] why the default behavior has changed
> > since introduction of per-file DAX on ext4 and xfs [3][4].
> >
> > That is, before the introduction of per-file DAX, when user doesn't
> > specify '-o dax', DAX is disabled for all files. After supporting
> > per-file DAX, when neither '-o dax' nor '-o dax=always|inode|never' is
> > specified, it actually works in a '-o dax=inode' way if the underlying
> > blkdev is DAX capable, i.e. depending on the persistent inode flag. That
> > is, the default behavior has changed from user's perspective.
> >
> > We are not sure if this is intentional or not. Appreciate if anyone
> > could offer some hint.
>
> Yes, that was an intentional change to all three filesystems to make the
> steps we expose to sysadmins/users consistent and documented officially:
>
> https://lore.kernel.org/linux-fsdevel/[email protected]/

Ok, so basically new dax options semantics are different from old "-o dax".

- dax=inode is default. This is change of behavior from old "-o dax" where
default was *no dax* at all.

- I tried xfs and mount does not fail even if user mounted with
"-o dax=inode" and underlying block device does not support dax.
That's little strange. Some users might expect a failure if certain
mount option can't be enabled.

So in general, what's the expected behavior with filesystem mount
options. If user passes a mount option and it can't be enabled,
should filesystem return error and force user to try again without
the certain mount option or silently fallback to something else.

I think in the past I have come across overlayfs users which demanded
that mount fails if certain overlayfs option they have passed in
can't be honored. They want to know about it so that they can either
fix the configuration or change mount option.

- With xfs, I mounted /dev/pmem0 with "-o dax=inode" and checked
/proc/mounts and I don't see "dax=inode" there. Is that intentional?

I am just trying to wrap my head around the new semantics as we are
trying to implement those for virtiofs.

So following is the side affects of behavior change.

A. If somebody wrote scripts and scanned for mount flags to decide whehter
dax is enabled or not, these will not work anymore. scripts will have
to be changed to stat() every file in filesystem and look for
STATX_ATTR_DAX flag to determine dax status.

I would have thought to not make dax=inode default and let user opt-in
for that using "dax=inode" mount option. But I guess people liked
dax=inode default better.

Anway, I guess if we want to keep the behavior of virtiofs in-line with
ext4/xfs, we might have to make dax=inode default (atleast in client).
Server default might be different because querying the state of
FS_XFLAG_DAX is extra ioctl() call on each LOOKUP and GETATTR call and
those who don't want to use DAX, might not want to pay this cost.

Thanks
Vivek

>
> (This was the first step; ext* were converted as separate series around
> the same time.)
>
> --D
>
> >
> >
> > [1] https://lore.kernel.org/all/[email protected]/T/
> > [2]
> > https://lore.kernel.org/all/[email protected]/T/#mf067498887ca2023c64c8b8f6aec879557eb28f8
> > [3] 9cb20f94afcd2964944f9468e38da736ee855b19 ("fs/ext4: Make DAX mount
> > option a tri-state")
> > [4] 02beb2686ff964884756c581d513e103542dcc6a ("fs/xfs: Make DAX mount
> > option a tri-state")
> >
> >
> > --
> > Thanks,
> > Jeffle
>

2021-10-27 08:09:11

by Ira Weiny

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

On Tue, Oct 26, 2021 at 03:25:51PM -0400, Vivek Goyal wrote:
> On Tue, Oct 26, 2021 at 08:48:34AM -0700, Darrick J. Wong wrote:
> > On Tue, Oct 26, 2021 at 10:12:17PM +0800, JeffleXu wrote:
> > > Hi,
> > >
> > > Recently I'm working on supporting per-file DAX for virtiofs [1]. Vivek
> > > Goyal and I are interested [2] why the default behavior has changed
> > > since introduction of per-file DAX on ext4 and xfs [3][4].
> > >
> > > That is, before the introduction of per-file DAX, when user doesn't
> > > specify '-o dax', DAX is disabled for all files. After supporting
> > > per-file DAX, when neither '-o dax' nor '-o dax=always|inode|never' is
> > > specified, it actually works in a '-o dax=inode' way if the underlying
> > > blkdev is DAX capable, i.e. depending on the persistent inode flag. That
> > > is, the default behavior has changed from user's perspective.
> > >
> > > We are not sure if this is intentional or not. Appreciate if anyone
> > > could offer some hint.
> >
> > Yes, that was an intentional change to all three filesystems to make the
> > steps we expose to sysadmins/users consistent and documented officially:
> >
> > https://lore.kernel.org/linux-fsdevel/[email protected]/
>
> Ok, so basically new dax options semantics are different from old "-o dax".
>
> - dax=inode is default. This is change of behavior from old "-o dax" where
> default was *no dax* at all.

Again I think this is debatable. The file system will still work and all files
will, by default, _not_ use DAX. Specifying '-o dax' or future proof '-o
dax=always' will override this just like it did before.

>
> - I tried xfs and mount does not fail even if user mounted with
> "-o dax=inode" and underlying block device does not support dax.
> That's little strange. Some users might expect a failure if certain
> mount option can't be enabled.

But the mount option _is_ enabled. It is just that the files can't be used in
DAX mode. The files can still have their inode flag set. This was
specifically discussed to support backing up file systems on devices which did
not support DAX. This allows you to restore that file system with all the
proper inode flags in place.

If a DAX file is on a non-DAX device the file will not be in DAX mode when
opened. A statx() call can determine this.

>
> So in general, what's the expected behavior with filesystem mount
> options. If user passes a mount option and it can't be enabled,
> should filesystem return error and force user to try again without
> the certain mount option or silently fallback to something else.

But it is enabled.

>
> I think in the past I have come across overlayfs users which demanded
> that mount fails if certain overlayfs option they have passed in
> can't be honored. They want to know about it so that they can either
> fix the configuration or change mount option.

I understand how this is a bit convoluted. However, for 99% of the users out
there who are using DAX on DAX devices this is not going to change anything for
them. (Especially since they are all probably using '-o dax').

>
> - With xfs, I mounted /dev/pmem0 with "-o dax=inode" and checked
> /proc/mounts and I don't see "dax=inode" there. Is that intentional?

Yes absolutely. I originally implemented it to show dax=inode and was told
that default mount options were not to be shown. After thinking about it I
agreed. It is intractable to print out all the mount options which are
defaulted. The user can read what the defaults are an know what the file
system is using for options which are not overridden.

>
> I am just trying to wrap my head around the new semantics as we are
> trying to implement those for virtiofs.
>
> So following is the side affects of behavior change.
>
> A. If somebody wrote scripts and scanned for mount flags to decide whehter
> dax is enabled or not, these will not work anymore. scripts will have
> to be changed to stat() every file in filesystem and look for
> STATX_ATTR_DAX flag to determine dax status.

Why would you need to stat() 'every' file? Why, and to who, is it important
that every file in the file system is in dax mode? I was getting the feeling
that it was important to the client to know this on the server but you last
email in the other thread has confused me on that point.[1]


>
> I would have thought to not make dax=inode default and let user opt-in
> for that using "dax=inode" mount option. But I guess people liked
> dax=inode default better.

Yes, because it gives the _end_ user (not the sys-admin) the control on their
individual files.

>
> Anway, I guess if we want to keep the behavior of virtiofs in-line with
> ext4/xfs, we might have to make dax=inode default (atleast in client).

Yes, I think we should make dax=inode the default.

> Server default might be different because querying the state of
> FS_XFLAG_DAX is extra ioctl() call on each LOOKUP and GETATTR call and
> those who don't want to use DAX, might not want to pay this cost.

I've not responded on the other thread because I feel like I've reached the
depth of my virtiofs knowledge. From your email:

"In general, there is no connection between DAX in guest and device on
host enabling DAX."[1]

But then you say:

"... if server does not support DAX and client asks for DAX, mount will
fail. (As it should fail)."[1]

So I decided I need to review the virtiofs code a bit to better understand this
relationship. Because I'm confused.

As to the subject of having a file based policy; any such policy is not the
kernels job. If users want to have different policies based on file size they
are free to do that with dax=inode. I don't see how that works with the other
2 mount options.

Furthermore, performance of different files may be device specific and moving a
file from one device to another may result in the user wanting to change the
mode, which dax=inode allows.

All of this this supports dax=inode as a better default.

I get the feeling that your most concerned with the admin user being able to
see if the entire file system is in DAX mode. Is that true? And I can't argue
that indeed that is different. But I'm failing to see the use case for that
being a requirement.

Is the biggest issue the lack of visibility to see if the device supports DAX?

Ira

[1] https://lore.kernel.org/linux-fsdevel/[email protected]/

2021-10-27 09:15:06

by Dave Chinner

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

On Tue, Oct 26, 2021 at 03:25:51PM -0400, Vivek Goyal wrote:
> On Tue, Oct 26, 2021 at 08:48:34AM -0700, Darrick J. Wong wrote:
> > On Tue, Oct 26, 2021 at 10:12:17PM +0800, JeffleXu wrote:
> > > Hi,
> > >
> > > Recently I'm working on supporting per-file DAX for virtiofs [1]. Vivek
> > > Goyal and I are interested [2] why the default behavior has changed
> > > since introduction of per-file DAX on ext4 and xfs [3][4].
> > >
> > > That is, before the introduction of per-file DAX, when user doesn't
> > > specify '-o dax', DAX is disabled for all files. After supporting
> > > per-file DAX, when neither '-o dax' nor '-o dax=always|inode|never' is
> > > specified, it actually works in a '-o dax=inode' way if the underlying
> > > blkdev is DAX capable, i.e. depending on the persistent inode flag. That
> > > is, the default behavior has changed from user's perspective.
> > >
> > > We are not sure if this is intentional or not. Appreciate if anyone
> > > could offer some hint.
> >
> > Yes, that was an intentional change to all three filesystems to make the
> > steps we expose to sysadmins/users consistent and documented officially:
> >
> > https://lore.kernel.org/linux-fsdevel/[email protected]/
>
> Ok, so basically new dax options semantics are different from old "-o dax".

Well, yes. "-o dax" is exactly equivalent of "-o dax=always", but it
is deprecated and should be ignored for the purposes of a new FSDAX
implementation. It will go away eventually.

> - dax=inode is default. This is change of behavior from old "-o dax" where
> default was *no dax* at all.

No, it's not actually a change of behaviour. The default behaviour
of a filesystem that supports DAX is identical when you have no
mount option specified: if you've taken no action to enable DAX,
then DAX will be disabled and not used.

We originally implemented DAX with per-inode flags an no mount
option in XFS - the mount option came along with the ext4 DAX
implementation for testing because it didn't have on-disk inode
flags for DAX.

IOWs, the dax=inode default reflects how we originally intended
FSDAX to be managed and how it originally behaved on XFS when no
mount option was specified. Then we came across bugs in dynamically
changing the per-inode DAX state, we temporarily disabled the
on-disk flags on XFS (because EXPERIMENTAL). Then some people
started incorrectly associated "no dax option" with "admin wants dax
disabled". Then we fixed the bugs with changing on-disk inode flags,
ext4 added an on-disk flag and we re-enabled them. The result was a
tristate config situation - never, always and per-inode...

> - I tried xfs and mount does not fail even if user mounted with
> "-o dax=inode" and underlying block device does not support dax.
> That's little strange. Some users might expect a failure if certain
> mount option can't be enabled.

It's perfectly reasonable. If the hardware doesn't support DAX, then
we just always behave as if dax=never is set. IOWs, we simply ignore
the inode DAX hint, because we can never enable it. There's just no
reason for being obnoxious and rejecting mounts just because the
block device doesn't support DAX.

Then we have to consider filesystems with multiple block devices
that have different DAX capabilities. dax=inode has to transparently
become dax=never for the block devices that don't support DAX, but
still operate as dax=inode for the other DAX capable block devices.

We also have to consider that block devices can change configuration
dynamically - think about teired storage (e.g a dm-cache device)
that has pmem for hot access and SSDs for backing store. That could
mean we have DAX capable access for hot data, but not for cold data
fetched/stored from SSD. We might manually migrate data between
teirs and so the dax capability of the data sets can change
dynamically.

The actual presence of DAX should be completely transparent and
irrelevant to the application unless the application is specifically
dependent on DAX being enabled (i.e. using application level CPU
cache flushes for data integrity purposes). If so, it is up to the
application to check STATX_ATTR_DAX on it's open files and refuse to
operate.

> So in general, what's the expected behavior with filesystem mount
> options. If user passes a mount option and it can't be enabled,
> should filesystem return error and force user to try again without
> the certain mount option or silently fallback to something else.
>
> I think in the past I have come across overlayfs users which demanded
> that mount fails if certain overlayfs option they have passed in
> can't be honored. They want to know about it so that they can either
> fix the configuration or change mount option.

It's up to the filesystem how it handles mount options. XFS just
turns dax=always into dax=never with a warning and continues
onwards. There's no point in forcing the user to mount with a
different mount option - the end result is going to be exactly the
same (i.e. dax=never because hardware doesn't support it).

> - With xfs, I mounted /dev/pmem0 with "-o dax=inode" and checked
> /proc/mounts and I don't see "dax=inode" there. Is that intentional?

Yes, XFS policy is to elide default options from the mount table.

> So following is the side affects of behavior change.
>
> A. If somebody wrote scripts and scanned for mount flags to decide whehter
> dax is enabled or not, these will not work anymore.

Correct - this was never supported in the first place and we went
through this years ago with intel doing dodgy things like this in their
userspace library that we never intended to support.

> scripts will have
> to be changed to stat() every file in filesystem and look for
> STATX_ATTR_DAX flag to determine dax status.

If you are scanning the filesystem for "DAX capability" then you are
doing it wrong. DAX capability is a property of the underlying block
device:

$ cat /sys/block/pmem0/queue/dax
1
$

And that tells you if the filesystem is on DAX capable hardware
and hence can be used if the admin and/or application turns it on.

> I would have thought to not make dax=inode default and let user opt-in
> for that using "dax=inode" mount option. But I guess people liked
> dax=inode default better.
>
> Anway, I guess if we want to keep the behavior of virtiofs in-line with
> ext4/xfs, we might have to make dax=inode default (atleast in client).

Yes, I've been asking you to make dax=inode the default for some
time now.

> Server default might be different because querying the state of
> FS_XFLAG_DAX is extra ioctl() call on each LOOKUP and GETATTR call and
> those who don't want to use DAX, might not want to pay this cost.

The admin cost of managing per-inode/per-directory DAX capability is
negliable. It's a "set-once" operation done at application
installation/configuration/data set initialisation time, and never
changed again. Performance/overhead arguments for per-inode flag
admin hold no water at all.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2021-10-27 14:32:06

by Jingbo Xu

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

Sorry for the noise. Ira Weiny had replied my previous mail [1] but
unfortunately the reply was moved into junk box, and I didn't note that.

[1]
https://lore.kernel.org/all/[email protected]/T/#mb022959fe3b6e9b82e2b066dd8daa301cd2b2c53

On 10/26/21 10:12 PM, JeffleXu wrote:
> Hi,
>
> Recently I'm working on supporting per-file DAX for virtiofs [1]. Vivek
> Goyal and I are interested [2] why the default behavior has changed
> since introduction of per-file DAX on ext4 and xfs [3][4].
>
> That is, before the introduction of per-file DAX, when user doesn't
> specify '-o dax', DAX is disabled for all files. After supporting
> per-file DAX, when neither '-o dax' nor '-o dax=always|inode|never' is
> specified, it actually works in a '-o dax=inode' way if the underlying
> blkdev is DAX capable, i.e. depending on the persistent inode flag. That
> is, the default behavior has changed from user's perspective.
>
> We are not sure if this is intentional or not. Appreciate if anyone
> could offer some hint.
>
>
> [1] https://lore.kernel.org/all/[email protected]/T/
> [2]
> https://lore.kernel.org/all/[email protected]/T/#mf067498887ca2023c64c8b8f6aec879557eb28f8
> [3] 9cb20f94afcd2964944f9468e38da736ee855b19 ("fs/ext4: Make DAX mount
> option a tri-state")
> [4] 02beb2686ff964884756c581d513e103542dcc6a ("fs/xfs: Make DAX mount
> option a tri-state")
>
>

--
Thanks,
Jeffle

2021-10-27 21:24:30

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

On Wed, Oct 27, 2021 at 09:33:17AM +1100, Dave Chinner wrote:
> On Tue, Oct 26, 2021 at 03:25:51PM -0400, Vivek Goyal wrote:
> > On Tue, Oct 26, 2021 at 08:48:34AM -0700, Darrick J. Wong wrote:
> > > On Tue, Oct 26, 2021 at 10:12:17PM +0800, JeffleXu wrote:
> > > > Hi,
> > > >
> > > > Recently I'm working on supporting per-file DAX for virtiofs [1]. Vivek
> > > > Goyal and I are interested [2] why the default behavior has changed
> > > > since introduction of per-file DAX on ext4 and xfs [3][4].
> > > >
> > > > That is, before the introduction of per-file DAX, when user doesn't
> > > > specify '-o dax', DAX is disabled for all files. After supporting
> > > > per-file DAX, when neither '-o dax' nor '-o dax=always|inode|never' is
> > > > specified, it actually works in a '-o dax=inode' way if the underlying
> > > > blkdev is DAX capable, i.e. depending on the persistent inode flag. That
> > > > is, the default behavior has changed from user's perspective.
> > > >
> > > > We are not sure if this is intentional or not. Appreciate if anyone
> > > > could offer some hint.
> > >
> > > Yes, that was an intentional change to all three filesystems to make the
> > > steps we expose to sysadmins/users consistent and documented officially:
> > >
> > > https://lore.kernel.org/linux-fsdevel/[email protected]/
> >
> > Ok, so basically new dax options semantics are different from old "-o dax".
>
> Well, yes. "-o dax" is exactly equivalent of "-o dax=always", but it
> is deprecated and should be ignored for the purposes of a new FSDAX
> implementation. It will go away eventually.
>
> > - dax=inode is default. This is change of behavior from old "-o dax" where
> > default was *no dax* at all.
>
> No, it's not actually a change of behaviour. The default behaviour
> of a filesystem that supports DAX is identical when you have no
> mount option specified: if you've taken no action to enable DAX,
> then DAX will be disabled and not used.
>
> We originally implemented DAX with per-inode flags an no mount
> option in XFS - the mount option came along with the ext4 DAX
> implementation for testing because it didn't have on-disk inode
> flags for DAX.
>
> IOWs, the dax=inode default reflects how we originally intended
> FSDAX to be managed and how it originally behaved on XFS when no
> mount option was specified. Then we came across bugs in dynamically
> changing the per-inode DAX state, we temporarily disabled the
> on-disk flags on XFS (because EXPERIMENTAL). Then some people
> started incorrectly associated "no dax option" with "admin wants dax
> disabled". Then we fixed the bugs with changing on-disk inode flags,
> ext4 added an on-disk flag and we re-enabled them. The result was a
> tristate config situation - never, always and per-inode...
>
> > - I tried xfs and mount does not fail even if user mounted with
> > "-o dax=inode" and underlying block device does not support dax.
> > That's little strange. Some users might expect a failure if certain
> > mount option can't be enabled.
>

Hi Dave,

Thanks for all the explanaiton and background. It helps me a lot in
wrapping my head around the rationale for current design.

> It's perfectly reasonable. If the hardware doesn't support DAX, then
> we just always behave as if dax=never is set.

I tried mounting non-DAX block device with dax=always and it failed
saying DAX can't be used with reflink.

[ 100.371978] XFS (vdb): DAX unsupported by block device. Turning off DAX.
[ 100.374185] XFS (vdb): DAX and reflink cannot be used together!

So looks like first check tried to fallback to dax=never as device does
not support DAX. But later reflink check thought dax is enabled and
did not fallback to dax=never.

> IOWs, we simply ignore
> the inode DAX hint, because we can never enable it. There's just no
> reason for being obnoxious and rejecting mounts just because the
> block device doesn't support DAX.
>
> Then we have to consider filesystems with multiple block devices
> that have different DAX capabilities. dax=inode has to transparently
> become dax=never for the block devices that don't support DAX, but
> still operate as dax=inode for the other DAX capable block devices.
>
> We also have to consider that block devices can change configuration
> dynamically - think about teired storage (e.g a dm-cache device)
> that has pmem for hot access and SSDs for backing store. That could
> mean we have DAX capable access for hot data, but not for cold data
> fetched/stored from SSD. We might manually migrate data between
> teirs and so the dax capability of the data sets can change
> dynamically.
>
> The actual presence of DAX should be completely transparent and
> irrelevant to the application unless the application is specifically
> dependent on DAX being enabled (i.e. using application level CPU
> cache flushes for data integrity purposes). If so, it is up to the
> application to check STATX_ATTR_DAX on it's open files and refuse to
> operate.
>
> > So in general, what's the expected behavior with filesystem mount
> > options. If user passes a mount option and it can't be enabled,
> > should filesystem return error and force user to try again without
> > the certain mount option or silently fallback to something else.
> >
> > I think in the past I have come across overlayfs users which demanded
> > that mount fails if certain overlayfs option they have passed in
> > can't be honored. They want to know about it so that they can either
> > fix the configuration or change mount option.
>
> It's up to the filesystem how it handles mount options. XFS just
> turns dax=always into dax=never with a warning and continues
> onwards. There's no point in forcing the user to mount with a
> different mount option - the end result is going to be exactly the
> same (i.e. dax=never because hardware doesn't support it).
>
> > - With xfs, I mounted /dev/pmem0 with "-o dax=inode" and checked
> > /proc/mounts and I don't see "dax=inode" there. Is that intentional?
>
> Yes, XFS policy is to elide default options from the mount table.

For my education purposes, why do we hide default options. These defaults
can vary from system to system based on kernel version or based on
kernel CONFIG options. So if I login into a system and try to figure
out what defaults xfs (or any other filesystem is working with), I
probably will have no idea. Looking at /proc/mounts still might help
me a bit with debugging what filesystem might be doing.

IOW, I thought that from debugging point of view it can be very helpful
to even show default options. But there must be reasons to hide defaults
that I am not aware of.

>
> > So following is the side affects of behavior change.
> >
> > A. If somebody wrote scripts and scanned for mount flags to decide whehter
> > dax is enabled or not, these will not work anymore.
>
> Correct - this was never supported in the first place and we went
> through this years ago with intel doing dodgy things like this in their
> userspace library that we never intended to support.
>
> > scripts will have
> > to be changed to stat() every file in filesystem and look for
> > STATX_ATTR_DAX flag to determine dax status.
>
> If you are scanning the filesystem for "DAX capability" then you are
> doing it wrong. DAX capability is a property of the underlying block
> device:
>
> $ cat /sys/block/pmem0/queue/dax
> 1
> $
>
> And that tells you if the filesystem is on DAX capable hardware
> and hence can be used if the admin and/or application turns it on.
>
> > I would have thought to not make dax=inode default and let user opt-in
> > for that using "dax=inode" mount option. But I guess people liked
> > dax=inode default better.
> >
> > Anway, I guess if we want to keep the behavior of virtiofs in-line with
> > ext4/xfs, we might have to make dax=inode default (atleast in client).
>
> Yes, I've been asking you to make dax=inode the default for some
> time now.
>
> > Server default might be different because querying the state of
> > FS_XFLAG_DAX is extra ioctl() call on each LOOKUP and GETATTR call and
> > those who don't want to use DAX, might not want to pay this cost.
>
> The admin cost of managing per-inode/per-directory DAX capability is
> negliable. It's a "set-once" operation done at application
> installation/configuration/data set initialisation time, and never
> changed again.

Agreed.

> Performance/overhead arguments for per-inode flag
> admin hold no water at all.

Not sure I understand this. If we make dax=inode default in server, then
server will always have to call ioctl() to figure out if FS_XFLAG_DAX
is set or not on LOOKUP and GETATTR calls. So that's one extra system
call all the time. That's not an admin overhead but runtime overhead
of virtiofs file server. And if user never intends to use DAX, then
there is no need to have this overhead (by default).

Thanks
Vivek

2021-10-27 21:26:46

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

On Tue, Oct 26, 2021 at 01:57:31PM -0700, Ira Weiny wrote:
> On Tue, Oct 26, 2021 at 03:25:51PM -0400, Vivek Goyal wrote:
> > On Tue, Oct 26, 2021 at 08:48:34AM -0700, Darrick J. Wong wrote:
> > > On Tue, Oct 26, 2021 at 10:12:17PM +0800, JeffleXu wrote:
> > > > Hi,
> > > >
> > > > Recently I'm working on supporting per-file DAX for virtiofs [1]. Vivek
> > > > Goyal and I are interested [2] why the default behavior has changed
> > > > since introduction of per-file DAX on ext4 and xfs [3][4].
> > > >
> > > > That is, before the introduction of per-file DAX, when user doesn't
> > > > specify '-o dax', DAX is disabled for all files. After supporting
> > > > per-file DAX, when neither '-o dax' nor '-o dax=always|inode|never' is
> > > > specified, it actually works in a '-o dax=inode' way if the underlying
> > > > blkdev is DAX capable, i.e. depending on the persistent inode flag. That
> > > > is, the default behavior has changed from user's perspective.
> > > >
> > > > We are not sure if this is intentional or not. Appreciate if anyone
> > > > could offer some hint.
> > >
> > > Yes, that was an intentional change to all three filesystems to make the
> > > steps we expose to sysadmins/users consistent and documented officially:
> > >
> > > https://lore.kernel.org/linux-fsdevel/[email protected]/
> >
> > Ok, so basically new dax options semantics are different from old "-o dax".
> >
> > - dax=inode is default. This is change of behavior from old "-o dax" where
> > default was *no dax* at all.
>
> Again I think this is debatable. The file system will still work and all files
> will, by default, _not_ use DAX. Specifying '-o dax' or future proof '-o
> dax=always' will override this just like it did before.
>
> >
> > - I tried xfs and mount does not fail even if user mounted with
> > "-o dax=inode" and underlying block device does not support dax.
> > That's little strange. Some users might expect a failure if certain
> > mount option can't be enabled.
>
> But the mount option _is_ enabled. It is just that the files can't be used in
> DAX mode. The files can still have their inode flag set. This was
> specifically discussed to support backing up file systems on devices which did
> not support DAX. This allows you to restore that file system with all the
> proper inode flags in place.
>
> If a DAX file is on a non-DAX device the file will not be in DAX mode when
> opened. A statx() call can determine this.
>
> >
> > So in general, what's the expected behavior with filesystem mount
> > options. If user passes a mount option and it can't be enabled,
> > should filesystem return error and force user to try again without
> > the certain mount option or silently fallback to something else.
>
> But it is enabled.
>
> >
> > I think in the past I have come across overlayfs users which demanded
> > that mount fails if certain overlayfs option they have passed in
> > can't be honored. They want to know about it so that they can either
> > fix the configuration or change mount option.
>
> I understand how this is a bit convoluted. However, for 99% of the users out
> there who are using DAX on DAX devices this is not going to change anything for
> them. (Especially since they are all probably using '-o dax').
>
> >
> > - With xfs, I mounted /dev/pmem0 with "-o dax=inode" and checked
> > /proc/mounts and I don't see "dax=inode" there. Is that intentional?
>
> Yes absolutely. I originally implemented it to show dax=inode and was told
> that default mount options were not to be shown. After thinking about it I
> agreed. It is intractable to print out all the mount options which are
> defaulted. The user can read what the defaults are an know what the file
> system is using for options which are not overridden.
>
> >
> > I am just trying to wrap my head around the new semantics as we are
> > trying to implement those for virtiofs.
> >
> > So following is the side affects of behavior change.
> >
> > A. If somebody wrote scripts and scanned for mount flags to decide whehter
> > dax is enabled or not, these will not work anymore. scripts will have
> > to be changed to stat() every file in filesystem and look for
> > STATX_ATTR_DAX flag to determine dax status.
>
> Why would you need to stat() 'every' file? Why, and to who, is it important
> that every file in the file system is in dax mode?

Only for debugging purpose say I want to know which mounts/files are using
DAX.

> I was getting the feeling
> that it was important to the client to know this on the server but you last
> email in the other thread has confused me on that point.[1]
>
>
> >
> > I would have thought to not make dax=inode default and let user opt-in
> > for that using "dax=inode" mount option. But I guess people liked
> > dax=inode default better.
>
> Yes, because it gives the _end_ user (not the sys-admin) the control on their
> individual files.

Well, I can argue that _end_ user should get that control only if sysadmin
allows that. If sysadmin did not enable reflink feature, end user does not
get to use it.

Anyway, I think idea is that you did not want sys-admin to have to mount
xfs instance with dax=inode and have this feature be enabled by default
and that's why.

>
> >
> > Anway, I guess if we want to keep the behavior of virtiofs in-line with
> > ext4/xfs, we might have to make dax=inode default (atleast in client).
>
> Yes, I think we should make dax=inode the default.
>
> > Server default might be different because querying the state of
> > FS_XFLAG_DAX is extra ioctl() call on each LOOKUP and GETATTR call and
> > those who don't want to use DAX, might not want to pay this cost.
>
> I've not responded on the other thread because I feel like I've reached the
> depth of my virtiofs knowledge. From your email:
>
> "In general, there is no connection between DAX in guest and device on
> host enabling DAX."[1]
>
> But then you say:
>
> "... if server does not support DAX and client asks for DAX, mount will
> fail. (As it should fail)."[1]

When I say server I mean "virtiofs daemon + qemu" combination. So qemu has
to specify that virtiofs device has a cache using option
"cache-size=<X>G". If this is missing, DAX can't be enabled in guest. This
is somewhat similar to whether underlying block device supports DAX or
not. In this case it signifies whether virtiofs device supports DAX or
not.

And this has nothing to do with host devieces capability of being able to
do DAX.

>
> So I decided I need to review the virtiofs code a bit to better understand this
> relationship. Because I'm confused.
>
> As to the subject of having a file based policy; any such policy is not the
> kernels job. If users want to have different policies based on file size they
> are free to do that with dax=inode. I don't see how that works with the other
> 2 mount options.

Right. File based policy will be in user space (virtiofsd). It will not
be part of kernel. In fat guest kernel will not even know what policy
is being used by server. Server will tell guest kernel whether to
enable DAX on a inode or not.

>
> Furthermore, performance of different files may be device specific and moving a
> file from one device to another may result in the user wanting to change the
> mode, which dax=inode allows.
>
> All of this this supports dax=inode as a better default.
>
> I get the feeling that your most concerned with the admin user being able to
> see if the entire file system is in DAX mode. Is that true? And I can't argue
> that indeed that is different.

Right. I am looking at new dax options from the lens of old dax options
where I could easily look at mount options and tell whether filesystem
is using dax or not. And dax was disabled by default and user had to
opt-in to enable dax.

> But I'm failing to see the use case for that
> being a requirement.

Agreed that this is not necessarily a requirement. It is just little
different from old dax options. And I was only worried about users
being confused.

>
> Is the biggest issue the lack of visibility to see if the device supports DAX?

Not necessarily. I think for me two biggest issues are.

- Should dax be enabled by default in server as well. If we do that,
server will have to make extra ioctl() call on every LOOKUP and GETATTR
fuse request. Local filesystems probably can easily query FS_XFLAGS_DAX
state but doing extra syscall all the time will probably be some cost
(No idea how much).

- So far if virtiofs is mounted without any of the dax options, just
by looking at mount option, I could tell, DAX is not enabled on any
of the files. But that will not be true anymore. Because dax=inode
be default, it is possible that server upgrade enabled dax on some
or all the files.

I guess I will have to stick to same reason given by ext4/xfs. That is
to determine whether DAX is enabled on a file or not, you need to
query STATX_ATTR_DAX flag. That's the only way to conclude if DAX is
being used on a file or not. Don't look at filesystem mount options
and reach a conclusion (except the case of dax=never).

Thanks
Vivek

>
> Ira
>
> [1] https://lore.kernel.org/linux-fsdevel/[email protected]/
>

2021-10-28 05:53:31

by Jingbo Xu

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX



On 10/27/21 10:36 PM, Vivek Goyal wrote:
> [snip]
>
>>
>> Is the biggest issue the lack of visibility to see if the device supports DAX?
>
> Not necessarily. I think for me two biggest issues are.
>
> - Should dax be enabled by default in server as well. If we do that,
> server will have to make extra ioctl() call on every LOOKUP and GETATTR
> fuse request. Local filesystems probably can easily query FS_XFLAGS_DAX
> state but doing extra syscall all the time will probably be some cost
> (No idea how much).

I tested the time cost from virtiofsd's perspective (time cost of
passthrough_ll.c:lo_do_lookup()):
- before per inode DAX feature: 2~4 us
- after per inode DAX feature: 7~8 us

It is within expectation, as the introduction of per inode DAX feature,
one extra ioctl() system call is introduced.

Also the time cost from client's perspective (time cost of
fs/fuse/dir.c:fuse_lookup_name())
- before per inode DAX feature: 25~30 us
- after per inode DAX feature: 30~35 us

That is, ~15%~20% performance loss.

Currently we do ioctl() to query the persitent inode flags every time
FUSE_LOOKUP request is received, maybe we could cache the result of
ioctl() on virtiofsd side, but I have no idea how to intercept the
runtime modification to these persistent indoe flags from other
processes on host, e.g. sysadmin on host, to maintain the cache consistency.

So if the default behavior of client side is 'dax=inode', and virtiofsd
disables per inode DAX by default (neither '-o dax=server|attr' is
specified for virtiofsd) for the sake of performance, then guest won't
see DAX enabled and thus won't be surprised. This can reduce the
behavior change to the minimum.


>
> - So far if virtiofs is mounted without any of the dax options, just
> by looking at mount option, I could tell, DAX is not enabled on any
> of the files. But that will not be true anymore. Because dax=inode
> be default, it is possible that server upgrade enabled dax on some
> or all the files.
>
> I guess I will have to stick to same reason given by ext4/xfs. That is
> to determine whether DAX is enabled on a file or not, you need to
> query STATX_ATTR_DAX flag. That's the only way to conclude if DAX is
> being used on a file or not. Don't look at filesystem mount options
> and reach a conclusion (except the case of dax=never).


--
Thanks,
Jeffle

2021-10-28 13:02:16

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

On Thu, Oct 28, 2021 at 01:52:27PM +0800, JeffleXu wrote:
>
>
> On 10/27/21 10:36 PM, Vivek Goyal wrote:
> > [snip]
> >
> >>
> >> Is the biggest issue the lack of visibility to see if the device supports DAX?
> >
> > Not necessarily. I think for me two biggest issues are.
> >
> > - Should dax be enabled by default in server as well. If we do that,
> > server will have to make extra ioctl() call on every LOOKUP and GETATTR
> > fuse request. Local filesystems probably can easily query FS_XFLAGS_DAX
> > state but doing extra syscall all the time will probably be some cost
> > (No idea how much).
>
> I tested the time cost from virtiofsd's perspective (time cost of
> passthrough_ll.c:lo_do_lookup()):
> - before per inode DAX feature: 2~4 us
> - after per inode DAX feature: 7~8 us
>
> It is within expectation, as the introduction of per inode DAX feature,
> one extra ioctl() system call is introduced.
>
> Also the time cost from client's perspective (time cost of
> fs/fuse/dir.c:fuse_lookup_name())
> - before per inode DAX feature: 25~30 us
> - after per inode DAX feature: 30~35 us
>
> That is, ~15%~20% performance loss.

Hi Jeffle,

Thanks for measuring the performance impact of enabling dax=inode by
default in server.

>
> Currently we do ioctl() to query the persitent inode flags every time
> FUSE_LOOKUP request is received, maybe we could cache the result of
> ioctl() on virtiofsd side, but I have no idea how to intercept the
> runtime modification to these persistent indoe flags from other
> processes on host, e.g. sysadmin on host, to maintain the cache consistency.
>
> So if the default behavior of client side is 'dax=inode', and virtiofsd
> disables per inode DAX by default (neither '-o dax=server|attr' is
> specified for virtiofsd) for the sake of performance, then guest won't
> see DAX enabled and thus won't be surprised. This can reduce the
> behavior change to the minimum.

Agreed. Lets not enable any dax by default in server and let admin/user
enable dax explicitly in server. From fuse client perspective, we can
assume dax=inode by default. That way kernel side behavior will be
similar to ext4/xfs (as long as server has been started with per
inode dax policy).

Vivek
>
>
> >
> > - So far if virtiofs is mounted without any of the dax options, just
> > by looking at mount option, I could tell, DAX is not enabled on any
> > of the files. But that will not be true anymore. Because dax=inode
> > be default, it is possible that server upgrade enabled dax on some
> > or all the files.
> >
> > I guess I will have to stick to same reason given by ext4/xfs. That is
> > to determine whether DAX is enabled on a file or not, you need to
> > query STATX_ATTR_DAX flag. That's the only way to conclude if DAX is
> > being used on a file or not. Don't look at filesystem mount options
> > and reach a conclusion (except the case of dax=never).
>
>
> --
> Thanks,
> Jeffle
>

2021-10-28 16:32:49

by Eric Sandeen

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

On 10/27/21 8:14 AM, Vivek Goyal wrote:
> On Wed, Oct 27, 2021 at 09:33:17AM +1100, Dave Chinner wrote:

...

> Hi Dave,
>
> Thanks for all the explanaiton and background. It helps me a lot in
> wrapping my head around the rationale for current design.
>
>> It's perfectly reasonable. If the hardware doesn't support DAX, then
>> we just always behave as if dax=never is set.
>
> I tried mounting non-DAX block device with dax=always and it failed
> saying DAX can't be used with reflink.
>
> [ 100.371978] XFS (vdb): DAX unsupported by block device. Turning off DAX.
> [ 100.374185] XFS (vdb): DAX and reflink cannot be used together!
>
> So looks like first check tried to fallback to dax=never as device does
> not support DAX. But later reflink check thought dax is enabled and
> did not fallback to dax=never.

We need to think hard about this stuff and audit it to be sure.

But, I think that reflink check should probably just be removed, now that
DAX files and reflinked files can co-exist on a filesystem - it's just
that they can't both be active on the /same file/.

I think that even "dax=always" is still just "advisory" - it means,
try to enable dax on every file. It may still fail in the same ways as
dax=inode (default) + flag set may fail.

But ... we should go through the whole mount option / feature set /
device capability logic to be sure this is all consistent. Thanks for
pointing it out!

-Eric

>> IO

2021-10-28 16:43:17

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

On Thu, Oct 28, 2021 at 11:29:08AM -0500, Eric Sandeen wrote:
> On 10/27/21 8:14 AM, Vivek Goyal wrote:
> > On Wed, Oct 27, 2021 at 09:33:17AM +1100, Dave Chinner wrote:
>
> ...
>
> > Hi Dave,
> >
> > Thanks for all the explanaiton and background. It helps me a lot in
> > wrapping my head around the rationale for current design.
> >
> > > It's perfectly reasonable. If the hardware doesn't support DAX, then
> > > we just always behave as if dax=never is set.
> >
> > I tried mounting non-DAX block device with dax=always and it failed
> > saying DAX can't be used with reflink.
> >
> > [ 100.371978] XFS (vdb): DAX unsupported by block device. Turning off DAX.
> > [ 100.374185] XFS (vdb): DAX and reflink cannot be used together!
> >
> > So looks like first check tried to fallback to dax=never as device does
> > not support DAX. But later reflink check thought dax is enabled and
> > did not fallback to dax=never.
>
> We need to think hard about this stuff and audit it to be sure.
>
> But, I think that reflink check should probably just be removed, now that
> DAX files and reflinked files can co-exist on a filesystem - it's just
> that they can't both be active on the /same file/.
>
> I think that even "dax=always" is still just "advisory" - it means,
> try to enable dax on every file. It may still fail in the same ways as
> dax=inode (default) + flag set may fail.
>
> But ... we should go through the whole mount option / feature set /
> device capability logic to be sure this is all consistent. Thanks for
> pointing it out!

I was rather hoping that we'd solve this problem by helping Shiyang get
his two patchsets landed, and then we can eliminate the dax+reflink
check entirely.

[1] (dax poison notifications via rmap V7)
https://lore.kernel.org/linux-xfs/[email protected]/
[2] (reflink + dax V10)
https://lore.kernel.org/linux-xfs/[email protected]/

(The second patchset is AFAICT ready to go, but we still need to iron
out the difficulties pointed out in the last review of patchset #1)

--D

> -Eric
>
> > > IO

2021-10-28 18:24:40

by Ira Weiny

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

On Thu, Oct 28, 2021 at 01:52:27PM +0800, JeffleXu wrote:
>
>
> On 10/27/21 10:36 PM, Vivek Goyal wrote:
> > [snip]
> >
> >>
> >> Is the biggest issue the lack of visibility to see if the device supports DAX?
> >
> > Not necessarily. I think for me two biggest issues are.
> >
> > - Should dax be enabled by default in server as well. If we do that,
> > server will have to make extra ioctl() call on every LOOKUP and GETATTR
> > fuse request. Local filesystems probably can easily query FS_XFLAGS_DAX
> > state but doing extra syscall all the time will probably be some cost
> > (No idea how much).
>
> I tested the time cost from virtiofsd's perspective (time cost of
> passthrough_ll.c:lo_do_lookup()):
> - before per inode DAX feature: 2~4 us
> - after per inode DAX feature: 7~8 us
>
> It is within expectation, as the introduction of per inode DAX feature,
> one extra ioctl() system call is introduced.
>
> Also the time cost from client's perspective (time cost of
> fs/fuse/dir.c:fuse_lookup_name())
> - before per inode DAX feature: 25~30 us
> - after per inode DAX feature: 30~35 us
>
> That is, ~15%~20% performance loss.
>
> Currently we do ioctl() to query the persitent inode flags every time
> FUSE_LOOKUP request is received, maybe we could cache the result of
> ioctl() on virtiofsd side, but I have no idea how to intercept the
> runtime modification to these persistent indoe flags from other
> processes on host, e.g. sysadmin on host, to maintain the cache consistency.
>

Do you really expect the dax flag to change on individual files a lot? This in
itself is an expensive operation as the FS has to flush the inode.

>
> So if the default behavior of client side is 'dax=inode', and virtiofsd
> disables per inode DAX by default (neither '-o dax=server|attr' is

I'm not following what dax=server or dax=attr is?

> specified for virtiofsd) for the sake of performance, then guest won't
> see DAX enabled and thus won't be surprised. This can reduce the
> behavior change to the minimum.
>

What processes, other than virtiofsd have 'control' of these files?

I know that a sysadmin could come in and change the dax flag but I think that
is like saying a sys-admin can come in and change your .bashrc and your
environment is suddenly different. We have to trust the admins not to do stuff
like that. So I don't think admins are going to be changing the dax flag on
files out from under 'users'; in this case virtiofsd. Right?

That means that virtiofsd could cache the status and avoid the performance
issues above correct?

Ira

>
> >
> > - So far if virtiofs is mounted without any of the dax options, just
> > by looking at mount option, I could tell, DAX is not enabled on any
> > of the files. But that will not be true anymore. Because dax=inode
> > be default, it is possible that server upgrade enabled dax on some
> > or all the files.
> >
> > I guess I will have to stick to same reason given by ext4/xfs. That is
> > to determine whether DAX is enabled on a file or not, you need to
> > query STATX_ATTR_DAX flag. That's the only way to conclude if DAX is
> > being used on a file or not. Don't look at filesystem mount options
> > and reach a conclusion (except the case of dax=never).
>
>
> --
> Thanks,
> Jeffle

2021-10-28 19:20:26

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Question] ext4/xfs: Default behavior changed after per-file DAX

On Thu, Oct 28, 2021 at 11:24:08AM -0700, Ira Weiny wrote:
> On Thu, Oct 28, 2021 at 01:52:27PM +0800, JeffleXu wrote:
> >
> >
> > On 10/27/21 10:36 PM, Vivek Goyal wrote:
> > > [snip]
> > >
> > >>
> > >> Is the biggest issue the lack of visibility to see if the device supports DAX?
> > >
> > > Not necessarily. I think for me two biggest issues are.
> > >
> > > - Should dax be enabled by default in server as well. If we do that,
> > > server will have to make extra ioctl() call on every LOOKUP and GETATTR
> > > fuse request. Local filesystems probably can easily query FS_XFLAGS_DAX
> > > state but doing extra syscall all the time will probably be some cost
> > > (No idea how much).
> >
> > I tested the time cost from virtiofsd's perspective (time cost of
> > passthrough_ll.c:lo_do_lookup()):
> > - before per inode DAX feature: 2~4 us
> > - after per inode DAX feature: 7~8 us
> >
> > It is within expectation, as the introduction of per inode DAX feature,
> > one extra ioctl() system call is introduced.
> >
> > Also the time cost from client's perspective (time cost of
> > fs/fuse/dir.c:fuse_lookup_name())
> > - before per inode DAX feature: 25~30 us
> > - after per inode DAX feature: 30~35 us
> >
> > That is, ~15%~20% performance loss.
> >
> > Currently we do ioctl() to query the persitent inode flags every time
> > FUSE_LOOKUP request is received, maybe we could cache the result of
> > ioctl() on virtiofsd side, but I have no idea how to intercept the
> > runtime modification to these persistent indoe flags from other
> > processes on host, e.g. sysadmin on host, to maintain the cache consistency.
> >
>
> Do you really expect the dax flag to change on individual files a lot? This in
> itself is an expensive operation as the FS has to flush the inode.

No, we do not expect it to change often. But in a shared filesystem it
could be changed by somebody else. So we can't cache it in virtiofsd.
Even if we cache it we will need mechanism to invalidate cache if
some other client changed it.

>
> >
> > So if the default behavior of client side is 'dax=inode', and virtiofsd
> > disables per inode DAX by default (neither '-o dax=server|attr' is
>
> I'm not following what dax=server or dax=attr is?

These are just the virtiofs daemon option names we are considering to
allow daemon to switch between different kind of policies. These names
are not final. As of now dax=attr is suggesting that look for FS_XFLAG_DAX
flag on inode and enable DAX on inode accordingly. dax=server means
that server can choose other policy to enable/disable DAX on an inode
(and can ignore FS_XFLAG_DAX).

>
> > specified for virtiofsd) for the sake of performance, then guest won't
> > see DAX enabled and thus won't be surprised. This can reduce the
> > behavior change to the minimum.
> >
>
> What processes, other than virtiofsd have 'control' of these files?

Guest process or user can change these flags. virtiofsd is not going
to modify this flag. It will just query this flag and respond to client
to enable DAX if this flag/attr is set on inode.

>
> I know that a sysadmin could come in and change the dax flag but I think that
> is like saying a sys-admin can come in and change your .bashrc and your
> environment is suddenly different. We have to trust the admins not to do stuff
> like that. So I don't think admins are going to be changing the dax flag on
> files out from under 'users'; in this case virtiofsd. Right?

Right. Generally I don't expect that on host anybody will change it. But I
will not rule it out because host is the one preparing initial filesystem
for the guest and if admin/tools on host want to set FS_XFLAG_DAX on
some of the inodes to begin with, so be it. Guest will boot with that
initial filesystem state.

>
> That means that virtiofsd could cache the status and avoid the performance
> issues above correct?

This directory could be shared also. That means multiple guests are
sharing same directory (each guest has one corresponding virtiofsd
instance running). That means if one guest changes the property of
one of the files, other guests/virtiofsd will have no idea that property
has changed.

Vivek

>
> Ira
>
> >
> > >
> > > - So far if virtiofs is mounted without any of the dax options, just
> > > by looking at mount option, I could tell, DAX is not enabled on any
> > > of the files. But that will not be true anymore. Because dax=inode
> > > be default, it is possible that server upgrade enabled dax on some
> > > or all the files.
> > >
> > > I guess I will have to stick to same reason given by ext4/xfs. That is
> > > to determine whether DAX is enabled on a file or not, you need to
> > > query STATX_ATTR_DAX flag. That's the only way to conclude if DAX is
> > > being used on a file or not. Don't look at filesystem mount options
> > > and reach a conclusion (except the case of dax=never).
> >
> >
> > --
> > Thanks,
> > Jeffle
>