These two fixes are to the now-disabled per-inode DAX support in XFS. I
started working on these in the v4.13 timeframe before the per-inode
DAX feature was turned off.
I realize that they may not be critical right now since the per-inode DAX
feature is turned off, but I think that if we ever do re-enable that
feature we will want these two issues to have been fixed.
I also do think that they are both vaulable for stable kernels.
These used to be part of my ext4 per-inode enabling:
https://lkml.org/lkml/2017/9/5/748
But I've since split that into a series of XFS bug fixes (these two) and a
series of ext4 bug fixes.
Ross Zwisler (2):
xfs: always use DAX if mount option is used
xfs: validate bdev support for DAX inode flag
fs/xfs/xfs_ioctl.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
--
2.9.5
Before support for the per-inode DAX flag was disabled the XFS the code had
an issue where the user couldn't reliably tell whether or not DAX was being
used to service page faults and I/O when the DAX mount option was used. In
this case each inode within the mounted filesystem started with S_DAX set
due to the mount option, but it could be cleared if someone touched the
individual inode flag.
For example (v4.13 and before):
# mount | grep dax
/dev/pmem0 on /mnt type xfs
(rw,relatime,seclabel,attr2,dax,inode64,sunit=4096,swidth=4096,noquota)
# touch /mnt/a /mnt/b # both files currently use DAX
# xfs_io -c "lsattr" /mnt/* # neither has the DAX inode option set
----------e----- /mnt/a
----------e----- /mnt/b
# xfs_io -c "chattr -x" /mnt/a # this clears S_DAX for /mnt/a
# xfs_io -c "lsattr" /mnt/*
----------e----- /mnt/a
----------e----- /mnt/b
We end up with both /mnt/a and /mnt/b looking identical from the point of
view of the mount option and from lsattr, but one is using DAX and the
other is not.
Fix this by always doing DAX I/O when either the mount option is set or
when the DAX inode flag is set. This means that DAX will always be used
for all inodes on a filesystem mounted with -o dax, making the usage
reliable and detectable.
This does not fix the race issues that caused the XFS DAX inode option to
be disabled, so that option will still be disabled. If/when we re-enable
it, though, I think we will want this issue to have been fixed. I also do
think that we want to fix this in stable kernels.
Signed-off-by: Ross Zwisler <[email protected]>
CC: [email protected]
---
fs/xfs/xfs_ioctl.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 5049e8a..26faeb9 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1013,7 +1013,7 @@ xfs_diflags_to_linux(
else
inode->i_flags &= ~S_NOATIME;
#if 0 /* disabled until the flag switching races are sorted out */
- if (xflags & FS_XFLAG_DAX)
+ if ((xflags & FS_XFLAG_DAX) || (ip->i_mount->m_flags & XFS_MOUNT_DAX))
inode->i_flags |= S_DAX;
else
inode->i_flags &= ~S_DAX;
@@ -1104,7 +1104,14 @@ xfs_ioctl_setattr_dax_invalidate(
return -EINVAL;
}
- /* If the DAX state is not changing, we have nothing to do here. */
+ /*
+ * If the DAX state is not changing, we have nothing to do here. If
+ * the DAX mount option was used we will update the DAX inode flag as
+ * the user requested but we will continue to use DAX for I/O and page
+ * faults regardless of how the inode flag is set.
+ */
+ if (ip->i_mount->m_flags & XFS_MOUNT_DAX)
+ return 0;
if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
return 0;
if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
--
2.9.5
Currently only the blocksize is checked, but we should really be calling
bdev_dax_supported() which also tests to make sure we can get a
struct dax_device and that the dax_direct_access() path is working.
This is the same check that we do for the "-o dax" mount option in
xfs_fs_fill_super().
This does not fix the race issues that caused the XFS DAX inode option to
be disabled, so that option will still be disabled. If/when we re-enable
it, though, I think we will want this issue to have been fixed. I also do
think that we want to fix this in stable kernels.
Signed-off-by: Ross Zwisler <[email protected]>
CC: [email protected]
---
fs/xfs/xfs_ioctl.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 26faeb9..0433aef 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1088,6 +1088,7 @@ xfs_ioctl_setattr_dax_invalidate(
int *join_flags)
{
struct inode *inode = VFS_I(ip);
+ struct super_block *sb = inode->i_sb;
int error;
*join_flags = 0;
@@ -1100,7 +1101,7 @@ xfs_ioctl_setattr_dax_invalidate(
if (fa->fsx_xflags & FS_XFLAG_DAX) {
if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
return -EINVAL;
- if (ip->i_mount->m_sb.sb_blocksize != PAGE_SIZE)
+ if (bdev_dax_supported(sb, sb->s_blocksize) < 0)
return -EINVAL;
}
--
2.9.5
On Thu, Sep 07, 2017 at 03:08:31PM -0600, Ross Zwisler wrote:
> Before support for the per-inode DAX flag was disabled the XFS the code had
> an issue where the user couldn't reliably tell whether or not DAX was being
> used to service page faults and I/O when the DAX mount option was used. In
> this case each inode within the mounted filesystem started with S_DAX set
> due to the mount option, but it could be cleared if someone touched the
> individual inode flag.
Looks good, but can you please add a testcase to xfstests for this?
Reviewed-by: Christoph Hellwig <[email protected]>
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
On Fri, Sep 08, 2017 at 12:20:28AM -0700, Christoph Hellwig wrote:
> On Thu, Sep 07, 2017 at 03:08:31PM -0600, Ross Zwisler wrote:
> > Before support for the per-inode DAX flag was disabled the XFS the code had
> > an issue where the user couldn't reliably tell whether or not DAX was being
> > used to service page faults and I/O when the DAX mount option was used. In
> > this case each inode within the mounted filesystem started with S_DAX set
> > due to the mount option, but it could be cleared if someone touched the
> > individual inode flag.
>
> Looks good, but can you please add a testcase to xfstests for this?
Hmm...aside from looking at tracepoints, I'm not sure how to detect whether or
not S_DAX is actually being used for an inode. I don't see any other xfstests
that try and look at the trace buffer, but I guess that could work.
Do you know of a better way to detect this test failure/success?
> Reviewed-by: Christoph Hellwig <[email protected]>
Thanks for the review!