2020-04-28 00:22:28

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11

From: Ira Weiny <[email protected]>

Changes from V10:
Update the documentaiton quite a bit
Rename mark_inode_dontcache() to d_mark_dontcache()
move d_mark_dontcache() to fs/dcache.c
Move show options to xfs_info_set array
Collect reviews.

At LSF/MM'19 [1] [2] we discussed applications that overestimate memory
consumption due to their inability to detect whether the kernel will
instantiate page cache for a file, and cases where a global dax enable via a
mount option is too coarse.

The following patch series enables the use of DAX on individual files and/or
directories on xfs, and lays some groundwork to do so in ext4. It further
enhances the dax mount option to be a tri-state of 'always', 'never', or
'iflag' (default). Furthermore, it maintians '-o dax' to be equivalent to '-o
dax=always'.

The insight at LSF/MM was to separate the per-mount or per-file "physical"
(FS_XFLAG_DAX) capability switch from an "effective" (S_DAX) attribute for the
file.

At LSF/MM we discussed the difficulties of switching the DAX state of a file
with active mappings / page cache. It was thought the races could be avoided
by limiting DAX state flips to 0-length files.

However, this turns out to not be true.[3][5] This is because address space
operations (a_ops) may be in use at any time the inode is referenced.

For this reason direct manipulation of the FS_XFLAG_DAX is allowed but the
operation of the file (S_DAX) is not immediately changed.

Details of when and how DAX state can be changed on a file is included in a
documentation patch.

It should be noted that FS_XFLAG_DAX inheritance is not shown in this patch set
as it was maintained from previous work on XFS. FS_XFLAG_DAX and it's
inheritance will need to be added to other file systems for user control.


[1] https://lwn.net/Articles/787973/
[2] https://lwn.net/Articles/787233/
[3] https://lkml.org/lkml/2019/10/20/96
[4] https://patchwork.kernel.org/patch/11310511/
[5] https://lore.kernel.org/lkml/[email protected]/


To: [email protected]
Cc: Al Viro <[email protected]>
Cc: "Darrick J. Wong" <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "Theodore Y. Ts'o" <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Changes from V9:
Slight reorder of series to put Documentation sooner
modify i_state under i_lock
Change name of xfs_ioctl_setattr_dax_invalidate() ->
xfs_ioctl_setattr_prepare_dax()
Do not report default dax=inode mount mode
Move XFS_IEOFBLOCKS to '9'
Fixup some doc typos
Fix xfs style indentation
Fix commit mispelling

Changes from V8:
Rebase to 5.7-rc2
Change ALWAYS/NEVER bits to be 26/27
Remove XFS_IDONTCACHE -> lift to I_DONTCACHE
use mark_inode_dontcache() in XFS
create xfs_dax_mode enum
use xfs signature styles
Change xfs_inode_enabe_dax() -> xfs_inode_should_enable()
Based on feedback to ext4 series
Fix locking of DCACHE_DONTCACHE
Change flag_inode_dontcache() -> mark_inode_dontcache()
Change xfs_ioctl_setattr_dax_invalidate() -> xfs_ioctl_dax_check_set_cache()
Documentation cleanups
Clean up all commit messages

Changes from V7:
Add DCACHE_DONTCACHE
If mount override don't worry about inode caching
Change mount flags to NEVER/ALWAYS
Clean up xfs_inode_enable_dax()
Clarify comments
Update documentation

Changes from V6:
Incorporate feedback on patches
Add ability to change FS_XFLAG_DAX on files at any time.
Add a don't cache flag to the VFS layer
Preserve internal XFS IDONTCACHE behavior for bulkstat
operations

Changes from V5:
* make dax mount option a tri-state
* Reject changes to FS_XFLAG_DAX for regular files
- Allow only on directories
* Update documentation

Changes from V4:
* Open code the aops lock rather than add it to the xfs_ilock()
subsystem (Darrick's comments were obsoleted by this change)
* Fix lkp build suggestions and bugs

Changes from V3:
* Remove global locking... :-D
* put back per inode locking and remove pre-mature optimizations
* Fix issues with Directories having IS_DAX() set
* Fix kernel crash issues reported by Jeff
* Add some clean up patches
* Consolidate diflags to iflags functions
* Update/add documentation
* Reorder/rename patches quite a bit

Changes from V2:

* Move i_dax_sem to be a global percpu_rw_sem rather than per inode
Internal discussions with Dan determined this would be easier,
just as performant, and slightly less overhead that having it
in the SB as suggested by Jan
* Fix locking order in comments and throughout code
* Change "mode" to "state" throughout commits
* Add CONFIG_FS_DAX wrapper to disable inode_[un]lock_state() when not
configured
* Add static branch for which is activated by a device which supports
DAX in XFS
* Change "lock/unlock" to up/down read/write as appropriate
Previous names were over simplified
* Update comments/documentation

* Remove the xfs specific lock to the vfs (global) layer.
* Fix i_dax_sem locking order and comments

* Move 'i_mapped' count from struct inode to struct address_space and
rename it to mmap_count
* Add inode_has_mappings() call

* Fix build issues
* Clean up syntax spacing and minor issues
* Update man page text for STATX_ATTR_DAX
* Add reviewed-by's
* Rebase to 5.6

Rename patch:
from: fs/xfs: Add lock/unlock state to xfs
to: fs/xfs: Add write DAX lock to xfs layer
Add patch:
fs/xfs: Clarify lockdep dependency for xfs_isilocked()
Drop patch:
fs/xfs: Fix truncate up

Ira Weiny (11):
fs/xfs: Remove unnecessary initialization of i_rwsem
fs: Remove unneeded IS_DAX() check in io_is_direct()
fs/stat: Define DAX statx attribute
Documentation/dax: Update Usage section
fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
fs/xfs: Make DAX mount option a tri-state
fs/xfs: Create function xfs_inode_should_enable_dax()
fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
fs: Lift XFS_IDONTCACHE to the VFS layer
fs: Introduce DCACHE_DONTCACHE
fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

Documentation/filesystems/dax.txt | 139 ++++++++++++++++++++++++++++-
drivers/block/loop.c | 6 +-
fs/dcache.c | 19 ++++
fs/stat.c | 3 +
fs/xfs/xfs_icache.c | 8 +-
fs/xfs/xfs_inode.h | 4 +-
fs/xfs/xfs_ioctl.c | 141 +++++-------------------------
fs/xfs/xfs_iops.c | 72 ++++++++++-----
fs/xfs/xfs_mount.h | 4 +-
fs/xfs/xfs_super.c | 50 +++++++++--
include/linux/dcache.h | 2 +
include/linux/fs.h | 14 +--
include/uapi/linux/stat.h | 1 +
13 files changed, 291 insertions(+), 172 deletions(-)

--
2.25.1


2020-04-28 00:22:34

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE

From: Ira Weiny <[email protected]>

DCACHE_DONTCACHE indicates a dentry should not be cached on final
dput().

Also add a helper function to mark DCACHE_DONTCACHE on all dentries
pointing to a specific inode when that inode is being set I_DONTCACHE.

This facilitates dropping dentry references to inodes sooner which
require eviction to swap S_DAX mode.

Cc: Al Viro <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V10:
rename to d_mark_dontcache()
Move function to fs/dcache.c

Changes from V9:
modify i_state under i_lock
Update comment
"Purge from memory on final dput()"

Changes from V8:
Update commit message
Use mark_inode_dontcache in XFS
Fix locking... can't use rcu here.
Change name to mark_inode_dontcache
---
fs/dcache.c | 19 +++++++++++++++++++
fs/xfs/xfs_icache.c | 2 +-
include/linux/dcache.h | 2 ++
include/linux/fs.h | 1 +
4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b280e07e162b..0d07fb335b78 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
if (dentry->d_op->d_delete(dentry))
return false;
}
+
+ if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
+ return false;
+
/* retain; LRU fodder */
dentry->d_lockref.count--;
if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
@@ -656,6 +660,21 @@ static inline bool retain_dentry(struct dentry *dentry)
return true;
}

+void d_mark_dontcache(struct inode *inode)
+{
+ struct dentry *de;
+
+ spin_lock(&inode->i_lock);
+ hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
+ spin_lock(&de->d_lock);
+ de->d_flags |= DCACHE_DONTCACHE;
+ spin_unlock(&de->d_lock);
+ }
+ inode->i_state |= I_DONTCACHE;
+ spin_unlock(&inode->i_lock);
+}
+EXPORT_SYMBOL(d_mark_dontcache);
+
/*
* Finish off a dentry we've decided to kill.
* dentry->d_lock must be held, returns with it unlocked.
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index de76f7f60695..888646d74d7d 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -559,7 +559,7 @@ xfs_iget_cache_miss(
*/
iflags = XFS_INEW;
if (flags & XFS_IGET_DONTCACHE)
- VFS_I(ip)->i_state |= I_DONTCACHE;
+ d_mark_dontcache(VFS_I(ip));
ip->i_udquot = NULL;
ip->i_gdquot = NULL;
ip->i_pdquot = NULL;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c1488cc84fd9..a81f0c3cf352 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -177,6 +177,8 @@ struct dentry_operations {

#define DCACHE_REFERENCED 0x00000040 /* Recently used, don't discard. */

+#define DCACHE_DONTCACHE 0x00000080 /* Purge from memory on final dput() */
+
#define DCACHE_CANT_MOUNT 0x00000100
#define DCACHE_GENOCIDE 0x00000200
#define DCACHE_SHRINK_LIST 0x00000400
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 44bd45af760f..7c3e8c0306e0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
return !inode->i_nlink || inode_unhashed(inode) ||
(inode->i_state & I_DONTCACHE);
}
+extern void d_mark_dontcache(struct inode *inode);

extern struct inode *ilookup5_nowait(struct super_block *sb,
unsigned long hashval, int (*test)(struct inode *, void *),
--
2.25.1

2020-04-28 00:22:47

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

From: Ira Weiny <[email protected]>

Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
setting of S_DAX, data invalidation no longer needs to happen when
FS_XFLAG_DAX is changed.

Change xfs_ioctl_setattr_dax_invalidate() to be
xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
functionality.

Furthermore, we no longer need the locking so we remove the join_flags
logic.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V10:
adjust for renamed d_mark_dontcache() function

Changes from V9:
Change name of function to xfs_ioctl_setattr_prepare_dax()

Changes from V8:
Change name of function to xfs_ioctl_dax_check_set_cache()
Update commit message
Fix bit manipulations

Changes from V7:
Use new flag_inode_dontcache()
Skip don't cache if mount over ride is active.

Changes from v6:
Fix completely broken implementation and update commit message.
Use the new VFS layer I_DONTCACHE to facilitate inode eviction
and S_DAX changing on drop_caches

Changes from v5:
New patch
---
fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
1 file changed, 20 insertions(+), 88 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 104495ac187c..ff474f2c9acf 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
return 0;
}

-/*
- * If we are changing DAX flags, we have to ensure the file is clean and any
- * cached objects in the address space are invalidated and removed. This
- * requires us to lock out other IO and page faults similar to a truncate
- * operation. The locks need to be held until the transaction has been committed
- * so that the cache invalidation is atomic with respect to the DAX flag
- * manipulation.
- */
-static int
-xfs_ioctl_setattr_dax_invalidate(
+static void
+xfs_ioctl_setattr_prepare_dax(
struct xfs_inode *ip,
- struct fsxattr *fa,
- int *join_flags)
+ struct fsxattr *fa)
{
- struct inode *inode = VFS_I(ip);
- struct super_block *sb = inode->i_sb;
- int error;
-
- *join_flags = 0;
-
- /*
- * It is only valid to set the DAX flag on regular files and
- * directories on filesystems where the block size is equal to the page
- * size. On directories it serves as an inherited hint so we don't
- * have to check the device for dax support or flush pagecache.
- */
- if (fa->fsx_xflags & FS_XFLAG_DAX) {
- struct xfs_buftarg *target = xfs_inode_buftarg(ip);
-
- if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
- return -EINVAL;
- }
-
- /* If the DAX state is not changing, we have nothing to do here. */
- if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
- return 0;
- if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
- return 0;
+ struct xfs_mount *mp = ip->i_mount;
+ struct inode *inode = VFS_I(ip);

if (S_ISDIR(inode->i_mode))
- return 0;
-
- /* lock, flush and invalidate mapping in preparation for flag change */
- xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
- error = filemap_write_and_wait(inode->i_mapping);
- if (error)
- goto out_unlock;
- error = invalidate_inode_pages2(inode->i_mapping);
- if (error)
- goto out_unlock;
-
- *join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
- return 0;
+ return;

-out_unlock:
- xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
- return error;
+ if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
+ (mp->m_flags & XFS_MOUNT_DAX_NEVER))
+ return;

+ if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
+ !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
+ (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
+ (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
+ d_mark_dontcache(inode);
}

/*
@@ -1310,17 +1272,10 @@ xfs_ioctl_setattr_dax_invalidate(
* have permission to do so. On success, return a clean transaction and the
* inode locked exclusively ready for further operation specific checks. On
* failure, return an error without modifying or locking the inode.
- *
- * The inode might already be IO locked on call. If this is the case, it is
- * indicated in @join_flags and we take full responsibility for ensuring they
- * are unlocked from now on. Hence if we have an error here, we still have to
- * unlock them. Otherwise, once they are joined to the transaction, they will
- * be unlocked on commit/cancel.
*/
static struct xfs_trans *
xfs_ioctl_setattr_get_trans(
- struct xfs_inode *ip,
- int join_flags)
+ struct xfs_inode *ip)
{
struct xfs_mount *mp = ip->i_mount;
struct xfs_trans *tp;
@@ -1337,8 +1292,7 @@ xfs_ioctl_setattr_get_trans(
goto out_unlock;

xfs_ilock(ip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
- join_flags = 0;
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);

/*
* CAP_FOWNER overrides the following restrictions:
@@ -1359,8 +1313,6 @@ xfs_ioctl_setattr_get_trans(
out_cancel:
xfs_trans_cancel(tp);
out_unlock:
- if (join_flags)
- xfs_iunlock(ip, join_flags);
return ERR_PTR(error);
}

@@ -1486,7 +1438,6 @@ xfs_ioctl_setattr(
struct xfs_dquot *pdqp = NULL;
struct xfs_dquot *olddquot = NULL;
int code;
- int join_flags = 0;

trace_xfs_ioctl_setattr(ip);

@@ -1510,18 +1461,9 @@ xfs_ioctl_setattr(
return code;
}

- /*
- * Changing DAX config may require inode locking for mapping
- * invalidation. These need to be held all the way to transaction commit
- * or cancel time, so need to be passed through to
- * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
- * appropriately.
- */
- code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
- if (code)
- goto error_free_dquots;
+ xfs_ioctl_setattr_prepare_dax(ip, fa);

- tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+ tp = xfs_ioctl_setattr_get_trans(ip);
if (IS_ERR(tp)) {
code = PTR_ERR(tp);
goto error_free_dquots;
@@ -1651,7 +1593,6 @@ xfs_ioc_setxflags(
struct fsxattr fa;
struct fsxattr old_fa;
unsigned int flags;
- int join_flags = 0;
int error;

if (copy_from_user(&flags, arg, sizeof(flags)))
@@ -1668,18 +1609,9 @@ xfs_ioc_setxflags(
if (error)
return error;

- /*
- * Changing DAX config may require inode locking for mapping
- * invalidation. These need to be held all the way to transaction commit
- * or cancel time, so need to be passed through to
- * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
- * appropriately.
- */
- error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
- if (error)
- goto out_drop_write;
+ xfs_ioctl_setattr_prepare_dax(ip, &fa);

- tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+ tp = xfs_ioctl_setattr_get_trans(ip);
if (IS_ERR(tp)) {
error = PTR_ERR(tp);
goto out_drop_write;
--
2.25.1

2020-04-28 00:23:01

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer

From: Ira Weiny <[email protected]>

DAX effective mode (S_DAX) changes requires inode eviction.

XFS has an advisory flag (XFS_IDONTCACHE) to prevent caching of the
inode if no other additional references are taken. We lift this flag to
the VFS layer and change the behavior slightly by allowing the flag to
remain even if multiple references are taken.

This will expedite the eviction of inodes to change S_DAX.

Cc: Al Viro <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V9:
Fix misspelling in commit subject
move XFS_IEOFBLOCKS to '9'

Changes from V8:
Remove XFS_IDONTCACHE
---
fs/xfs/xfs_icache.c | 4 ++--
fs/xfs/xfs_inode.h | 3 +--
fs/xfs/xfs_super.c | 2 +-
include/linux/fs.h | 6 +++++-
4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 17a0b86fe701..de76f7f60695 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -477,7 +477,7 @@ xfs_iget_cache_hit(
xfs_ilock(ip, lock_flags);

if (!(flags & XFS_IGET_INCORE))
- xfs_iflags_clear(ip, XFS_ISTALE | XFS_IDONTCACHE);
+ xfs_iflags_clear(ip, XFS_ISTALE);
XFS_STATS_INC(mp, xs_ig_found);

return 0;
@@ -559,7 +559,7 @@ xfs_iget_cache_miss(
*/
iflags = XFS_INEW;
if (flags & XFS_IGET_DONTCACHE)
- iflags |= XFS_IDONTCACHE;
+ VFS_I(ip)->i_state |= I_DONTCACHE;
ip->i_udquot = NULL;
ip->i_gdquot = NULL;
ip->i_pdquot = NULL;
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 83073c883fbf..d8ce3eaa246e 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -218,8 +218,7 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip)
#define XFS_IFLOCK (1 << __XFS_IFLOCK_BIT)
#define __XFS_IPINNED_BIT 8 /* wakeup key for zero pin count */
#define XFS_IPINNED (1 << __XFS_IPINNED_BIT)
-#define XFS_IDONTCACHE (1 << 9) /* don't cache the inode long term */
-#define XFS_IEOFBLOCKS (1 << 10)/* has the preallocblocks tag set */
+#define XFS_IEOFBLOCKS (1 << 9) /* has the preallocblocks tag set */
/*
* If this unlinked inode is in the middle of recovery, don't let drop_inode
* truncate and free the inode. This can happen if we iget the inode during
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index e80bd2c4c279..6f91c13fb6ea 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -737,7 +737,7 @@ xfs_fs_drop_inode(
return 0;
}

- return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE);
+ return generic_drop_inode(inode);
}

static void
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a87cc5845a02..44bd45af760f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2156,6 +2156,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
*
* I_CREATING New object's inode in the middle of setting up.
*
+ * I_DONTCACHE Evict inode as soon as it is not used anymore.
+ *
* Q: What is the difference between I_WILL_FREE and I_FREEING?
*/
#define I_DIRTY_SYNC (1 << 0)
@@ -2178,6 +2180,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
#define I_WB_SWITCH (1 << 13)
#define I_OVL_INUSE (1 << 14)
#define I_CREATING (1 << 15)
+#define I_DONTCACHE (1 << 16)

#define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
#define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
@@ -3049,7 +3052,8 @@ extern int inode_needs_sync(struct inode *inode);
extern int generic_delete_inode(struct inode *inode);
static inline int generic_drop_inode(struct inode *inode)
{
- return !inode->i_nlink || inode_unhashed(inode);
+ return !inode->i_nlink || inode_unhashed(inode) ||
+ (inode->i_state & I_DONTCACHE);
}

extern struct inode *ilookup5_nowait(struct super_block *sb,
--
2.25.1

2020-04-28 00:23:17

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11 06/11] fs/xfs: Make DAX mount option a tri-state

From: Ira Weiny <[email protected]>

As agreed upon[1]. We make the dax mount option a tri-state. '-o dax'
continues to operate the same. We add 'always', 'never', and 'inode'
(default).

[1] https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V10:
Move show options to xfs_info_set array

Changes from V9:
Fix indentation in xfs_mount_set_dax_mode()
Do not report dax=inode

Changes from v8:
Move NEVER bit to 27
Use xfs signature style
use xfs_dax_mode enum

Changes from v7:
Change to XFS_MOUNT_DAX_NEVER

Changes from v6:
Use 2 flag bits rather than a field.
change iflag to inode

Changes from v5:
New Patch
---
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_super.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index f6123fb0113c..37bfb50db809 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -238,6 +238,7 @@ typedef struct xfs_mount {
allocator */
#define XFS_MOUNT_NOATTR2 (1ULL << 25) /* disable use of attr2 format */
#define XFS_MOUNT_DAX_ALWAYS (1ULL << 26)
+#define XFS_MOUNT_DAX_NEVER (1ULL << 27)

/*
* Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index ce169d1c7474..e80bd2c4c279 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -47,6 +47,39 @@ static struct kset *xfs_kset; /* top-level xfs sysfs dir */
static struct xfs_kobj xfs_dbg_kobj; /* global debug sysfs attrs */
#endif

+enum xfs_dax_mode {
+ XFS_DAX_INODE = 0,
+ XFS_DAX_ALWAYS = 1,
+ XFS_DAX_NEVER = 2,
+};
+
+static void
+xfs_mount_set_dax_mode(
+ struct xfs_mount *mp,
+ enum xfs_dax_mode mode)
+{
+ switch (mode) {
+ case XFS_DAX_INODE:
+ mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER);
+ break;
+ case XFS_DAX_ALWAYS:
+ mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
+ mp->m_flags &= ~XFS_MOUNT_DAX_NEVER;
+ break;
+ case XFS_DAX_NEVER:
+ mp->m_flags |= XFS_MOUNT_DAX_NEVER;
+ mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
+ break;
+ }
+}
+
+static const struct constant_table dax_param_enums[] = {
+ {"inode", XFS_DAX_INODE },
+ {"always", XFS_DAX_ALWAYS },
+ {"never", XFS_DAX_NEVER },
+ {}
+};
+
/*
* Table driven mount option parser.
*/
@@ -59,7 +92,7 @@ enum {
Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
- Opt_discard, Opt_nodiscard, Opt_dax,
+ Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
};

static const struct fs_parameter_spec xfs_fs_parameters[] = {
@@ -103,6 +136,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
fsparam_flag("discard", Opt_discard),
fsparam_flag("nodiscard", Opt_nodiscard),
fsparam_flag("dax", Opt_dax),
+ fsparam_enum("dax", Opt_dax_enum, dax_param_enums),
{}
};

@@ -129,7 +163,8 @@ xfs_fs_show_options(
{ XFS_MOUNT_GRPID, ",grpid" },
{ XFS_MOUNT_DISCARD, ",discard" },
{ XFS_MOUNT_LARGEIO, ",largeio" },
- { XFS_MOUNT_DAX_ALWAYS, ",dax" },
+ { XFS_MOUNT_DAX_ALWAYS, ",dax=always" },
+ { XFS_MOUNT_DAX_NEVER, ",dax=never" },
{ 0, NULL }
};
struct xfs_mount *mp = XFS_M(root->d_sb);
@@ -1261,7 +1296,10 @@ xfs_fc_parse_param(
return 0;
#ifdef CONFIG_FS_DAX
case Opt_dax:
- mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
+ xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
+ return 0;
+ case Opt_dax_enum:
+ xfs_mount_set_dax_mode(mp, result.uint_32);
return 0;
#endif
default:
@@ -1468,7 +1506,7 @@ xfs_fc_fill_super(
if (!rtdev_is_dax && !datadev_is_dax) {
xfs_alert(mp,
"DAX unsupported by block device. Turning off DAX.");
- mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
+ xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
}
if (xfs_sb_version_hasreflink(&mp->m_sb)) {
xfs_alert(mp,
--
2.25.1

2020-04-28 00:23:21

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()

From: Ira Weiny <[email protected]>

The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
nearly identical. The only difference is that *_to_linux() is called after
inode setup and disallows changing the DAX flag.

Combining them can be done with a flag which indicates if this is the initial
setup to allow the DAX flag to be properly set only at init time.

So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
directly.

While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
use xfs_ip2xflags() to ensure future diflags are included correctly.

Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V7:
Clarify with a comment the reason for leaving S_DAX out of the
mask

Changes from V6:
Move unrelated hunk to previous patch.
Change logic for better code generation.

Changes from V5:
The functions are no longer identical so we can only combine
them rather than deleting one completely. This is reflected in
the new init parameter.
---
fs/xfs/xfs_inode.h | 1 +
fs/xfs/xfs_ioctl.c | 33 +--------------------------------
fs/xfs/xfs_iops.c | 46 +++++++++++++++++++++++++++-------------------
3 files changed, 29 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index c6a63f6764a6..83073c883fbf 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -467,6 +467,7 @@ int xfs_break_layouts(struct inode *inode, uint *iolock,
/* from xfs_iops.c */
extern void xfs_setup_inode(struct xfs_inode *ip);
extern void xfs_setup_iops(struct xfs_inode *ip);
+extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);

/*
* When setting up a newly allocated inode, we need to call
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 309958186d33..104495ac187c 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1201,37 +1201,6 @@ xfs_flags2diflags2(
return di_flags2;
}

-STATIC void
-xfs_diflags_to_linux(
- struct xfs_inode *ip)
-{
- struct inode *inode = VFS_I(ip);
- unsigned int xflags = xfs_ip2xflags(ip);
-
- if (xflags & FS_XFLAG_IMMUTABLE)
- inode->i_flags |= S_IMMUTABLE;
- else
- inode->i_flags &= ~S_IMMUTABLE;
- if (xflags & FS_XFLAG_APPEND)
- inode->i_flags |= S_APPEND;
- else
- inode->i_flags &= ~S_APPEND;
- if (xflags & FS_XFLAG_SYNC)
- inode->i_flags |= S_SYNC;
- else
- inode->i_flags &= ~S_SYNC;
- if (xflags & FS_XFLAG_NOATIME)
- inode->i_flags |= S_NOATIME;
- else
- inode->i_flags &= ~S_NOATIME;
-#if 0 /* disabled until the flag switching races are sorted out */
- if (xflags & FS_XFLAG_DAX)
- inode->i_flags |= S_DAX;
- else
- inode->i_flags &= ~S_DAX;
-#endif
-}
-
static int
xfs_ioctl_setattr_xflags(
struct xfs_trans *tp,
@@ -1269,7 +1238,7 @@ xfs_ioctl_setattr_xflags(
ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
ip->i_d.di_flags2 = di_flags2;

- xfs_diflags_to_linux(ip);
+ xfs_diflags_to_iflags(ip, false);
xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
XFS_STATS_INC(mp, xs_ig_attrchg);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1814f10e43d3..b70b735fe4a4 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1276,26 +1276,34 @@ xfs_inode_should_enable_dax(
return false;
}

-STATIC void
+void
xfs_diflags_to_iflags(
- struct inode *inode,
- struct xfs_inode *ip)
+ struct xfs_inode *ip,
+ bool init)
{
- uint16_t flags = ip->i_d.di_flags;
-
- inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
- S_NOATIME | S_DAX);
-
- if (flags & XFS_DIFLAG_IMMUTABLE)
- inode->i_flags |= S_IMMUTABLE;
- if (flags & XFS_DIFLAG_APPEND)
- inode->i_flags |= S_APPEND;
- if (flags & XFS_DIFLAG_SYNC)
- inode->i_flags |= S_SYNC;
- if (flags & XFS_DIFLAG_NOATIME)
- inode->i_flags |= S_NOATIME;
- if (xfs_inode_should_enable_dax(ip))
- inode->i_flags |= S_DAX;
+ struct inode *inode = VFS_I(ip);
+ unsigned int xflags = xfs_ip2xflags(ip);
+ unsigned int flags = 0;
+
+ ASSERT(!(IS_DAX(inode) && init));
+
+ if (xflags & FS_XFLAG_IMMUTABLE)
+ flags |= S_IMMUTABLE;
+ if (xflags & FS_XFLAG_APPEND)
+ flags |= S_APPEND;
+ if (xflags & FS_XFLAG_SYNC)
+ flags |= S_SYNC;
+ if (xflags & FS_XFLAG_NOATIME)
+ flags |= S_NOATIME;
+ if (init && xfs_inode_should_enable_dax(ip))
+ flags |= S_DAX;
+
+ /*
+ * S_DAX can only be set during inode initialization and is never set by
+ * the VFS, so we cannot mask off S_DAX in i_flags.
+ */
+ inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);
+ inode->i_flags |= flags;
}

/*
@@ -1321,7 +1329,7 @@ xfs_setup_inode(
inode_fake_hash(inode);

i_size_write(inode, ip->i_d.di_size);
- xfs_diflags_to_iflags(inode, ip);
+ xfs_diflags_to_iflags(ip, true);

if (S_ISDIR(inode->i_mode)) {
/*
--
2.25.1

2020-04-28 00:23:29

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11 04/11] Documentation/dax: Update Usage section

From: Ira Weiny <[email protected]>

Update the Usage section to reflect the new individual dax selection
functionality.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V10:
Clarifications from Dave
Add '-c' to xfs_io examples

Changes from V9:
Fix missing ')'
Fix trialing '"'

Changes from V8:
Updates from Darrick

Changes from V7:
Cleanups/clarifications from Darrick and Dan

Changes from V6:
Update to allow setting FS_XFLAG_DAX any time.
Update with list of behaviors from Darrick
https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/

Changes from V5:
Update to reflect the agreed upon semantics
https://lore.kernel.org/lkml/[email protected]/
---
Documentation/filesystems/dax.txt | 139 +++++++++++++++++++++++++++++-
1 file changed, 136 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..409e4e83e46a 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -17,11 +17,144 @@ For file mappings, the storage device is mapped directly into userspace.
Usage
-----

-If you have a block device which supports DAX, you can make a filesystem
+If you have a block device which supports DAX, you can make a file system
on it as usual. The DAX code currently only supports files with a block
size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem. When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the file system.
+
+Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them
+is different.
+
+Enabling DAX on ext4 and ext2
+-----------------------------
+
+When mounting the filesystem, use the "-o dax" option on the command line or
+add 'dax' to the options in /etc/fstab. This works to enable DAX on all files
+within the filesystem. It is equivalent to the '-o dax=always' behavior below.
+
+
+Enabling DAX on xfs
+-------------------
+
+Summary
+-------
+
+ 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
+ the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details
+ about this access mode.
+
+ 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
+ files and directories. This advisory flag can be set or cleared at any
+ time, but doing so does not immediately affect the S_DAX state.
+
+ 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
+ be inherited by all regular files and sub directories that are subsequently
+ created in this directory. Files and subdirectories that exist at the time
+ this flag is set or cleared on the parent directory are not modified by
+ this modification of the parent directory.
+
+ 4. There exists dax mount options which can override FS_XFLAG_DAX in the
+ setting of the S_DAX flag. Given underlying storage which supports DAX the
+ following hold.
+
+ "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
+
+ "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
+
+ "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
+
+ "-o dax" is a legacy option which is an alias for "dax=always".
+ This may be removed in the future so "-o dax=always" is
+ the preferred method for specifying this behavior.
+
+ NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when
+ the file system is mounted with a dax option. However, in-core inode state
+ (S_DAX) will be overridden until the file system is remounted with
+ dax=inode and the inode is evicted from kernel memory.
+
+ 5. The DAX policy can be changed via:
+
+ a) Set the parent directory FS_XFLAG_DAX as needed before files are created
+
+ b) Set the appropriate dax="foo" mount option
+
+ c) Change the FS_XFLAG_DAX on existing regular files and directories. This
+ has runtime constraints and limitations that are described in 6) below.
+
+ 6. When changing the DAX policy via toggling the persistent FS_XFLAG_DAX flag,
+ the change in behaviour for existing regular files may not occur
+ immediately. If the change must take effect immediately, the administrator
+ needs to:
+
+ a) stop the application so there are no active references to the data set
+ the policy change will affect
+
+ b) evict the data set from kernel caches so it will be re-instantiated when
+ the application is restarted. This can be acheived by:
+
+ i. drop-caches
+ ii. a filesystem unmount and mount cycle
+ iii. a system reboot
+
+
+Details
+-------
+
+There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX)
+and the other is a volatile flag indicating the active state of the feature
+(S_DAX).
+
+FS_XFLAG_DAX is preserved within the file system. This persistent config
+setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
+(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
+
+New files and directories automatically inherit FS_XFLAG_DAX from
+their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at
+directory creation time can be used to set a default behavior for an entire
+sub-tree.
+
+To clarify inheritance here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+ dax: a,e
+ no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io -c 'chattr +x' a
+mkdir -p a/b/c/d
+
+ dax: a,b,c,d
+ no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' c
+mkdir a/b/c/d
+
+ dax: c,d
+ no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is instantiated in
+memory by the kernel. It is set based on the underlying media support, the
+value of FS_XFLAG_DAX and the file systems dax mount option setting.
+
+statx can be used to query S_DAX. NOTE that a directory will never have S_DAX
+set and therefore statx will never indicate that S_DAX is set on directories.
+
+Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
+the underlying media does not support dax and/or the file system is overridden
+with a mount option.
+


Implementation Tips for Block Driver Writers
--
2.25.1

2020-04-28 00:23:59

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS

From: Ira Weiny <[email protected]>

In prep for the new tri-state mount option which then introduces
XFS_MOUNT_DAX_NEVER.

Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from v8
Move bit to 26
---
fs/xfs/xfs_iops.c | 2 +-
fs/xfs/xfs_mount.h | 3 +--
fs/xfs/xfs_super.c | 8 ++++----
3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index f7a99b3bbcf7..462f89af479a 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1248,7 +1248,7 @@ xfs_inode_supports_dax(
return false;

/* DAX mount option or DAX iflag must be set. */
- if (!(mp->m_flags & XFS_MOUNT_DAX) &&
+ if (!(mp->m_flags & XFS_MOUNT_DAX_ALWAYS) &&
!(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
return false;

diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b2e4598fdf7d..f6123fb0113c 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -237,8 +237,7 @@ typedef struct xfs_mount {
#define XFS_MOUNT_FILESTREAMS (1ULL << 24) /* enable the filestreams
allocator */
#define XFS_MOUNT_NOATTR2 (1ULL << 25) /* disable use of attr2 format */
-
-#define XFS_MOUNT_DAX (1ULL << 62) /* TEST ONLY! */
+#define XFS_MOUNT_DAX_ALWAYS (1ULL << 26)

/*
* Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 424bb9a2d532..ce169d1c7474 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -129,7 +129,7 @@ xfs_fs_show_options(
{ XFS_MOUNT_GRPID, ",grpid" },
{ XFS_MOUNT_DISCARD, ",discard" },
{ XFS_MOUNT_LARGEIO, ",largeio" },
- { XFS_MOUNT_DAX, ",dax" },
+ { XFS_MOUNT_DAX_ALWAYS, ",dax" },
{ 0, NULL }
};
struct xfs_mount *mp = XFS_M(root->d_sb);
@@ -1261,7 +1261,7 @@ xfs_fc_parse_param(
return 0;
#ifdef CONFIG_FS_DAX
case Opt_dax:
- mp->m_flags |= XFS_MOUNT_DAX;
+ mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
return 0;
#endif
default:
@@ -1454,7 +1454,7 @@ xfs_fc_fill_super(
if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
sb->s_flags |= SB_I_VERSION;

- if (mp->m_flags & XFS_MOUNT_DAX) {
+ if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS) {
bool rtdev_is_dax = false, datadev_is_dax;

xfs_warn(mp,
@@ -1468,7 +1468,7 @@ xfs_fc_fill_super(
if (!rtdev_is_dax && !datadev_is_dax) {
xfs_alert(mp,
"DAX unsupported by block device. Turning off DAX.");
- mp->m_flags &= ~XFS_MOUNT_DAX;
+ mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
}
if (xfs_sb_version_hasreflink(&mp->m_sb)) {
xfs_alert(mp,
--
2.25.1

2020-04-28 00:24:19

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct()

From: Ira Weiny <[email protected]>

Remove the check because DAX now has it's own read/write methods and
file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on
their own. Therefore, it does not matter if the file state is DAX when
the iocb flags are created.

Also remove io_is_direct() as it is just a simple flag check.

Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from v8:
Rebase to latest Linus tree

Changes from v6:
remove io_is_direct() as well.
Remove Reviews since this is quite a bit different.

Changes from v3:
Reword commit message.
Reordered to be a 'pre-cleanup' patch
---
drivers/block/loop.c | 6 +++---
include/linux/fs.h | 7 +------
2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index da693e6a834e..14372df0f354 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -634,8 +634,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)

static inline void loop_update_dio(struct loop_device *lo)
{
- __loop_update_dio(lo, io_is_direct(lo->lo_backing_file) |
- lo->use_dio);
+ __loop_update_dio(lo, (lo->lo_backing_file->f_flags & O_DIRECT) |
+ lo->use_dio);
}

static void loop_reread_partitions(struct loop_device *lo,
@@ -1028,7 +1028,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
blk_queue_write_cache(lo->lo_queue, true, false);

- if (io_is_direct(lo->lo_backing_file) && inode->i_sb->s_bdev) {
+ if ((lo->lo_backing_file->f_flags & O_DIRECT) && inode->i_sb->s_bdev) {
/* In case of direct I/O, match underlying block size */
unsigned short bsize = bdev_logical_block_size(
inode->i_sb->s_bdev);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4f6f59b4f22a..a87cc5845a02 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3394,11 +3394,6 @@ extern void setattr_copy(struct inode *inode, const struct iattr *attr);

extern int file_update_time(struct file *file);

-static inline bool io_is_direct(struct file *filp)
-{
- return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
-}
-
static inline bool vma_is_dax(const struct vm_area_struct *vma)
{
return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
@@ -3423,7 +3418,7 @@ static inline int iocb_flags(struct file *file)
int res = 0;
if (file->f_flags & O_APPEND)
res |= IOCB_APPEND;
- if (io_is_direct(file))
+ if (file->f_flags & O_DIRECT)
res |= IOCB_DIRECT;
if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
res |= IOCB_DSYNC;
--
2.25.1

2020-04-28 00:24:27

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem

From: Ira Weiny <[email protected]>

An earlier call of xfs_reinit_inode() from xfs_iget_cache_hit() already
handles initialization of i_rwsem.

Doing so again is unneeded.

Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V4:
Update commit message to make it clear the xfs_iget_cache_hit()
is actually doing the initialization via xfs_reinit_inode()

New for V4:

NOTE: This was found while ensuring the new i_aops_sem was properly
handled. It seems like this is a layering violation so I think it is
worth cleaning up so as to not confuse others.
---
fs/xfs/xfs_icache.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 8bf1d15be3f6..17a0b86fe701 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -423,6 +423,7 @@ xfs_iget_cache_hit(
spin_unlock(&ip->i_flags_lock);
rcu_read_unlock();

+ ASSERT(!rwsem_is_locked(&inode->i_rwsem));
error = xfs_reinit_inode(mp, inode);
if (error) {
bool wake;
@@ -456,9 +457,6 @@ xfs_iget_cache_hit(
ip->i_sick = 0;
ip->i_checked = 0;

- ASSERT(!rwsem_is_locked(&inode->i_rwsem));
- init_rwsem(&inode->i_rwsem);
-
spin_unlock(&ip->i_flags_lock);
spin_unlock(&pag->pag_ici_lock);
} else {
--
2.25.1

2020-04-28 13:05:39

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE

On Mon 27-04-20 17:21:41, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> DCACHE_DONTCACHE indicates a dentry should not be cached on final
> dput().
>
> Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> pointing to a specific inode when that inode is being set I_DONTCACHE.
>
> This facilitates dropping dentry references to inodes sooner which
> require eviction to swap S_DAX mode.
>
> Cc: Al Viro <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

>
> ---
> Changes from V10:
> rename to d_mark_dontcache()
> Move function to fs/dcache.c
>
> Changes from V9:
> modify i_state under i_lock
> Update comment
> "Purge from memory on final dput()"
>
> Changes from V8:
> Update commit message
> Use mark_inode_dontcache in XFS
> Fix locking... can't use rcu here.
> Change name to mark_inode_dontcache
> ---
> fs/dcache.c | 19 +++++++++++++++++++
> fs/xfs/xfs_icache.c | 2 +-
> include/linux/dcache.h | 2 ++
> include/linux/fs.h | 1 +
> 4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b280e07e162b..0d07fb335b78 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
> if (dentry->d_op->d_delete(dentry))
> return false;
> }
> +
> + if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
> + return false;
> +
> /* retain; LRU fodder */
> dentry->d_lockref.count--;
> if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
> @@ -656,6 +660,21 @@ static inline bool retain_dentry(struct dentry *dentry)
> return true;
> }
>
> +void d_mark_dontcache(struct inode *inode)
> +{
> + struct dentry *de;
> +
> + spin_lock(&inode->i_lock);
> + hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> + spin_lock(&de->d_lock);
> + de->d_flags |= DCACHE_DONTCACHE;
> + spin_unlock(&de->d_lock);
> + }
> + inode->i_state |= I_DONTCACHE;
> + spin_unlock(&inode->i_lock);
> +}
> +EXPORT_SYMBOL(d_mark_dontcache);
> +
> /*
> * Finish off a dentry we've decided to kill.
> * dentry->d_lock must be held, returns with it unlocked.
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index de76f7f60695..888646d74d7d 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -559,7 +559,7 @@ xfs_iget_cache_miss(
> */
> iflags = XFS_INEW;
> if (flags & XFS_IGET_DONTCACHE)
> - VFS_I(ip)->i_state |= I_DONTCACHE;
> + d_mark_dontcache(VFS_I(ip));
> ip->i_udquot = NULL;
> ip->i_gdquot = NULL;
> ip->i_pdquot = NULL;
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index c1488cc84fd9..a81f0c3cf352 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -177,6 +177,8 @@ struct dentry_operations {
>
> #define DCACHE_REFERENCED 0x00000040 /* Recently used, don't discard. */
>
> +#define DCACHE_DONTCACHE 0x00000080 /* Purge from memory on final dput() */
> +
> #define DCACHE_CANT_MOUNT 0x00000100
> #define DCACHE_GENOCIDE 0x00000200
> #define DCACHE_SHRINK_LIST 0x00000400
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 44bd45af760f..7c3e8c0306e0 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
> return !inode->i_nlink || inode_unhashed(inode) ||
> (inode->i_state & I_DONTCACHE);
> }
> +extern void d_mark_dontcache(struct inode *inode);
>
> extern struct inode *ilookup5_nowait(struct super_block *sb,
> unsigned long hashval, int (*test)(struct inode *, void *),
> --
> 2.25.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-04-28 20:07:58

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V11 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer

On Mon, Apr 27, 2020 at 05:21:40PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> DAX effective mode (S_DAX) changes requires inode eviction.
>
> XFS has an advisory flag (XFS_IDONTCACHE) to prevent caching of the
> inode if no other additional references are taken. We lift this flag to
> the VFS layer and change the behavior slightly by allowing the flag to
> remain even if multiple references are taken.
>
> This will expedite the eviction of inodes to change S_DAX.
>
> Cc: Al Viro <[email protected]>
> Reviewed-by: Dave Chinner <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>

Looks ok to me,
Reviewed-by: Darrick J. Wong <[email protected]>

--D

>
> ---
> Changes from V9:
> Fix misspelling in commit subject
> move XFS_IEOFBLOCKS to '9'
>
> Changes from V8:
> Remove XFS_IDONTCACHE
> ---
> fs/xfs/xfs_icache.c | 4 ++--
> fs/xfs/xfs_inode.h | 3 +--
> fs/xfs/xfs_super.c | 2 +-
> include/linux/fs.h | 6 +++++-
> 4 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 17a0b86fe701..de76f7f60695 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -477,7 +477,7 @@ xfs_iget_cache_hit(
> xfs_ilock(ip, lock_flags);
>
> if (!(flags & XFS_IGET_INCORE))
> - xfs_iflags_clear(ip, XFS_ISTALE | XFS_IDONTCACHE);
> + xfs_iflags_clear(ip, XFS_ISTALE);
> XFS_STATS_INC(mp, xs_ig_found);
>
> return 0;
> @@ -559,7 +559,7 @@ xfs_iget_cache_miss(
> */
> iflags = XFS_INEW;
> if (flags & XFS_IGET_DONTCACHE)
> - iflags |= XFS_IDONTCACHE;
> + VFS_I(ip)->i_state |= I_DONTCACHE;
> ip->i_udquot = NULL;
> ip->i_gdquot = NULL;
> ip->i_pdquot = NULL;
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 83073c883fbf..d8ce3eaa246e 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -218,8 +218,7 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip)
> #define XFS_IFLOCK (1 << __XFS_IFLOCK_BIT)
> #define __XFS_IPINNED_BIT 8 /* wakeup key for zero pin count */
> #define XFS_IPINNED (1 << __XFS_IPINNED_BIT)
> -#define XFS_IDONTCACHE (1 << 9) /* don't cache the inode long term */
> -#define XFS_IEOFBLOCKS (1 << 10)/* has the preallocblocks tag set */
> +#define XFS_IEOFBLOCKS (1 << 9) /* has the preallocblocks tag set */
> /*
> * If this unlinked inode is in the middle of recovery, don't let drop_inode
> * truncate and free the inode. This can happen if we iget the inode during
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index e80bd2c4c279..6f91c13fb6ea 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -737,7 +737,7 @@ xfs_fs_drop_inode(
> return 0;
> }
>
> - return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE);
> + return generic_drop_inode(inode);
> }
>
> static void
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a87cc5845a02..44bd45af760f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2156,6 +2156,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> *
> * I_CREATING New object's inode in the middle of setting up.
> *
> + * I_DONTCACHE Evict inode as soon as it is not used anymore.
> + *
> * Q: What is the difference between I_WILL_FREE and I_FREEING?
> */
> #define I_DIRTY_SYNC (1 << 0)
> @@ -2178,6 +2180,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> #define I_WB_SWITCH (1 << 13)
> #define I_OVL_INUSE (1 << 14)
> #define I_CREATING (1 << 15)
> +#define I_DONTCACHE (1 << 16)
>
> #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
> #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> @@ -3049,7 +3052,8 @@ extern int inode_needs_sync(struct inode *inode);
> extern int generic_delete_inode(struct inode *inode);
> static inline int generic_drop_inode(struct inode *inode)
> {
> - return !inode->i_nlink || inode_unhashed(inode);
> + return !inode->i_nlink || inode_unhashed(inode) ||
> + (inode->i_state & I_DONTCACHE);
> }
>
> extern struct inode *ilookup5_nowait(struct super_block *sb,
> --
> 2.25.1
>

2020-04-28 20:09:58

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V11 06/11] fs/xfs: Make DAX mount option a tri-state

On Mon, Apr 27, 2020 at 05:21:37PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> As agreed upon[1]. We make the dax mount option a tri-state. '-o dax'
> continues to operate the same. We add 'always', 'never', and 'inode'
> (default).
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Signed-off-by: Ira Weiny <[email protected]>

Looks good to me,
Reviewed-by: Darrick J. Wong <[email protected]>

--D

>
> ---
> Changes from V10:
> Move show options to xfs_info_set array
>
> Changes from V9:
> Fix indentation in xfs_mount_set_dax_mode()
> Do not report dax=inode
>
> Changes from v8:
> Move NEVER bit to 27
> Use xfs signature style
> use xfs_dax_mode enum
>
> Changes from v7:
> Change to XFS_MOUNT_DAX_NEVER
>
> Changes from v6:
> Use 2 flag bits rather than a field.
> change iflag to inode
>
> Changes from v5:
> New Patch
> ---
> fs/xfs/xfs_mount.h | 1 +
> fs/xfs/xfs_super.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 43 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index f6123fb0113c..37bfb50db809 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -238,6 +238,7 @@ typedef struct xfs_mount {
> allocator */
> #define XFS_MOUNT_NOATTR2 (1ULL << 25) /* disable use of attr2 format */
> #define XFS_MOUNT_DAX_ALWAYS (1ULL << 26)
> +#define XFS_MOUNT_DAX_NEVER (1ULL << 27)
>
> /*
> * Max and min values for mount-option defined I/O
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index ce169d1c7474..e80bd2c4c279 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -47,6 +47,39 @@ static struct kset *xfs_kset; /* top-level xfs sysfs dir */
> static struct xfs_kobj xfs_dbg_kobj; /* global debug sysfs attrs */
> #endif
>
> +enum xfs_dax_mode {
> + XFS_DAX_INODE = 0,
> + XFS_DAX_ALWAYS = 1,
> + XFS_DAX_NEVER = 2,
> +};
> +
> +static void
> +xfs_mount_set_dax_mode(
> + struct xfs_mount *mp,
> + enum xfs_dax_mode mode)
> +{
> + switch (mode) {
> + case XFS_DAX_INODE:
> + mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER);
> + break;
> + case XFS_DAX_ALWAYS:
> + mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
> + mp->m_flags &= ~XFS_MOUNT_DAX_NEVER;
> + break;
> + case XFS_DAX_NEVER:
> + mp->m_flags |= XFS_MOUNT_DAX_NEVER;
> + mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
> + break;
> + }
> +}
> +
> +static const struct constant_table dax_param_enums[] = {
> + {"inode", XFS_DAX_INODE },
> + {"always", XFS_DAX_ALWAYS },
> + {"never", XFS_DAX_NEVER },
> + {}
> +};
> +
> /*
> * Table driven mount option parser.
> */
> @@ -59,7 +92,7 @@ enum {
> Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
> Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
> Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
> - Opt_discard, Opt_nodiscard, Opt_dax,
> + Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
> };
>
> static const struct fs_parameter_spec xfs_fs_parameters[] = {
> @@ -103,6 +136,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
> fsparam_flag("discard", Opt_discard),
> fsparam_flag("nodiscard", Opt_nodiscard),
> fsparam_flag("dax", Opt_dax),
> + fsparam_enum("dax", Opt_dax_enum, dax_param_enums),
> {}
> };
>
> @@ -129,7 +163,8 @@ xfs_fs_show_options(
> { XFS_MOUNT_GRPID, ",grpid" },
> { XFS_MOUNT_DISCARD, ",discard" },
> { XFS_MOUNT_LARGEIO, ",largeio" },
> - { XFS_MOUNT_DAX_ALWAYS, ",dax" },
> + { XFS_MOUNT_DAX_ALWAYS, ",dax=always" },
> + { XFS_MOUNT_DAX_NEVER, ",dax=never" },
> { 0, NULL }
> };
> struct xfs_mount *mp = XFS_M(root->d_sb);
> @@ -1261,7 +1296,10 @@ xfs_fc_parse_param(
> return 0;
> #ifdef CONFIG_FS_DAX
> case Opt_dax:
> - mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
> + xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
> + return 0;
> + case Opt_dax_enum:
> + xfs_mount_set_dax_mode(mp, result.uint_32);
> return 0;
> #endif
> default:
> @@ -1468,7 +1506,7 @@ xfs_fc_fill_super(
> if (!rtdev_is_dax && !datadev_is_dax) {
> xfs_alert(mp,
> "DAX unsupported by block device. Turning off DAX.");
> - mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
> + xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
> }
> if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> xfs_alert(mp,
> --
> 2.25.1
>

2020-04-28 20:11:18

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE

On Mon, Apr 27, 2020 at 05:21:41PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> DCACHE_DONTCACHE indicates a dentry should not be cached on final
> dput().
>
> Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> pointing to a specific inode when that inode is being set I_DONTCACHE.
>
> This facilitates dropping dentry references to inodes sooner which
> require eviction to swap S_DAX mode.
>
> Cc: Al Viro <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>

Seems reasonable to me,
Reviewed-by: Darrick J. Wong <[email protected]>

--D

>
> ---
> Changes from V10:
> rename to d_mark_dontcache()
> Move function to fs/dcache.c
>
> Changes from V9:
> modify i_state under i_lock
> Update comment
> "Purge from memory on final dput()"
>
> Changes from V8:
> Update commit message
> Use mark_inode_dontcache in XFS
> Fix locking... can't use rcu here.
> Change name to mark_inode_dontcache
> ---
> fs/dcache.c | 19 +++++++++++++++++++
> fs/xfs/xfs_icache.c | 2 +-
> include/linux/dcache.h | 2 ++
> include/linux/fs.h | 1 +
> 4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b280e07e162b..0d07fb335b78 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
> if (dentry->d_op->d_delete(dentry))
> return false;
> }
> +
> + if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
> + return false;
> +
> /* retain; LRU fodder */
> dentry->d_lockref.count--;
> if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
> @@ -656,6 +660,21 @@ static inline bool retain_dentry(struct dentry *dentry)
> return true;
> }
>
> +void d_mark_dontcache(struct inode *inode)
> +{
> + struct dentry *de;
> +
> + spin_lock(&inode->i_lock);
> + hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> + spin_lock(&de->d_lock);
> + de->d_flags |= DCACHE_DONTCACHE;
> + spin_unlock(&de->d_lock);
> + }
> + inode->i_state |= I_DONTCACHE;
> + spin_unlock(&inode->i_lock);
> +}
> +EXPORT_SYMBOL(d_mark_dontcache);
> +
> /*
> * Finish off a dentry we've decided to kill.
> * dentry->d_lock must be held, returns with it unlocked.
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index de76f7f60695..888646d74d7d 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -559,7 +559,7 @@ xfs_iget_cache_miss(
> */
> iflags = XFS_INEW;
> if (flags & XFS_IGET_DONTCACHE)
> - VFS_I(ip)->i_state |= I_DONTCACHE;
> + d_mark_dontcache(VFS_I(ip));
> ip->i_udquot = NULL;
> ip->i_gdquot = NULL;
> ip->i_pdquot = NULL;
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index c1488cc84fd9..a81f0c3cf352 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -177,6 +177,8 @@ struct dentry_operations {
>
> #define DCACHE_REFERENCED 0x00000040 /* Recently used, don't discard. */
>
> +#define DCACHE_DONTCACHE 0x00000080 /* Purge from memory on final dput() */
> +
> #define DCACHE_CANT_MOUNT 0x00000100
> #define DCACHE_GENOCIDE 0x00000200
> #define DCACHE_SHRINK_LIST 0x00000400
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 44bd45af760f..7c3e8c0306e0 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
> return !inode->i_nlink || inode_unhashed(inode) ||
> (inode->i_state & I_DONTCACHE);
> }
> +extern void d_mark_dontcache(struct inode *inode);
>
> extern struct inode *ilookup5_nowait(struct super_block *sb,
> unsigned long hashval, int (*test)(struct inode *, void *),
> --
> 2.25.1
>

2020-04-28 20:14:53

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

On Mon, Apr 27, 2020 at 05:21:42PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
> setting of S_DAX, data invalidation no longer needs to happen when
> FS_XFLAG_DAX is changed.
>
> Change xfs_ioctl_setattr_dax_invalidate() to be
> xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
> functionality.
>
> Furthermore, we no longer need the locking so we remove the join_flags
> logic.
>
> Signed-off-by: Ira Weiny <[email protected]>

Looks ok,
Reviewed-by: Darrick J. Wong <[email protected]>

--D

>
> ---
> Changes from V10:
> adjust for renamed d_mark_dontcache() function
>
> Changes from V9:
> Change name of function to xfs_ioctl_setattr_prepare_dax()
>
> Changes from V8:
> Change name of function to xfs_ioctl_dax_check_set_cache()
> Update commit message
> Fix bit manipulations
>
> Changes from V7:
> Use new flag_inode_dontcache()
> Skip don't cache if mount over ride is active.
>
> Changes from v6:
> Fix completely broken implementation and update commit message.
> Use the new VFS layer I_DONTCACHE to facilitate inode eviction
> and S_DAX changing on drop_caches
>
> Changes from v5:
> New patch
> ---
> fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
> 1 file changed, 20 insertions(+), 88 deletions(-)
>
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 104495ac187c..ff474f2c9acf 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
> return 0;
> }
>
> -/*
> - * If we are changing DAX flags, we have to ensure the file is clean and any
> - * cached objects in the address space are invalidated and removed. This
> - * requires us to lock out other IO and page faults similar to a truncate
> - * operation. The locks need to be held until the transaction has been committed
> - * so that the cache invalidation is atomic with respect to the DAX flag
> - * manipulation.
> - */
> -static int
> -xfs_ioctl_setattr_dax_invalidate(
> +static void
> +xfs_ioctl_setattr_prepare_dax(
> struct xfs_inode *ip,
> - struct fsxattr *fa,
> - int *join_flags)
> + struct fsxattr *fa)
> {
> - struct inode *inode = VFS_I(ip);
> - struct super_block *sb = inode->i_sb;
> - int error;
> -
> - *join_flags = 0;
> -
> - /*
> - * It is only valid to set the DAX flag on regular files and
> - * directories on filesystems where the block size is equal to the page
> - * size. On directories it serves as an inherited hint so we don't
> - * have to check the device for dax support or flush pagecache.
> - */
> - if (fa->fsx_xflags & FS_XFLAG_DAX) {
> - struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> -
> - if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
> - return -EINVAL;
> - }
> -
> - /* If the DAX state is not changing, we have nothing to do here. */
> - if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> - return 0;
> - if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> - return 0;
> + struct xfs_mount *mp = ip->i_mount;
> + struct inode *inode = VFS_I(ip);
>
> if (S_ISDIR(inode->i_mode))
> - return 0;
> -
> - /* lock, flush and invalidate mapping in preparation for flag change */
> - xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> - error = filemap_write_and_wait(inode->i_mapping);
> - if (error)
> - goto out_unlock;
> - error = invalidate_inode_pages2(inode->i_mapping);
> - if (error)
> - goto out_unlock;
> -
> - *join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
> - return 0;
> + return;
>
> -out_unlock:
> - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> - return error;
> + if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
> + (mp->m_flags & XFS_MOUNT_DAX_NEVER))
> + return;
>
> + if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> + !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> + (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> + (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> + d_mark_dontcache(inode);
> }
>
> /*
> @@ -1310,17 +1272,10 @@ xfs_ioctl_setattr_dax_invalidate(
> * have permission to do so. On success, return a clean transaction and the
> * inode locked exclusively ready for further operation specific checks. On
> * failure, return an error without modifying or locking the inode.
> - *
> - * The inode might already be IO locked on call. If this is the case, it is
> - * indicated in @join_flags and we take full responsibility for ensuring they
> - * are unlocked from now on. Hence if we have an error here, we still have to
> - * unlock them. Otherwise, once they are joined to the transaction, they will
> - * be unlocked on commit/cancel.
> */
> static struct xfs_trans *
> xfs_ioctl_setattr_get_trans(
> - struct xfs_inode *ip,
> - int join_flags)
> + struct xfs_inode *ip)
> {
> struct xfs_mount *mp = ip->i_mount;
> struct xfs_trans *tp;
> @@ -1337,8 +1292,7 @@ xfs_ioctl_setattr_get_trans(
> goto out_unlock;
>
> xfs_ilock(ip, XFS_ILOCK_EXCL);
> - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
> - join_flags = 0;
> + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
>
> /*
> * CAP_FOWNER overrides the following restrictions:
> @@ -1359,8 +1313,6 @@ xfs_ioctl_setattr_get_trans(
> out_cancel:
> xfs_trans_cancel(tp);
> out_unlock:
> - if (join_flags)
> - xfs_iunlock(ip, join_flags);
> return ERR_PTR(error);
> }
>
> @@ -1486,7 +1438,6 @@ xfs_ioctl_setattr(
> struct xfs_dquot *pdqp = NULL;
> struct xfs_dquot *olddquot = NULL;
> int code;
> - int join_flags = 0;
>
> trace_xfs_ioctl_setattr(ip);
>
> @@ -1510,18 +1461,9 @@ xfs_ioctl_setattr(
> return code;
> }
>
> - /*
> - * Changing DAX config may require inode locking for mapping
> - * invalidation. These need to be held all the way to transaction commit
> - * or cancel time, so need to be passed through to
> - * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> - * appropriately.
> - */
> - code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
> - if (code)
> - goto error_free_dquots;
> + xfs_ioctl_setattr_prepare_dax(ip, fa);
>
> - tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> + tp = xfs_ioctl_setattr_get_trans(ip);
> if (IS_ERR(tp)) {
> code = PTR_ERR(tp);
> goto error_free_dquots;
> @@ -1651,7 +1593,6 @@ xfs_ioc_setxflags(
> struct fsxattr fa;
> struct fsxattr old_fa;
> unsigned int flags;
> - int join_flags = 0;
> int error;
>
> if (copy_from_user(&flags, arg, sizeof(flags)))
> @@ -1668,18 +1609,9 @@ xfs_ioc_setxflags(
> if (error)
> return error;
>
> - /*
> - * Changing DAX config may require inode locking for mapping
> - * invalidation. These need to be held all the way to transaction commit
> - * or cancel time, so need to be passed through to
> - * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> - * appropriately.
> - */
> - error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
> - if (error)
> - goto out_drop_write;
> + xfs_ioctl_setattr_prepare_dax(ip, &fa);
>
> - tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> + tp = xfs_ioctl_setattr_get_trans(ip);
> if (IS_ERR(tp)) {
> error = PTR_ERR(tp);
> goto out_drop_write;
> --
> 2.25.1
>

2020-04-28 20:30:09

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V11 04/11] Documentation/dax: Update Usage section

On Mon, Apr 27, 2020 at 05:21:35PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Update the Usage section to reflect the new individual dax selection
> functionality.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from V10:
> Clarifications from Dave
> Add '-c' to xfs_io examples
>
> Changes from V9:
> Fix missing ')'
> Fix trialing '"'
>
> Changes from V8:
> Updates from Darrick
>
> Changes from V7:
> Cleanups/clarifications from Darrick and Dan
>
> Changes from V6:
> Update to allow setting FS_XFLAG_DAX any time.
> Update with list of behaviors from Darrick
> https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
>
> Changes from V5:
> Update to reflect the agreed upon semantics
> https://lore.kernel.org/lkml/[email protected]/
> ---
> Documentation/filesystems/dax.txt | 139 +++++++++++++++++++++++++++++-
> 1 file changed, 136 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 679729442fd2..409e4e83e46a 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -17,11 +17,144 @@ For file mappings, the storage device is mapped directly into userspace.
> Usage
> -----
>
> -If you have a block device which supports DAX, you can make a filesystem
> +If you have a block device which supports DAX, you can make a file system
> on it as usual. The DAX code currently only supports files with a block
> size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> -size when creating the filesystem. When mounting it, use the "-o dax"
> -option on the command line or add 'dax' to the options in /etc/fstab.
> +size when creating the file system.
> +
> +Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them
> +is different.
> +
> +Enabling DAX on ext4 and ext2
> +-----------------------------
> +
> +When mounting the filesystem, use the "-o dax" option on the command line or
> +add 'dax' to the options in /etc/fstab. This works to enable DAX on all files
> +within the filesystem. It is equivalent to the '-o dax=always' behavior below.
> +
> +
> +Enabling DAX on xfs
> +-------------------
> +
> +Summary
> +-------
> +
> + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
> + the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details
> + about this access mode.
> +
> + 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
> + files and directories. This advisory flag can be set or cleared at any
> + time, but doing so does not immediately affect the S_DAX state.
> +
> + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
> + be inherited by all regular files and sub directories that are subsequently

Well, I'm at the level of minor edits: "...and subdirectories that..."

> + created in this directory. Files and subdirectories that exist at the time
> + this flag is set or cleared on the parent directory are not modified by
> + this modification of the parent directory.
> +
> + 4. There exists dax mount options which can override FS_XFLAG_DAX in the
> + setting of the S_DAX flag. Given underlying storage which supports DAX the
> + following hold.

"hold:"

> +
> + "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
> +
> + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
> +
> + "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> +
> + "-o dax" is a legacy option which is an alias for "dax=always".
> + This may be removed in the future so "-o dax=always" is
> + the preferred method for specifying this behavior.
> +
> + NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when
> + the file system is mounted with a dax option.

We can also clear the flag at any time no matter the mount option state.
Perhaps:

"NOTE: Modifications to and inheritance behavior of FS_XFLAG_DAX remain
the same even when the filesystem is mounted with a dax option."

> + However, in-core inode state
> + (S_DAX) will be overridden until the file system is remounted with
> + dax=inode and the inode is evicted from kernel memory.
> +
> + 5. The DAX policy can be changed via:

"The S_DAX policy". I don't want people to get confused.

> +
> + a) Set the parent directory FS_XFLAG_DAX as needed before files are created
> +
> + b) Set the appropriate dax="foo" mount option
> +
> + c) Change the FS_XFLAG_DAX on existing regular files and directories. This
> + has runtime constraints and limitations that are described in 6) below.

"Setting", and "Changing" at the front of these three bullet points?

Were you to put these together as full sentences, you'd want them to
read "The DAX policy can be changed via setting the parent directory
FS_XFLAG_DAX..."

> +
> + 6. When changing the DAX policy via toggling the persistent FS_XFLAG_DAX flag,

"When changing the S_DAX policy..."

> + the change in behaviour for existing regular files may not occur
> + immediately. If the change must take effect immediately, the administrator
> + needs to:
> +
> + a) stop the application so there are no active references to the data set
> + the policy change will affect
> +
> + b) evict the data set from kernel caches so it will be re-instantiated when
> + the application is restarted. This can be acheived by:

"achieved"

> +
> + i. drop-caches
> + ii. a filesystem unmount and mount cycle
> + iii. a system reboot
> +
> +
> +Details
> +-------
> +
> +There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX)
> +and the other is a volatile flag indicating the active state of the feature
> +(S_DAX).
> +
> +FS_XFLAG_DAX is preserved within the file system. This persistent config
> +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> +
> +New files and directories automatically inherit FS_XFLAG_DAX from
> +their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at
> +directory creation time can be used to set a default behavior for an entire
> +sub-tree.
> +
> +To clarify inheritance here are 3 examples:

"...inheritance, here are..."

> +
> +Example A:
> +
> +mkdir -p a/b/c
> +xfs_io -c 'chattr +x' a
> +mkdir a/b/c/d
> +mkdir a/e
> +
> + dax: a,e
> + no dax: b,c,d
> +
> +Example B:
> +
> +mkdir a
> +xfs_io -c 'chattr +x' a
> +mkdir -p a/b/c/d
> +
> + dax: a,b,c,d
> + no dax:
> +
> +Example C:
> +
> +mkdir -p a/b/c
> +xfs_io -c 'chattr +x' c
> +mkdir a/b/c/d
> +
> + dax: c,d
> + no dax: a,b
> +
> +
> +The current enabled state (S_DAX) is set when a file inode is instantiated in
> +memory by the kernel. It is set based on the underlying media support, the
> +value of FS_XFLAG_DAX and the file systems dax mount option setting.

"...and the file system's dax mount option string."

> +
> +statx can be used to query S_DAX. NOTE that a directory will never have S_DAX

"Note that only regular files will ever have S_DAX set..."?

--D

> +set and therefore statx will never indicate that S_DAX is set on directories.
> +
> +Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
> +the underlying media does not support dax and/or the file system is overridden
> +with a mount option.
> +
>
>
> Implementation Tips for Block Driver Writers
> --
> 2.25.1
>

2020-04-28 20:53:47

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V11 04/11] Documentation/dax: Update Usage section

On Tue, Apr 28, 2020 at 01:27:38PM -0700, Darrick J. Wong wrote:
> On Mon, Apr 27, 2020 at 05:21:35PM -0700, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >

[snip]

> > +
> > + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
> > + be inherited by all regular files and sub directories that are subsequently
>
> Well, I'm at the level of minor edits: "...and subdirectories that..."

Done.

>
> > + created in this directory. Files and subdirectories that exist at the time
> > + this flag is set or cleared on the parent directory are not modified by
> > + this modification of the parent directory.
> > +
> > + 4. There exists dax mount options which can override FS_XFLAG_DAX in the
> > + setting of the S_DAX flag. Given underlying storage which supports DAX the
> > + following hold.
>
> "hold:"

Dene.

>
> > +
> > + "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
> > +
> > + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
> > +
> > + "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> > +
> > + "-o dax" is a legacy option which is an alias for "dax=always".
> > + This may be removed in the future so "-o dax=always" is
> > + the preferred method for specifying this behavior.
> > +
> > + NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when
> > + the file system is mounted with a dax option.
>
> We can also clear the flag at any time no matter the mount option state.
> Perhaps:
>
> "NOTE: Modifications to and inheritance behavior of FS_XFLAG_DAX remain
> the same even when the filesystem is mounted with a dax option."

Done.

>
> > + However, in-core inode state
> > + (S_DAX) will be overridden until the file system is remounted with
> > + dax=inode and the inode is evicted from kernel memory.
> > +
> > + 5. The DAX policy can be changed via:
>
> "The S_DAX policy". I don't want people to get confused.

Done.

>
> > +
> > + a) Set the parent directory FS_XFLAG_DAX as needed before files are created
> > +
> > + b) Set the appropriate dax="foo" mount option
> > +
> > + c) Change the FS_XFLAG_DAX on existing regular files and directories. This
> > + has runtime constraints and limitations that are described in 6) below.
>
> "Setting", and "Changing" at the front of these three bullet points?
>
> Were you to put these together as full sentences, you'd want them to
> read "The DAX policy can be changed via setting the parent directory
> FS_XFLAG_DAX..."
>

Done.

> > +
> > + 6. When changing the DAX policy via toggling the persistent FS_XFLAG_DAX flag,
>
> "When changing the S_DAX policy..."

Done.

>
> > + the change in behaviour for existing regular files may not occur
> > + immediately. If the change must take effect immediately, the administrator
> > + needs to:
> > +
> > + a) stop the application so there are no active references to the data set
> > + the policy change will affect
> > +
> > + b) evict the data set from kernel caches so it will be re-instantiated when
> > + the application is restarted. This can be acheived by:
>
> "achieved"

Done.

>
> > +
> > + i. drop-caches
> > + ii. a filesystem unmount and mount cycle
> > + iii. a system reboot
> > +
> > +
> > +Details
> > +-------
> > +
> > +There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX)
> > +and the other is a volatile flag indicating the active state of the feature
> > +(S_DAX).
> > +
> > +FS_XFLAG_DAX is preserved within the file system. This persistent config
> > +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> > +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> > +
> > +New files and directories automatically inherit FS_XFLAG_DAX from
> > +their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at
> > +directory creation time can be used to set a default behavior for an entire
> > +sub-tree.
> > +
> > +To clarify inheritance here are 3 examples:
>
> "...inheritance, here are..."

Done.

>
> > +
> > +Example A:
> > +
> > +mkdir -p a/b/c
> > +xfs_io -c 'chattr +x' a
> > +mkdir a/b/c/d
> > +mkdir a/e
> > +
> > + dax: a,e
> > + no dax: b,c,d
> > +
> > +Example B:
> > +
> > +mkdir a
> > +xfs_io -c 'chattr +x' a
> > +mkdir -p a/b/c/d
> > +
> > + dax: a,b,c,d
> > + no dax:
> > +
> > +Example C:
> > +
> > +mkdir -p a/b/c
> > +xfs_io -c 'chattr +x' c
> > +mkdir a/b/c/d
> > +
> > + dax: c,d
> > + no dax: a,b
> > +
> > +
> > +The current enabled state (S_DAX) is set when a file inode is instantiated in
> > +memory by the kernel. It is set based on the underlying media support, the
> > +value of FS_XFLAG_DAX and the file systems dax mount option setting.
>
> "...and the file system's dax mount option string."

No. I don't think string is the right word here. It is the setting not the
string which is controlling the behavior. How about:

"...and the file system's dax mount option."

>
> > +
> > +statx can be used to query S_DAX. NOTE that a directory will never have S_DAX
>
> "Note that only regular files will ever have S_DAX set..."?

Done.

Ira

>
> --D
>
> > +set and therefore statx will never indicate that S_DAX is set on directories.
> > +
> > +Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
> > +the underlying media does not support dax and/or the file system is overridden
> > +with a mount option.
> > +
> >
> >
> > Implementation Tips for Block Driver Writers
> > --
> > 2.25.1
> >

2020-04-28 21:13:32

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V11 04/11] Documentation/dax: Update Usage section

On Tue, Apr 28, 2020 at 01:53:10PM -0700, Ira Weiny wrote:
> On Tue, Apr 28, 2020 at 01:27:38PM -0700, Darrick J. Wong wrote:
> > On Mon, Apr 27, 2020 at 05:21:35PM -0700, [email protected] wrote:
> > > From: Ira Weiny <[email protected]>
> > >
>
> [snip]
>
> > > +
> > > + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
> > > + be inherited by all regular files and sub directories that are subsequently
> >
> > Well, I'm at the level of minor edits: "...and subdirectories that..."
>
> Done.
>
> >
> > > + created in this directory. Files and subdirectories that exist at the time
> > > + this flag is set or cleared on the parent directory are not modified by
> > > + this modification of the parent directory.
> > > +
> > > + 4. There exists dax mount options which can override FS_XFLAG_DAX in the
> > > + setting of the S_DAX flag. Given underlying storage which supports DAX the
> > > + following hold.
> >
> > "hold:"
>
> Dene.
>
> >
> > > +
> > > + "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
> > > +
> > > + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
> > > +
> > > + "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> > > +
> > > + "-o dax" is a legacy option which is an alias for "dax=always".
> > > + This may be removed in the future so "-o dax=always" is
> > > + the preferred method for specifying this behavior.
> > > +
> > > + NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when
> > > + the file system is mounted with a dax option.
> >
> > We can also clear the flag at any time no matter the mount option state.
> > Perhaps:
> >
> > "NOTE: Modifications to and inheritance behavior of FS_XFLAG_DAX remain
> > the same even when the filesystem is mounted with a dax option."
>
> Done.
>
> >
> > > + However, in-core inode state
> > > + (S_DAX) will be overridden until the file system is remounted with
> > > + dax=inode and the inode is evicted from kernel memory.
> > > +
> > > + 5. The DAX policy can be changed via:
> >
> > "The S_DAX policy". I don't want people to get confused.
>
> Done.
>
> >
> > > +
> > > + a) Set the parent directory FS_XFLAG_DAX as needed before files are created
> > > +
> > > + b) Set the appropriate dax="foo" mount option
> > > +
> > > + c) Change the FS_XFLAG_DAX on existing regular files and directories. This
> > > + has runtime constraints and limitations that are described in 6) below.
> >
> > "Setting", and "Changing" at the front of these three bullet points?
> >
> > Were you to put these together as full sentences, you'd want them to
> > read "The DAX policy can be changed via setting the parent directory
> > FS_XFLAG_DAX..."
> >
>
> Done.
>
> > > +
> > > + 6. When changing the DAX policy via toggling the persistent FS_XFLAG_DAX flag,
> >
> > "When changing the S_DAX policy..."
>
> Done.
>
> >
> > > + the change in behaviour for existing regular files may not occur
> > > + immediately. If the change must take effect immediately, the administrator
> > > + needs to:
> > > +
> > > + a) stop the application so there are no active references to the data set
> > > + the policy change will affect
> > > +
> > > + b) evict the data set from kernel caches so it will be re-instantiated when
> > > + the application is restarted. This can be acheived by:
> >
> > "achieved"
>
> Done.
>
> >
> > > +
> > > + i. drop-caches
> > > + ii. a filesystem unmount and mount cycle
> > > + iii. a system reboot
> > > +
> > > +
> > > +Details
> > > +-------
> > > +
> > > +There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX)
> > > +and the other is a volatile flag indicating the active state of the feature
> > > +(S_DAX).
> > > +
> > > +FS_XFLAG_DAX is preserved within the file system. This persistent config
> > > +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> > > +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> > > +
> > > +New files and directories automatically inherit FS_XFLAG_DAX from
> > > +their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at
> > > +directory creation time can be used to set a default behavior for an entire
> > > +sub-tree.
> > > +
> > > +To clarify inheritance here are 3 examples:
> >
> > "...inheritance, here are..."
>
> Done.
>
> >
> > > +
> > > +Example A:
> > > +
> > > +mkdir -p a/b/c
> > > +xfs_io -c 'chattr +x' a
> > > +mkdir a/b/c/d
> > > +mkdir a/e
> > > +
> > > + dax: a,e
> > > + no dax: b,c,d
> > > +
> > > +Example B:
> > > +
> > > +mkdir a
> > > +xfs_io -c 'chattr +x' a
> > > +mkdir -p a/b/c/d
> > > +
> > > + dax: a,b,c,d
> > > + no dax:
> > > +
> > > +Example C:
> > > +
> > > +mkdir -p a/b/c
> > > +xfs_io -c 'chattr +x' c
> > > +mkdir a/b/c/d
> > > +
> > > + dax: c,d
> > > + no dax: a,b
> > > +
> > > +
> > > +The current enabled state (S_DAX) is set when a file inode is instantiated in
> > > +memory by the kernel. It is set based on the underlying media support, the
> > > +value of FS_XFLAG_DAX and the file systems dax mount option setting.
> >
> > "...and the file system's dax mount option string."
>
> No. I don't think string is the right word here. It is the setting not the
> string which is controlling the behavior. How about:
>
> "...and the file system's dax mount option."

Oops. I miscopied that; all I really wanted was the apostrophe-s after
"file system".

"...and the file system's dax mount option setting."

would have been fine.

--D

>
> >
> > > +
> > > +statx can be used to query S_DAX. NOTE that a directory will never have S_DAX
> >
> > "Note that only regular files will ever have S_DAX set..."?
>
> Done.
>
> Ira
>
> >
> > --D
> >
> > > +set and therefore statx will never indicate that S_DAX is set on directories.
> > > +
> > > +Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
> > > +the underlying media does not support dax and/or the file system is overridden
> > > +with a mount option.
> > > +
> > >
> > >
> > > Implementation Tips for Block Driver Writers
> > > --
> > > 2.25.1
> > >

2020-04-28 22:22:24

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11.1] Documentation/dax: Update Usage section

From: Ira Weiny <[email protected]>

Update the Usage section to reflect the new individual dax selection
functionality.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V11:
Minor changes from Darrick

Changes from V10:
Clarifications from Dave
Add '-c' to xfs_io examples

Changes from V9:
Fix missing ')'
Fix trialing '"'

Changes from V8:
Updates from Darrick

Changes from V7:
Cleanups/clarifications from Darrick and Dan

Changes from V6:
Update to allow setting FS_XFLAG_DAX any time.
Update with list of behaviors from Darrick
https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/

Changes from V5:
Update to reflect the agreed upon semantics
https://lore.kernel.org/lkml/[email protected]/
---
Documentation/filesystems/dax.txt | 142 +++++++++++++++++++++++++++++-
1 file changed, 139 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..dc1c1aa36cc2 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -17,11 +17,147 @@ For file mappings, the storage device is mapped directly into userspace.
Usage
-----

-If you have a block device which supports DAX, you can make a filesystem
+If you have a block device which supports DAX, you can make a file system
on it as usual. The DAX code currently only supports files with a block
size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem. When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the file system.
+
+Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them
+is different.
+
+Enabling DAX on ext4 and ext2
+-----------------------------
+
+When mounting the filesystem, use the "-o dax" option on the command line or
+add 'dax' to the options in /etc/fstab. This works to enable DAX on all files
+within the filesystem. It is equivalent to the '-o dax=always' behavior below.
+
+
+Enabling DAX on xfs
+-------------------
+
+Summary
+-------
+
+ 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
+ the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details
+ about this access mode.
+
+ 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
+ files and directories. This advisory flag can be set or cleared at any
+ time, but doing so does not immediately affect the S_DAX state.
+
+ 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
+ be inherited by all regular files and subdirectories that are subsequently
+ created in this directory. Files and subdirectories that exist at the time
+ this flag is set or cleared on the parent directory are not modified by
+ this modification of the parent directory.
+
+ 4. There exists dax mount options which can override FS_XFLAG_DAX in the
+ setting of the S_DAX flag. Given underlying storage which supports DAX the
+ following hold:
+
+ "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
+
+ "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
+
+ "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
+
+ "-o dax" is a legacy option which is an alias for "dax=always".
+ This may be removed in the future so "-o dax=always" is
+ the preferred method for specifying this behavior.
+
+ NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain
+ the same even when the file system is mounted with a dax option. However,
+ in-core inode state (S_DAX) will be overridden until the file system is
+ remounted with dax=inode and the inode is evicted from kernel memory.
+
+ 5. The S_DAX policy can be changed via:
+
+ a) Setting the parent directory FS_XFLAG_DAX as needed before files are
+ created
+
+ b) Setting the appropriate dax="foo" mount option
+
+ c) Changing the FS_XFLAG_DAX on existing regular files and directories.
+ This has runtime constraints and limitations that are described in 6)
+ below.
+
+ 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
+ the change in behaviour for existing regular files may not occur
+ immediately. If the change must take effect immediately, the administrator
+ needs to:
+
+ a) stop the application so there are no active references to the data set
+ the policy change will affect
+
+ b) evict the data set from kernel caches so it will be re-instantiated when
+ the application is restarted. This can be achieved by:
+
+ i. drop-caches
+ ii. a filesystem unmount and mount cycle
+ iii. a system reboot
+
+
+Details
+-------
+
+There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX)
+and the other is a volatile flag indicating the active state of the feature
+(S_DAX).
+
+FS_XFLAG_DAX is preserved within the file system. This persistent config
+setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
+(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
+
+New files and directories automatically inherit FS_XFLAG_DAX from
+their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at
+directory creation time can be used to set a default behavior for an entire
+sub-tree.
+
+To clarify inheritance, here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+ dax: a,e
+ no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io -c 'chattr +x' a
+mkdir -p a/b/c/d
+
+ dax: a,b,c,d
+ no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' c
+mkdir a/b/c/d
+
+ dax: c,d
+ no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is instantiated in
+memory by the kernel. It is set based on the underlying media support, the
+value of FS_XFLAG_DAX and the file system's dax mount option.
+
+statx can be used to query S_DAX. NOTE that only regular files will ever have
+S_DAX set and therefore statx will never indicate that S_DAX is set on
+directories.
+
+Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
+the underlying media does not support dax and/or the file system is overridden
+with a mount option.
+


Implementation Tips for Block Driver Writers
--
2.25.1

2020-04-29 02:22:07

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH V11.1] Documentation/dax: Update Usage section

On 4/28/20 3:21 PM, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Update the Usage section to reflect the new individual dax selection
> functionality.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from V11:
> Minor changes from Darrick
>
> Changes from V10:
> Clarifications from Dave
> Add '-c' to xfs_io examples
>
> Changes from V9:
> Fix missing ')'
> Fix trialing '"'

trailing

>
> Changes from V8:
> Updates from Darrick
>
> Changes from V7:
> Cleanups/clarifications from Darrick and Dan
>
> Changes from V6:
> Update to allow setting FS_XFLAG_DAX any time.
> Update with list of behaviors from Darrick
> https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
>
> Changes from V5:
> Update to reflect the agreed upon semantics
> https://lore.kernel.org/lkml/[email protected]/
> ---
> Documentation/filesystems/dax.txt | 142 +++++++++++++++++++++++++++++-
> 1 file changed, 139 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 679729442fd2..dc1c1aa36cc2 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -17,11 +17,147 @@ For file mappings, the storage device is mapped directly into userspace.
> Usage
> -----
>
> -If you have a block device which supports DAX, you can make a filesystem
> +If you have a block device which supports DAX, you can make a file system
> on it as usual. The DAX code currently only supports files with a block
> size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> -size when creating the filesystem. When mounting it, use the "-o dax"
> -option on the command line or add 'dax' to the options in /etc/fstab.
> +size when creating the file system.
> +
> +Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them

Why "file system" in the first paragraph when "filesystem" is used here and below?

> +is different.
> +
> +Enabling DAX on ext4 and ext2
> +-----------------------------
> +
> +When mounting the filesystem, use the "-o dax" option on the command line or
> +add 'dax' to the options in /etc/fstab. This works to enable DAX on all files
> +within the filesystem. It is equivalent to the '-o dax=always' behavior below.
> +
> +
> +Enabling DAX on xfs
> +-------------------
> +
> +Summary
> +-------
> +
> + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
> + the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details
> + about this access mode.
> +
> + 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
> + files and directories. This advisory flag can be set or cleared at any
> + time, but doing so does not immediately affect the S_DAX state.
> +
> + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
> + be inherited by all regular files and subdirectories that are subsequently
> + created in this directory. Files and subdirectories that exist at the time
> + this flag is set or cleared on the parent directory are not modified by
> + this modification of the parent directory.
> +
> + 4. There exists dax mount options which can override FS_XFLAG_DAX in the

exist

> + setting of the S_DAX flag. Given underlying storage which supports DAX the
> + following hold:
> +
> + "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
> +
> + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
> +
> + "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> +
> + "-o dax" is a legacy option which is an alias for "dax=always".
> + This may be removed in the future so "-o dax=always" is
> + the preferred method for specifying this behavior.
> +
> + NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain
> + the same even when the file system is mounted with a dax option. However,
> + in-core inode state (S_DAX) will be overridden until the file system is

"file system" (2 times above)

> + remounted with dax=inode and the inode is evicted from kernel memory.
> +
> + 5. The S_DAX policy can be changed via:
> +
> + a) Setting the parent directory FS_XFLAG_DAX as needed before files are
> + created
> +
> + b) Setting the appropriate dax="foo" mount option
> +
> + c) Changing the FS_XFLAG_DAX on existing regular files and directories.

FS_XFLAGS_DAX flag on

> + This has runtime constraints and limitations that are described in 6)
> + below.
> +
> + 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
> + the change in behaviour for existing regular files may not occur
> + immediately. If the change must take effect immediately, the administrator
> + needs to:
> +
> + a) stop the application so there are no active references to the data set
> + the policy change will affect
> +
> + b) evict the data set from kernel caches so it will be re-instantiated when
> + the application is restarted. This can be achieved by:
> +
> + i. drop-caches
> + ii. a filesystem unmount and mount cycle

filesystem

> + iii. a system reboot
> +
> +
> +Details
> +-------
> +
> +There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX)
> +and the other is a volatile flag indicating the active state of the feature
> +(S_DAX).
> +
> +FS_XFLAG_DAX is preserved within the file system. This persistent config

file system

> +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> +
> +New files and directories automatically inherit FS_XFLAG_DAX from
> +their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at
> +directory creation time can be used to set a default behavior for an entire
> +sub-tree.
> +
> +To clarify inheritance, here are 3 examples:
> +
> +Example A:
> +
> +mkdir -p a/b/c
> +xfs_io -c 'chattr +x' a
> +mkdir a/b/c/d
> +mkdir a/e
> +
> + dax: a,e
> + no dax: b,c,d
> +
> +Example B:
> +
> +mkdir a
> +xfs_io -c 'chattr +x' a
> +mkdir -p a/b/c/d
> +
> + dax: a,b,c,d
> + no dax:
> +
> +Example C:
> +
> +mkdir -p a/b/c
> +xfs_io -c 'chattr +x' c
> +mkdir a/b/c/d
> +
> + dax: c,d
> + no dax: a,b
> +
> +
> +The current enabled state (S_DAX) is set when a file inode is instantiated in
> +memory by the kernel. It is set based on the underlying media support, the
> +value of FS_XFLAG_DAX and the file system's dax mount option.
> +
> +statx can be used to query S_DAX. NOTE that only regular files will ever have
> +S_DAX set and therefore statx will never indicate that S_DAX is set on
> +directories.
> +
> +Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if

the FS_XFLAG_DAX flag

> +the underlying media does not support dax and/or the file system is overridden

file system

Just be consistent, please.

> +with a mount option.
> +
>
>
> Implementation Tips for Block Driver Writers
>

thanks.
--
~Randy

2020-04-29 04:30:53

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V11.1] Documentation/dax: Update Usage section

On Tue, Apr 28, 2020 at 07:21:18PM -0700, Randy Dunlap wrote:
> On 4/28/20 3:21 PM, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > Update the Usage section to reflect the new individual dax selection
> > functionality.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
> > Changes from V11:
> > Minor changes from Darrick
> >
> > Changes from V10:
> > Clarifications from Dave
> > Add '-c' to xfs_io examples
> >
> > Changes from V9:
> > Fix missing ')'
> > Fix trialing '"'
>
> trailing

Thanks

>
> >
> > Changes from V8:
> > Updates from Darrick
> >
> > Changes from V7:
> > Cleanups/clarifications from Darrick and Dan
> >
> > Changes from V6:
> > Update to allow setting FS_XFLAG_DAX any time.
> > Update with list of behaviors from Darrick
> > https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> >
> > Changes from V5:
> > Update to reflect the agreed upon semantics
> > https://lore.kernel.org/lkml/[email protected]/
> > ---
> > Documentation/filesystems/dax.txt | 142 +++++++++++++++++++++++++++++-
> > 1 file changed, 139 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > index 679729442fd2..dc1c1aa36cc2 100644
> > --- a/Documentation/filesystems/dax.txt
> > +++ b/Documentation/filesystems/dax.txt
> > @@ -17,11 +17,147 @@ For file mappings, the storage device is mapped directly into userspace.
> > Usage
> > -----
> >
> > -If you have a block device which supports DAX, you can make a filesystem
> > +If you have a block device which supports DAX, you can make a file system
> > on it as usual. The DAX code currently only supports files with a block
> > size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > -size when creating the filesystem. When mounting it, use the "-o dax"
> > -option on the command line or add 'dax' to the options in /etc/fstab.
> > +size when creating the file system.
> > +
> > +Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them
>
> Why "file system" in the first paragraph when "filesystem" is used here and below?

AFAIK they are interchangeable. https://en.wikipedia.org/wiki/File_system

But consistency is a reasonable request.

The rest of the doc (save 1 place) uses filesystem so I will adopt that here,
and clean up that 1 place.

>
> > +is different.
> > +
> > +Enabling DAX on ext4 and ext2
> > +-----------------------------
> > +
> > +When mounting the filesystem, use the "-o dax" option on the command line or
> > +add 'dax' to the options in /etc/fstab. This works to enable DAX on all files
> > +within the filesystem. It is equivalent to the '-o dax=always' behavior below.
> > +
> > +
> > +Enabling DAX on xfs
> > +-------------------
> > +
> > +Summary
> > +-------
> > +
> > + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
> > + the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details
> > + about this access mode.
> > +
> > + 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
> > + files and directories. This advisory flag can be set or cleared at any
> > + time, but doing so does not immediately affect the S_DAX state.
> > +
> > + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
> > + be inherited by all regular files and subdirectories that are subsequently
> > + created in this directory. Files and subdirectories that exist at the time
> > + this flag is set or cleared on the parent directory are not modified by
> > + this modification of the parent directory.
> > +
> > + 4. There exists dax mount options which can override FS_XFLAG_DAX in the
>
> exist

Ah yea,
Done.

>
> > + setting of the S_DAX flag. Given underlying storage which supports DAX the
> > + following hold:
> > +
> > + "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
> > +
> > + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
> > +
> > + "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> > +
> > + "-o dax" is a legacy option which is an alias for "dax=always".
> > + This may be removed in the future so "-o dax=always" is
> > + the preferred method for specifying this behavior.
> > +
> > + NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain
> > + the same even when the file system is mounted with a dax option. However,
> > + in-core inode state (S_DAX) will be overridden until the file system is
>
> "file system" (2 times above)

Done.

>
> > + remounted with dax=inode and the inode is evicted from kernel memory.
> > +
> > + 5. The S_DAX policy can be changed via:
> > +
> > + a) Setting the parent directory FS_XFLAG_DAX as needed before files are
> > + created
> > +
> > + b) Setting the appropriate dax="foo" mount option
> > +
> > + c) Changing the FS_XFLAG_DAX on existing regular files and directories.
>
> FS_XFLAGS_DAX flag on

Done.

>
> > + This has runtime constraints and limitations that are described in 6)
> > + below.
> > +
> > + 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
> > + the change in behaviour for existing regular files may not occur
> > + immediately. If the change must take effect immediately, the administrator
> > + needs to:
> > +
> > + a) stop the application so there are no active references to the data set
> > + the policy change will affect
> > +
> > + b) evict the data set from kernel caches so it will be re-instantiated when
> > + the application is restarted. This can be achieved by:
> > +
> > + i. drop-caches
> > + ii. a filesystem unmount and mount cycle
>
> filesystem

Done.

>
> > + iii. a system reboot
> > +
> > +
> > +Details
> > +-------
> > +
> > +There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX)
> > +and the other is a volatile flag indicating the active state of the feature
> > +(S_DAX).
> > +
> > +FS_XFLAG_DAX is preserved within the file system. This persistent config
>
> file system

Done.

>
> > +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> > +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> > +
> > +New files and directories automatically inherit FS_XFLAG_DAX from
> > +their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at
> > +directory creation time can be used to set a default behavior for an entire
> > +sub-tree.
> > +
> > +To clarify inheritance, here are 3 examples:
> > +
> > +Example A:
> > +
> > +mkdir -p a/b/c
> > +xfs_io -c 'chattr +x' a
> > +mkdir a/b/c/d
> > +mkdir a/e
> > +
> > + dax: a,e
> > + no dax: b,c,d
> > +
> > +Example B:
> > +
> > +mkdir a
> > +xfs_io -c 'chattr +x' a
> > +mkdir -p a/b/c/d
> > +
> > + dax: a,b,c,d
> > + no dax:
> > +
> > +Example C:
> > +
> > +mkdir -p a/b/c
> > +xfs_io -c 'chattr +x' c
> > +mkdir a/b/c/d
> > +
> > + dax: c,d
> > + no dax: a,b
> > +
> > +
> > +The current enabled state (S_DAX) is set when a file inode is instantiated in
> > +memory by the kernel. It is set based on the underlying media support, the
> > +value of FS_XFLAG_DAX and the file system's dax mount option.
> > +
> > +statx can be used to query S_DAX. NOTE that only regular files will ever have
> > +S_DAX set and therefore statx will never indicate that S_DAX is set on
> > +directories.
> > +
> > +Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
>
> the FS_XFLAG_DAX flag

Done.

>
> > +the underlying media does not support dax and/or the file system is overridden
>
> file system
>
> Just be consistent, please.

Fair enough,
Done.

>
> > +with a mount option.
> > +
> >
> >
> > Implementation Tips for Block Driver Writers
> >
>
> thanks.

NP, Thank you!
Ira

> --
> ~Randy
>

2020-04-29 04:33:59

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V11.2] Documentation/dax: Update Usage section

From: Ira Weiny <[email protected]>

Update the Usage section to reflect the new individual dax selection
functionality.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V11.1:
Make filesystem/file system consistently filesystem
grammatical fixes

Changes from V11:
Minor changes from Darrick

Changes from V10:
Clarifications from Dave
Add '-c' to xfs_io examples

Changes from V9:
Fix missing ')'
Fix trailing '"' (fixed this!)

Changes from V8:
Updates from Darrick

Changes from V7:
Cleanups/clarifications from Darrick and Dan

Changes from V6:
Update to allow setting FS_XFLAG_DAX any time.
Update with list of behaviors from Darrick
https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/

Changes from V5:
Update to reflect the agreed upon semantics
https://lore.kernel.org/lkml/[email protected]/
---
Documentation/filesystems/dax.txt | 142 +++++++++++++++++++++++++++++-
1 file changed, 139 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..735fb4b54117 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -20,8 +20,144 @@ Usage
If you have a block device which supports DAX, you can make a filesystem
on it as usual. The DAX code currently only supports files with a block
size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem. When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the filesystem.
+
+Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them
+is different.
+
+Enabling DAX on ext4 and ext2
+-----------------------------
+
+When mounting the filesystem, use the "-o dax" option on the command line or
+add 'dax' to the options in /etc/fstab. This works to enable DAX on all files
+within the filesystem. It is equivalent to the '-o dax=always' behavior below.
+
+
+Enabling DAX on xfs
+-------------------
+
+Summary
+-------
+
+ 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
+ the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details
+ about this access mode.
+
+ 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
+ files and directories. This advisory flag can be set or cleared at any
+ time, but doing so does not immediately affect the S_DAX state.
+
+ 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
+ be inherited by all regular files and subdirectories that are subsequently
+ created in this directory. Files and subdirectories that exist at the time
+ this flag is set or cleared on the parent directory are not modified by
+ this modification of the parent directory.
+
+ 4. There exist dax mount options which can override FS_XFLAG_DAX in the
+ setting of the S_DAX flag. Given underlying storage which supports DAX the
+ following hold:
+
+ "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
+
+ "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
+
+ "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
+
+ "-o dax" is a legacy option which is an alias for "dax=always".
+ This may be removed in the future so "-o dax=always" is
+ the preferred method for specifying this behavior.
+
+ NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain
+ the same even when the filesystem is mounted with a dax option. However,
+ in-core inode state (S_DAX) will be overridden until the filesystem is
+ remounted with dax=inode and the inode is evicted from kernel memory.
+
+ 5. The S_DAX policy can be changed via:
+
+ a) Setting the parent directory FS_XFLAG_DAX as needed before files are
+ created
+
+ b) Setting the appropriate dax="foo" mount option
+
+ c) Changing the FS_XFLAG_DAX flag on existing regular files and
+ directories. This has runtime constraints and limitations that are
+ described in 6) below.
+
+ 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
+ the change in behaviour for existing regular files may not occur
+ immediately. If the change must take effect immediately, the administrator
+ needs to:
+
+ a) stop the application so there are no active references to the data set
+ the policy change will affect
+
+ b) evict the data set from kernel caches so it will be re-instantiated when
+ the application is restarted. This can be achieved by:
+
+ i. drop-caches
+ ii. a filesystem unmount and mount cycle
+ iii. a system reboot
+
+
+Details
+-------
+
+There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX)
+and the other is a volatile flag indicating the active state of the feature
+(S_DAX).
+
+FS_XFLAG_DAX is preserved within the filesystem. This persistent config
+setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
+(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
+
+New files and directories automatically inherit FS_XFLAG_DAX from
+their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at
+directory creation time can be used to set a default behavior for an entire
+sub-tree.
+
+To clarify inheritance, here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+ dax: a,e
+ no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io -c 'chattr +x' a
+mkdir -p a/b/c/d
+
+ dax: a,b,c,d
+ no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' c
+mkdir a/b/c/d
+
+ dax: c,d
+ no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is instantiated in
+memory by the kernel. It is set based on the underlying media support, the
+value of FS_XFLAG_DAX and the filesystem's dax mount option.
+
+statx can be used to query S_DAX. NOTE that only regular files will ever have
+S_DAX set and therefore statx will never indicate that S_DAX is set on
+directories.
+
+Setting the FS_XFLAG_DAX flag (specifically or through inheritance) occurs even
+if the underlying media does not support dax and/or the filesystem is
+overridden with a mount option.
+


Implementation Tips for Block Driver Writers
@@ -94,7 +230,7 @@ sysadmins have an option to restore the lost data from a prior backup/inbuilt
redundancy in the following ways:

1. Delete the affected file, and restore from a backup (sysadmin route):
- This will free the file system blocks that were being used by the file,
+ This will free the filesystem blocks that were being used by the file,
and the next time they're allocated, they will be zeroed first, which
happens through the driver, and will clear bad sectors.

--
2.25.1

2020-04-29 05:49:07

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH V11.2] Documentation/dax: Update Usage section

On 4/28/20 9:33 PM, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Update the Usage section to reflect the new individual dax selection
> functionality.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---

Acked-by: Randy Dunlap <[email protected]>

Thanks.

> ---
> Documentation/filesystems/dax.txt | 142 +++++++++++++++++++++++++++++-
> 1 file changed, 139 insertions(+), 3 deletions(-)


--
~Randy

2020-06-02 17:24:49

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

On Tue, Apr 28, 2020 at 01:11:38PM -0700, Darrick J. Wong wrote:
> On Mon, Apr 27, 2020 at 05:21:42PM -0700, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
> > setting of S_DAX, data invalidation no longer needs to happen when
> > FS_XFLAG_DAX is changed.
> >
> > Change xfs_ioctl_setattr_dax_invalidate() to be
> > xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
> > functionality.
> >
> > Furthermore, we no longer need the locking so we remove the join_flags
> > logic.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
>
> Looks ok,
> Reviewed-by: Darrick J. Wong <[email protected]>
>
> --D
>
> >
> > ---
> > Changes from V10:
> > adjust for renamed d_mark_dontcache() function
> >
> > Changes from V9:
> > Change name of function to xfs_ioctl_setattr_prepare_dax()
> >
> > Changes from V8:
> > Change name of function to xfs_ioctl_dax_check_set_cache()
> > Update commit message
> > Fix bit manipulations
> >
> > Changes from V7:
> > Use new flag_inode_dontcache()
> > Skip don't cache if mount over ride is active.
> >
> > Changes from v6:
> > Fix completely broken implementation and update commit message.
> > Use the new VFS layer I_DONTCACHE to facilitate inode eviction
> > and S_DAX changing on drop_caches
> >
> > Changes from v5:
> > New patch
> > ---
> > fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
> > 1 file changed, 20 insertions(+), 88 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 104495ac187c..ff474f2c9acf 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
> > return 0;
> > }
> >
> > -/*
> > - * If we are changing DAX flags, we have to ensure the file is clean and any
> > - * cached objects in the address space are invalidated and removed. This
> > - * requires us to lock out other IO and page faults similar to a truncate
> > - * operation. The locks need to be held until the transaction has been committed
> > - * so that the cache invalidation is atomic with respect to the DAX flag
> > - * manipulation.
> > - */
> > -static int
> > -xfs_ioctl_setattr_dax_invalidate(
> > +static void
> > +xfs_ioctl_setattr_prepare_dax(
> > struct xfs_inode *ip,
> > - struct fsxattr *fa,
> > - int *join_flags)
> > + struct fsxattr *fa)
> > {
> > - struct inode *inode = VFS_I(ip);
> > - struct super_block *sb = inode->i_sb;
> > - int error;
> > -
> > - *join_flags = 0;
> > -
> > - /*
> > - * It is only valid to set the DAX flag on regular files and
> > - * directories on filesystems where the block size is equal to the page
> > - * size. On directories it serves as an inherited hint so we don't
> > - * have to check the device for dax support or flush pagecache.
> > - */
> > - if (fa->fsx_xflags & FS_XFLAG_DAX) {
> > - struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> > -
> > - if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
> > - return -EINVAL;
> > - }
> > -
> > - /* If the DAX state is not changing, we have nothing to do here. */
> > - if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> > - return 0;
> > - if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> > - return 0;
> > + struct xfs_mount *mp = ip->i_mount;
> > + struct inode *inode = VFS_I(ip);
> >
> > if (S_ISDIR(inode->i_mode))
> > - return 0;
> > -
> > - /* lock, flush and invalidate mapping in preparation for flag change */
> > - xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > - error = filemap_write_and_wait(inode->i_mapping);
> > - if (error)
> > - goto out_unlock;
> > - error = invalidate_inode_pages2(inode->i_mapping);
> > - if (error)
> > - goto out_unlock;
> > -
> > - *join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
> > - return 0;
> > + return;
> >
> > -out_unlock:
> > - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > - return error;
> > + if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
> > + (mp->m_flags & XFS_MOUNT_DAX_NEVER))
> > + return;
> >
> > + if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> > + !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> > + (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> > + (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> > + d_mark_dontcache(inode);

Now that I think about this further, are we /really/ sure that we want
to let unprivileged userspace cause inode evictions?

--D

> > }
> >
> > /*
> > @@ -1310,17 +1272,10 @@ xfs_ioctl_setattr_dax_invalidate(
> > * have permission to do so. On success, return a clean transaction and the
> > * inode locked exclusively ready for further operation specific checks. On
> > * failure, return an error without modifying or locking the inode.
> > - *
> > - * The inode might already be IO locked on call. If this is the case, it is
> > - * indicated in @join_flags and we take full responsibility for ensuring they
> > - * are unlocked from now on. Hence if we have an error here, we still have to
> > - * unlock them. Otherwise, once they are joined to the transaction, they will
> > - * be unlocked on commit/cancel.
> > */
> > static struct xfs_trans *
> > xfs_ioctl_setattr_get_trans(
> > - struct xfs_inode *ip,
> > - int join_flags)
> > + struct xfs_inode *ip)
> > {
> > struct xfs_mount *mp = ip->i_mount;
> > struct xfs_trans *tp;
> > @@ -1337,8 +1292,7 @@ xfs_ioctl_setattr_get_trans(
> > goto out_unlock;
> >
> > xfs_ilock(ip, XFS_ILOCK_EXCL);
> > - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
> > - join_flags = 0;
> > + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> >
> > /*
> > * CAP_FOWNER overrides the following restrictions:
> > @@ -1359,8 +1313,6 @@ xfs_ioctl_setattr_get_trans(
> > out_cancel:
> > xfs_trans_cancel(tp);
> > out_unlock:
> > - if (join_flags)
> > - xfs_iunlock(ip, join_flags);
> > return ERR_PTR(error);
> > }
> >
> > @@ -1486,7 +1438,6 @@ xfs_ioctl_setattr(
> > struct xfs_dquot *pdqp = NULL;
> > struct xfs_dquot *olddquot = NULL;
> > int code;
> > - int join_flags = 0;
> >
> > trace_xfs_ioctl_setattr(ip);
> >
> > @@ -1510,18 +1461,9 @@ xfs_ioctl_setattr(
> > return code;
> > }
> >
> > - /*
> > - * Changing DAX config may require inode locking for mapping
> > - * invalidation. These need to be held all the way to transaction commit
> > - * or cancel time, so need to be passed through to
> > - * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> > - * appropriately.
> > - */
> > - code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
> > - if (code)
> > - goto error_free_dquots;
> > + xfs_ioctl_setattr_prepare_dax(ip, fa);
> >
> > - tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> > + tp = xfs_ioctl_setattr_get_trans(ip);
> > if (IS_ERR(tp)) {
> > code = PTR_ERR(tp);
> > goto error_free_dquots;
> > @@ -1651,7 +1593,6 @@ xfs_ioc_setxflags(
> > struct fsxattr fa;
> > struct fsxattr old_fa;
> > unsigned int flags;
> > - int join_flags = 0;
> > int error;
> >
> > if (copy_from_user(&flags, arg, sizeof(flags)))
> > @@ -1668,18 +1609,9 @@ xfs_ioc_setxflags(
> > if (error)
> > return error;
> >
> > - /*
> > - * Changing DAX config may require inode locking for mapping
> > - * invalidation. These need to be held all the way to transaction commit
> > - * or cancel time, so need to be passed through to
> > - * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> > - * appropriately.
> > - */
> > - error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
> > - if (error)
> > - goto out_drop_write;
> > + xfs_ioctl_setattr_prepare_dax(ip, &fa);
> >
> > - tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> > + tp = xfs_ioctl_setattr_get_trans(ip);
> > if (IS_ERR(tp)) {
> > error = PTR_ERR(tp);
> > goto out_drop_write;
> > --
> > 2.25.1
> >

2020-06-02 23:17:11

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

On Tue, Jun 02, 2020 at 10:23:53AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 28, 2020 at 01:11:38PM -0700, Darrick J. Wong wrote:
> > On Mon, Apr 27, 2020 at 05:21:42PM -0700, [email protected] wrote:
> > > From: Ira Weiny <[email protected]>
> > >

...

> > > -out_unlock:
> > > - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > > - return error;
> > > + if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
> > > + (mp->m_flags & XFS_MOUNT_DAX_NEVER))
> > > + return;
> > >
> > > + if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> > > + !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> > > + (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> > > + (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> > > + d_mark_dontcache(inode);
>
> Now that I think about this further, are we /really/ sure that we want
> to let unprivileged userspace cause inode evictions?

This code only applies to files they have access to. And it does not directly
cause an eviction. It only hints that those inodes (for which they have access
to) will not be cached thus causing them to be reloaded sooner than they might
otherwise be.

So I think we are fine here.

Ira

>
> --D
>

2020-06-03 17:06:23

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

On Wed, Jun 03, 2020 at 12:10:24PM +0200, Jan Kara wrote:
> On Tue 02-06-20 10:23:53, Darrick J. Wong wrote:
> > On Tue, Apr 28, 2020 at 01:11:38PM -0700, Darrick J. Wong wrote:
> > > > -out_unlock:
> > > > - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > > > - return error;
> > > > + if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
> > > > + (mp->m_flags & XFS_MOUNT_DAX_NEVER))
> > > > + return;
> > > >
> > > > + if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> > > > + !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> > > > + (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> > > > + (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> > > > + d_mark_dontcache(inode);
> >
> > Now that I think about this further, are we /really/ sure that we want
> > to let unprivileged userspace cause inode evictions?
>
> You have to have an equivalent of write access to the file to be able to
> trigger d_mark_dontcache(). So you can e.g. delete it. Or you could
> fadvise / madvise regarding its page cache. I don't see the ability to push
> inode out of cache as stronger than the abilities you already have...

<nod> Ok. I just had one last bout of paranoia, but I think it'll be
fine. :)

--D

> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR