2020-04-21 19:18:52

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V9 00/11] XFS - Enable per-file/per-directory DAX operations V9

From: Ira Weiny <[email protected]>

Changes from V8:
Rebase to 5.7-rc2
Change ALWAYS/NEVER bits to be 26/27
Remove XFS_IDONTCACHE -> lift to I_DONTCACHE
use mark_inode_dontcache() in XFS
create xfs_dax_mode enum
use xfs signature styles
Change xfs_inode_enabe_dax() -> xfs_inode_should_enable()
Based on feedback to ext4 series
Fix locking of DCACHE_DONTCACHE
Change flag_inode_dontcache() -> mark_inode_dontcache()
Change xfs_ioctl_setattr_dax_invalidate() -> xfs_ioctl_dax_check_set_cache()
Documentation cleanups
Clean up all commit messages


At LSF/MM'19 [1] [2] we discussed applications that overestimate memory
consumption due to their inability to detect whether the kernel will
instantiate page cache for a file, and cases where a global dax enable via a
mount option is too coarse.

The following patch series enables the use of DAX on individual files and/or
directories on xfs, and lays some groundwork to do so in ext4. It further
enhances the dax mount option to be a tri-state of 'always', 'never', or
'iflag' (default). Furthermore, it maintians '-o dax' to be equivalent to '-o
dax=always'.

The insight at LSF/MM was to separate the per-mount or per-file "physical"
(FS_XFLAG_DAX) capability switch from an "effective" (S_DAX) attribute for the
file.

At LSF/MM we discussed the difficulties of switching the DAX state of a file
with active mappings / page cache. It was thought the races could be avoided
by limiting DAX state flips to 0-length files.

However, this turns out to not be true.[3][5] This is because address space
operations (a_ops) may be in use at any time the inode is referenced.

For this reason direct manipulation of the FS_XFLAG_DAX is allowed but the
operation of the file (S_DAX) is not immediately changed.

Details of when and how DAX state can be changed on a file is included in a
documentation patch.

It should be noted that FS_XFLAG_DAX inheritance is not shown in this patch set
as it was maintained from previous work on XFS. FS_XFLAG_DAX and it's
inheritance will need to be added to other file systems for user control.


[1] https://lwn.net/Articles/787973/
[2] https://lwn.net/Articles/787233/
[3] https://lkml.org/lkml/2019/10/20/96
[4] https://patchwork.kernel.org/patch/11310511/
[5] https://lore.kernel.org/lkml/[email protected]/


To: [email protected]
Cc: Al Viro <[email protected]>
Cc: "Darrick J. Wong" <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "Theodore Y. Ts'o" <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Changes from V7:
Add DCACHE_DONTCACHE
If mount override don't worry about inode caching
Change mount flags to NEVER/ALWAYS
Clean up xfs_inode_enable_dax()
Clarify comments
Update documentation

Changes from V6:
Incorporate feedback on patches
Add ability to change FS_XFLAG_DAX on files at any time.
Add a don't cache flag to the VFS layer
Preserve internal XFS IDONTCACHE behavior for bulkstat
operations

Changes from V5:
* make dax mount option a tri-state
* Reject changes to FS_XFLAG_DAX for regular files
- Allow only on directories
* Update documentation

Changes from V4:
* Open code the aops lock rather than add it to the xfs_ilock()
subsystem (Darrick's comments were obsoleted by this change)
* Fix lkp build suggestions and bugs

Changes from V3:
* Remove global locking... :-D
* put back per inode locking and remove pre-mature optimizations
* Fix issues with Directories having IS_DAX() set
* Fix kernel crash issues reported by Jeff
* Add some clean up patches
* Consolidate diflags to iflags functions
* Update/add documentation
* Reorder/rename patches quite a bit

Changes from V2:

* Move i_dax_sem to be a global percpu_rw_sem rather than per inode
Internal discussions with Dan determined this would be easier,
just as performant, and slightly less overhead that having it
in the SB as suggested by Jan
* Fix locking order in comments and throughout code
* Change "mode" to "state" throughout commits
* Add CONFIG_FS_DAX wrapper to disable inode_[un]lock_state() when not
configured
* Add static branch for which is activated by a device which supports
DAX in XFS
* Change "lock/unlock" to up/down read/write as appropriate
Previous names were over simplified
* Update comments/documentation

* Remove the xfs specific lock to the vfs (global) layer.
* Fix i_dax_sem locking order and comments

* Move 'i_mapped' count from struct inode to struct address_space and
rename it to mmap_count
* Add inode_has_mappings() call

* Fix build issues
* Clean up syntax spacing and minor issues
* Update man page text for STATX_ATTR_DAX
* Add reviewed-by's
* Rebase to 5.6

Rename patch:
from: fs/xfs: Add lock/unlock state to xfs
to: fs/xfs: Add write DAX lock to xfs layer
Add patch:
fs/xfs: Clarify lockdep dependency for xfs_isilocked()
Drop patch:
fs/xfs: Fix truncate up


Ira Weiny (11):
fs/xfs: Remove unnecessary initialization of i_rwsem
fs: Remove unneeded IS_DAX() check in io_is_direct()
fs/stat: Define DAX statx attribute
fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
fs/xfs: Make DAX mount option a tri-state
fs/xfs: Create function xfs_inode_should_enable_dax()
fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
fs: Lift XFS_IDONTCACNE to the VFS layer
fs: Introduce DCACHE_DONTCACHE
fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
Documentation/dax: Update Usage section

Documentation/filesystems/dax.txt | 164 +++++++++++++++++++++++++++++-
drivers/block/loop.c | 6 +-
fs/dcache.c | 4 +
fs/inode.c | 15 +++
fs/stat.c | 3 +
fs/xfs/xfs_icache.c | 8 +-
fs/xfs/xfs_inode.h | 3 +-
fs/xfs/xfs_ioctl.c | 141 ++++---------------------
fs/xfs/xfs_iops.c | 72 ++++++++-----
fs/xfs/xfs_mount.h | 4 +-
fs/xfs/xfs_super.c | 55 ++++++++--
include/linux/dcache.h | 2 +
include/linux/fs.h | 14 +--
include/uapi/linux/stat.h | 1 +
14 files changed, 321 insertions(+), 171 deletions(-)

--
2.25.1


2020-04-21 19:19:32

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V9 11/11] Documentation/dax: Update Usage section

From: Ira Weiny <[email protected]>

Update the Usage section to reflect the new individual dax selection
functionality.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V8:
Updates from Darrick

Changes from V7:
Cleanups/clarifications from Darrick and Dan

Changes from V6:
Update to allow setting FS_XFLAG_DAX any time.
Update with list of behaviors from Darrick
https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/

Changes from V5:
Update to reflect the agreed upon semantics
https://lore.kernel.org/lkml/[email protected]/
---
Documentation/filesystems/dax.txt | 164 +++++++++++++++++++++++++++++-
1 file changed, 161 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..8f4ab08be715 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -17,11 +17,169 @@ For file mappings, the storage device is mapped directly into userspace.
Usage
-----

-If you have a block device which supports DAX, you can make a filesystem
+If you have a block device which supports DAX, you can make a file system
on it as usual. The DAX code currently only supports files with a block
size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem. When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the file system.
+
+Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them
+is different.
+
+Enabling DAX on ext4 and ext2
+-----------------------------
+
+When mounting the filesystem, use the "-o dax" option on the command line or
+add 'dax' to the options in /etc/fstab. This works to enable DAX on all files
+within the filesystem. It is equivalent to the '-o dax=always' behavior below.
+
+
+Enabling DAX on xfs
+-------------------
+
+Summary
+-------
+
+ 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
+ the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details
+ about this access mode.
+
+ 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
+ inherited from the parent directory FS_XFLAG_DAX inode flag at file
+ creation time. This advisory flag can be set or cleared at any
+ time, but doing so does not immediately affect the S_DAX state.
+
+ Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
+ and the fs is on pmem then it will enable S_DAX at inode load time;
+ if FS_XFLAG_DAX is not set, it will not enable S_DAX.
+
+ 3. There exists a dax= mount option.
+
+ "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
+
+ "-o dax=always" means "always set S_DAX (at least on pmem),
+ and ignore FS_XFLAG_DAX."
+
+ "-o dax" is an alias for "dax=always".
+
+ "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
+
+ 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
+ be set or cleared at any time. The flag state is inherited by any files or
+ subdirectories when they are created within that directory.
+
+ 5. Programs that require a specific file access mode (DAX or not DAX)
+ can do one of the following:
+
+ (a) Create files in directories that the FS_XFLAG_DAX flag set as
+ needed; or
+
+ (b) Have the administrator set an override via mount option; or
+
+ (c) Set or clear the file's FS_XFLAG_DAX flag as needed. Programs
+ must then cause the kernel to evict the inode from memory. This
+ can be done by:
+
+ i> Closing the file and re-opening the file and using statx to
+ see if the fs has changed the S_DAX flag; and
+
+ ii> If the file still does not have the desired S_DAX access
+ mode, either unmount and remount the filesystem, or close
+ the file and use drop_caches.
+
+ 6. It is expected that users who want to squeeze every last bit of performance
+ out of the particular rough and tumble bits of their storage will also be
+ exposed to the difficulties of what happens when the operating system can't
+ totally virtualize those hardware capabilities. DAX is such a feature.
+
+
+Details
+-------
+
+There are 2 per-file dax flags. One is a physical inode setting (FS_XFLAG_DAX)
+and the other a currently enabled state (S_DAX).
+
+FS_XFLAG_DAX is maintained, on disk, on individual inodes. It is preserved
+within the file system. This 'physical' config setting can be set using an
+ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'". Files and
+directories automatically inherit FS_XFLAG_DAX from their parent directory
+_when_ _created_. Therefore, setting FS_XFLAG_DAX at directory creation time
+can be used to set a default behavior for an entire sub-tree. (Doing so on the
+root directory acts to set a default for the entire file system.)
+
+To clarify inheritance here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+ dax: a,e
+ no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io 'chattr +x' a
+mkdir -p a/b/c/d
+
+ dax: a,b,c,d
+ no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io 'chattr +x' c
+mkdir a/b/c/d
+
+ dax: c,d
+ no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
+the underlying media support, the value of FS_XFLAG_DAX, and the file systems
+dax mount option setting. See below.
+
+statx can be used to query S_DAX. NOTE that a directory will never have S_DAX
+set and therefore statx will never indicate that S_DAX is set on directories.
+
+NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
+even if the underlying media does not support dax and/or the file system is
+overridden with a mount option.
+
+
+Overriding FS_XFLAG_DAX (dax= mount option)
+-------------------------------------------
+
+There exists a dax mount option. Using the mount option does not change the
+physical configured state of individual files but overrides the S_DAX operating
+state when inodes are loaded.
+
+Given underlying media support, the dax mount option is a tri-state option
+(never, always, inode) with the following meanings:
+
+ "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
+ "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
+ "-o dax" by itself means "dax=always" to remain compatible with older
+ kernels
+ "-o dax=inode" means "follow FS_XFLAG_DAX"
+
+The default state is 'inode'. Given underlying media support, the following
+algorithm is used to determine the effective mode of the file S_DAX on a
+capable device.
+
+ S_DAX = FS_XFLAG_DAX;
+
+ if (dax_mount == "always")
+ S_DAX = true;
+ else if (dax_mount == "off"
+ S_DAX = false;
+
+To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
+while the file system is mounted with a dax override. However, in-core inode
+state (S_DAX) will continue to be overridden until the filesystem is remounted
+with dax=inode and the inode is evicted."


Implementation Tips for Block Driver Writers
--
2.25.1

2020-04-21 19:19:34

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V9 10/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

From: Ira Weiny <[email protected]>

Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
setting of S_DAX, data invalidation no longer needs to happen when
FS_XFLAG_DAX is changed.

Change xfs_ioctl_setattr_dax_invalidate() to be
xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
functionality.

Furthermore, we no longer need the locking so we remove the join_flags
logic.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V8:
Change name of function to xfs_ioctl_dax_check_set_cache()
Update commit message
Fix bit manipulations

Changes from V7:
Use new flag_inode_dontcache()
Skip don't cache if mount over ride is active.

Changes from v6:
Fix completely broken implementation and update commit message.
Use the new VFS layer I_DONTCACHE to facilitate inode eviction
and S_DAX changing on drop_caches

Changes from v5:
New patch
---
fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
1 file changed, 20 insertions(+), 88 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 104495ac187c..b87b571a6748 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
return 0;
}

-/*
- * If we are changing DAX flags, we have to ensure the file is clean and any
- * cached objects in the address space are invalidated and removed. This
- * requires us to lock out other IO and page faults similar to a truncate
- * operation. The locks need to be held until the transaction has been committed
- * so that the cache invalidation is atomic with respect to the DAX flag
- * manipulation.
- */
-static int
-xfs_ioctl_setattr_dax_invalidate(
+static void
+xfs_ioctl_dax_check_set_cache(
struct xfs_inode *ip,
- struct fsxattr *fa,
- int *join_flags)
+ struct fsxattr *fa)
{
- struct inode *inode = VFS_I(ip);
- struct super_block *sb = inode->i_sb;
- int error;
-
- *join_flags = 0;
-
- /*
- * It is only valid to set the DAX flag on regular files and
- * directories on filesystems where the block size is equal to the page
- * size. On directories it serves as an inherited hint so we don't
- * have to check the device for dax support or flush pagecache.
- */
- if (fa->fsx_xflags & FS_XFLAG_DAX) {
- struct xfs_buftarg *target = xfs_inode_buftarg(ip);
-
- if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
- return -EINVAL;
- }
-
- /* If the DAX state is not changing, we have nothing to do here. */
- if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
- return 0;
- if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
- return 0;
+ struct xfs_mount *mp = ip->i_mount;
+ struct inode *inode = VFS_I(ip);

if (S_ISDIR(inode->i_mode))
- return 0;
-
- /* lock, flush and invalidate mapping in preparation for flag change */
- xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
- error = filemap_write_and_wait(inode->i_mapping);
- if (error)
- goto out_unlock;
- error = invalidate_inode_pages2(inode->i_mapping);
- if (error)
- goto out_unlock;
-
- *join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
- return 0;
+ return;

-out_unlock:
- xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
- return error;
+ if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
+ (mp->m_flags & XFS_MOUNT_DAX_NEVER))
+ return;

+ if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
+ !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
+ (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
+ (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
+ mark_inode_dontcache(inode);
}

/*
@@ -1310,17 +1272,10 @@ xfs_ioctl_setattr_dax_invalidate(
* have permission to do so. On success, return a clean transaction and the
* inode locked exclusively ready for further operation specific checks. On
* failure, return an error without modifying or locking the inode.
- *
- * The inode might already be IO locked on call. If this is the case, it is
- * indicated in @join_flags and we take full responsibility for ensuring they
- * are unlocked from now on. Hence if we have an error here, we still have to
- * unlock them. Otherwise, once they are joined to the transaction, they will
- * be unlocked on commit/cancel.
*/
static struct xfs_trans *
xfs_ioctl_setattr_get_trans(
- struct xfs_inode *ip,
- int join_flags)
+ struct xfs_inode *ip)
{
struct xfs_mount *mp = ip->i_mount;
struct xfs_trans *tp;
@@ -1337,8 +1292,7 @@ xfs_ioctl_setattr_get_trans(
goto out_unlock;

xfs_ilock(ip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
- join_flags = 0;
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);

/*
* CAP_FOWNER overrides the following restrictions:
@@ -1359,8 +1313,6 @@ xfs_ioctl_setattr_get_trans(
out_cancel:
xfs_trans_cancel(tp);
out_unlock:
- if (join_flags)
- xfs_iunlock(ip, join_flags);
return ERR_PTR(error);
}

@@ -1486,7 +1438,6 @@ xfs_ioctl_setattr(
struct xfs_dquot *pdqp = NULL;
struct xfs_dquot *olddquot = NULL;
int code;
- int join_flags = 0;

trace_xfs_ioctl_setattr(ip);

@@ -1510,18 +1461,9 @@ xfs_ioctl_setattr(
return code;
}

- /*
- * Changing DAX config may require inode locking for mapping
- * invalidation. These need to be held all the way to transaction commit
- * or cancel time, so need to be passed through to
- * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
- * appropriately.
- */
- code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
- if (code)
- goto error_free_dquots;
+ xfs_ioctl_dax_check_set_cache(ip, fa);

- tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+ tp = xfs_ioctl_setattr_get_trans(ip);
if (IS_ERR(tp)) {
code = PTR_ERR(tp);
goto error_free_dquots;
@@ -1651,7 +1593,6 @@ xfs_ioc_setxflags(
struct fsxattr fa;
struct fsxattr old_fa;
unsigned int flags;
- int join_flags = 0;
int error;

if (copy_from_user(&flags, arg, sizeof(flags)))
@@ -1668,18 +1609,9 @@ xfs_ioc_setxflags(
if (error)
return error;

- /*
- * Changing DAX config may require inode locking for mapping
- * invalidation. These need to be held all the way to transaction commit
- * or cancel time, so need to be passed through to
- * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
- * appropriately.
- */
- error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
- if (error)
- goto out_drop_write;
+ xfs_ioctl_dax_check_set_cache(ip, &fa);

- tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+ tp = xfs_ioctl_setattr_get_trans(ip);
if (IS_ERR(tp)) {
error = PTR_ERR(tp);
goto out_drop_write;
--
2.25.1

2020-04-21 19:19:39

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V9 09/11] fs: Introduce DCACHE_DONTCACHE

From: Ira Weiny <[email protected]>

DCACHE_DONTCACHE indicates a dentry should not be cached on final
dput().

Also add a helper function to mark DCACHE_DONTCACHE on all dentries
pointing to a specific inode when that inode is being set I_DONTCACHE.

This facilitates dropping dentry references to inodes sooner which
require eviction to swap S_DAX mode.

Cc: Al Viro <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V8:
Update commit message
Use mark_inode_dontcache in XFS
Fix locking... can't use rcu here.
Change name to mark_inode_dontcache
---
fs/dcache.c | 4 ++++
fs/inode.c | 15 +++++++++++++++
fs/xfs/xfs_icache.c | 2 +-
include/linux/dcache.h | 2 ++
include/linux/fs.h | 1 +
5 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b280e07e162b..0030fabab2c4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
if (dentry->d_op->d_delete(dentry))
return false;
}
+
+ if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
+ return false;
+
/* retain; LRU fodder */
dentry->d_lockref.count--;
if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
diff --git a/fs/inode.c b/fs/inode.c
index 93d9252a00ab..da7f3c4926cd 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1526,6 +1526,21 @@ int generic_delete_inode(struct inode *inode)
}
EXPORT_SYMBOL(generic_delete_inode);

+void mark_inode_dontcache(struct inode *inode)
+{
+ struct dentry *de;
+
+ spin_lock(&inode->i_lock);
+ hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
+ spin_lock(&de->d_lock);
+ de->d_flags |= DCACHE_DONTCACHE;
+ spin_unlock(&de->d_lock);
+ }
+ spin_unlock(&inode->i_lock);
+ inode->i_state |= I_DONTCACHE;
+}
+EXPORT_SYMBOL(mark_inode_dontcache);
+
/*
* Called when we're dropping the last reference
* to an inode.
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index de76f7f60695..3c8f44477804 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -559,7 +559,7 @@ xfs_iget_cache_miss(
*/
iflags = XFS_INEW;
if (flags & XFS_IGET_DONTCACHE)
- VFS_I(ip)->i_state |= I_DONTCACHE;
+ mark_inode_dontcache(VFS_I(ip));
ip->i_udquot = NULL;
ip->i_gdquot = NULL;
ip->i_pdquot = NULL;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c1488cc84fd9..56b1482d9223 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -177,6 +177,8 @@ struct dentry_operations {

#define DCACHE_REFERENCED 0x00000040 /* Recently used, don't discard. */

+#define DCACHE_DONTCACHE 0x00000080 /* don't cache on final dput() */
+
#define DCACHE_CANT_MOUNT 0x00000100
#define DCACHE_GENOCIDE 0x00000200
#define DCACHE_SHRINK_LIST 0x00000400
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 44bd45af760f..064168ec2e0b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
return !inode->i_nlink || inode_unhashed(inode) ||
(inode->i_state & I_DONTCACHE);
}
+extern void mark_inode_dontcache(struct inode *inode);

extern struct inode *ilookup5_nowait(struct super_block *sb,
unsigned long hashval, int (*test)(struct inode *, void *),
--
2.25.1

2020-04-21 19:20:14

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V9 07/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()

From: Ira Weiny <[email protected]>

The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
nearly identical. The only difference is that *_to_linux() is called after
inode setup and disallows changing the DAX flag.

Combining them can be done with a flag which indicates if this is the initial
setup to allow the DAX flag to be properly set only at init time.

So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
directly.

While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
use xfs_ip2xflags() to ensure future diflags are included correctly.

Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V7:
Clarify with a comment the reason for leaving S_DAX out of the
mask

Changes from V6:
Move unrelated hunk to previous patch.
Change logic for better code generation.

Changes from V5:
The functions are no longer identical so we can only combine
them rather than deleting one completely. This is reflected in
the new init parameter.
---
fs/xfs/xfs_inode.h | 1 +
fs/xfs/xfs_ioctl.c | 33 +--------------------------------
fs/xfs/xfs_iops.c | 46 +++++++++++++++++++++++++++-------------------
3 files changed, 29 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index c6a63f6764a6..83073c883fbf 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -467,6 +467,7 @@ int xfs_break_layouts(struct inode *inode, uint *iolock,
/* from xfs_iops.c */
extern void xfs_setup_inode(struct xfs_inode *ip);
extern void xfs_setup_iops(struct xfs_inode *ip);
+extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);

/*
* When setting up a newly allocated inode, we need to call
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 309958186d33..104495ac187c 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1201,37 +1201,6 @@ xfs_flags2diflags2(
return di_flags2;
}

-STATIC void
-xfs_diflags_to_linux(
- struct xfs_inode *ip)
-{
- struct inode *inode = VFS_I(ip);
- unsigned int xflags = xfs_ip2xflags(ip);
-
- if (xflags & FS_XFLAG_IMMUTABLE)
- inode->i_flags |= S_IMMUTABLE;
- else
- inode->i_flags &= ~S_IMMUTABLE;
- if (xflags & FS_XFLAG_APPEND)
- inode->i_flags |= S_APPEND;
- else
- inode->i_flags &= ~S_APPEND;
- if (xflags & FS_XFLAG_SYNC)
- inode->i_flags |= S_SYNC;
- else
- inode->i_flags &= ~S_SYNC;
- if (xflags & FS_XFLAG_NOATIME)
- inode->i_flags |= S_NOATIME;
- else
- inode->i_flags &= ~S_NOATIME;
-#if 0 /* disabled until the flag switching races are sorted out */
- if (xflags & FS_XFLAG_DAX)
- inode->i_flags |= S_DAX;
- else
- inode->i_flags &= ~S_DAX;
-#endif
-}
-
static int
xfs_ioctl_setattr_xflags(
struct xfs_trans *tp,
@@ -1269,7 +1238,7 @@ xfs_ioctl_setattr_xflags(
ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
ip->i_d.di_flags2 = di_flags2;

- xfs_diflags_to_linux(ip);
+ xfs_diflags_to_iflags(ip, false);
xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
XFS_STATS_INC(mp, xs_ig_attrchg);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1814f10e43d3..b70b735fe4a4 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1276,26 +1276,34 @@ xfs_inode_should_enable_dax(
return false;
}

-STATIC void
+void
xfs_diflags_to_iflags(
- struct inode *inode,
- struct xfs_inode *ip)
+ struct xfs_inode *ip,
+ bool init)
{
- uint16_t flags = ip->i_d.di_flags;
-
- inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
- S_NOATIME | S_DAX);
-
- if (flags & XFS_DIFLAG_IMMUTABLE)
- inode->i_flags |= S_IMMUTABLE;
- if (flags & XFS_DIFLAG_APPEND)
- inode->i_flags |= S_APPEND;
- if (flags & XFS_DIFLAG_SYNC)
- inode->i_flags |= S_SYNC;
- if (flags & XFS_DIFLAG_NOATIME)
- inode->i_flags |= S_NOATIME;
- if (xfs_inode_should_enable_dax(ip))
- inode->i_flags |= S_DAX;
+ struct inode *inode = VFS_I(ip);
+ unsigned int xflags = xfs_ip2xflags(ip);
+ unsigned int flags = 0;
+
+ ASSERT(!(IS_DAX(inode) && init));
+
+ if (xflags & FS_XFLAG_IMMUTABLE)
+ flags |= S_IMMUTABLE;
+ if (xflags & FS_XFLAG_APPEND)
+ flags |= S_APPEND;
+ if (xflags & FS_XFLAG_SYNC)
+ flags |= S_SYNC;
+ if (xflags & FS_XFLAG_NOATIME)
+ flags |= S_NOATIME;
+ if (init && xfs_inode_should_enable_dax(ip))
+ flags |= S_DAX;
+
+ /*
+ * S_DAX can only be set during inode initialization and is never set by
+ * the VFS, so we cannot mask off S_DAX in i_flags.
+ */
+ inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);
+ inode->i_flags |= flags;
}

/*
@@ -1321,7 +1329,7 @@ xfs_setup_inode(
inode_fake_hash(inode);

i_size_write(inode, ip->i_d.di_size);
- xfs_diflags_to_iflags(inode, ip);
+ xfs_diflags_to_iflags(ip, true);

if (S_ISDIR(inode->i_mode)) {
/*
--
2.25.1

2020-04-21 19:20:20

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V9 05/11] fs/xfs: Make DAX mount option a tri-state

From: Ira Weiny <[email protected]>

As agreed upon[1]. We make the dax mount option a tri-state. '-o dax'
continues to operate the same. We add 'always', 'never', and 'inode'
(default).

[1] https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from v8:
Move NEVER bit to 27
Use xfs signature style
use xfs_dax_mode enum

Changes from v7:
Change to XFS_MOUNT_DAX_NEVER

Changes from v6:
Use 2 flag bits rather than a field.
change iflag to inode

Changes from v5:
New Patch
---
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_super.c | 51 ++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index f6123fb0113c..37bfb50db809 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -238,6 +238,7 @@ typedef struct xfs_mount {
allocator */
#define XFS_MOUNT_NOATTR2 (1ULL << 25) /* disable use of attr2 format */
#define XFS_MOUNT_DAX_ALWAYS (1ULL << 26)
+#define XFS_MOUNT_DAX_NEVER (1ULL << 27)

/*
* Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index ce169d1c7474..0d0f74786799 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -47,6 +47,39 @@ static struct kset *xfs_kset; /* top-level xfs sysfs dir */
static struct xfs_kobj xfs_dbg_kobj; /* global debug sysfs attrs */
#endif

+enum xfs_dax_mode {
+ XFS_DAX_INODE = 0,
+ XFS_DAX_ALWAYS = 1,
+ XFS_DAX_NEVER = 2,
+};
+
+static void
+xfs_mount_set_dax_mode(
+ struct xfs_mount *mp,
+ enum xfs_dax_mode mode)
+{
+ switch (mode) {
+ case XFS_DAX_INODE:
+ mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER);
+ break;
+ case XFS_DAX_ALWAYS:
+ mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
+ mp->m_flags &= ~XFS_MOUNT_DAX_NEVER;
+ break;
+ case XFS_DAX_NEVER:
+ mp->m_flags |= XFS_MOUNT_DAX_NEVER;
+ mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
+ break;
+ }
+}
+
+static const struct constant_table dax_param_enums[] = {
+ {"inode", XFS_DAX_INODE },
+ {"always", XFS_DAX_ALWAYS },
+ {"never", XFS_DAX_NEVER },
+ {}
+};
+
/*
* Table driven mount option parser.
*/
@@ -59,7 +92,7 @@ enum {
Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
- Opt_discard, Opt_nodiscard, Opt_dax,
+ Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
};

static const struct fs_parameter_spec xfs_fs_parameters[] = {
@@ -103,6 +136,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
fsparam_flag("discard", Opt_discard),
fsparam_flag("nodiscard", Opt_nodiscard),
fsparam_flag("dax", Opt_dax),
+ fsparam_enum("dax", Opt_dax_enum, dax_param_enums),
{}
};

@@ -129,7 +163,6 @@ xfs_fs_show_options(
{ XFS_MOUNT_GRPID, ",grpid" },
{ XFS_MOUNT_DISCARD, ",discard" },
{ XFS_MOUNT_LARGEIO, ",largeio" },
- { XFS_MOUNT_DAX_ALWAYS, ",dax" },
{ 0, NULL }
};
struct xfs_mount *mp = XFS_M(root->d_sb);
@@ -185,6 +218,13 @@ xfs_fs_show_options(
if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
seq_puts(m, ",noquota");

+ if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
+ seq_puts(m, ",dax=always");
+ else if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
+ seq_puts(m, ",dax=never");
+ else
+ seq_puts(m, ",dax=inode");
+
return 0;
}

@@ -1261,7 +1301,10 @@ xfs_fc_parse_param(
return 0;
#ifdef CONFIG_FS_DAX
case Opt_dax:
- mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
+ xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
+ return 0;
+ case Opt_dax_enum:
+ xfs_mount_set_dax_mode(mp, result.uint_32);
return 0;
#endif
default:
@@ -1468,7 +1511,7 @@ xfs_fc_fill_super(
if (!rtdev_is_dax && !datadev_is_dax) {
xfs_alert(mp,
"DAX unsupported by block device. Turning off DAX.");
- mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
+ xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
}
if (xfs_sb_version_hasreflink(&mp->m_sb)) {
xfs_alert(mp,
--
2.25.1

2020-04-21 20:18:09

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V9 05/11] fs/xfs: Make DAX mount option a tri-state

On Tue, Apr 21, 2020 at 12:17:47PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> As agreed upon[1]. We make the dax mount option a tri-state. '-o dax'
> continues to operate the same. We add 'always', 'never', and 'inode'
> (default).
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from v8:
> Move NEVER bit to 27
> Use xfs signature style
> use xfs_dax_mode enum
>
> Changes from v7:
> Change to XFS_MOUNT_DAX_NEVER
>
> Changes from v6:
> Use 2 flag bits rather than a field.
> change iflag to inode
>
> Changes from v5:
> New Patch
> ---
> fs/xfs/xfs_mount.h | 1 +
> fs/xfs/xfs_super.c | 51 ++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 48 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index f6123fb0113c..37bfb50db809 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -238,6 +238,7 @@ typedef struct xfs_mount {
> allocator */
> #define XFS_MOUNT_NOATTR2 (1ULL << 25) /* disable use of attr2 format */
> #define XFS_MOUNT_DAX_ALWAYS (1ULL << 26)
> +#define XFS_MOUNT_DAX_NEVER (1ULL << 27)
>
> /*
> * Max and min values for mount-option defined I/O
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index ce169d1c7474..0d0f74786799 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -47,6 +47,39 @@ static struct kset *xfs_kset; /* top-level xfs sysfs dir */
> static struct xfs_kobj xfs_dbg_kobj; /* global debug sysfs attrs */
> #endif
>
> +enum xfs_dax_mode {
> + XFS_DAX_INODE = 0,
> + XFS_DAX_ALWAYS = 1,
> + XFS_DAX_NEVER = 2,
> +};
> +
> +static void
> +xfs_mount_set_dax_mode(
> + struct xfs_mount *mp,
> + enum xfs_dax_mode mode)

Indent between the type name and the variable name, please.

static void
xfs_mount_set_dax_mode(
struct xfs_mount *mp,
enum xfs_dax_mode mode)

> +{
> + switch (mode) {
> + case XFS_DAX_INODE:
> + mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER);
> + break;
> + case XFS_DAX_ALWAYS:
> + mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
> + mp->m_flags &= ~XFS_MOUNT_DAX_NEVER;
> + break;
> + case XFS_DAX_NEVER:
> + mp->m_flags |= XFS_MOUNT_DAX_NEVER;
> + mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
> + break;

You can skip one level of indentation, e.g.

switch (mode) {
case XFS_DAX_INODE:
mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER);
break;

> + }
> +}
> +
> +static const struct constant_table dax_param_enums[] = {
> + {"inode", XFS_DAX_INODE },
> + {"always", XFS_DAX_ALWAYS },
> + {"never", XFS_DAX_NEVER },
> + {}
> +};
> +
> /*
> * Table driven mount option parser.
> */
> @@ -59,7 +92,7 @@ enum {
> Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
> Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
> Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
> - Opt_discard, Opt_nodiscard, Opt_dax,
> + Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
> };
>
> static const struct fs_parameter_spec xfs_fs_parameters[] = {
> @@ -103,6 +136,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
> fsparam_flag("discard", Opt_discard),
> fsparam_flag("nodiscard", Opt_nodiscard),
> fsparam_flag("dax", Opt_dax),
> + fsparam_enum("dax", Opt_dax_enum, dax_param_enums),
> {}
> };
>
> @@ -129,7 +163,6 @@ xfs_fs_show_options(
> { XFS_MOUNT_GRPID, ",grpid" },
> { XFS_MOUNT_DISCARD, ",discard" },
> { XFS_MOUNT_LARGEIO, ",largeio" },
> - { XFS_MOUNT_DAX_ALWAYS, ",dax" },
> { 0, NULL }
> };
> struct xfs_mount *mp = XFS_M(root->d_sb);
> @@ -185,6 +218,13 @@ xfs_fs_show_options(
> if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
> seq_puts(m, ",noquota");
>
> + if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
> + seq_puts(m, ",dax=always");
> + else if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
> + seq_puts(m, ",dax=never");
> + else
> + seq_puts(m, ",dax=inode");

No need to report dax=inode, since that's the default.

--D

> +
> return 0;
> }
>
> @@ -1261,7 +1301,10 @@ xfs_fc_parse_param(
> return 0;
> #ifdef CONFIG_FS_DAX
> case Opt_dax:
> - mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
> + xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
> + return 0;
> + case Opt_dax_enum:
> + xfs_mount_set_dax_mode(mp, result.uint_32);
> return 0;
> #endif
> default:
> @@ -1468,7 +1511,7 @@ xfs_fc_fill_super(
> if (!rtdev_is_dax && !datadev_is_dax) {
> xfs_alert(mp,
> "DAX unsupported by block device. Turning off DAX.");
> - mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
> + xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
> }
> if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> xfs_alert(mp,
> --
> 2.25.1
>

2020-04-21 20:26:07

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V9 09/11] fs: Introduce DCACHE_DONTCACHE

On Tue, Apr 21, 2020 at 12:17:51PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> DCACHE_DONTCACHE indicates a dentry should not be cached on final
> dput().
>
> Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> pointing to a specific inode when that inode is being set I_DONTCACHE.
>
> This facilitates dropping dentry references to inodes sooner which
> require eviction to swap S_DAX mode.
>
> Cc: Al Viro <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from V8:
> Update commit message
> Use mark_inode_dontcache in XFS
> Fix locking... can't use rcu here.
> Change name to mark_inode_dontcache
> ---
> fs/dcache.c | 4 ++++
> fs/inode.c | 15 +++++++++++++++
> fs/xfs/xfs_icache.c | 2 +-
> include/linux/dcache.h | 2 ++
> include/linux/fs.h | 1 +
> 5 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b280e07e162b..0030fabab2c4 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
> if (dentry->d_op->d_delete(dentry))
> return false;
> }
> +
> + if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
> + return false;
> +
> /* retain; LRU fodder */
> dentry->d_lockref.count--;
> if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
> diff --git a/fs/inode.c b/fs/inode.c
> index 93d9252a00ab..da7f3c4926cd 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1526,6 +1526,21 @@ int generic_delete_inode(struct inode *inode)
> }
> EXPORT_SYMBOL(generic_delete_inode);
>
> +void mark_inode_dontcache(struct inode *inode)
> +{
> + struct dentry *de;
> +
> + spin_lock(&inode->i_lock);
> + hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> + spin_lock(&de->d_lock);
> + de->d_flags |= DCACHE_DONTCACHE;
> + spin_unlock(&de->d_lock);
> + }
> + spin_unlock(&inode->i_lock);
> + inode->i_state |= I_DONTCACHE;
> +}
> +EXPORT_SYMBOL(mark_inode_dontcache);
> +
> /*
> * Called when we're dropping the last reference
> * to an inode.
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index de76f7f60695..3c8f44477804 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -559,7 +559,7 @@ xfs_iget_cache_miss(
> */
> iflags = XFS_INEW;
> if (flags & XFS_IGET_DONTCACHE)
> - VFS_I(ip)->i_state |= I_DONTCACHE;
> + mark_inode_dontcache(VFS_I(ip));
> ip->i_udquot = NULL;
> ip->i_gdquot = NULL;
> ip->i_pdquot = NULL;
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index c1488cc84fd9..56b1482d9223 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -177,6 +177,8 @@ struct dentry_operations {
>
> #define DCACHE_REFERENCED 0x00000040 /* Recently used, don't discard. */
>
> +#define DCACHE_DONTCACHE 0x00000080 /* don't cache on final dput() */

"Purge from memory on final dput()"?

--D

> +
> #define DCACHE_CANT_MOUNT 0x00000100
> #define DCACHE_GENOCIDE 0x00000200
> #define DCACHE_SHRINK_LIST 0x00000400
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 44bd45af760f..064168ec2e0b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
> return !inode->i_nlink || inode_unhashed(inode) ||
> (inode->i_state & I_DONTCACHE);
> }
> +extern void mark_inode_dontcache(struct inode *inode);
>
> extern struct inode *ilookup5_nowait(struct super_block *sb,
> unsigned long hashval, int (*test)(struct inode *, void *),
> --
> 2.25.1
>

2020-04-21 20:33:46

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V9 10/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

On Tue, Apr 21, 2020 at 12:17:52PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
> setting of S_DAX, data invalidation no longer needs to happen when
> FS_XFLAG_DAX is changed.
>
> Change xfs_ioctl_setattr_dax_invalidate() to be
> xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
> functionality.
>
> Furthermore, we no longer need the locking so we remove the join_flags
> logic.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from V8:
> Change name of function to xfs_ioctl_dax_check_set_cache()
> Update commit message
> Fix bit manipulations
>
> Changes from V7:
> Use new flag_inode_dontcache()
> Skip don't cache if mount over ride is active.
>
> Changes from v6:
> Fix completely broken implementation and update commit message.
> Use the new VFS layer I_DONTCACHE to facilitate inode eviction
> and S_DAX changing on drop_caches
>
> Changes from v5:
> New patch
> ---
> fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
> 1 file changed, 20 insertions(+), 88 deletions(-)
>
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 104495ac187c..b87b571a6748 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
> return 0;
> }
>
> -/*
> - * If we are changing DAX flags, we have to ensure the file is clean and any
> - * cached objects in the address space are invalidated and removed. This
> - * requires us to lock out other IO and page faults similar to a truncate
> - * operation. The locks need to be held until the transaction has been committed
> - * so that the cache invalidation is atomic with respect to the DAX flag
> - * manipulation.
> - */
> -static int
> -xfs_ioctl_setattr_dax_invalidate(
> +static void
> +xfs_ioctl_dax_check_set_cache(

That's a ... strange name. Set cache on what?

Oh, this is the function that sets I_DONTCACHE if an FS_XFLAG_DAX change
could have an immediate effect on S_DAX (assuming no other users). What
do you think of the following?

/*
* If a change to FS_XFLAG_DAX will result in an change to S_DAX
* the next time the incore inode is initialized, set the VFS
* I_DONTCACHE flag to try to hurry that along.
*/
static void
xfs_ioctl_try_change_vfs_dax(...)

--D

> struct xfs_inode *ip,
> - struct fsxattr *fa,
> - int *join_flags)
> + struct fsxattr *fa)
> {
> - struct inode *inode = VFS_I(ip);
> - struct super_block *sb = inode->i_sb;
> - int error;
> -
> - *join_flags = 0;
> -
> - /*
> - * It is only valid to set the DAX flag on regular files and
> - * directories on filesystems where the block size is equal to the page
> - * size. On directories it serves as an inherited hint so we don't
> - * have to check the device for dax support or flush pagecache.
> - */
> - if (fa->fsx_xflags & FS_XFLAG_DAX) {
> - struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> -
> - if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
> - return -EINVAL;
> - }
> -
> - /* If the DAX state is not changing, we have nothing to do here. */
> - if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> - return 0;
> - if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> - return 0;
> + struct xfs_mount *mp = ip->i_mount;
> + struct inode *inode = VFS_I(ip);
>
> if (S_ISDIR(inode->i_mode))
> - return 0;
> -
> - /* lock, flush and invalidate mapping in preparation for flag change */
> - xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> - error = filemap_write_and_wait(inode->i_mapping);
> - if (error)
> - goto out_unlock;
> - error = invalidate_inode_pages2(inode->i_mapping);
> - if (error)
> - goto out_unlock;
> -
> - *join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
> - return 0;
> + return;
>
> -out_unlock:
> - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> - return error;
> + if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
> + (mp->m_flags & XFS_MOUNT_DAX_NEVER))
> + return;
>
> + if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> + !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> + (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> + (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> + mark_inode_dontcache(inode);
> }
>
> /*
> @@ -1310,17 +1272,10 @@ xfs_ioctl_setattr_dax_invalidate(
> * have permission to do so. On success, return a clean transaction and the
> * inode locked exclusively ready for further operation specific checks. On
> * failure, return an error without modifying or locking the inode.
> - *
> - * The inode might already be IO locked on call. If this is the case, it is
> - * indicated in @join_flags and we take full responsibility for ensuring they
> - * are unlocked from now on. Hence if we have an error here, we still have to
> - * unlock them. Otherwise, once they are joined to the transaction, they will
> - * be unlocked on commit/cancel.
> */
> static struct xfs_trans *
> xfs_ioctl_setattr_get_trans(
> - struct xfs_inode *ip,
> - int join_flags)
> + struct xfs_inode *ip)
> {
> struct xfs_mount *mp = ip->i_mount;
> struct xfs_trans *tp;
> @@ -1337,8 +1292,7 @@ xfs_ioctl_setattr_get_trans(
> goto out_unlock;
>
> xfs_ilock(ip, XFS_ILOCK_EXCL);
> - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
> - join_flags = 0;
> + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
>
> /*
> * CAP_FOWNER overrides the following restrictions:
> @@ -1359,8 +1313,6 @@ xfs_ioctl_setattr_get_trans(
> out_cancel:
> xfs_trans_cancel(tp);
> out_unlock:
> - if (join_flags)
> - xfs_iunlock(ip, join_flags);
> return ERR_PTR(error);
> }
>
> @@ -1486,7 +1438,6 @@ xfs_ioctl_setattr(
> struct xfs_dquot *pdqp = NULL;
> struct xfs_dquot *olddquot = NULL;
> int code;
> - int join_flags = 0;
>
> trace_xfs_ioctl_setattr(ip);
>
> @@ -1510,18 +1461,9 @@ xfs_ioctl_setattr(
> return code;
> }
>
> - /*
> - * Changing DAX config may require inode locking for mapping
> - * invalidation. These need to be held all the way to transaction commit
> - * or cancel time, so need to be passed through to
> - * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> - * appropriately.
> - */
> - code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
> - if (code)
> - goto error_free_dquots;
> + xfs_ioctl_dax_check_set_cache(ip, fa);
>
> - tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> + tp = xfs_ioctl_setattr_get_trans(ip);
> if (IS_ERR(tp)) {
> code = PTR_ERR(tp);
> goto error_free_dquots;
> @@ -1651,7 +1593,6 @@ xfs_ioc_setxflags(
> struct fsxattr fa;
> struct fsxattr old_fa;
> unsigned int flags;
> - int join_flags = 0;
> int error;
>
> if (copy_from_user(&flags, arg, sizeof(flags)))
> @@ -1668,18 +1609,9 @@ xfs_ioc_setxflags(
> if (error)
> return error;
>
> - /*
> - * Changing DAX config may require inode locking for mapping
> - * invalidation. These need to be held all the way to transaction commit
> - * or cancel time, so need to be passed through to
> - * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> - * appropriately.
> - */
> - error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
> - if (error)
> - goto out_drop_write;
> + xfs_ioctl_dax_check_set_cache(ip, &fa);
>
> - tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> + tp = xfs_ioctl_setattr_get_trans(ip);
> if (IS_ERR(tp)) {
> error = PTR_ERR(tp);
> goto out_drop_write;
> --
> 2.25.1
>

2020-04-21 20:35:10

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V9 11/11] Documentation/dax: Update Usage section

On Tue, Apr 21, 2020 at 12:17:53PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Update the Usage section to reflect the new individual dax selection
> functionality.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from V8:
> Updates from Darrick
>
> Changes from V7:
> Cleanups/clarifications from Darrick and Dan
>
> Changes from V6:
> Update to allow setting FS_XFLAG_DAX any time.
> Update with list of behaviors from Darrick
> https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
>
> Changes from V5:
> Update to reflect the agreed upon semantics
> https://lore.kernel.org/lkml/[email protected]/
> ---
> Documentation/filesystems/dax.txt | 164 +++++++++++++++++++++++++++++-
> 1 file changed, 161 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 679729442fd2..8f4ab08be715 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -17,11 +17,169 @@ For file mappings, the storage device is mapped directly into userspace.
> Usage
> -----
>
> -If you have a block device which supports DAX, you can make a filesystem
> +If you have a block device which supports DAX, you can make a file system
> on it as usual. The DAX code currently only supports files with a block
> size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> -size when creating the filesystem. When mounting it, use the "-o dax"
> -option on the command line or add 'dax' to the options in /etc/fstab.
> +size when creating the file system.
> +
> +Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them
> +is different.
> +
> +Enabling DAX on ext4 and ext2
> +-----------------------------
> +
> +When mounting the filesystem, use the "-o dax" option on the command line or
> +add 'dax' to the options in /etc/fstab. This works to enable DAX on all files
> +within the filesystem. It is equivalent to the '-o dax=always' behavior below.
> +
> +
> +Enabling DAX on xfs
> +-------------------
> +
> +Summary
> +-------
> +
> + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
> + the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details
> + about this access mode.
> +
> + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> + inherited from the parent directory FS_XFLAG_DAX inode flag at file
> + creation time. This advisory flag can be set or cleared at any
> + time, but doing so does not immediately affect the S_DAX state.
> +
> + Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> + and the fs is on pmem then it will enable S_DAX at inode load time;
> + if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> +
> + 3. There exists a dax= mount option.
> +
> + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
> +
> + "-o dax=always" means "always set S_DAX (at least on pmem),
> + and ignore FS_XFLAG_DAX."
> +
> + "-o dax" is an alias for "dax=always".
> +
> + "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
> +
> + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> + be set or cleared at any time. The flag state is inherited by any files or
> + subdirectories when they are created within that directory.
> +
> + 5. Programs that require a specific file access mode (DAX or not DAX)
> + can do one of the following:
> +
> + (a) Create files in directories that the FS_XFLAG_DAX flag set as
> + needed; or
> +
> + (b) Have the administrator set an override via mount option; or
> +
> + (c) Set or clear the file's FS_XFLAG_DAX flag as needed. Programs
> + must then cause the kernel to evict the inode from memory. This
> + can be done by:
> +
> + i> Closing the file and re-opening the file and using statx to
> + see if the fs has changed the S_DAX flag; and
> +
> + ii> If the file still does not have the desired S_DAX access
> + mode, either unmount and remount the filesystem, or close
> + the file and use drop_caches.
> +
> + 6. It is expected that users who want to squeeze every last bit of performance
> + out of the particular rough and tumble bits of their storage will also be
> + exposed to the difficulties of what happens when the operating system can't
> + totally virtualize those hardware capabilities. DAX is such a feature.
> +
> +
> +Details
> +-------
> +
> +There are 2 per-file dax flags. One is a physical inode setting (FS_XFLAG_DAX)
> +and the other a currently enabled state (S_DAX).
> +
> +FS_XFLAG_DAX is maintained, on disk, on individual inodes. It is preserved
> +within the file system. This 'physical' config setting can be set using an
> +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'". Files and
> +directories automatically inherit FS_XFLAG_DAX from their parent directory
> +_when_ _created_. Therefore, setting FS_XFLAG_DAX at directory creation time
> +can be used to set a default behavior for an entire sub-tree. (Doing so on the
> +root directory acts to set a default for the entire file system.)
> +
> +To clarify inheritance here are 3 examples:
> +
> +Example A:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' a
> +mkdir a/b/c/d
> +mkdir a/e
> +
> + dax: a,e
> + no dax: b,c,d
> +
> +Example B:
> +
> +mkdir a
> +xfs_io 'chattr +x' a
> +mkdir -p a/b/c/d
> +
> + dax: a,b,c,d
> + no dax:
> +
> +Example C:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' c
> +mkdir a/b/c/d
> +
> + dax: c,d
> + no dax: a,b
> +
> +
> +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
> +the underlying media support, the value of FS_XFLAG_DAX, and the file systems
> +dax mount option setting. See below.
> +
> +statx can be used to query S_DAX. NOTE that a directory will never have S_DAX
> +set and therefore statx will never indicate that S_DAX is set on directories.
> +
> +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> +even if the underlying media does not support dax and/or the file system is
> +overridden with a mount option.
> +
> +
> +Overriding FS_XFLAG_DAX (dax= mount option)
> +-------------------------------------------
> +
> +There exists a dax mount option. Using the mount option does not change the
> +physical configured state of individual files but overrides the S_DAX operating
> +state when inodes are loaded.
> +
> +Given underlying media support, the dax mount option is a tri-state option
> +(never, always, inode) with the following meanings:
> +
> + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> + "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> + "-o dax" by itself means "dax=always" to remain compatible with older
> + kernels
> + "-o dax=inode" means "follow FS_XFLAG_DAX"
> +
> +The default state is 'inode'. Given underlying media support, the following
> +algorithm is used to determine the effective mode of the file S_DAX on a
> +capable device.
> +
> + S_DAX = FS_XFLAG_DAX;
> +
> + if (dax_mount == "always")
> + S_DAX = true;
> + else if (dax_mount == "off"

Missing trailing parenthesis here.

> + S_DAX = false;
> +
> +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> +while the file system is mounted with a dax override. However, in-core inode
> +state (S_DAX) will continue to be overridden until the filesystem is remounted
> +with dax=inode and the inode is evicted."

Trailing double-quote here.

Otherwise, this looks good to me.

--D

>
>
> Implementation Tips for Block Driver Writers
> --
> 2.25.1
>

2020-04-21 21:06:47

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V9 05/11] fs/xfs: Make DAX mount option a tri-state

On Tue, Apr 21, 2020 at 01:16:41PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 21, 2020 at 12:17:47PM -0700, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > As agreed upon[1]. We make the dax mount option a tri-state. '-o dax'
> > continues to operate the same. We add 'always', 'never', and 'inode'
> > (default).
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
> > Changes from v8:
> > Move NEVER bit to 27
> > Use xfs signature style
> > use xfs_dax_mode enum
> >
> > Changes from v7:
> > Change to XFS_MOUNT_DAX_NEVER
> >
> > Changes from v6:
> > Use 2 flag bits rather than a field.
> > change iflag to inode
> >
> > Changes from v5:
> > New Patch
> > ---
> > fs/xfs/xfs_mount.h | 1 +
> > fs/xfs/xfs_super.c | 51 ++++++++++++++++++++++++++++++++++++++++++----
> > 2 files changed, 48 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index f6123fb0113c..37bfb50db809 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -238,6 +238,7 @@ typedef struct xfs_mount {
> > allocator */
> > #define XFS_MOUNT_NOATTR2 (1ULL << 25) /* disable use of attr2 format */
> > #define XFS_MOUNT_DAX_ALWAYS (1ULL << 26)
> > +#define XFS_MOUNT_DAX_NEVER (1ULL << 27)
> >
> > /*
> > * Max and min values for mount-option defined I/O
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index ce169d1c7474..0d0f74786799 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -47,6 +47,39 @@ static struct kset *xfs_kset; /* top-level xfs sysfs dir */
> > static struct xfs_kobj xfs_dbg_kobj; /* global debug sysfs attrs */
> > #endif
> >
> > +enum xfs_dax_mode {
> > + XFS_DAX_INODE = 0,
> > + XFS_DAX_ALWAYS = 1,
> > + XFS_DAX_NEVER = 2,
> > +};
> > +
> > +static void
> > +xfs_mount_set_dax_mode(
> > + struct xfs_mount *mp,
> > + enum xfs_dax_mode mode)
>
> Indent between the type name and the variable name, please.
>
> static void
> xfs_mount_set_dax_mode(
> struct xfs_mount *mp,
> enum xfs_dax_mode mode)

Done.

>
> > +{
> > + switch (mode) {
> > + case XFS_DAX_INODE:
> > + mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER);
> > + break;
> > + case XFS_DAX_ALWAYS:
> > + mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
> > + mp->m_flags &= ~XFS_MOUNT_DAX_NEVER;
> > + break;
> > + case XFS_DAX_NEVER:
> > + mp->m_flags |= XFS_MOUNT_DAX_NEVER;
> > + mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
> > + break;
>
> You can skip one level of indentation, e.g.
>
> switch (mode) {
> case XFS_DAX_INODE:
> mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER);
> break;

done.

>
> > + }
> > +}
> > +
> > +static const struct constant_table dax_param_enums[] = {
> > + {"inode", XFS_DAX_INODE },
> > + {"always", XFS_DAX_ALWAYS },
> > + {"never", XFS_DAX_NEVER },
> > + {}
> > +};
> > +
> > /*
> > * Table driven mount option parser.
> > */
> > @@ -59,7 +92,7 @@ enum {
> > Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
> > Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
> > Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
> > - Opt_discard, Opt_nodiscard, Opt_dax,
> > + Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
> > };
> >
> > static const struct fs_parameter_spec xfs_fs_parameters[] = {
> > @@ -103,6 +136,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
> > fsparam_flag("discard", Opt_discard),
> > fsparam_flag("nodiscard", Opt_nodiscard),
> > fsparam_flag("dax", Opt_dax),
> > + fsparam_enum("dax", Opt_dax_enum, dax_param_enums),
> > {}
> > };
> >
> > @@ -129,7 +163,6 @@ xfs_fs_show_options(
> > { XFS_MOUNT_GRPID, ",grpid" },
> > { XFS_MOUNT_DISCARD, ",discard" },
> > { XFS_MOUNT_LARGEIO, ",largeio" },
> > - { XFS_MOUNT_DAX_ALWAYS, ",dax" },
> > { 0, NULL }
> > };
> > struct xfs_mount *mp = XFS_M(root->d_sb);
> > @@ -185,6 +218,13 @@ xfs_fs_show_options(
> > if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
> > seq_puts(m, ",noquota");
> >
> > + if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
> > + seq_puts(m, ",dax=always");
> > + else if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
> > + seq_puts(m, ",dax=never");
> > + else
> > + seq_puts(m, ",dax=inode");
>
> No need to report dax=inode, since that's the default.

done.

thanks,
Ira

>
> --D
>
> > +
> > return 0;
> > }
> >
> > @@ -1261,7 +1301,10 @@ xfs_fc_parse_param(
> > return 0;
> > #ifdef CONFIG_FS_DAX
> > case Opt_dax:
> > - mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
> > + xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
> > + return 0;
> > + case Opt_dax_enum:
> > + xfs_mount_set_dax_mode(mp, result.uint_32);
> > return 0;
> > #endif
> > default:
> > @@ -1468,7 +1511,7 @@ xfs_fc_fill_super(
> > if (!rtdev_is_dax && !datadev_is_dax) {
> > xfs_alert(mp,
> > "DAX unsupported by block device. Turning off DAX.");
> > - mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
> > + xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
> > }
> > if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > xfs_alert(mp,
> > --
> > 2.25.1
> >

2020-04-21 21:17:20

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V9 09/11] fs: Introduce DCACHE_DONTCACHE

On Tue, Apr 21, 2020 at 01:25:19PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 21, 2020 at 12:17:51PM -0700, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > DCACHE_DONTCACHE indicates a dentry should not be cached on final
> > dput().
> >
> > Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> > pointing to a specific inode when that inode is being set I_DONTCACHE.
> >
> > This facilitates dropping dentry references to inodes sooner which
> > require eviction to swap S_DAX mode.
> >
> > Cc: Al Viro <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---

[snip]

> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index de76f7f60695..3c8f44477804 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -559,7 +559,7 @@ xfs_iget_cache_miss(
> > */
> > iflags = XFS_INEW;
> > if (flags & XFS_IGET_DONTCACHE)
> > - VFS_I(ip)->i_state |= I_DONTCACHE;
> > + mark_inode_dontcache(VFS_I(ip));
> > ip->i_udquot = NULL;
> > ip->i_gdquot = NULL;
> > ip->i_pdquot = NULL;
> > diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> > index c1488cc84fd9..56b1482d9223 100644
> > --- a/include/linux/dcache.h
> > +++ b/include/linux/dcache.h
> > @@ -177,6 +177,8 @@ struct dentry_operations {
> >
> > #define DCACHE_REFERENCED 0x00000040 /* Recently used, don't discard. */
> >
> > +#define DCACHE_DONTCACHE 0x00000080 /* don't cache on final dput() */
>
> "Purge from memory on final dput()"?

Sounds good to me,
Ira

>
> --D
>

2020-04-21 21:45:01

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V9 10/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

On Wed, Apr 22, 2020 at 07:30:49AM +1000, Dave Chinner wrote:
> On Tue, Apr 21, 2020 at 01:31:40PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 21, 2020 at 12:17:52PM -0700, [email protected] wrote:
> > > From: Ira Weiny <[email protected]>
> > >
> > > Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
> > > setting of S_DAX, data invalidation no longer needs to happen when
> > > FS_XFLAG_DAX is changed.
> > >
> > > Change xfs_ioctl_setattr_dax_invalidate() to be
> > > xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
> > > functionality.
> > >
> > > Furthermore, we no longer need the locking so we remove the join_flags
> > > logic.
> > >
> > > Signed-off-by: Ira Weiny <[email protected]>
> > >
> > > ---
> > > Changes from V8:
> > > Change name of function to xfs_ioctl_dax_check_set_cache()
> > > Update commit message
> > > Fix bit manipulations
> > >
> > > Changes from V7:
> > > Use new flag_inode_dontcache()
> > > Skip don't cache if mount over ride is active.
> > >
> > > Changes from v6:
> > > Fix completely broken implementation and update commit message.
> > > Use the new VFS layer I_DONTCACHE to facilitate inode eviction
> > > and S_DAX changing on drop_caches
> > >
> > > Changes from v5:
> > > New patch
> > > ---
> > > fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
> > > 1 file changed, 20 insertions(+), 88 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > index 104495ac187c..b87b571a6748 100644
> > > --- a/fs/xfs/xfs_ioctl.c
> > > +++ b/fs/xfs/xfs_ioctl.c
> > > @@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
> > > return 0;
> > > }
> > >
> > > -/*
> > > - * If we are changing DAX flags, we have to ensure the file is clean and any
> > > - * cached objects in the address space are invalidated and removed. This
> > > - * requires us to lock out other IO and page faults similar to a truncate
> > > - * operation. The locks need to be held until the transaction has been committed
> > > - * so that the cache invalidation is atomic with respect to the DAX flag
> > > - * manipulation.
> > > - */
> > > -static int
> > > -xfs_ioctl_setattr_dax_invalidate(
> > > +static void
> > > +xfs_ioctl_dax_check_set_cache(
> >
> > That's a ... strange name. Set cache on what?
> >
> > Oh, this is the function that sets I_DONTCACHE if an FS_XFLAG_DAX change
> > could have an immediate effect on S_DAX (assuming no other users). What
> > do you think of the following?
> >
> > /*
> > * If a change to FS_XFLAG_DAX will result in an change to S_DAX
> > * the next time the incore inode is initialized, set the VFS
> > * I_DONTCACHE flag to try to hurry that along.
> > */
> > static void
> > xfs_ioctl_try_change_vfs_dax(...)
>
> That doesn't seem any better. This is a preparation function now, in
> that it can't fail and doesn't change the outcome of the operation
> being performed. So, IMO, calling it something like
> xfs_ioctl_setattr_prepare_dax() would be a better name for it.

But it does potentially (after a check) set I_DONTCACHE.

What about?

xfs_ioctl_dax_check_set_dontcache()?

Ira

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]

2020-04-21 21:51:28

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V9 11/11] Documentation/dax: Update Usage section

On Tue, Apr 21, 2020 at 01:34:23PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 21, 2020 at 12:17:53PM -0700, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > Update the Usage section to reflect the new individual dax selection
> > functionality.
> >

[snip]

> > +
> > +The default state is 'inode'. Given underlying media support, the following
> > +algorithm is used to determine the effective mode of the file S_DAX on a
> > +capable device.
> > +
> > + S_DAX = FS_XFLAG_DAX;
> > +
> > + if (dax_mount == "always")
> > + S_DAX = true;
> > + else if (dax_mount == "off"
>
> Missing trailing parenthesis here.

Done.

>
> > + S_DAX = false;
> > +
> > +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> > +while the file system is mounted with a dax override. However, in-core inode
> > +state (S_DAX) will continue to be overridden until the filesystem is remounted
> > +with dax=inode and the inode is evicted."
>
> Trailing double-quote here.

Fixed done.

>
> Otherwise, this looks good to me.

thanks,
Ira

2020-04-21 23:14:32

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH V9 10/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

On Tue, Apr 21, 2020 at 02:43:29PM -0700, Ira Weiny wrote:
> On Wed, Apr 22, 2020 at 07:30:49AM +1000, Dave Chinner wrote:
> > On Tue, Apr 21, 2020 at 01:31:40PM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 21, 2020 at 12:17:52PM -0700, [email protected] wrote:
> > > > From: Ira Weiny <[email protected]>
> > > >
> > > > Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
> > > > setting of S_DAX, data invalidation no longer needs to happen when
> > > > FS_XFLAG_DAX is changed.
> > > >
> > > > Change xfs_ioctl_setattr_dax_invalidate() to be
> > > > xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
> > > > functionality.
> > > >
> > > > Furthermore, we no longer need the locking so we remove the join_flags
> > > > logic.
> > > >
> > > > Signed-off-by: Ira Weiny <[email protected]>
> > > >
> > > > ---
> > > > Changes from V8:
> > > > Change name of function to xfs_ioctl_dax_check_set_cache()
> > > > Update commit message
> > > > Fix bit manipulations
> > > >
> > > > Changes from V7:
> > > > Use new flag_inode_dontcache()
> > > > Skip don't cache if mount over ride is active.
> > > >
> > > > Changes from v6:
> > > > Fix completely broken implementation and update commit message.
> > > > Use the new VFS layer I_DONTCACHE to facilitate inode eviction
> > > > and S_DAX changing on drop_caches
> > > >
> > > > Changes from v5:
> > > > New patch
> > > > ---
> > > > fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
> > > > 1 file changed, 20 insertions(+), 88 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > > index 104495ac187c..b87b571a6748 100644
> > > > --- a/fs/xfs/xfs_ioctl.c
> > > > +++ b/fs/xfs/xfs_ioctl.c
> > > > @@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
> > > > return 0;
> > > > }
> > > >
> > > > -/*
> > > > - * If we are changing DAX flags, we have to ensure the file is clean and any
> > > > - * cached objects in the address space are invalidated and removed. This
> > > > - * requires us to lock out other IO and page faults similar to a truncate
> > > > - * operation. The locks need to be held until the transaction has been committed
> > > > - * so that the cache invalidation is atomic with respect to the DAX flag
> > > > - * manipulation.
> > > > - */
> > > > -static int
> > > > -xfs_ioctl_setattr_dax_invalidate(
> > > > +static void
> > > > +xfs_ioctl_dax_check_set_cache(
> > >
> > > That's a ... strange name. Set cache on what?
> > >
> > > Oh, this is the function that sets I_DONTCACHE if an FS_XFLAG_DAX change
> > > could have an immediate effect on S_DAX (assuming no other users). What
> > > do you think of the following?
> > >
> > > /*
> > > * If a change to FS_XFLAG_DAX will result in an change to S_DAX
> > > * the next time the incore inode is initialized, set the VFS
> > > * I_DONTCACHE flag to try to hurry that along.
> > > */
> > > static void
> > > xfs_ioctl_try_change_vfs_dax(...)
> >
> > That doesn't seem any better. This is a preparation function now, in
> > that it can't fail and doesn't change the outcome of the operation
> > being performed. So, IMO, calling it something like
> > xfs_ioctl_setattr_prepare_dax() would be a better name for it.
>
> But it does potentially (after a check) set I_DONTCACHE.

That is an implementation detail - it doesn't change the outcome of
the function, the current behaviour of the inode, or the result of
the ioctl....

> What about?
>
> xfs_ioctl_dax_check_set_dontcache()?

Then we have to rename it again the moment we change it's
functionality. i.e. we have exactly the same problem we have now
where the function name describes the implementation, not the
operational reason the function is being called...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2020-04-21 23:25:29

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V9 10/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

On Wed, Apr 22, 2020 at 09:13:43AM +1000, Dave Chinner wrote:
> On Tue, Apr 21, 2020 at 02:43:29PM -0700, Ira Weiny wrote:
> > On Wed, Apr 22, 2020 at 07:30:49AM +1000, Dave Chinner wrote:
> > > On Tue, Apr 21, 2020 at 01:31:40PM -0700, Darrick J. Wong wrote:
> > > > On Tue, Apr 21, 2020 at 12:17:52PM -0700, [email protected] wrote:
> > > > > From: Ira Weiny <[email protected]>

[snip]

> > > >
> > > > That's a ... strange name. Set cache on what?
> > > >
> > > > Oh, this is the function that sets I_DONTCACHE if an FS_XFLAG_DAX change
> > > > could have an immediate effect on S_DAX (assuming no other users). What
> > > > do you think of the following?
> > > >
> > > > /*
> > > > * If a change to FS_XFLAG_DAX will result in an change to S_DAX
> > > > * the next time the incore inode is initialized, set the VFS
> > > > * I_DONTCACHE flag to try to hurry that along.
> > > > */
> > > > static void
> > > > xfs_ioctl_try_change_vfs_dax(...)
> > >
> > > That doesn't seem any better. This is a preparation function now, in
> > > that it can't fail and doesn't change the outcome of the operation
> > > being performed. So, IMO, calling it something like
> > > xfs_ioctl_setattr_prepare_dax() would be a better name for it.
> >
> > But it does potentially (after a check) set I_DONTCACHE.
>
> That is an implementation detail - it doesn't change the outcome of
> the function, the current behaviour of the inode, or the result of
> the ioctl....

I'm confused. This function does potentially flag the inode to not be cached.
How is that not changing the behavior of the inode?

>
> > What about?
> >
> > xfs_ioctl_dax_check_set_dontcache()?
>
> Then we have to rename it again the moment we change it's
> functionality. i.e. we have exactly the same problem we have now
> where the function name describes the implementation, not the
> operational reason the function is being called...

Ok, I see what you are driving at. You want this function to potentially do
other things and it should 'prepare' the inode for 'dax stuff'. Is that about
it?

I'm ok with xfs_ioctl_setattr_prepare_dax() then.

Ira

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]

2020-04-22 03:46:59

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V9 09/11] fs: Introduce DCACHE_DONTCACHE

On Wed, Apr 22, 2020 at 03:34:07AM +0100, Al Viro wrote:
> On Tue, Apr 21, 2020 at 01:25:19PM -0700, Darrick J. Wong wrote:
>
> > > DCACHE_DONTCACHE indicates a dentry should not be cached on final
> > > dput().
> > >
> > > Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> > > pointing to a specific inode when that inode is being set I_DONTCACHE.
> > >
> > > This facilitates dropping dentry references to inodes sooner which
> > > require eviction to swap S_DAX mode.
>
> Explain, please. Questions:
>
> 1) does that ever happen to directories?

Directories never get S_DAX set. So the eviction only needs to happen on
inodes. But that can't happen without dentries also dropping their references.

> 2) how much trouble do we get if such inode is *NOT* evicted for, say, several
> days?

No trouble at all. Users understand that changing the FS_XFLAG_DAX setting
does _not_ immediately result in S_DAX changing.

It is intended that applications requiring a change of mode would flip the
FS_XFLAG_DAX close the file and wait for the eviction (or force it through a
drop cache if they have permission).

Ira

2020-04-22 09:48:44

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V9 11/11] Documentation/dax: Update Usage section

On Tue 21-04-20 12:17:53, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Update the Usage section to reflect the new individual dax selection
> functionality.
>
> Signed-off-by: Ira Weiny <[email protected]>

Looks good to me. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

>
> ---
> Changes from V8:
> Updates from Darrick
>
> Changes from V7:
> Cleanups/clarifications from Darrick and Dan
>
> Changes from V6:
> Update to allow setting FS_XFLAG_DAX any time.
> Update with list of behaviors from Darrick
> https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
>
> Changes from V5:
> Update to reflect the agreed upon semantics
> https://lore.kernel.org/lkml/[email protected]/
> ---
> Documentation/filesystems/dax.txt | 164 +++++++++++++++++++++++++++++-
> 1 file changed, 161 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 679729442fd2..8f4ab08be715 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -17,11 +17,169 @@ For file mappings, the storage device is mapped directly into userspace.
> Usage
> -----
>
> -If you have a block device which supports DAX, you can make a filesystem
> +If you have a block device which supports DAX, you can make a file system
> on it as usual. The DAX code currently only supports files with a block
> size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> -size when creating the filesystem. When mounting it, use the "-o dax"
> -option on the command line or add 'dax' to the options in /etc/fstab.
> +size when creating the file system.
> +
> +Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them
> +is different.
> +
> +Enabling DAX on ext4 and ext2
> +-----------------------------
> +
> +When mounting the filesystem, use the "-o dax" option on the command line or
> +add 'dax' to the options in /etc/fstab. This works to enable DAX on all files
> +within the filesystem. It is equivalent to the '-o dax=always' behavior below.
> +
> +
> +Enabling DAX on xfs
> +-------------------
> +
> +Summary
> +-------
> +
> + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
> + the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details
> + about this access mode.
> +
> + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> + inherited from the parent directory FS_XFLAG_DAX inode flag at file
> + creation time. This advisory flag can be set or cleared at any
> + time, but doing so does not immediately affect the S_DAX state.
> +
> + Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> + and the fs is on pmem then it will enable S_DAX at inode load time;
> + if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> +
> + 3. There exists a dax= mount option.
> +
> + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
> +
> + "-o dax=always" means "always set S_DAX (at least on pmem),
> + and ignore FS_XFLAG_DAX."
> +
> + "-o dax" is an alias for "dax=always".
> +
> + "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
> +
> + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> + be set or cleared at any time. The flag state is inherited by any files or
> + subdirectories when they are created within that directory.
> +
> + 5. Programs that require a specific file access mode (DAX or not DAX)
> + can do one of the following:
> +
> + (a) Create files in directories that the FS_XFLAG_DAX flag set as
> + needed; or
> +
> + (b) Have the administrator set an override via mount option; or
> +
> + (c) Set or clear the file's FS_XFLAG_DAX flag as needed. Programs
> + must then cause the kernel to evict the inode from memory. This
> + can be done by:
> +
> + i> Closing the file and re-opening the file and using statx to
> + see if the fs has changed the S_DAX flag; and
> +
> + ii> If the file still does not have the desired S_DAX access
> + mode, either unmount and remount the filesystem, or close
> + the file and use drop_caches.
> +
> + 6. It is expected that users who want to squeeze every last bit of performance
> + out of the particular rough and tumble bits of their storage will also be
> + exposed to the difficulties of what happens when the operating system can't
> + totally virtualize those hardware capabilities. DAX is such a feature.
> +
> +
> +Details
> +-------
> +
> +There are 2 per-file dax flags. One is a physical inode setting (FS_XFLAG_DAX)
> +and the other a currently enabled state (S_DAX).
> +
> +FS_XFLAG_DAX is maintained, on disk, on individual inodes. It is preserved
> +within the file system. This 'physical' config setting can be set using an
> +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'". Files and
> +directories automatically inherit FS_XFLAG_DAX from their parent directory
> +_when_ _created_. Therefore, setting FS_XFLAG_DAX at directory creation time
> +can be used to set a default behavior for an entire sub-tree. (Doing so on the
> +root directory acts to set a default for the entire file system.)
> +
> +To clarify inheritance here are 3 examples:
> +
> +Example A:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' a
> +mkdir a/b/c/d
> +mkdir a/e
> +
> + dax: a,e
> + no dax: b,c,d
> +
> +Example B:
> +
> +mkdir a
> +xfs_io 'chattr +x' a
> +mkdir -p a/b/c/d
> +
> + dax: a,b,c,d
> + no dax:
> +
> +Example C:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' c
> +mkdir a/b/c/d
> +
> + dax: c,d
> + no dax: a,b
> +
> +
> +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
> +the underlying media support, the value of FS_XFLAG_DAX, and the file systems
> +dax mount option setting. See below.
> +
> +statx can be used to query S_DAX. NOTE that a directory will never have S_DAX
> +set and therefore statx will never indicate that S_DAX is set on directories.
> +
> +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> +even if the underlying media does not support dax and/or the file system is
> +overridden with a mount option.
> +
> +
> +Overriding FS_XFLAG_DAX (dax= mount option)
> +-------------------------------------------
> +
> +There exists a dax mount option. Using the mount option does not change the
> +physical configured state of individual files but overrides the S_DAX operating
> +state when inodes are loaded.
> +
> +Given underlying media support, the dax mount option is a tri-state option
> +(never, always, inode) with the following meanings:
> +
> + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> + "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> + "-o dax" by itself means "dax=always" to remain compatible with older
> + kernels
> + "-o dax=inode" means "follow FS_XFLAG_DAX"
> +
> +The default state is 'inode'. Given underlying media support, the following
> +algorithm is used to determine the effective mode of the file S_DAX on a
> +capable device.
> +
> + S_DAX = FS_XFLAG_DAX;
> +
> + if (dax_mount == "always")
> + S_DAX = true;
> + else if (dax_mount == "off"
> + S_DAX = false;
> +
> +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> +while the file system is mounted with a dax override. However, in-core inode
> +state (S_DAX) will continue to be overridden until the filesystem is remounted
> +with dax=inode and the inode is evicted."
>
>
> Implementation Tips for Block Driver Writers
> --
> 2.25.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-04-22 15:20:58

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V9 09/11] fs: Introduce DCACHE_DONTCACHE

On Wed, Apr 22, 2020 at 10:46:47AM +0200, Jan Kara wrote:
> On Tue 21-04-20 12:17:51, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > DCACHE_DONTCACHE indicates a dentry should not be cached on final
> > dput().
> >
> > Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> > pointing to a specific inode when that inode is being set I_DONTCACHE.
> >
> > This facilitates dropping dentry references to inodes sooner which
> > require eviction to swap S_DAX mode.
> >
> > Cc: Al Viro <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
> > Changes from V8:
> > Update commit message
> > Use mark_inode_dontcache in XFS
> > Fix locking... can't use rcu here.
> > Change name to mark_inode_dontcache
> > ---
> > fs/dcache.c | 4 ++++
> > fs/inode.c | 15 +++++++++++++++
> > fs/xfs/xfs_icache.c | 2 +-
> > include/linux/dcache.h | 2 ++
> > include/linux/fs.h | 1 +
> > 5 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index b280e07e162b..0030fabab2c4 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
> > if (dentry->d_op->d_delete(dentry))
> > return false;
> > }
> > +
> > + if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
> > + return false;
> > +
> > /* retain; LRU fodder */
> > dentry->d_lockref.count--;
> > if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 93d9252a00ab..da7f3c4926cd 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1526,6 +1526,21 @@ int generic_delete_inode(struct inode *inode)
> > }
> > EXPORT_SYMBOL(generic_delete_inode);
> >
> > +void mark_inode_dontcache(struct inode *inode)
> > +{
> > + struct dentry *de;
> > +
> > + spin_lock(&inode->i_lock);
> > + hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> > + spin_lock(&de->d_lock);
> > + de->d_flags |= DCACHE_DONTCACHE;
> > + spin_unlock(&de->d_lock);
> > + }
> > + spin_unlock(&inode->i_lock);
> > + inode->i_state |= I_DONTCACHE;
>
> Modification of i_state should happen under i_lock.

Done.

>
> > +}
> > +EXPORT_SYMBOL(mark_inode_dontcache);
> > +
> > /*
> > * Called when we're dropping the last reference
> > * to an inode.
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index de76f7f60695..3c8f44477804 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -559,7 +559,7 @@ xfs_iget_cache_miss(
> > */
> > iflags = XFS_INEW;
> > if (flags & XFS_IGET_DONTCACHE)
> > - VFS_I(ip)->i_state |= I_DONTCACHE;
> > + mark_inode_dontcache(VFS_I(ip));
>
> And I know here modification of i_state didn't happen under i_lock but
> that's a special case because we are just instantiating the inode so it was
> not a real issue.

Thanks!
Ira

>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR