2020-04-22 21:22:14

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10

From: Ira Weiny <[email protected]>

Changes from V9:
Slight reorder of series to put Documentation sooner
modify i_state under i_lock
Change name of xfs_ioctl_setattr_dax_invalidate() ->
xfs_ioctl_setattr_prepare_dax()
Do not report default dax=inode mount mode
Move XFS_IEOFBLOCKS to '9'
Fixup some doc typos
Fix xfs style indentation
Fix commit mispelling

At LSF/MM'19 [1] [2] we discussed applications that overestimate memory
consumption due to their inability to detect whether the kernel will
instantiate page cache for a file, and cases where a global dax enable via a
mount option is too coarse.

The following patch series enables the use of DAX on individual files and/or
directories on xfs, and lays some groundwork to do so in ext4. It further
enhances the dax mount option to be a tri-state of 'always', 'never', or
'iflag' (default). Furthermore, it maintians '-o dax' to be equivalent to '-o
dax=always'.

The insight at LSF/MM was to separate the per-mount or per-file "physical"
(FS_XFLAG_DAX) capability switch from an "effective" (S_DAX) attribute for the
file.

At LSF/MM we discussed the difficulties of switching the DAX state of a file
with active mappings / page cache. It was thought the races could be avoided
by limiting DAX state flips to 0-length files.

However, this turns out to not be true.[3][5] This is because address space
operations (a_ops) may be in use at any time the inode is referenced.

For this reason direct manipulation of the FS_XFLAG_DAX is allowed but the
operation of the file (S_DAX) is not immediately changed.

Details of when and how DAX state can be changed on a file is included in a
documentation patch.

It should be noted that FS_XFLAG_DAX inheritance is not shown in this patch set
as it was maintained from previous work on XFS. FS_XFLAG_DAX and it's
inheritance will need to be added to other file systems for user control.


[1] https://lwn.net/Articles/787973/
[2] https://lwn.net/Articles/787233/
[3] https://lkml.org/lkml/2019/10/20/96
[4] https://patchwork.kernel.org/patch/11310511/
[5] https://lore.kernel.org/lkml/[email protected]/


To: [email protected]
Cc: Al Viro <[email protected]>
Cc: "Darrick J. Wong" <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "Theodore Y. Ts'o" <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Changes from V8:
Rebase to 5.7-rc2
Change ALWAYS/NEVER bits to be 26/27
Remove XFS_IDONTCACHE -> lift to I_DONTCACHE
use mark_inode_dontcache() in XFS
create xfs_dax_mode enum
use xfs signature styles
Change xfs_inode_enabe_dax() -> xfs_inode_should_enable()
Based on feedback to ext4 series
Fix locking of DCACHE_DONTCACHE
Change flag_inode_dontcache() -> mark_inode_dontcache()
Change xfs_ioctl_setattr_dax_invalidate() -> xfs_ioctl_dax_check_set_cache()
Documentation cleanups
Clean up all commit messages

Changes from V7:
Add DCACHE_DONTCACHE
If mount override don't worry about inode caching
Change mount flags to NEVER/ALWAYS
Clean up xfs_inode_enable_dax()
Clarify comments
Update documentation

Changes from V6:
Incorporate feedback on patches
Add ability to change FS_XFLAG_DAX on files at any time.
Add a don't cache flag to the VFS layer
Preserve internal XFS IDONTCACHE behavior for bulkstat
operations

Changes from V5:
* make dax mount option a tri-state
* Reject changes to FS_XFLAG_DAX for regular files
- Allow only on directories
* Update documentation

Changes from V4:
* Open code the aops lock rather than add it to the xfs_ilock()
subsystem (Darrick's comments were obsoleted by this change)
* Fix lkp build suggestions and bugs

Changes from V3:
* Remove global locking... :-D
* put back per inode locking and remove pre-mature optimizations
* Fix issues with Directories having IS_DAX() set
* Fix kernel crash issues reported by Jeff
* Add some clean up patches
* Consolidate diflags to iflags functions
* Update/add documentation
* Reorder/rename patches quite a bit

Changes from V2:

* Move i_dax_sem to be a global percpu_rw_sem rather than per inode
Internal discussions with Dan determined this would be easier,
just as performant, and slightly less overhead that having it
in the SB as suggested by Jan
* Fix locking order in comments and throughout code
* Change "mode" to "state" throughout commits
* Add CONFIG_FS_DAX wrapper to disable inode_[un]lock_state() when not
configured
* Add static branch for which is activated by a device which supports
DAX in XFS
* Change "lock/unlock" to up/down read/write as appropriate
Previous names were over simplified
* Update comments/documentation

* Remove the xfs specific lock to the vfs (global) layer.
* Fix i_dax_sem locking order and comments

* Move 'i_mapped' count from struct inode to struct address_space and
rename it to mmap_count
* Add inode_has_mappings() call

* Fix build issues
* Clean up syntax spacing and minor issues
* Update man page text for STATX_ATTR_DAX
* Add reviewed-by's
* Rebase to 5.6

Rename patch:
from: fs/xfs: Add lock/unlock state to xfs
to: fs/xfs: Add write DAX lock to xfs layer
Add patch:
fs/xfs: Clarify lockdep dependency for xfs_isilocked()
Drop patch:
fs/xfs: Fix truncate up

Ira Weiny (11):
fs/xfs: Remove unnecessary initialization of i_rwsem
fs: Remove unneeded IS_DAX() check in io_is_direct()
fs/stat: Define DAX statx attribute
Documentation/dax: Update Usage section
fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
fs/xfs: Make DAX mount option a tri-state
fs/xfs: Create function xfs_inode_should_enable_dax()
fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
fs: Lift XFS_IDONTCACHE to the VFS layer
fs: Introduce DCACHE_DONTCACHE
fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

Documentation/filesystems/dax.txt | 164 +++++++++++++++++++++++++++++-
drivers/block/loop.c | 6 +-
fs/dcache.c | 4 +
fs/inode.c | 15 +++
fs/stat.c | 3 +
fs/xfs/xfs_icache.c | 8 +-
fs/xfs/xfs_inode.h | 4 +-
fs/xfs/xfs_ioctl.c | 141 ++++---------------------
fs/xfs/xfs_iops.c | 72 ++++++++-----
fs/xfs/xfs_mount.h | 4 +-
fs/xfs/xfs_super.c | 53 ++++++++--
include/linux/dcache.h | 2 +
include/linux/fs.h | 14 +--
include/uapi/linux/stat.h | 1 +
14 files changed, 319 insertions(+), 172 deletions(-)

--
2.25.1


2020-04-22 21:22:49

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE

From: Ira Weiny <[email protected]>

DCACHE_DONTCACHE indicates a dentry should not be cached on final
dput().

Also add a helper function to mark DCACHE_DONTCACHE on all dentries
pointing to a specific inode when that inode is being set I_DONTCACHE.

This facilitates dropping dentry references to inodes sooner which
require eviction to swap S_DAX mode.

Cc: Al Viro <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V9:
modify i_state under i_lock
Update comment
"Purge from memory on final dput()"

Changes from V8:
Update commit message
Use mark_inode_dontcache in XFS
Fix locking... can't use rcu here.
Change name to mark_inode_dontcache
---
fs/dcache.c | 4 ++++
fs/inode.c | 15 +++++++++++++++
fs/xfs/xfs_icache.c | 2 +-
include/linux/dcache.h | 2 ++
include/linux/fs.h | 1 +
5 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b280e07e162b..0030fabab2c4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
if (dentry->d_op->d_delete(dentry))
return false;
}
+
+ if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
+ return false;
+
/* retain; LRU fodder */
dentry->d_lockref.count--;
if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
diff --git a/fs/inode.c b/fs/inode.c
index 93d9252a00ab..316355433797 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1526,6 +1526,21 @@ int generic_delete_inode(struct inode *inode)
}
EXPORT_SYMBOL(generic_delete_inode);

+void mark_inode_dontcache(struct inode *inode)
+{
+ struct dentry *de;
+
+ spin_lock(&inode->i_lock);
+ hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
+ spin_lock(&de->d_lock);
+ de->d_flags |= DCACHE_DONTCACHE;
+ spin_unlock(&de->d_lock);
+ }
+ inode->i_state |= I_DONTCACHE;
+ spin_unlock(&inode->i_lock);
+}
+EXPORT_SYMBOL(mark_inode_dontcache);
+
/*
* Called when we're dropping the last reference
* to an inode.
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index de76f7f60695..3c8f44477804 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -559,7 +559,7 @@ xfs_iget_cache_miss(
*/
iflags = XFS_INEW;
if (flags & XFS_IGET_DONTCACHE)
- VFS_I(ip)->i_state |= I_DONTCACHE;
+ mark_inode_dontcache(VFS_I(ip));
ip->i_udquot = NULL;
ip->i_gdquot = NULL;
ip->i_pdquot = NULL;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c1488cc84fd9..a81f0c3cf352 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -177,6 +177,8 @@ struct dentry_operations {

#define DCACHE_REFERENCED 0x00000040 /* Recently used, don't discard. */

+#define DCACHE_DONTCACHE 0x00000080 /* Purge from memory on final dput() */
+
#define DCACHE_CANT_MOUNT 0x00000100
#define DCACHE_GENOCIDE 0x00000200
#define DCACHE_SHRINK_LIST 0x00000400
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 44bd45af760f..064168ec2e0b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
return !inode->i_nlink || inode_unhashed(inode) ||
(inode->i_state & I_DONTCACHE);
}
+extern void mark_inode_dontcache(struct inode *inode);

extern struct inode *ilookup5_nowait(struct super_block *sb,
unsigned long hashval, int (*test)(struct inode *, void *),
--
2.25.1

2020-04-22 21:23:15

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()

From: Ira Weiny <[email protected]>

The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
nearly identical. The only difference is that *_to_linux() is called after
inode setup and disallows changing the DAX flag.

Combining them can be done with a flag which indicates if this is the initial
setup to allow the DAX flag to be properly set only at init time.

So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
directly.

While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
use xfs_ip2xflags() to ensure future diflags are included correctly.

Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V7:
Clarify with a comment the reason for leaving S_DAX out of the
mask

Changes from V6:
Move unrelated hunk to previous patch.
Change logic for better code generation.

Changes from V5:
The functions are no longer identical so we can only combine
them rather than deleting one completely. This is reflected in
the new init parameter.
---
fs/xfs/xfs_inode.h | 1 +
fs/xfs/xfs_ioctl.c | 33 +--------------------------------
fs/xfs/xfs_iops.c | 46 +++++++++++++++++++++++++++-------------------
3 files changed, 29 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index c6a63f6764a6..83073c883fbf 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -467,6 +467,7 @@ int xfs_break_layouts(struct inode *inode, uint *iolock,
/* from xfs_iops.c */
extern void xfs_setup_inode(struct xfs_inode *ip);
extern void xfs_setup_iops(struct xfs_inode *ip);
+extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);

/*
* When setting up a newly allocated inode, we need to call
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 309958186d33..104495ac187c 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1201,37 +1201,6 @@ xfs_flags2diflags2(
return di_flags2;
}

-STATIC void
-xfs_diflags_to_linux(
- struct xfs_inode *ip)
-{
- struct inode *inode = VFS_I(ip);
- unsigned int xflags = xfs_ip2xflags(ip);
-
- if (xflags & FS_XFLAG_IMMUTABLE)
- inode->i_flags |= S_IMMUTABLE;
- else
- inode->i_flags &= ~S_IMMUTABLE;
- if (xflags & FS_XFLAG_APPEND)
- inode->i_flags |= S_APPEND;
- else
- inode->i_flags &= ~S_APPEND;
- if (xflags & FS_XFLAG_SYNC)
- inode->i_flags |= S_SYNC;
- else
- inode->i_flags &= ~S_SYNC;
- if (xflags & FS_XFLAG_NOATIME)
- inode->i_flags |= S_NOATIME;
- else
- inode->i_flags &= ~S_NOATIME;
-#if 0 /* disabled until the flag switching races are sorted out */
- if (xflags & FS_XFLAG_DAX)
- inode->i_flags |= S_DAX;
- else
- inode->i_flags &= ~S_DAX;
-#endif
-}
-
static int
xfs_ioctl_setattr_xflags(
struct xfs_trans *tp,
@@ -1269,7 +1238,7 @@ xfs_ioctl_setattr_xflags(
ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
ip->i_d.di_flags2 = di_flags2;

- xfs_diflags_to_linux(ip);
+ xfs_diflags_to_iflags(ip, false);
xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
XFS_STATS_INC(mp, xs_ig_attrchg);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1814f10e43d3..b70b735fe4a4 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1276,26 +1276,34 @@ xfs_inode_should_enable_dax(
return false;
}

-STATIC void
+void
xfs_diflags_to_iflags(
- struct inode *inode,
- struct xfs_inode *ip)
+ struct xfs_inode *ip,
+ bool init)
{
- uint16_t flags = ip->i_d.di_flags;
-
- inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
- S_NOATIME | S_DAX);
-
- if (flags & XFS_DIFLAG_IMMUTABLE)
- inode->i_flags |= S_IMMUTABLE;
- if (flags & XFS_DIFLAG_APPEND)
- inode->i_flags |= S_APPEND;
- if (flags & XFS_DIFLAG_SYNC)
- inode->i_flags |= S_SYNC;
- if (flags & XFS_DIFLAG_NOATIME)
- inode->i_flags |= S_NOATIME;
- if (xfs_inode_should_enable_dax(ip))
- inode->i_flags |= S_DAX;
+ struct inode *inode = VFS_I(ip);
+ unsigned int xflags = xfs_ip2xflags(ip);
+ unsigned int flags = 0;
+
+ ASSERT(!(IS_DAX(inode) && init));
+
+ if (xflags & FS_XFLAG_IMMUTABLE)
+ flags |= S_IMMUTABLE;
+ if (xflags & FS_XFLAG_APPEND)
+ flags |= S_APPEND;
+ if (xflags & FS_XFLAG_SYNC)
+ flags |= S_SYNC;
+ if (xflags & FS_XFLAG_NOATIME)
+ flags |= S_NOATIME;
+ if (init && xfs_inode_should_enable_dax(ip))
+ flags |= S_DAX;
+
+ /*
+ * S_DAX can only be set during inode initialization and is never set by
+ * the VFS, so we cannot mask off S_DAX in i_flags.
+ */
+ inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);
+ inode->i_flags |= flags;
}

/*
@@ -1321,7 +1329,7 @@ xfs_setup_inode(
inode_fake_hash(inode);

i_size_write(inode, ip->i_d.di_size);
- xfs_diflags_to_iflags(inode, ip);
+ xfs_diflags_to_iflags(ip, true);

if (S_ISDIR(inode->i_mode)) {
/*
--
2.25.1

2020-04-22 21:23:24

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer

From: Ira Weiny <[email protected]>

DAX effective mode (S_DAX) changes requires inode eviction.

XFS has an advisory flag (XFS_IDONTCACHE) to prevent caching of the
inode if no other additional references are taken. We lift this flag to
the VFS layer and change the behavior slightly by allowing the flag to
remain even if multiple references are taken.

This will expedite the eviction of inodes to change S_DAX.

Cc: Al Viro <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V9:
Fix misspelling in commit subject
move XFS_IEOFBLOCKS to '9'

Changes from V8:
Remove XFS_IDONTCACHE
---
fs/xfs/xfs_icache.c | 4 ++--
fs/xfs/xfs_inode.h | 3 +--
fs/xfs/xfs_super.c | 2 +-
include/linux/fs.h | 6 +++++-
4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 17a0b86fe701..de76f7f60695 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -477,7 +477,7 @@ xfs_iget_cache_hit(
xfs_ilock(ip, lock_flags);

if (!(flags & XFS_IGET_INCORE))
- xfs_iflags_clear(ip, XFS_ISTALE | XFS_IDONTCACHE);
+ xfs_iflags_clear(ip, XFS_ISTALE);
XFS_STATS_INC(mp, xs_ig_found);

return 0;
@@ -559,7 +559,7 @@ xfs_iget_cache_miss(
*/
iflags = XFS_INEW;
if (flags & XFS_IGET_DONTCACHE)
- iflags |= XFS_IDONTCACHE;
+ VFS_I(ip)->i_state |= I_DONTCACHE;
ip->i_udquot = NULL;
ip->i_gdquot = NULL;
ip->i_pdquot = NULL;
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 83073c883fbf..d8ce3eaa246e 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -218,8 +218,7 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip)
#define XFS_IFLOCK (1 << __XFS_IFLOCK_BIT)
#define __XFS_IPINNED_BIT 8 /* wakeup key for zero pin count */
#define XFS_IPINNED (1 << __XFS_IPINNED_BIT)
-#define XFS_IDONTCACHE (1 << 9) /* don't cache the inode long term */
-#define XFS_IEOFBLOCKS (1 << 10)/* has the preallocblocks tag set */
+#define XFS_IEOFBLOCKS (1 << 9) /* has the preallocblocks tag set */
/*
* If this unlinked inode is in the middle of recovery, don't let drop_inode
* truncate and free the inode. This can happen if we iget the inode during
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index a4028992b789..284ab186798c 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -740,7 +740,7 @@ xfs_fs_drop_inode(
return 0;
}

- return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE);
+ return generic_drop_inode(inode);
}

static void
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a87cc5845a02..44bd45af760f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2156,6 +2156,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
*
* I_CREATING New object's inode in the middle of setting up.
*
+ * I_DONTCACHE Evict inode as soon as it is not used anymore.
+ *
* Q: What is the difference between I_WILL_FREE and I_FREEING?
*/
#define I_DIRTY_SYNC (1 << 0)
@@ -2178,6 +2180,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
#define I_WB_SWITCH (1 << 13)
#define I_OVL_INUSE (1 << 14)
#define I_CREATING (1 << 15)
+#define I_DONTCACHE (1 << 16)

#define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
#define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
@@ -3049,7 +3052,8 @@ extern int inode_needs_sync(struct inode *inode);
extern int generic_delete_inode(struct inode *inode);
static inline int generic_drop_inode(struct inode *inode)
{
- return !inode->i_nlink || inode_unhashed(inode);
+ return !inode->i_nlink || inode_unhashed(inode) ||
+ (inode->i_state & I_DONTCACHE);
}

extern struct inode *ilookup5_nowait(struct super_block *sb,
--
2.25.1

2020-04-22 21:23:25

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 06/11] fs/xfs: Make DAX mount option a tri-state

From: Ira Weiny <[email protected]>

As agreed upon[1]. We make the dax mount option a tri-state. '-o dax'
continues to operate the same. We add 'always', 'never', and 'inode'
(default).

[1] https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V9:
Fix indentation in xfs_mount_set_dax_mode()
Do not report dax=inode

Changes from v8:
Move NEVER bit to 27
Use xfs signature style
use xfs_dax_mode enum

Changes from v7:
Change to XFS_MOUNT_DAX_NEVER

Changes from v6:
Use 2 flag bits rather than a field.
change iflag to inode

Changes from v5:
New Patch
---
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_super.c | 49 ++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index f6123fb0113c..37bfb50db809 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -238,6 +238,7 @@ typedef struct xfs_mount {
allocator */
#define XFS_MOUNT_NOATTR2 (1ULL << 25) /* disable use of attr2 format */
#define XFS_MOUNT_DAX_ALWAYS (1ULL << 26)
+#define XFS_MOUNT_DAX_NEVER (1ULL << 27)

/*
* Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index ce169d1c7474..a4028992b789 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -47,6 +47,39 @@ static struct kset *xfs_kset; /* top-level xfs sysfs dir */
static struct xfs_kobj xfs_dbg_kobj; /* global debug sysfs attrs */
#endif

+enum xfs_dax_mode {
+ XFS_DAX_INODE = 0,
+ XFS_DAX_ALWAYS = 1,
+ XFS_DAX_NEVER = 2,
+};
+
+static void
+xfs_mount_set_dax_mode(
+ struct xfs_mount *mp,
+ enum xfs_dax_mode mode)
+{
+ switch (mode) {
+ case XFS_DAX_INODE:
+ mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER);
+ break;
+ case XFS_DAX_ALWAYS:
+ mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
+ mp->m_flags &= ~XFS_MOUNT_DAX_NEVER;
+ break;
+ case XFS_DAX_NEVER:
+ mp->m_flags |= XFS_MOUNT_DAX_NEVER;
+ mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
+ break;
+ }
+}
+
+static const struct constant_table dax_param_enums[] = {
+ {"inode", XFS_DAX_INODE },
+ {"always", XFS_DAX_ALWAYS },
+ {"never", XFS_DAX_NEVER },
+ {}
+};
+
/*
* Table driven mount option parser.
*/
@@ -59,7 +92,7 @@ enum {
Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
- Opt_discard, Opt_nodiscard, Opt_dax,
+ Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
};

static const struct fs_parameter_spec xfs_fs_parameters[] = {
@@ -103,6 +136,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
fsparam_flag("discard", Opt_discard),
fsparam_flag("nodiscard", Opt_nodiscard),
fsparam_flag("dax", Opt_dax),
+ fsparam_enum("dax", Opt_dax_enum, dax_param_enums),
{}
};

@@ -129,7 +163,6 @@ xfs_fs_show_options(
{ XFS_MOUNT_GRPID, ",grpid" },
{ XFS_MOUNT_DISCARD, ",discard" },
{ XFS_MOUNT_LARGEIO, ",largeio" },
- { XFS_MOUNT_DAX_ALWAYS, ",dax" },
{ 0, NULL }
};
struct xfs_mount *mp = XFS_M(root->d_sb);
@@ -185,6 +218,11 @@ xfs_fs_show_options(
if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
seq_puts(m, ",noquota");

+ if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
+ seq_puts(m, ",dax=always");
+ else if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
+ seq_puts(m, ",dax=never");
+
return 0;
}

@@ -1261,7 +1299,10 @@ xfs_fc_parse_param(
return 0;
#ifdef CONFIG_FS_DAX
case Opt_dax:
- mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
+ xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
+ return 0;
+ case Opt_dax_enum:
+ xfs_mount_set_dax_mode(mp, result.uint_32);
return 0;
#endif
default:
@@ -1468,7 +1509,7 @@ xfs_fc_fill_super(
if (!rtdev_is_dax && !datadev_is_dax) {
xfs_alert(mp,
"DAX unsupported by block device. Turning off DAX.");
- mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
+ xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
}
if (xfs_sb_version_hasreflink(&mp->m_sb)) {
xfs_alert(mp,
--
2.25.1

2020-04-22 21:23:30

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 07/11] fs/xfs: Create function xfs_inode_should_enable_dax()

From: Ira Weiny <[email protected]>

xfs_inode_supports_dax() should reflect if the inode can support DAX not
that it is enabled for DAX.

Change the use of xfs_inode_supports_dax() to reflect only if the inode
and underlying storage support dax.

Add a new function xfs_inode_should_enable_dax() which reflects if the
inode should be enabled for DAX.

Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V8
Change to 'should enable' (feedback given by Jan in ext4 series)
Darrick I've preserved your Reviewed-by for now LMK if
that is an issue.

Changes from v7:
Move S_ISREG check first
use IS_ENABLED(CONFIG_FS_DAX) rather than duplicated function

Changes from v6:
Change enable checks to be sequential logic.
Update for 2 bit tri-state option.
Make 'static' consistent.
Don't set S_DAX if !CONFIG_FS_DAX

Changes from v5:
Update to reflect the new tri-state mount option

Changes from v3:
Update functions and names to be more clear
Update commit message
Merge with
'fs/xfs: Clean up DAX support check'
don't allow IS_DAX() on a directory
use STATIC macro for static
make xfs_inode_supports_dax() static
---
fs/xfs/xfs_iops.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 462f89af479a..1814f10e43d3 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1243,13 +1243,12 @@ xfs_inode_supports_dax(
{
struct xfs_mount *mp = ip->i_mount;

- /* Only supported on non-reflinked files. */
- if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip))
+ /* Only supported on regular files. */
+ if (!S_ISREG(VFS_I(ip)->i_mode))
return false;

- /* DAX mount option or DAX iflag must be set. */
- if (!(mp->m_flags & XFS_MOUNT_DAX_ALWAYS) &&
- !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
+ /* Only supported on non-reflinked files. */
+ if (xfs_is_reflink_inode(ip))
return false;

/* Block size must match page size */
@@ -1260,6 +1259,23 @@ xfs_inode_supports_dax(
return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
}

+static bool
+xfs_inode_should_enable_dax(
+ struct xfs_inode *ip)
+{
+ if (!IS_ENABLED(CONFIG_FS_DAX))
+ return false;
+ if (ip->i_mount->m_flags & XFS_MOUNT_DAX_NEVER)
+ return false;
+ if (!xfs_inode_supports_dax(ip))
+ return false;
+ if (ip->i_mount->m_flags & XFS_MOUNT_DAX_ALWAYS)
+ return true;
+ if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
+ return true;
+ return false;
+}
+
STATIC void
xfs_diflags_to_iflags(
struct inode *inode,
@@ -1278,7 +1294,7 @@ xfs_diflags_to_iflags(
inode->i_flags |= S_SYNC;
if (flags & XFS_DIFLAG_NOATIME)
inode->i_flags |= S_NOATIME;
- if (xfs_inode_supports_dax(ip))
+ if (xfs_inode_should_enable_dax(ip))
inode->i_flags |= S_DAX;
}

--
2.25.1

2020-04-22 21:23:34

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

From: Ira Weiny <[email protected]>

Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
setting of S_DAX, data invalidation no longer needs to happen when
FS_XFLAG_DAX is changed.

Change xfs_ioctl_setattr_dax_invalidate() to be
xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
functionality.

Furthermore, we no longer need the locking so we remove the join_flags
logic.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V9:
Change name of function to xfs_ioctl_setattr_prepare_dax()

Changes from V8:
Change name of function to xfs_ioctl_dax_check_set_cache()
Update commit message
Fix bit manipulations

Changes from V7:
Use new flag_inode_dontcache()
Skip don't cache if mount over ride is active.

Changes from v6:
Fix completely broken implementation and update commit message.
Use the new VFS layer I_DONTCACHE to facilitate inode eviction
and S_DAX changing on drop_caches

Changes from v5:
New patch
---
fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
1 file changed, 20 insertions(+), 88 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 104495ac187c..aa1a8bd7d68d 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
return 0;
}

-/*
- * If we are changing DAX flags, we have to ensure the file is clean and any
- * cached objects in the address space are invalidated and removed. This
- * requires us to lock out other IO and page faults similar to a truncate
- * operation. The locks need to be held until the transaction has been committed
- * so that the cache invalidation is atomic with respect to the DAX flag
- * manipulation.
- */
-static int
-xfs_ioctl_setattr_dax_invalidate(
+static void
+xfs_ioctl_setattr_prepare_dax(
struct xfs_inode *ip,
- struct fsxattr *fa,
- int *join_flags)
+ struct fsxattr *fa)
{
- struct inode *inode = VFS_I(ip);
- struct super_block *sb = inode->i_sb;
- int error;
-
- *join_flags = 0;
-
- /*
- * It is only valid to set the DAX flag on regular files and
- * directories on filesystems where the block size is equal to the page
- * size. On directories it serves as an inherited hint so we don't
- * have to check the device for dax support or flush pagecache.
- */
- if (fa->fsx_xflags & FS_XFLAG_DAX) {
- struct xfs_buftarg *target = xfs_inode_buftarg(ip);
-
- if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
- return -EINVAL;
- }
-
- /* If the DAX state is not changing, we have nothing to do here. */
- if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
- return 0;
- if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
- return 0;
+ struct xfs_mount *mp = ip->i_mount;
+ struct inode *inode = VFS_I(ip);

if (S_ISDIR(inode->i_mode))
- return 0;
-
- /* lock, flush and invalidate mapping in preparation for flag change */
- xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
- error = filemap_write_and_wait(inode->i_mapping);
- if (error)
- goto out_unlock;
- error = invalidate_inode_pages2(inode->i_mapping);
- if (error)
- goto out_unlock;
-
- *join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
- return 0;
+ return;

-out_unlock:
- xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
- return error;
+ if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
+ (mp->m_flags & XFS_MOUNT_DAX_NEVER))
+ return;

+ if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
+ !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
+ (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
+ (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
+ mark_inode_dontcache(inode);
}

/*
@@ -1310,17 +1272,10 @@ xfs_ioctl_setattr_dax_invalidate(
* have permission to do so. On success, return a clean transaction and the
* inode locked exclusively ready for further operation specific checks. On
* failure, return an error without modifying or locking the inode.
- *
- * The inode might already be IO locked on call. If this is the case, it is
- * indicated in @join_flags and we take full responsibility for ensuring they
- * are unlocked from now on. Hence if we have an error here, we still have to
- * unlock them. Otherwise, once they are joined to the transaction, they will
- * be unlocked on commit/cancel.
*/
static struct xfs_trans *
xfs_ioctl_setattr_get_trans(
- struct xfs_inode *ip,
- int join_flags)
+ struct xfs_inode *ip)
{
struct xfs_mount *mp = ip->i_mount;
struct xfs_trans *tp;
@@ -1337,8 +1292,7 @@ xfs_ioctl_setattr_get_trans(
goto out_unlock;

xfs_ilock(ip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
- join_flags = 0;
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);

/*
* CAP_FOWNER overrides the following restrictions:
@@ -1359,8 +1313,6 @@ xfs_ioctl_setattr_get_trans(
out_cancel:
xfs_trans_cancel(tp);
out_unlock:
- if (join_flags)
- xfs_iunlock(ip, join_flags);
return ERR_PTR(error);
}

@@ -1486,7 +1438,6 @@ xfs_ioctl_setattr(
struct xfs_dquot *pdqp = NULL;
struct xfs_dquot *olddquot = NULL;
int code;
- int join_flags = 0;

trace_xfs_ioctl_setattr(ip);

@@ -1510,18 +1461,9 @@ xfs_ioctl_setattr(
return code;
}

- /*
- * Changing DAX config may require inode locking for mapping
- * invalidation. These need to be held all the way to transaction commit
- * or cancel time, so need to be passed through to
- * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
- * appropriately.
- */
- code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
- if (code)
- goto error_free_dquots;
+ xfs_ioctl_setattr_prepare_dax(ip, fa);

- tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+ tp = xfs_ioctl_setattr_get_trans(ip);
if (IS_ERR(tp)) {
code = PTR_ERR(tp);
goto error_free_dquots;
@@ -1651,7 +1593,6 @@ xfs_ioc_setxflags(
struct fsxattr fa;
struct fsxattr old_fa;
unsigned int flags;
- int join_flags = 0;
int error;

if (copy_from_user(&flags, arg, sizeof(flags)))
@@ -1668,18 +1609,9 @@ xfs_ioc_setxflags(
if (error)
return error;

- /*
- * Changing DAX config may require inode locking for mapping
- * invalidation. These need to be held all the way to transaction commit
- * or cancel time, so need to be passed through to
- * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
- * appropriately.
- */
- error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
- if (error)
- goto out_drop_write;
+ xfs_ioctl_setattr_prepare_dax(ip, &fa);

- tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+ tp = xfs_ioctl_setattr_get_trans(ip);
if (IS_ERR(tp)) {
error = PTR_ERR(tp);
goto out_drop_write;
--
2.25.1

2020-04-22 21:23:51

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct()

From: Ira Weiny <[email protected]>

Remove the check because DAX now has it's own read/write methods and
file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on
their own. Therefore, it does not matter if the file state is DAX when
the iocb flags are created.

Also remove io_is_direct() as it is just a simple flag check.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from v8:
Rebase to latest Linus tree

Changes from v6:
remove io_is_direct() as well.
Remove Reviews since this is quite a bit different.

Changes from v3:
Reword commit message.
Reordered to be a 'pre-cleanup' patch
---
drivers/block/loop.c | 6 +++---
include/linux/fs.h | 7 +------
2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index da693e6a834e..14372df0f354 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -634,8 +634,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)

static inline void loop_update_dio(struct loop_device *lo)
{
- __loop_update_dio(lo, io_is_direct(lo->lo_backing_file) |
- lo->use_dio);
+ __loop_update_dio(lo, (lo->lo_backing_file->f_flags & O_DIRECT) |
+ lo->use_dio);
}

static void loop_reread_partitions(struct loop_device *lo,
@@ -1028,7 +1028,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
blk_queue_write_cache(lo->lo_queue, true, false);

- if (io_is_direct(lo->lo_backing_file) && inode->i_sb->s_bdev) {
+ if ((lo->lo_backing_file->f_flags & O_DIRECT) && inode->i_sb->s_bdev) {
/* In case of direct I/O, match underlying block size */
unsigned short bsize = bdev_logical_block_size(
inode->i_sb->s_bdev);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4f6f59b4f22a..a87cc5845a02 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3394,11 +3394,6 @@ extern void setattr_copy(struct inode *inode, const struct iattr *attr);

extern int file_update_time(struct file *file);

-static inline bool io_is_direct(struct file *filp)
-{
- return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
-}
-
static inline bool vma_is_dax(const struct vm_area_struct *vma)
{
return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
@@ -3423,7 +3418,7 @@ static inline int iocb_flags(struct file *file)
int res = 0;
if (file->f_flags & O_APPEND)
res |= IOCB_APPEND;
- if (io_is_direct(file))
+ if (file->f_flags & O_DIRECT)
res |= IOCB_DIRECT;
if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
res |= IOCB_DSYNC;
--
2.25.1

2020-04-23 08:42:18

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE

On Wed 22-04-20 14:21:01, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> DCACHE_DONTCACHE indicates a dentry should not be cached on final
> dput().
>
> Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> pointing to a specific inode when that inode is being set I_DONTCACHE.
>
> This facilitates dropping dentry references to inodes sooner which
> require eviction to swap S_DAX mode.
>
> Cc: Al Viro <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

>
> ---
> Changes from V9:
> modify i_state under i_lock
> Update comment
> "Purge from memory on final dput()"
>
> Changes from V8:
> Update commit message
> Use mark_inode_dontcache in XFS
> Fix locking... can't use rcu here.
> Change name to mark_inode_dontcache
> ---
> fs/dcache.c | 4 ++++
> fs/inode.c | 15 +++++++++++++++
> fs/xfs/xfs_icache.c | 2 +-
> include/linux/dcache.h | 2 ++
> include/linux/fs.h | 1 +
> 5 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b280e07e162b..0030fabab2c4 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
> if (dentry->d_op->d_delete(dentry))
> return false;
> }
> +
> + if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
> + return false;
> +
> /* retain; LRU fodder */
> dentry->d_lockref.count--;
> if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
> diff --git a/fs/inode.c b/fs/inode.c
> index 93d9252a00ab..316355433797 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1526,6 +1526,21 @@ int generic_delete_inode(struct inode *inode)
> }
> EXPORT_SYMBOL(generic_delete_inode);
>
> +void mark_inode_dontcache(struct inode *inode)
> +{
> + struct dentry *de;
> +
> + spin_lock(&inode->i_lock);
> + hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> + spin_lock(&de->d_lock);
> + de->d_flags |= DCACHE_DONTCACHE;
> + spin_unlock(&de->d_lock);
> + }
> + inode->i_state |= I_DONTCACHE;
> + spin_unlock(&inode->i_lock);
> +}
> +EXPORT_SYMBOL(mark_inode_dontcache);
> +
> /*
> * Called when we're dropping the last reference
> * to an inode.
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index de76f7f60695..3c8f44477804 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -559,7 +559,7 @@ xfs_iget_cache_miss(
> */
> iflags = XFS_INEW;
> if (flags & XFS_IGET_DONTCACHE)
> - VFS_I(ip)->i_state |= I_DONTCACHE;
> + mark_inode_dontcache(VFS_I(ip));
> ip->i_udquot = NULL;
> ip->i_gdquot = NULL;
> ip->i_pdquot = NULL;
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index c1488cc84fd9..a81f0c3cf352 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -177,6 +177,8 @@ struct dentry_operations {
>
> #define DCACHE_REFERENCED 0x00000040 /* Recently used, don't discard. */
>
> +#define DCACHE_DONTCACHE 0x00000080 /* Purge from memory on final dput() */
> +
> #define DCACHE_CANT_MOUNT 0x00000100
> #define DCACHE_GENOCIDE 0x00000200
> #define DCACHE_SHRINK_LIST 0x00000400
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 44bd45af760f..064168ec2e0b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
> return !inode->i_nlink || inode_unhashed(inode) ||
> (inode->i_state & I_DONTCACHE);
> }
> +extern void mark_inode_dontcache(struct inode *inode);
>
> extern struct inode *ilookup5_nowait(struct super_block *sb,
> unsigned long hashval, int (*test)(struct inode *, void *),
> --
> 2.25.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-04-23 22:36:01

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH V10 06/11] fs/xfs: Make DAX mount option a tri-state

On Wed, Apr 22, 2020 at 02:20:57PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> As agreed upon[1]. We make the dax mount option a tri-state. '-o dax'
> continues to operate the same. We add 'always', 'never', and 'inode'
> (default).
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Signed-off-by: Ira Weiny <[email protected]>
....
> @@ -129,7 +163,6 @@ xfs_fs_show_options(
> { XFS_MOUNT_GRPID, ",grpid" },
> { XFS_MOUNT_DISCARD, ",discard" },
> { XFS_MOUNT_LARGEIO, ",largeio" },
> - { XFS_MOUNT_DAX_ALWAYS, ",dax" },
> { 0, NULL }
> };
> struct xfs_mount *mp = XFS_M(root->d_sb);
> @@ -185,6 +218,11 @@ xfs_fs_show_options(
> if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
> seq_puts(m, ",noquota");
>
> + if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
> + seq_puts(m, ",dax=always");
> + else if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
> + seq_puts(m, ",dax=never");

These can never be set at the same time, so please put these in the
m_flags options table as XFS_MOUNT_DAX_ALWAYS already is. i.e.

@@ -129,7 +163,8 @@ xfs_fs_show_options(
{ XFS_MOUNT_GRPID, ",grpid" },
{ XFS_MOUNT_DISCARD, ",discard" },
{ XFS_MOUNT_LARGEIO, ",largeio" },
- { XFS_MOUNT_DAX_ALWAYS, ",dax" },
+ { XFS_MOUNT_DAX_ALWAYS, ",dax=always" },
+ { XFS_MOUNT_DAX_NEVER, ",dax=never" },
{ 0, NULL }
};
struct xfs_mount *mp = XFS_M(root->d_sb);

Otherwise looks OK.

-Dave.
--
Dave Chinner
[email protected]

2020-04-23 22:36:53

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH V10 07/11] fs/xfs: Create function xfs_inode_should_enable_dax()

On Wed, Apr 22, 2020 at 02:20:58PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> xfs_inode_supports_dax() should reflect if the inode can support DAX not
> that it is enabled for DAX.
>
> Change the use of xfs_inode_supports_dax() to reflect only if the inode
> and underlying storage support dax.
>
> Add a new function xfs_inode_should_enable_dax() which reflects if the
> inode should be enabled for DAX.
>
> Reviewed-by: Darrick J. Wong <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>

Looks fine.

Reviewed-by: Dave Chinner <[email protected]>
--
Dave Chinner
[email protected]

2020-04-23 22:38:57

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH V10 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()

On Wed, Apr 22, 2020 at 02:20:59PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
> nearly identical. The only difference is that *_to_linux() is called after
> inode setup and disallows changing the DAX flag.
>
> Combining them can be done with a flag which indicates if this is the initial
> setup to allow the DAX flag to be properly set only at init time.
>
> So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
> directly.
>
> While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
> use xfs_ip2xflags() to ensure future diflags are included correctly.
>
> Reviewed-by: Darrick J. Wong <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>

Looks good.

Reviewed-by: Dave Chinner <[email protected]>
--
Dave Chinner
[email protected]

2020-04-23 22:59:16

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE

On Wed, Apr 22, 2020 at 02:21:01PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> DCACHE_DONTCACHE indicates a dentry should not be cached on final
> dput().
>
> Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> pointing to a specific inode when that inode is being set I_DONTCACHE.
>
> This facilitates dropping dentry references to inodes sooner which
> require eviction to swap S_DAX mode.
>
> Cc: Al Viro <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>

Code looks fine....

> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1526,6 +1526,21 @@ int generic_delete_inode(struct inode *inode)
> }
> EXPORT_SYMBOL(generic_delete_inode);
>
> +void mark_inode_dontcache(struct inode *inode)
> +{
> + struct dentry *de;
> +
> + spin_lock(&inode->i_lock);
> + hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> + spin_lock(&de->d_lock);
> + de->d_flags |= DCACHE_DONTCACHE;
> + spin_unlock(&de->d_lock);
> + }
> + inode->i_state |= I_DONTCACHE;
> + spin_unlock(&inode->i_lock);
> +}
> +EXPORT_SYMBOL(mark_inode_dontcache);

Though I suspect that this should be in fs/dcache.c and not
fs/inode.c. i.e. nothing in fs/inode.c does dentry list walks, but
there are several cases in the dcache code where inode dentry walks
are done under the inode lock (e.g. d_find_alias(inode)).

So perhaps this should be d_mark_dontcache(inode), which also marks
the inode as I_DONTCACHE so that everything is evicted on last
reference...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2020-04-23 23:34:29

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V10 06/11] fs/xfs: Make DAX mount option a tri-state

On Fri, Apr 24, 2020 at 08:35:31AM +1000, Dave Chinner wrote:
> On Wed, Apr 22, 2020 at 02:20:57PM -0700, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > As agreed upon[1]. We make the dax mount option a tri-state. '-o dax'
> > continues to operate the same. We add 'always', 'never', and 'inode'
> > (default).
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> ....
> > @@ -129,7 +163,6 @@ xfs_fs_show_options(
> > { XFS_MOUNT_GRPID, ",grpid" },
> > { XFS_MOUNT_DISCARD, ",discard" },
> > { XFS_MOUNT_LARGEIO, ",largeio" },
> > - { XFS_MOUNT_DAX_ALWAYS, ",dax" },
> > { 0, NULL }
> > };
> > struct xfs_mount *mp = XFS_M(root->d_sb);
> > @@ -185,6 +218,11 @@ xfs_fs_show_options(
> > if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
> > seq_puts(m, ",noquota");
> >
> > + if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
> > + seq_puts(m, ",dax=always");
> > + else if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
> > + seq_puts(m, ",dax=never");
>
> These can never be set at the same time, so please put these in the
> m_flags options table as XFS_MOUNT_DAX_ALWAYS already is. i.e.
>
> @@ -129,7 +163,8 @@ xfs_fs_show_options(
> { XFS_MOUNT_GRPID, ",grpid" },
> { XFS_MOUNT_DISCARD, ",discard" },
> { XFS_MOUNT_LARGEIO, ",largeio" },
> - { XFS_MOUNT_DAX_ALWAYS, ",dax" },
> + { XFS_MOUNT_DAX_ALWAYS, ",dax=always" },
> + { XFS_MOUNT_DAX_NEVER, ",dax=never" },
> { 0, NULL }
> };
> struct xfs_mount *mp = XFS_M(root->d_sb);
>
> Otherwise looks OK.

Done.

Ira

>
> -Dave.
> --
> Dave Chinner
> [email protected]

2020-04-23 23:39:11

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE

On Fri, Apr 24, 2020 at 08:57:34AM +1000, Dave Chinner wrote:
> On Wed, Apr 22, 2020 at 02:21:01PM -0700, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > DCACHE_DONTCACHE indicates a dentry should not be cached on final
> > dput().
> >
> > Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> > pointing to a specific inode when that inode is being set I_DONTCACHE.
> >
> > This facilitates dropping dentry references to inodes sooner which
> > require eviction to swap S_DAX mode.
> >
> > Cc: Al Viro <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
>
> Code looks fine....
>
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1526,6 +1526,21 @@ int generic_delete_inode(struct inode *inode)
> > }
> > EXPORT_SYMBOL(generic_delete_inode);
> >
> > +void mark_inode_dontcache(struct inode *inode)
> > +{
> > + struct dentry *de;
> > +
> > + spin_lock(&inode->i_lock);
> > + hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> > + spin_lock(&de->d_lock);
> > + de->d_flags |= DCACHE_DONTCACHE;
> > + spin_unlock(&de->d_lock);
> > + }
> > + inode->i_state |= I_DONTCACHE;
> > + spin_unlock(&inode->i_lock);
> > +}
> > +EXPORT_SYMBOL(mark_inode_dontcache);
>
> Though I suspect that this should be in fs/dcache.c and not
> fs/inode.c. i.e. nothing in fs/inode.c does dentry list walks, but
> there are several cases in the dcache code where inode dentry walks
> are done under the inode lock (e.g. d_find_alias(inode)).
>
> So perhaps this should be d_mark_dontcache(inode), which also marks
> the inode as I_DONTCACHE so that everything is evicted on last
> reference...

That does follow an existing pattern.

Al? Any preference?

Ira

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]