2020-05-13 05:44:15

by Ira Weiny

[permalink] [raw]
Subject: [PATCH 0/9] Enable ext4 support for per-file/directory DAX operations

From: Ira Weiny <[email protected]>

Enable the same per file DAX support in ext4 as was done for xfs. This series
builds and depends on the V11 series for xfs.[1]

This passes the same xfstests test as XFS.

The only issue is that this modifies the old mount option parsing code rather
than waiting for the new parsing code to be finalized.

This series starts with 3 fixes which include making Verity and Encrypt truly
mutually exclusive from DAX. I think these first 3 patches should be picked up
for 5.8 regardless of what is decided regarding the mount parsing.

[1] https://lore.kernel.org/lkml/[email protected]/

To: [email protected]
Cc: "Darrick J. Wong" <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "Theodore Y. Ts'o" <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Ira Weiny (9):
fs/ext4: Narrow scope of DAX check in setflags
fs/ext4: Disallow verity if inode is DAX
fs/ext4: Disallow encryption if inode is DAX
fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYS
fs/ext4: Update ext4_should_use_dax()
fs/ext4: Only change S_DAX on inode load
fs/ext4: Make DAX mount option a tri-state
fs/ext4: Introduce DAX inode flag
Documentation/dax: Update DAX enablement for ext4

Documentation/filesystems/dax.txt | 6 +-
Documentation/filesystems/ext4/verity.rst | 7 +++
Documentation/filesystems/fscrypt.rst | 4 +-
fs/ext4/ext4.h | 20 ++++---
fs/ext4/ialloc.c | 2 +-
fs/ext4/inode.c | 27 +++++++--
fs/ext4/ioctl.c | 32 +++++++++--
fs/ext4/super.c | 67 +++++++++++++++--------
fs/ext4/verity.c | 5 +-
9 files changed, 125 insertions(+), 45 deletions(-)

--
2.25.1


2020-05-13 05:44:35

by Ira Weiny

[permalink] [raw]
Subject: [PATCH 7/9] fs/ext4: Make DAX mount option a tri-state

From: Ira Weiny <[email protected]>

We add 'always', 'never', and 'inode' (default). '-o dax' continue to
operate the same.

Specifically we introduce a 2nd DAX mount flag EXT4_MOUNT2_DAX_NEVER and set
it and EXT4_MOUNT_DAX_ALWAYS appropriately.

We also force EXT4_MOUNT2_DAX_NEVER if !CONFIG_FS_DAX.

https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from RFC:
Combine remount check for DAX_NEVER with DAX_ALWAYS
Update ext4_should_enable_dax()
---
fs/ext4/ext4.h | 1 +
fs/ext4/inode.c | 2 ++
fs/ext4/super.c | 43 +++++++++++++++++++++++++++++++++++++------
3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 86a0994332ce..01d1de838896 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1168,6 +1168,7 @@ struct ext4_inode_info {
blocks */
#define EXT4_MOUNT2_HURD_COMPAT 0x00000004 /* Support HURD-castrated
file systems */
+#define EXT4_MOUNT2_DAX_NEVER 0x00000008 /* Do not allow Direct Access */

#define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM 0x00000008 /* User explicitly
specified journal checksum */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 23e42a223235..140b1930e2f4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4400,6 +4400,8 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)

static bool ext4_should_enable_dax(struct inode *inode)
{
+ if (test_opt2(inode->i_sb, DAX_NEVER))
+ return false;
if (!S_ISREG(inode->i_mode))
return false;
if (ext4_should_journal_data(inode))
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 5ec900fdf73c..e01a040a58a9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1505,6 +1505,7 @@ enum {
Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version, Opt_dax,
+ Opt_dax_str,
Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
Opt_nowarn_on_error, Opt_mblk_io_submit,
Opt_lazytime, Opt_nolazytime, Opt_debug_want_extra_isize,
@@ -1570,6 +1571,7 @@ static const match_table_t tokens = {
{Opt_barrier, "barrier"},
{Opt_nobarrier, "nobarrier"},
{Opt_i_version, "i_version"},
+ {Opt_dax_str, "dax=%s"},
{Opt_dax, "dax"},
{Opt_stripe, "stripe=%u"},
{Opt_delalloc, "delalloc"},
@@ -1767,6 +1769,7 @@ static const struct mount_opts {
{Opt_min_batch_time, 0, MOPT_GTE0},
{Opt_inode_readahead_blks, 0, MOPT_GTE0},
{Opt_init_itable, 0, MOPT_GTE0},
+ {Opt_dax_str, 0, MOPT_STRING},
{Opt_dax, EXT4_MOUNT_DAX_ALWAYS, MOPT_SET},
{Opt_stripe, 0, MOPT_GTE0},
{Opt_resuid, 0, MOPT_GTE0},
@@ -2076,13 +2079,32 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
}
sbi->s_jquota_fmt = m->mount_opt;
#endif
- } else if (token == Opt_dax) {
+ } else if (token == Opt_dax || token == Opt_dax_str) {
#ifdef CONFIG_FS_DAX
- ext4_msg(sb, KERN_WARNING,
- "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
- sbi->s_mount_opt |= m->mount_opt;
+ char *tmp = match_strdup(&args[0]);
+
+ if (!tmp || !strcmp(tmp, "always")) {
+ ext4_msg(sb, KERN_WARNING,
+ "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
+ sbi->s_mount_opt |= EXT4_MOUNT_DAX_ALWAYS;
+ sbi->s_mount_opt2 &= ~EXT4_MOUNT2_DAX_NEVER;
+ } else if (!strcmp(tmp, "never")) {
+ sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER;
+ sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
+ } else if (!strcmp(tmp, "inode")) {
+ sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
+ sbi->s_mount_opt2 &= ~EXT4_MOUNT2_DAX_NEVER;
+ } else {
+ ext4_msg(sb, KERN_WARNING, "DAX invalid option.");
+ kfree(tmp);
+ return -1;
+ }
+
+ kfree(tmp);
#else
ext4_msg(sb, KERN_INFO, "dax option not supported");
+ sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER;
+ sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
return -1;
#endif
} else if (token == Opt_data_err_abort) {
@@ -2306,6 +2328,13 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
if (DUMMY_ENCRYPTION_ENABLED(sbi))
SEQ_OPTS_PUTS("test_dummy_encryption");

+ if (test_opt2(sb, DAX_NEVER))
+ SEQ_OPTS_PUTS("dax=never");
+ else if (test_opt(sb, DAX_ALWAYS))
+ SEQ_OPTS_PUTS("dax=always");
+ else
+ SEQ_OPTS_PUTS("dax=inode");
+
ext4_show_quota_options(seq, sb);
return 0;
}
@@ -5425,10 +5454,12 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
goto restore_opts;
}

- if ((sbi->s_mount_opt ^ old_opts.s_mount_opt) & EXT4_MOUNT_DAX_ALWAYS) {
+ if ((sbi->s_mount_opt ^ old_opts.s_mount_opt) & EXT4_MOUNT_DAX_ALWAYS ||
+ (sbi->s_mount_opt2 ^ old_opts.s_mount_opt2) & EXT4_MOUNT2_DAX_NEVER) {
ext4_msg(sb, KERN_WARNING, "warning: refusing change of "
- "dax flag with busy inodes while remounting");
+ "dax mount option with busy inodes while remounting");
sbi->s_mount_opt ^= EXT4_MOUNT_DAX_ALWAYS;
+ sbi->s_mount_opt2 ^= EXT4_MOUNT2_DAX_NEVER;
}

if (sbi->s_mount_flags & EXT4_MF_FS_ABORTED)
--
2.25.1

2020-05-13 05:44:39

by Ira Weiny

[permalink] [raw]
Subject: [PATCH 8/9] fs/ext4: Introduce DAX inode flag

From: Ira Weiny <[email protected]>

Add a flag to preserve FS_XFLAG_DAX in the ext4 inode.

Set the flag to be user visible and changeable. Set the flag to be
inherited. Allow applications to change the flag at any time.

Finally, on regular files, flag the inode to not be cached to facilitate
changing S_DAX on the next creation of the inode.

Signed-off-by: Ira Weiny <[email protected]>

---
Change from RFC:
use new d_mark_dontcache()
Allow caching if ALWAYS/NEVER is set
Rebased to latest Linus master
Change flag to unused 0x01000000
update ext4_should_enable_dax()
---
fs/ext4/ext4.h | 13 +++++++++----
fs/ext4/inode.c | 4 +++-
fs/ext4/ioctl.c | 25 ++++++++++++++++++++++++-
3 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 01d1de838896..715f8f2029b2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -415,13 +415,16 @@ struct flex_groups {
#define EXT4_VERITY_FL 0x00100000 /* Verity protected inode */
#define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
/* 0x00400000 was formerly EXT4_EOFBLOCKS_FL */
+
+#define EXT4_DAX_FL 0x01000000 /* Inode is DAX */
+
#define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
#define EXT4_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
#define EXT4_CASEFOLD_FL 0x40000000 /* Casefolded file */
#define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */

-#define EXT4_FL_USER_VISIBLE 0x705BDFFF /* User visible flags */
-#define EXT4_FL_USER_MODIFIABLE 0x604BC0FF /* User modifiable flags */
+#define EXT4_FL_USER_VISIBLE 0x715BDFFF /* User visible flags */
+#define EXT4_FL_USER_MODIFIABLE 0x614BC0FF /* User modifiable flags */

/* Flags we can manipulate with through EXT4_IOC_FSSETXATTR */
#define EXT4_FL_XFLAG_VISIBLE (EXT4_SYNC_FL | \
@@ -429,14 +432,16 @@ struct flex_groups {
EXT4_APPEND_FL | \
EXT4_NODUMP_FL | \
EXT4_NOATIME_FL | \
- EXT4_PROJINHERIT_FL)
+ EXT4_PROJINHERIT_FL | \
+ EXT4_DAX_FL)

/* Flags that should be inherited by new inodes from their parent. */
#define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
EXT4_NOCOMPR_FL | EXT4_JOURNAL_DATA_FL |\
EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL |\
- EXT4_PROJINHERIT_FL | EXT4_CASEFOLD_FL)
+ EXT4_PROJINHERIT_FL | EXT4_CASEFOLD_FL |\
+ EXT4_DAX_FL)

/* Flags that are appropriate for regular files (all but dir-specific ones). */
#define EXT4_REG_FLMASK (~(EXT4_DIRSYNC_FL | EXT4_TOPDIR_FL | EXT4_CASEFOLD_FL |\
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 140b1930e2f4..105cf04f7940 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4400,6 +4400,8 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)

static bool ext4_should_enable_dax(struct inode *inode)
{
+ unsigned int flags = EXT4_I(inode)->i_flags;
+
if (test_opt2(inode->i_sb, DAX_NEVER))
return false;
if (!S_ISREG(inode->i_mode))
@@ -4418,7 +4420,7 @@ static bool ext4_should_enable_dax(struct inode *inode)
if (test_opt(inode->i_sb, DAX_ALWAYS))
return true;

- return false;
+ return flags & EXT4_DAX_FL;
}

void ext4_set_inode_flags(struct inode *inode, bool init)
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 145083e8cd1e..6996a5c3e101 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -528,12 +528,15 @@ static inline __u32 ext4_iflags_to_xflags(unsigned long iflags)
xflags |= FS_XFLAG_NOATIME;
if (iflags & EXT4_PROJINHERIT_FL)
xflags |= FS_XFLAG_PROJINHERIT;
+ if (iflags & EXT4_DAX_FL)
+ xflags |= FS_XFLAG_DAX;
return xflags;
}

#define EXT4_SUPPORTED_FS_XFLAGS (FS_XFLAG_SYNC | FS_XFLAG_IMMUTABLE | \
FS_XFLAG_APPEND | FS_XFLAG_NODUMP | \
- FS_XFLAG_NOATIME | FS_XFLAG_PROJINHERIT)
+ FS_XFLAG_NOATIME | FS_XFLAG_PROJINHERIT | \
+ FS_XFLAG_DAX)

/* Transfer xflags flags to internal */
static inline unsigned long ext4_xflags_to_iflags(__u32 xflags)
@@ -552,6 +555,8 @@ static inline unsigned long ext4_xflags_to_iflags(__u32 xflags)
iflags |= EXT4_NOATIME_FL;
if (xflags & FS_XFLAG_PROJINHERIT)
iflags |= EXT4_PROJINHERIT_FL;
+ if (xflags & FS_XFLAG_DAX)
+ iflags |= EXT4_DAX_FL;

return iflags;
}
@@ -802,6 +807,21 @@ static int ext4_ioctl_get_es_cache(struct file *filp, unsigned long arg)
return error;
}

+static void ext4_dax_dontcache(struct inode *inode, unsigned int flags)
+{
+ struct ext4_inode_info *ei = EXT4_I(inode);
+
+ if (S_ISDIR(inode->i_mode))
+ return;
+
+ if (test_opt2(inode->i_sb, DAX_NEVER) ||
+ test_opt(inode->i_sb, DAX_ALWAYS))
+ return;
+
+ if (((ei->i_flags ^ flags) & EXT4_DAX_FL) == EXT4_DAX_FL)
+ d_mark_dontcache(inode);
+}
+
long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
struct inode *inode = file_inode(filp);
@@ -1267,6 +1287,9 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return err;

inode_lock(inode);
+
+ ext4_dax_dontcache(inode, flags);
+
ext4_fill_fsxattr(inode, &old_fa);
err = vfs_ioc_fssetxattr_check(inode, &old_fa, &fa);
if (err)
--
2.25.1

2020-05-13 05:45:15

by Ira Weiny

[permalink] [raw]
Subject: [PATCH 6/9] fs/ext4: Only change S_DAX on inode load

From: Ira Weiny <[email protected]>

To prevent complications with in memory inodes we only set S_DAX on
inode load. FS_XFLAG_DAX can be changed at any time and S_DAX will
change after inode eviction and reload.

Add init bool to ext4_set_inode_flags() to indicate if the inode is
being newly initialized.

Assert that S_DAX is not set on an inode which is just being loaded.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from RFC:
Change J_ASSERT() to WARN_ON_ONCE()
Fix bug which would clear S_DAX incorrectly
---
fs/ext4/ext4.h | 2 +-
fs/ext4/ialloc.c | 2 +-
fs/ext4/inode.c | 13 ++++++++++---
fs/ext4/ioctl.c | 3 ++-
fs/ext4/super.c | 4 ++--
fs/ext4/verity.c | 2 +-
6 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1a3daf2d18ef..86a0994332ce 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2692,7 +2692,7 @@ extern int ext4_can_truncate(struct inode *inode);
extern int ext4_truncate(struct inode *);
extern int ext4_break_layouts(struct inode *);
extern int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length);
-extern void ext4_set_inode_flags(struct inode *);
+extern void ext4_set_inode_flags(struct inode *, bool init);
extern int ext4_alloc_da_blocks(struct inode *inode);
extern void ext4_set_aops(struct inode *inode);
extern int ext4_writepage_trans_blocks(struct inode *);
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 4b8c9a9bdf0c..7941c140723f 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1116,7 +1116,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
ei->i_block_group = group;
ei->i_last_alloc_group = ~0;

- ext4_set_inode_flags(inode);
+ ext4_set_inode_flags(inode, true);
if (IS_DIRSYNC(inode))
ext4_handle_sync(handle);
if (insert_inode_locked(inode) < 0) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d3a4c2ed7a1c..23e42a223235 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4419,11 +4419,13 @@ static bool ext4_should_enable_dax(struct inode *inode)
return false;
}

-void ext4_set_inode_flags(struct inode *inode)
+void ext4_set_inode_flags(struct inode *inode, bool init)
{
unsigned int flags = EXT4_I(inode)->i_flags;
unsigned int new_fl = 0;

+ WARN_ON_ONCE(IS_DAX(inode) && init);
+
if (flags & EXT4_SYNC_FL)
new_fl |= S_SYNC;
if (flags & EXT4_APPEND_FL)
@@ -4434,8 +4436,13 @@ void ext4_set_inode_flags(struct inode *inode)
new_fl |= S_NOATIME;
if (flags & EXT4_DIRSYNC_FL)
new_fl |= S_DIRSYNC;
- if (ext4_should_enable_dax(inode))
+
+ /* Because of the way inode_set_flags() works we must preserve S_DAX
+ * here if already set. */
+ new_fl |= (inode->i_flags & S_DAX);
+ if (init && ext4_should_enable_dax(inode))
new_fl |= S_DAX;
+
if (flags & EXT4_ENCRYPT_FL)
new_fl |= S_ENCRYPTED;
if (flags & EXT4_CASEFOLD_FL)
@@ -4649,7 +4656,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
* not initialized on a new filesystem. */
}
ei->i_flags = le32_to_cpu(raw_inode->i_flags);
- ext4_set_inode_flags(inode);
+ ext4_set_inode_flags(inode, true);
inode->i_blocks = ext4_inode_blocks(raw_inode, ei);
ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl_lo);
if (ext4_has_feature_64bit(sb))
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 5813e5e73eab..145083e8cd1e 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -381,7 +381,8 @@ static int ext4_ioctl_setflags(struct inode *inode,
ext4_clear_inode_flag(inode, i);
}

- ext4_set_inode_flags(inode);
+ ext4_set_inode_flags(inode, false);
+
inode->i_ctime = current_time(inode);

err = ext4_mark_iloc_dirty(handle, inode, &iloc);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index d0434b513919..5ec900fdf73c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1344,7 +1344,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
ext4_clear_inode_state(inode,
EXT4_STATE_MAY_INLINE_DATA);
- ext4_set_inode_flags(inode);
+ ext4_set_inode_flags(inode, false);
}
return res;
}
@@ -1367,7 +1367,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
ctx, len, 0);
if (!res) {
ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
- ext4_set_inode_flags(inode);
+ ext4_set_inode_flags(inode, false);
res = ext4_mark_inode_dirty(handle, inode);
if (res)
EXT4_ERROR_INODE(inode, "Failed to mark inode dirty");
diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index f05a09fb2ae4..89a155ece323 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -244,7 +244,7 @@ static int ext4_end_enable_verity(struct file *filp, const void *desc,
if (err)
goto out_stop;
ext4_set_inode_flag(inode, EXT4_INODE_VERITY);
- ext4_set_inode_flags(inode);
+ ext4_set_inode_flags(inode, false);
err = ext4_mark_iloc_dirty(handle, inode, &iloc);
}
out_stop:
--
2.25.1

2020-05-13 05:45:59

by Ira Weiny

[permalink] [raw]
Subject: [PATCH 2/9] fs/ext4: Disallow verity if inode is DAX

From: Ira Weiny <[email protected]>

Verity and DAX are incompatible. Changing the DAX mode due to a verity
flag change is wrong without a corresponding address_space_operations
update.

Make the 2 options mutually exclusive by returning an error if DAX was
set first.

(Setting DAX is already disabled if Verity is set first.)

Signed-off-by: Ira Weiny <[email protected]>

---
Changes:
remove WARN_ON_ONCE
Add documentation for DAX/Verity exclusivity
---
Documentation/filesystems/ext4/verity.rst | 7 +++++++
fs/ext4/verity.c | 3 +++
2 files changed, 10 insertions(+)

diff --git a/Documentation/filesystems/ext4/verity.rst b/Documentation/filesystems/ext4/verity.rst
index 3e4c0ee0e068..51ab1aa17e59 100644
--- a/Documentation/filesystems/ext4/verity.rst
+++ b/Documentation/filesystems/ext4/verity.rst
@@ -39,3 +39,10 @@ is encrypted as well as the data itself.

Verity files cannot have blocks allocated past the end of the verity
metadata.
+
+Verity and DAX
+--------------
+
+Verity and DAX are not compatible and attempts to set both of these flags on a
+file will fail.
+
diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index dc5ec724d889..f05a09fb2ae4 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -113,6 +113,9 @@ static int ext4_begin_enable_verity(struct file *filp)
handle_t *handle;
int err;

+ if (IS_DAX(inode))
+ return -EINVAL;
+
if (ext4_verity_in_progress(inode))
return -EBUSY;

--
2.25.1

2020-05-13 05:45:59

by Ira Weiny

[permalink] [raw]
Subject: [PATCH 3/9] fs/ext4: Disallow encryption if inode is DAX

From: Ira Weiny <[email protected]>

Encryption and DAX are incompatible. Changing the DAX mode due to a
change in Encryption mode is wrong without a corresponding
address_space_operations update.

Make the 2 options mutually exclusive by returning an error if DAX was
set first.

Furthermore, clarify the documentation of the exclusivity and how that
will work.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes:
remove WARN_ON_ONCE
Add documentation to the encrypt doc WRT DAX
---
Documentation/filesystems/fscrypt.rst | 4 +++-
fs/ext4/super.c | 10 +---------
2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index aa072112cfff..1475b8d52fef 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -1038,7 +1038,9 @@ astute users may notice some differences in behavior:
- The ext4 filesystem does not support data journaling with encrypted
regular files. It will fall back to ordered data mode instead.

-- DAX (Direct Access) is not supported on encrypted files.
+- DAX (Direct Access) is not supported on encrypted files. Attempts to enable
+ DAX on an encrypted file will fail. Mount options will _not_ enable DAX on
+ encrypted files.

- The st_size of an encrypted symlink will not necessarily give the
length of the symlink target as required by POSIX. It will actually
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index bf5fcb477f66..9873ab27e3fa 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1320,7 +1320,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
if (inode->i_ino == EXT4_ROOT_INO)
return -EPERM;

- if (WARN_ON_ONCE(IS_DAX(inode) && i_size_read(inode)))
+ if (IS_DAX(inode))
return -EINVAL;

res = ext4_convert_inline_data(inode);
@@ -1344,10 +1344,6 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
ext4_clear_inode_state(inode,
EXT4_STATE_MAY_INLINE_DATA);
- /*
- * Update inode->i_flags - S_ENCRYPTED will be enabled,
- * S_DAX may be disabled
- */
ext4_set_inode_flags(inode);
}
return res;
@@ -1371,10 +1367,6 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
ctx, len, 0);
if (!res) {
ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
- /*
- * Update inode->i_flags - S_ENCRYPTED will be enabled,
- * S_DAX may be disabled
- */
ext4_set_inode_flags(inode);
res = ext4_mark_inode_dirty(handle, inode);
if (res)
--
2.25.1

2020-05-13 11:37:25

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 6/9] fs/ext4: Only change S_DAX on inode load

On Tue 12-05-20 22:43:21, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> To prevent complications with in memory inodes we only set S_DAX on
> inode load. FS_XFLAG_DAX can be changed at any time and S_DAX will
> change after inode eviction and reload.
>
> Add init bool to ext4_set_inode_flags() to indicate if the inode is
> being newly initialized.
>
> Assert that S_DAX is not set on an inode which is just being loaded.
>
> Signed-off-by: Ira Weiny <[email protected]>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza


>
> ---
> Changes from RFC:
> Change J_ASSERT() to WARN_ON_ONCE()
> Fix bug which would clear S_DAX incorrectly
> ---
> fs/ext4/ext4.h | 2 +-
> fs/ext4/ialloc.c | 2 +-
> fs/ext4/inode.c | 13 ++++++++++---
> fs/ext4/ioctl.c | 3 ++-
> fs/ext4/super.c | 4 ++--
> fs/ext4/verity.c | 2 +-
> 6 files changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 1a3daf2d18ef..86a0994332ce 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2692,7 +2692,7 @@ extern int ext4_can_truncate(struct inode *inode);
> extern int ext4_truncate(struct inode *);
> extern int ext4_break_layouts(struct inode *);
> extern int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length);
> -extern void ext4_set_inode_flags(struct inode *);
> +extern void ext4_set_inode_flags(struct inode *, bool init);
> extern int ext4_alloc_da_blocks(struct inode *inode);
> extern void ext4_set_aops(struct inode *inode);
> extern int ext4_writepage_trans_blocks(struct inode *);
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 4b8c9a9bdf0c..7941c140723f 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -1116,7 +1116,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
> ei->i_block_group = group;
> ei->i_last_alloc_group = ~0;
>
> - ext4_set_inode_flags(inode);
> + ext4_set_inode_flags(inode, true);
> if (IS_DIRSYNC(inode))
> ext4_handle_sync(handle);
> if (insert_inode_locked(inode) < 0) {
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index d3a4c2ed7a1c..23e42a223235 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4419,11 +4419,13 @@ static bool ext4_should_enable_dax(struct inode *inode)
> return false;
> }
>
> -void ext4_set_inode_flags(struct inode *inode)
> +void ext4_set_inode_flags(struct inode *inode, bool init)
> {
> unsigned int flags = EXT4_I(inode)->i_flags;
> unsigned int new_fl = 0;
>
> + WARN_ON_ONCE(IS_DAX(inode) && init);
> +
> if (flags & EXT4_SYNC_FL)
> new_fl |= S_SYNC;
> if (flags & EXT4_APPEND_FL)
> @@ -4434,8 +4436,13 @@ void ext4_set_inode_flags(struct inode *inode)
> new_fl |= S_NOATIME;
> if (flags & EXT4_DIRSYNC_FL)
> new_fl |= S_DIRSYNC;
> - if (ext4_should_enable_dax(inode))
> +
> + /* Because of the way inode_set_flags() works we must preserve S_DAX
> + * here if already set. */
> + new_fl |= (inode->i_flags & S_DAX);
> + if (init && ext4_should_enable_dax(inode))
> new_fl |= S_DAX;
> +
> if (flags & EXT4_ENCRYPT_FL)
> new_fl |= S_ENCRYPTED;
> if (flags & EXT4_CASEFOLD_FL)
> @@ -4649,7 +4656,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
> * not initialized on a new filesystem. */
> }
> ei->i_flags = le32_to_cpu(raw_inode->i_flags);
> - ext4_set_inode_flags(inode);
> + ext4_set_inode_flags(inode, true);
> inode->i_blocks = ext4_inode_blocks(raw_inode, ei);
> ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl_lo);
> if (ext4_has_feature_64bit(sb))
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index 5813e5e73eab..145083e8cd1e 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -381,7 +381,8 @@ static int ext4_ioctl_setflags(struct inode *inode,
> ext4_clear_inode_flag(inode, i);
> }
>
> - ext4_set_inode_flags(inode);
> + ext4_set_inode_flags(inode, false);
> +
> inode->i_ctime = current_time(inode);
>
> err = ext4_mark_iloc_dirty(handle, inode, &iloc);
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index d0434b513919..5ec900fdf73c 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1344,7 +1344,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
> ext4_clear_inode_state(inode,
> EXT4_STATE_MAY_INLINE_DATA);
> - ext4_set_inode_flags(inode);
> + ext4_set_inode_flags(inode, false);
> }
> return res;
> }
> @@ -1367,7 +1367,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> ctx, len, 0);
> if (!res) {
> ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
> - ext4_set_inode_flags(inode);
> + ext4_set_inode_flags(inode, false);
> res = ext4_mark_inode_dirty(handle, inode);
> if (res)
> EXT4_ERROR_INODE(inode, "Failed to mark inode dirty");
> diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
> index f05a09fb2ae4..89a155ece323 100644
> --- a/fs/ext4/verity.c
> +++ b/fs/ext4/verity.c
> @@ -244,7 +244,7 @@ static int ext4_end_enable_verity(struct file *filp, const void *desc,
> if (err)
> goto out_stop;
> ext4_set_inode_flag(inode, EXT4_INODE_VERITY);
> - ext4_set_inode_flags(inode);
> + ext4_set_inode_flags(inode, false);
> err = ext4_mark_iloc_dirty(handle, inode, &iloc);
> }
> out_stop:
> --
> 2.25.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-05-13 14:36:40

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 7/9] fs/ext4: Make DAX mount option a tri-state

On Tue 12-05-20 22:43:22, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> We add 'always', 'never', and 'inode' (default). '-o dax' continue to
> operate the same.
>
> Specifically we introduce a 2nd DAX mount flag EXT4_MOUNT2_DAX_NEVER and set
> it and EXT4_MOUNT_DAX_ALWAYS appropriately.
>
> We also force EXT4_MOUNT2_DAX_NEVER if !CONFIG_FS_DAX.
>
> https://lore.kernel.org/lkml/[email protected]/
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from RFC:
> Combine remount check for DAX_NEVER with DAX_ALWAYS
> Update ext4_should_enable_dax()
> ---
> fs/ext4/ext4.h | 1 +
> fs/ext4/inode.c | 2 ++
> fs/ext4/super.c | 43 +++++++++++++++++++++++++++++++++++++------
> 3 files changed, 40 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 86a0994332ce..01d1de838896 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1168,6 +1168,7 @@ struct ext4_inode_info {
> blocks */
> #define EXT4_MOUNT2_HURD_COMPAT 0x00000004 /* Support HURD-castrated
> file systems */
> +#define EXT4_MOUNT2_DAX_NEVER 0x00000008 /* Do not allow Direct Access */
>
> #define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM 0x00000008 /* User explicitly
> specified journal checksum */
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 23e42a223235..140b1930e2f4 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4400,6 +4400,8 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
>
> static bool ext4_should_enable_dax(struct inode *inode)
> {
> + if (test_opt2(inode->i_sb, DAX_NEVER))
> + return false;
> if (!S_ISREG(inode->i_mode))
> return false;
> if (ext4_should_journal_data(inode))
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 5ec900fdf73c..e01a040a58a9 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1505,6 +1505,7 @@ enum {
> Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
> Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version, Opt_dax,
> + Opt_dax_str,
> Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
> Opt_nowarn_on_error, Opt_mblk_io_submit,
> Opt_lazytime, Opt_nolazytime, Opt_debug_want_extra_isize,
> @@ -1570,6 +1571,7 @@ static const match_table_t tokens = {
> {Opt_barrier, "barrier"},
> {Opt_nobarrier, "nobarrier"},
> {Opt_i_version, "i_version"},
> + {Opt_dax_str, "dax=%s"},

Hum, maybe it would be easier to handle this like we do with e.g. 'data='
mount option? I.e. like:

{Opt_dax_always, "dax=always"},
{Opt_dax_never, "dax=never"},
{Opt_dax_inode, "dax=inode"),

and then handle these three tokens... Not that it would be a big difference
but that's why we usually handle mount options with small "enums" in ext4.

Honza

> {Opt_dax, "dax"},
> {Opt_stripe, "stripe=%u"},
> {Opt_delalloc, "delalloc"},
> @@ -1767,6 +1769,7 @@ static const struct mount_opts {
> {Opt_min_batch_time, 0, MOPT_GTE0},
> {Opt_inode_readahead_blks, 0, MOPT_GTE0},
> {Opt_init_itable, 0, MOPT_GTE0},
> + {Opt_dax_str, 0, MOPT_STRING},
> {Opt_dax, EXT4_MOUNT_DAX_ALWAYS, MOPT_SET},
> {Opt_stripe, 0, MOPT_GTE0},
> {Opt_resuid, 0, MOPT_GTE0},
> @@ -2076,13 +2079,32 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
> }
> sbi->s_jquota_fmt = m->mount_opt;
> #endif
> - } else if (token == Opt_dax) {
> + } else if (token == Opt_dax || token == Opt_dax_str) {
> #ifdef CONFIG_FS_DAX
> - ext4_msg(sb, KERN_WARNING,
> - "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
> - sbi->s_mount_opt |= m->mount_opt;
> + char *tmp = match_strdup(&args[0]);
> +
> + if (!tmp || !strcmp(tmp, "always")) {
> + ext4_msg(sb, KERN_WARNING,
> + "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
> + sbi->s_mount_opt |= EXT4_MOUNT_DAX_ALWAYS;
> + sbi->s_mount_opt2 &= ~EXT4_MOUNT2_DAX_NEVER;
> + } else if (!strcmp(tmp, "never")) {
> + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER;
> + sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
> + } else if (!strcmp(tmp, "inode")) {
> + sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
> + sbi->s_mount_opt2 &= ~EXT4_MOUNT2_DAX_NEVER;
> + } else {
> + ext4_msg(sb, KERN_WARNING, "DAX invalid option.");
> + kfree(tmp);
> + return -1;
> + }
> +
> + kfree(tmp);
> #else
> ext4_msg(sb, KERN_INFO, "dax option not supported");
> + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER;
> + sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
> return -1;
> #endif
> } else if (token == Opt_data_err_abort) {
> @@ -2306,6 +2328,13 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
> if (DUMMY_ENCRYPTION_ENABLED(sbi))
> SEQ_OPTS_PUTS("test_dummy_encryption");
>
> + if (test_opt2(sb, DAX_NEVER))
> + SEQ_OPTS_PUTS("dax=never");
> + else if (test_opt(sb, DAX_ALWAYS))
> + SEQ_OPTS_PUTS("dax=always");
> + else
> + SEQ_OPTS_PUTS("dax=inode");
> +
> ext4_show_quota_options(seq, sb);
> return 0;
> }
> @@ -5425,10 +5454,12 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
> goto restore_opts;
> }
>
> - if ((sbi->s_mount_opt ^ old_opts.s_mount_opt) & EXT4_MOUNT_DAX_ALWAYS) {
> + if ((sbi->s_mount_opt ^ old_opts.s_mount_opt) & EXT4_MOUNT_DAX_ALWAYS ||
> + (sbi->s_mount_opt2 ^ old_opts.s_mount_opt2) & EXT4_MOUNT2_DAX_NEVER) {
> ext4_msg(sb, KERN_WARNING, "warning: refusing change of "
> - "dax flag with busy inodes while remounting");
> + "dax mount option with busy inodes while remounting");
> sbi->s_mount_opt ^= EXT4_MOUNT_DAX_ALWAYS;
> + sbi->s_mount_opt2 ^= EXT4_MOUNT2_DAX_NEVER;
> }
>
> if (sbi->s_mount_flags & EXT4_MF_FS_ABORTED)
> --
> 2.25.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-05-13 14:47:49

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 8/9] fs/ext4: Introduce DAX inode flag

On Tue 12-05-20 22:43:23, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Add a flag to preserve FS_XFLAG_DAX in the ext4 inode.
>
> Set the flag to be user visible and changeable. Set the flag to be
> inherited. Allow applications to change the flag at any time.
>
> Finally, on regular files, flag the inode to not be cached to facilitate
> changing S_DAX on the next creation of the inode.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Change from RFC:
> use new d_mark_dontcache()
> Allow caching if ALWAYS/NEVER is set
> Rebased to latest Linus master
> Change flag to unused 0x01000000
> update ext4_should_enable_dax()
> ---
> fs/ext4/ext4.h | 13 +++++++++----
> fs/ext4/inode.c | 4 +++-
> fs/ext4/ioctl.c | 25 ++++++++++++++++++++++++-
> 3 files changed, 36 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 01d1de838896..715f8f2029b2 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -415,13 +415,16 @@ struct flex_groups {
> #define EXT4_VERITY_FL 0x00100000 /* Verity protected inode */
> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> /* 0x00400000 was formerly EXT4_EOFBLOCKS_FL */
> +
> +#define EXT4_DAX_FL 0x01000000 /* Inode is DAX */
> +
> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
> #define EXT4_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
> #define EXT4_CASEFOLD_FL 0x40000000 /* Casefolded file */
> #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */
>
> -#define EXT4_FL_USER_VISIBLE 0x705BDFFF /* User visible flags */
> -#define EXT4_FL_USER_MODIFIABLE 0x604BC0FF /* User modifiable flags */
> +#define EXT4_FL_USER_VISIBLE 0x715BDFFF /* User visible flags */
> +#define EXT4_FL_USER_MODIFIABLE 0x614BC0FF /* User modifiable flags */

Hum, I think this was already mentioned but there are also definitions in
include/uapi/linux/fs.h which should be kept in sync... Also if DAX flag
gets modified through FS_IOC_SETFLAGS, we should call ext4_doncache() as
well, shouldn't we?

> @@ -802,6 +807,21 @@ static int ext4_ioctl_get_es_cache(struct file *filp, unsigned long arg)
> return error;
> }
>
> +static void ext4_dax_dontcache(struct inode *inode, unsigned int flags)
> +{
> + struct ext4_inode_info *ei = EXT4_I(inode);
> +
> + if (S_ISDIR(inode->i_mode))
> + return;
> +
> + if (test_opt2(inode->i_sb, DAX_NEVER) ||
> + test_opt(inode->i_sb, DAX_ALWAYS))
> + return;
> +
> + if (((ei->i_flags ^ flags) & EXT4_DAX_FL) == EXT4_DAX_FL)
> + d_mark_dontcache(inode);
> +}
> +
> long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> {
> struct inode *inode = file_inode(filp);
> @@ -1267,6 +1287,9 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> return err;
>
> inode_lock(inode);
> +
> + ext4_dax_dontcache(inode, flags);
> +

I don't think we should set dontcache flag when setting of DAX flag fails -
it could event be a security issue). So I think you'll have to check
whether DAX flag is being changed, call vfs_ioc_fssetxattr_check(), and
only if it succeeded and DAX flags was changing call ext4_dax_dontcache().

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-05-13 18:19:20

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 7/9] fs/ext4: Make DAX mount option a tri-state

On Wed, May 13, 2020 at 04:35:26PM +0200, Jan Kara wrote:
> On Tue 12-05-20 22:43:22, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > We add 'always', 'never', and 'inode' (default). '-o dax' continue to
> > operate the same.
> >
> > Specifically we introduce a 2nd DAX mount flag EXT4_MOUNT2_DAX_NEVER and set
> > it and EXT4_MOUNT_DAX_ALWAYS appropriately.
> >
> > We also force EXT4_MOUNT2_DAX_NEVER if !CONFIG_FS_DAX.
> >
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
> > Changes from RFC:
> > Combine remount check for DAX_NEVER with DAX_ALWAYS
> > Update ext4_should_enable_dax()
> > ---
> > fs/ext4/ext4.h | 1 +
> > fs/ext4/inode.c | 2 ++
> > fs/ext4/super.c | 43 +++++++++++++++++++++++++++++++++++++------
> > 3 files changed, 40 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 86a0994332ce..01d1de838896 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -1168,6 +1168,7 @@ struct ext4_inode_info {
> > blocks */
> > #define EXT4_MOUNT2_HURD_COMPAT 0x00000004 /* Support HURD-castrated
> > file systems */
> > +#define EXT4_MOUNT2_DAX_NEVER 0x00000008 /* Do not allow Direct Access */
> >
> > #define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM 0x00000008 /* User explicitly
> > specified journal checksum */
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 23e42a223235..140b1930e2f4 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -4400,6 +4400,8 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
> >
> > static bool ext4_should_enable_dax(struct inode *inode)
> > {
> > + if (test_opt2(inode->i_sb, DAX_NEVER))
> > + return false;
> > if (!S_ISREG(inode->i_mode))
> > return false;
> > if (ext4_should_journal_data(inode))
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 5ec900fdf73c..e01a040a58a9 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -1505,6 +1505,7 @@ enum {
> > Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
> > Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> > Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version, Opt_dax,
> > + Opt_dax_str,
> > Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
> > Opt_nowarn_on_error, Opt_mblk_io_submit,
> > Opt_lazytime, Opt_nolazytime, Opt_debug_want_extra_isize,
> > @@ -1570,6 +1571,7 @@ static const match_table_t tokens = {
> > {Opt_barrier, "barrier"},
> > {Opt_nobarrier, "nobarrier"},
> > {Opt_i_version, "i_version"},
> > + {Opt_dax_str, "dax=%s"},
>
> Hum, maybe it would be easier to handle this like we do with e.g. 'data='
> mount option? I.e. like:
>
> {Opt_dax_always, "dax=always"},
> {Opt_dax_never, "dax=never"},
> {Opt_dax_inode, "dax=inode"),
>
> and then handle these three tokens... Not that it would be a big difference
> but that's why we usually handle mount options with small "enums" in ext4.

I was hoping that we could hoist the tristate enum bits out of XFS and
simply share them across the three DAX filesystems, but I have no idea
if that will work with a filesystem that hasn't been converted to the
new mount option parsing api. I'm betting no. :/

(FWIW see enum xfs_dax_mode and struct constant_table dax_param_enums in
fs/xfs/xfs_super.c in the for-next tree.)

Hm, otoh I don't see any recent posting of an ext4 mount parsing
conversion series, so yeah this is probably as good as can be done until
that happens.

--D

> Honza
>
> > {Opt_dax, "dax"},
> > {Opt_stripe, "stripe=%u"},
> > {Opt_delalloc, "delalloc"},
> > @@ -1767,6 +1769,7 @@ static const struct mount_opts {
> > {Opt_min_batch_time, 0, MOPT_GTE0},
> > {Opt_inode_readahead_blks, 0, MOPT_GTE0},
> > {Opt_init_itable, 0, MOPT_GTE0},
> > + {Opt_dax_str, 0, MOPT_STRING},
> > {Opt_dax, EXT4_MOUNT_DAX_ALWAYS, MOPT_SET},
> > {Opt_stripe, 0, MOPT_GTE0},
> > {Opt_resuid, 0, MOPT_GTE0},
> > @@ -2076,13 +2079,32 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
> > }
> > sbi->s_jquota_fmt = m->mount_opt;
> > #endif
> > - } else if (token == Opt_dax) {
> > + } else if (token == Opt_dax || token == Opt_dax_str) {
> > #ifdef CONFIG_FS_DAX
> > - ext4_msg(sb, KERN_WARNING,
> > - "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
> > - sbi->s_mount_opt |= m->mount_opt;
> > + char *tmp = match_strdup(&args[0]);
> > +
> > + if (!tmp || !strcmp(tmp, "always")) {
> > + ext4_msg(sb, KERN_WARNING,
> > + "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
> > + sbi->s_mount_opt |= EXT4_MOUNT_DAX_ALWAYS;
> > + sbi->s_mount_opt2 &= ~EXT4_MOUNT2_DAX_NEVER;
> > + } else if (!strcmp(tmp, "never")) {
> > + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER;
> > + sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
> > + } else if (!strcmp(tmp, "inode")) {
> > + sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
> > + sbi->s_mount_opt2 &= ~EXT4_MOUNT2_DAX_NEVER;
> > + } else {
> > + ext4_msg(sb, KERN_WARNING, "DAX invalid option.");
> > + kfree(tmp);
> > + return -1;
> > + }
> > +
> > + kfree(tmp);
> > #else
> > ext4_msg(sb, KERN_INFO, "dax option not supported");
> > + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER;
> > + sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
> > return -1;
> > #endif
> > } else if (token == Opt_data_err_abort) {
> > @@ -2306,6 +2328,13 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
> > if (DUMMY_ENCRYPTION_ENABLED(sbi))
> > SEQ_OPTS_PUTS("test_dummy_encryption");
> >
> > + if (test_opt2(sb, DAX_NEVER))
> > + SEQ_OPTS_PUTS("dax=never");
> > + else if (test_opt(sb, DAX_ALWAYS))
> > + SEQ_OPTS_PUTS("dax=always");
> > + else
> > + SEQ_OPTS_PUTS("dax=inode");
> > +
> > ext4_show_quota_options(seq, sb);
> > return 0;
> > }
> > @@ -5425,10 +5454,12 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
> > goto restore_opts;
> > }
> >
> > - if ((sbi->s_mount_opt ^ old_opts.s_mount_opt) & EXT4_MOUNT_DAX_ALWAYS) {
> > + if ((sbi->s_mount_opt ^ old_opts.s_mount_opt) & EXT4_MOUNT_DAX_ALWAYS ||
> > + (sbi->s_mount_opt2 ^ old_opts.s_mount_opt2) & EXT4_MOUNT2_DAX_NEVER) {
> > ext4_msg(sb, KERN_WARNING, "warning: refusing change of "
> > - "dax flag with busy inodes while remounting");
> > + "dax mount option with busy inodes while remounting");
> > sbi->s_mount_opt ^= EXT4_MOUNT_DAX_ALWAYS;
> > + sbi->s_mount_opt2 ^= EXT4_MOUNT2_DAX_NEVER;
> > }
> >
> > if (sbi->s_mount_flags & EXT4_MF_FS_ABORTED)
> > --
> > 2.25.1
> >
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2020-05-13 19:54:54

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 7/9] fs/ext4: Make DAX mount option a tri-state

On Wed, May 13, 2020 at 11:17:17AM -0700, Darrick J. Wong wrote:
> On Wed, May 13, 2020 at 04:35:26PM +0200, Jan Kara wrote:
> > On Tue 12-05-20 22:43:22, [email protected] wrote:
> > > From: Ira Weiny <[email protected]>
> > >
> > > We add 'always', 'never', and 'inode' (default). '-o dax' continue to
> > > operate the same.
> > >
> > > Specifically we introduce a 2nd DAX mount flag EXT4_MOUNT2_DAX_NEVER and set
> > > it and EXT4_MOUNT_DAX_ALWAYS appropriately.
> > >
> > > We also force EXT4_MOUNT2_DAX_NEVER if !CONFIG_FS_DAX.
> > >
> > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > Signed-off-by: Ira Weiny <[email protected]>
> > >
> > > ---
> > > Changes from RFC:
> > > Combine remount check for DAX_NEVER with DAX_ALWAYS
> > > Update ext4_should_enable_dax()
> > > ---
> > > fs/ext4/ext4.h | 1 +
> > > fs/ext4/inode.c | 2 ++
> > > fs/ext4/super.c | 43 +++++++++++++++++++++++++++++++++++++------
> > > 3 files changed, 40 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > > index 86a0994332ce..01d1de838896 100644
> > > --- a/fs/ext4/ext4.h
> > > +++ b/fs/ext4/ext4.h
> > > @@ -1168,6 +1168,7 @@ struct ext4_inode_info {
> > > blocks */
> > > #define EXT4_MOUNT2_HURD_COMPAT 0x00000004 /* Support HURD-castrated
> > > file systems */
> > > +#define EXT4_MOUNT2_DAX_NEVER 0x00000008 /* Do not allow Direct Access */
> > >
> > > #define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM 0x00000008 /* User explicitly
> > > specified journal checksum */
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index 23e42a223235..140b1930e2f4 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -4400,6 +4400,8 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
> > >
> > > static bool ext4_should_enable_dax(struct inode *inode)
> > > {
> > > + if (test_opt2(inode->i_sb, DAX_NEVER))
> > > + return false;
> > > if (!S_ISREG(inode->i_mode))
> > > return false;
> > > if (ext4_should_journal_data(inode))
> > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > > index 5ec900fdf73c..e01a040a58a9 100644
> > > --- a/fs/ext4/super.c
> > > +++ b/fs/ext4/super.c
> > > @@ -1505,6 +1505,7 @@ enum {
> > > Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
> > > Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> > > Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version, Opt_dax,
> > > + Opt_dax_str,
> > > Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
> > > Opt_nowarn_on_error, Opt_mblk_io_submit,
> > > Opt_lazytime, Opt_nolazytime, Opt_debug_want_extra_isize,
> > > @@ -1570,6 +1571,7 @@ static const match_table_t tokens = {
> > > {Opt_barrier, "barrier"},
> > > {Opt_nobarrier, "nobarrier"},
> > > {Opt_i_version, "i_version"},
> > > + {Opt_dax_str, "dax=%s"},
> >
> > Hum, maybe it would be easier to handle this like we do with e.g. 'data='
> > mount option? I.e. like:
> >
> > {Opt_dax_always, "dax=always"},
> > {Opt_dax_never, "dax=never"},
> > {Opt_dax_inode, "dax=inode"),
> >
> > and then handle these three tokens... Not that it would be a big difference
> > but that's why we usually handle mount options with small "enums" in ext4.

We could, but at this point it would need to be reworked for the new option
parsing code anyway...

I've kind of been waiting to see if another round of those patches were
submitted but looks like they are taking more work.

>
> I was hoping that we could hoist the tristate enum bits out of XFS and
> simply share them across the three DAX filesystems, but I have no idea
> if that will work with a filesystem that hasn't been converted to the
> new mount option parsing api. I'm betting no. :/
>
> (FWIW see enum xfs_dax_mode and struct constant_table dax_param_enums in
> fs/xfs/xfs_super.c in the for-next tree.)
>
> Hm, otoh I don't see any recent posting of an ext4 mount parsing
> conversion series, so yeah this is probably as good as can be done until
> that happens.
>

That is my thinking.

I wanted to get this series out because as a feature it would be nice if this
went in together with XFS for 5.8. But I understand if we want to wait.

Ira

>
> --D
>
> > Honza
> >
> > > {Opt_dax, "dax"},
> > > {Opt_stripe, "stripe=%u"},
> > > {Opt_delalloc, "delalloc"},
> > > @@ -1767,6 +1769,7 @@ static const struct mount_opts {
> > > {Opt_min_batch_time, 0, MOPT_GTE0},
> > > {Opt_inode_readahead_blks, 0, MOPT_GTE0},
> > > {Opt_init_itable, 0, MOPT_GTE0},
> > > + {Opt_dax_str, 0, MOPT_STRING},
> > > {Opt_dax, EXT4_MOUNT_DAX_ALWAYS, MOPT_SET},
> > > {Opt_stripe, 0, MOPT_GTE0},
> > > {Opt_resuid, 0, MOPT_GTE0},
> > > @@ -2076,13 +2079,32 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
> > > }
> > > sbi->s_jquota_fmt = m->mount_opt;
> > > #endif
> > > - } else if (token == Opt_dax) {
> > > + } else if (token == Opt_dax || token == Opt_dax_str) {
> > > #ifdef CONFIG_FS_DAX
> > > - ext4_msg(sb, KERN_WARNING,
> > > - "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
> > > - sbi->s_mount_opt |= m->mount_opt;
> > > + char *tmp = match_strdup(&args[0]);
> > > +
> > > + if (!tmp || !strcmp(tmp, "always")) {
> > > + ext4_msg(sb, KERN_WARNING,
> > > + "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
> > > + sbi->s_mount_opt |= EXT4_MOUNT_DAX_ALWAYS;
> > > + sbi->s_mount_opt2 &= ~EXT4_MOUNT2_DAX_NEVER;
> > > + } else if (!strcmp(tmp, "never")) {
> > > + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER;
> > > + sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
> > > + } else if (!strcmp(tmp, "inode")) {
> > > + sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
> > > + sbi->s_mount_opt2 &= ~EXT4_MOUNT2_DAX_NEVER;
> > > + } else {
> > > + ext4_msg(sb, KERN_WARNING, "DAX invalid option.");
> > > + kfree(tmp);
> > > + return -1;
> > > + }
> > > +
> > > + kfree(tmp);
> > > #else
> > > ext4_msg(sb, KERN_INFO, "dax option not supported");
> > > + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER;
> > > + sbi->s_mount_opt &= ~EXT4_MOUNT_DAX_ALWAYS;
> > > return -1;
> > > #endif
> > > } else if (token == Opt_data_err_abort) {
> > > @@ -2306,6 +2328,13 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
> > > if (DUMMY_ENCRYPTION_ENABLED(sbi))
> > > SEQ_OPTS_PUTS("test_dummy_encryption");
> > >
> > > + if (test_opt2(sb, DAX_NEVER))
> > > + SEQ_OPTS_PUTS("dax=never");
> > > + else if (test_opt(sb, DAX_ALWAYS))
> > > + SEQ_OPTS_PUTS("dax=always");
> > > + else
> > > + SEQ_OPTS_PUTS("dax=inode");
> > > +
> > > ext4_show_quota_options(seq, sb);
> > > return 0;
> > > }
> > > @@ -5425,10 +5454,12 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
> > > goto restore_opts;
> > > }
> > >
> > > - if ((sbi->s_mount_opt ^ old_opts.s_mount_opt) & EXT4_MOUNT_DAX_ALWAYS) {
> > > + if ((sbi->s_mount_opt ^ old_opts.s_mount_opt) & EXT4_MOUNT_DAX_ALWAYS ||
> > > + (sbi->s_mount_opt2 ^ old_opts.s_mount_opt2) & EXT4_MOUNT2_DAX_NEVER) {
> > > ext4_msg(sb, KERN_WARNING, "warning: refusing change of "
> > > - "dax flag with busy inodes while remounting");
> > > + "dax mount option with busy inodes while remounting");
> > > sbi->s_mount_opt ^= EXT4_MOUNT_DAX_ALWAYS;
> > > + sbi->s_mount_opt2 ^= EXT4_MOUNT2_DAX_NEVER;
> > > }
> > >
> > > if (sbi->s_mount_flags & EXT4_MF_FS_ABORTED)
> > > --
> > > 2.25.1
> > >
> > --
> > Jan Kara <[email protected]>
> > SUSE Labs, CR

2020-05-13 21:42:40

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 8/9] fs/ext4: Introduce DAX inode flag

On Wed, May 13, 2020 at 04:47:06PM +0200, Jan Kara wrote:
> On Tue 12-05-20 22:43:23, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > Add a flag to preserve FS_XFLAG_DAX in the ext4 inode.
> >
> > Set the flag to be user visible and changeable. Set the flag to be
> > inherited. Allow applications to change the flag at any time.
> >
> > Finally, on regular files, flag the inode to not be cached to facilitate
> > changing S_DAX on the next creation of the inode.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
> > Change from RFC:
> > use new d_mark_dontcache()
> > Allow caching if ALWAYS/NEVER is set
> > Rebased to latest Linus master
> > Change flag to unused 0x01000000
> > update ext4_should_enable_dax()
> > ---
> > fs/ext4/ext4.h | 13 +++++++++----
> > fs/ext4/inode.c | 4 +++-
> > fs/ext4/ioctl.c | 25 ++++++++++++++++++++++++-
> > 3 files changed, 36 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 01d1de838896..715f8f2029b2 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -415,13 +415,16 @@ struct flex_groups {
> > #define EXT4_VERITY_FL 0x00100000 /* Verity protected inode */
> > #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> > /* 0x00400000 was formerly EXT4_EOFBLOCKS_FL */
> > +
> > +#define EXT4_DAX_FL 0x01000000 /* Inode is DAX */
> > +
> > #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
> > #define EXT4_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
> > #define EXT4_CASEFOLD_FL 0x40000000 /* Casefolded file */
> > #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */
> >
> > -#define EXT4_FL_USER_VISIBLE 0x705BDFFF /* User visible flags */
> > -#define EXT4_FL_USER_MODIFIABLE 0x604BC0FF /* User modifiable flags */
> > +#define EXT4_FL_USER_VISIBLE 0x715BDFFF /* User visible flags */
> > +#define EXT4_FL_USER_MODIFIABLE 0x614BC0FF /* User modifiable flags */
>
> Hum, I think this was already mentioned but there are also definitions in
> include/uapi/linux/fs.h which should be kept in sync... Also if DAX flag
> gets modified through FS_IOC_SETFLAGS, we should call ext4_doncache() as
> well, shouldn't we?

Ah yea it was mentioned. Sorry.

>
> > @@ -802,6 +807,21 @@ static int ext4_ioctl_get_es_cache(struct file *filp, unsigned long arg)
> > return error;
> > }
> >
> > +static void ext4_dax_dontcache(struct inode *inode, unsigned int flags)
> > +{
> > + struct ext4_inode_info *ei = EXT4_I(inode);
> > +
> > + if (S_ISDIR(inode->i_mode))
> > + return;
> > +
> > + if (test_opt2(inode->i_sb, DAX_NEVER) ||
> > + test_opt(inode->i_sb, DAX_ALWAYS))
> > + return;
> > +
> > + if (((ei->i_flags ^ flags) & EXT4_DAX_FL) == EXT4_DAX_FL)
> > + d_mark_dontcache(inode);
> > +}
> > +
> > long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> > {
> > struct inode *inode = file_inode(filp);
> > @@ -1267,6 +1287,9 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> > return err;
> >
> > inode_lock(inode);
> > +
> > + ext4_dax_dontcache(inode, flags);
> > +
>
> I don't think we should set dontcache flag when setting of DAX flag fails -
> it could event be a security issue).

good point.

>
> So I think you'll have to check
> whether DAX flag is being changed,

ext4_dax_dontcache() does check if the flag is being changed.

> call vfs_ioc_fssetxattr_check(), and
> only if it succeeded and DAX flags was changing call ext4_dax_dontcache().

Yes I think it would be better to ensure all of the ioctl succeeds prior to
setting the don't cache. The logic is easier to follow.

Ira

>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2020-05-14 06:44:23

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 8/9] fs/ext4: Introduce DAX inode flag

On Wed 13-05-20 14:41:55, Ira Weiny wrote:
> On Wed, May 13, 2020 at 04:47:06PM +0200, Jan Kara wrote:
> >
> > So I think you'll have to check
> > whether DAX flag is being changed,
>
> ext4_dax_dontcache() does check if the flag is being changed.

Yes, but if you call it after inode flags change, you cannot determine that
just from flags and EXT4_I(inode)->i_flags. So that logic needs to change.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-05-14 06:55:55

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 8/9] fs/ext4: Introduce DAX inode flag

On Thu, May 14, 2020 at 08:43:35AM +0200, Jan Kara wrote:
> On Wed 13-05-20 14:41:55, Ira Weiny wrote:
> > On Wed, May 13, 2020 at 04:47:06PM +0200, Jan Kara wrote:
> > >
> > > So I think you'll have to check
> > > whether DAX flag is being changed,
> >
> > ext4_dax_dontcache() does check if the flag is being changed.
>
> Yes, but if you call it after inode flags change, you cannot determine that
> just from flags and EXT4_I(inode)->i_flags. So that logic needs to change.

I just caught this email... just after sending V1.

I've moved where ext4_dax_dontcache() is called. I think it is ok now with the
current check.

LMK if I've messed it up... :-/

Ira

>
> Honza
>
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2020-05-16 01:49:58

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH 2/9] fs/ext4: Disallow verity if inode is DAX

On Tue, May 12, 2020 at 10:43:17PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Verity and DAX are incompatible. Changing the DAX mode due to a verity
> flag change is wrong without a corresponding address_space_operations
> update.
>
> Make the 2 options mutually exclusive by returning an error if DAX was
> set first.
>
> (Setting DAX is already disabled if Verity is set first.)
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes:
> remove WARN_ON_ONCE
> Add documentation for DAX/Verity exclusivity
> ---
> Documentation/filesystems/ext4/verity.rst | 7 +++++++
> fs/ext4/verity.c | 3 +++
> 2 files changed, 10 insertions(+)
>
> diff --git a/Documentation/filesystems/ext4/verity.rst b/Documentation/filesystems/ext4/verity.rst
> index 3e4c0ee0e068..51ab1aa17e59 100644
> --- a/Documentation/filesystems/ext4/verity.rst
> +++ b/Documentation/filesystems/ext4/verity.rst
> @@ -39,3 +39,10 @@ is encrypted as well as the data itself.
>
> Verity files cannot have blocks allocated past the end of the verity
> metadata.
> +
> +Verity and DAX
> +--------------
> +
> +Verity and DAX are not compatible and attempts to set both of these flags on a
> +file will fail.
> +

If you build the documentation, this shows up as its own subsection
"2.13. Verity and DAX" alongside "2.12. Verity files", which looks odd.
I think you should delete this new subsection header so that this paragraph goes
in the existing "Verity files" subsection.

Also, Documentation/filesystems/fsverity.rst already mentions DAX (similar to
fscrypt.rst). Is it intentional that you added this to the ext4-specific
documentation instead?

- Eric

2020-05-16 02:04:05

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH 3/9] fs/ext4: Disallow encryption if inode is DAX

On Tue, May 12, 2020 at 10:43:18PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Encryption and DAX are incompatible. Changing the DAX mode due to a
> change in Encryption mode is wrong without a corresponding
> address_space_operations update.
>
> Make the 2 options mutually exclusive by returning an error if DAX was
> set first.
>
> Furthermore, clarify the documentation of the exclusivity and how that
> will work.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes:
> remove WARN_ON_ONCE
> Add documentation to the encrypt doc WRT DAX
> ---
> Documentation/filesystems/fscrypt.rst | 4 +++-
> fs/ext4/super.c | 10 +---------
> 2 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
> index aa072112cfff..1475b8d52fef 100644
> --- a/Documentation/filesystems/fscrypt.rst
> +++ b/Documentation/filesystems/fscrypt.rst
> @@ -1038,7 +1038,9 @@ astute users may notice some differences in behavior:
> - The ext4 filesystem does not support data journaling with encrypted
> regular files. It will fall back to ordered data mode instead.
>
> -- DAX (Direct Access) is not supported on encrypted files.
> +- DAX (Direct Access) is not supported on encrypted files. Attempts to enable
> + DAX on an encrypted file will fail. Mount options will _not_ enable DAX on
> + encrypted files.
>
> - The st_size of an encrypted symlink will not necessarily give the
> length of the symlink target as required by POSIX. It will actually
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index bf5fcb477f66..9873ab27e3fa 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1320,7 +1320,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> if (inode->i_ino == EXT4_ROOT_INO)
> return -EPERM;
>
> - if (WARN_ON_ONCE(IS_DAX(inode) && i_size_read(inode)))
> + if (IS_DAX(inode))
> return -EINVAL;
>
> res = ext4_convert_inline_data(inode);
> @@ -1344,10 +1344,6 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
> ext4_clear_inode_state(inode,
> EXT4_STATE_MAY_INLINE_DATA);
> - /*
> - * Update inode->i_flags - S_ENCRYPTED will be enabled,
> - * S_DAX may be disabled
> - */
> ext4_set_inode_flags(inode);
> }
> return res;
> @@ -1371,10 +1367,6 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> ctx, len, 0);
> if (!res) {
> ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
> - /*
> - * Update inode->i_flags - S_ENCRYPTED will be enabled,
> - * S_DAX may be disabled
> - */
> ext4_set_inode_flags(inode);
> res = ext4_mark_inode_dirty(handle, inode);
> if (res)

I'm confused by the ext4_set_context() change.

ext4_set_context() is only called when FS_IOC_SET_ENCRYPTION_POLICY sets an
encryption policy on an empty directory, *or* when a new inode (regular, dir, or
symlink) is created in an encrypted directory (thus inheriting encryption from
its parent).

So when is it reachable when IS_DAX()? Is the issue that the DAX flag can now
be set on directories? The commit message doesn't seem to be talking about
directories. Is the behavior we want is that on an (empty) directory with the
DAX flag set, FS_IOC_SET_ENCRYPTION_POLICY should fail with EINVAL?

I don't see why the i_size_read(inode) check is there though, so I think you're
at least right to remove that.

- Eric

2020-05-18 05:03:56

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 3/9] fs/ext4: Disallow encryption if inode is DAX

On Fri, May 15, 2020 at 07:02:53PM -0700, Eric Biggers wrote:
> On Tue, May 12, 2020 at 10:43:18PM -0700, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > Encryption and DAX are incompatible. Changing the DAX mode due to a
> > change in Encryption mode is wrong without a corresponding
> > address_space_operations update.
> >
> > Make the 2 options mutually exclusive by returning an error if DAX was
> > set first.
> >
> > Furthermore, clarify the documentation of the exclusivity and how that
> > will work.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
> > Changes:
> > remove WARN_ON_ONCE
> > Add documentation to the encrypt doc WRT DAX
> > ---
> > Documentation/filesystems/fscrypt.rst | 4 +++-
> > fs/ext4/super.c | 10 +---------
> > 2 files changed, 4 insertions(+), 10 deletions(-)
> >
> > diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
> > index aa072112cfff..1475b8d52fef 100644
> > --- a/Documentation/filesystems/fscrypt.rst
> > +++ b/Documentation/filesystems/fscrypt.rst
> > @@ -1038,7 +1038,9 @@ astute users may notice some differences in behavior:
> > - The ext4 filesystem does not support data journaling with encrypted
> > regular files. It will fall back to ordered data mode instead.
> >
> > -- DAX (Direct Access) is not supported on encrypted files.
> > +- DAX (Direct Access) is not supported on encrypted files. Attempts to enable
> > + DAX on an encrypted file will fail. Mount options will _not_ enable DAX on
> > + encrypted files.
> >
> > - The st_size of an encrypted symlink will not necessarily give the
> > length of the symlink target as required by POSIX. It will actually
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index bf5fcb477f66..9873ab27e3fa 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -1320,7 +1320,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> > if (inode->i_ino == EXT4_ROOT_INO)
> > return -EPERM;
> >
> > - if (WARN_ON_ONCE(IS_DAX(inode) && i_size_read(inode)))
> > + if (IS_DAX(inode))
> > return -EINVAL;
> >
> > res = ext4_convert_inline_data(inode);
> > @@ -1344,10 +1344,6 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> > ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
> > ext4_clear_inode_state(inode,
> > EXT4_STATE_MAY_INLINE_DATA);
> > - /*
> > - * Update inode->i_flags - S_ENCRYPTED will be enabled,
> > - * S_DAX may be disabled
> > - */
> > ext4_set_inode_flags(inode);
> > }
> > return res;
> > @@ -1371,10 +1367,6 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> > ctx, len, 0);
> > if (!res) {
> > ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
> > - /*
> > - * Update inode->i_flags - S_ENCRYPTED will be enabled,
> > - * S_DAX may be disabled
> > - */
> > ext4_set_inode_flags(inode);
> > res = ext4_mark_inode_dirty(handle, inode);
> > if (res)
>
> I'm confused by the ext4_set_context() change.
>
> ext4_set_context() is only called when FS_IOC_SET_ENCRYPTION_POLICY sets an
> encryption policy on an empty directory, *or* when a new inode (regular, dir, or
> symlink) is created in an encrypted directory (thus inheriting encryption from
> its parent).

I don't see the check which prevents FS_IOC_SET_ENCRYPTION_POLICY on a file?

On inode creation, encryption will always usurp S_DAX...

>
> So when is it reachable when IS_DAX()? Is the issue that the DAX flag can now
> be set on directories? The commit message doesn't seem to be talking about
> directories. Is the behavior we want is that on an (empty) directory with the
> DAX flag set, FS_IOC_SET_ENCRYPTION_POLICY should fail with EINVAL?

We would want that but AFIAK S_DAX is never set on directories. Perhaps this
is another place where S_DAX needs to be changed to the new inode flag?
However, this would not be appropriate at this point in the series. At this
point in the series S_DAX is still set based on the mount option and I'm 99%
sure that only happens on regular files, not directories. So I'm confused now.

This is, AFAICS, not going to affect correctness. It will only be confusing
because the user will be able to set both DAX and encryption on the directory
but files there will only see encryption being used... :-(

Assuming you are correct about this call path only being valid on directories.
It seems this IS_DAX() needs to be changed to check for EXT4_DAX_FL in
"fs/ext4: Introduce DAX inode flag"? Then at that point we can prevent DAX and
encryption on a directory. ... and at this point IS_DAX() could be removed at
this point in the series???

>
> I don't see why the i_size_read(inode) check is there though, so I think you're
> at least right to remove that.

Agreed.
Ira

>
> - Eric

2020-05-18 05:32:51

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 2/9] fs/ext4: Disallow verity if inode is DAX

On Fri, May 15, 2020 at 06:49:16PM -0700, Eric Biggers wrote:
> On Tue, May 12, 2020 at 10:43:17PM -0700, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > Verity and DAX are incompatible. Changing the DAX mode due to a verity
> > flag change is wrong without a corresponding address_space_operations
> > update.
> >
> > Make the 2 options mutually exclusive by returning an error if DAX was
> > set first.
> >
> > (Setting DAX is already disabled if Verity is set first.)
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
> > Changes:
> > remove WARN_ON_ONCE
> > Add documentation for DAX/Verity exclusivity
> > ---
> > Documentation/filesystems/ext4/verity.rst | 7 +++++++
> > fs/ext4/verity.c | 3 +++
> > 2 files changed, 10 insertions(+)
> >
> > diff --git a/Documentation/filesystems/ext4/verity.rst b/Documentation/filesystems/ext4/verity.rst
> > index 3e4c0ee0e068..51ab1aa17e59 100644
> > --- a/Documentation/filesystems/ext4/verity.rst
> > +++ b/Documentation/filesystems/ext4/verity.rst
> > @@ -39,3 +39,10 @@ is encrypted as well as the data itself.
> >
> > Verity files cannot have blocks allocated past the end of the verity
> > metadata.
> > +
> > +Verity and DAX
> > +--------------
> > +
> > +Verity and DAX are not compatible and attempts to set both of these flags on a
> > +file will fail.
> > +
>
> If you build the documentation, this shows up as its own subsection
> "2.13. Verity and DAX" alongside "2.12. Verity files", which looks odd.
> I think you should delete this new subsection header so that this paragraph goes
> in the existing "Verity files" subsection.

Ok... I'll fix it up...

>
> Also, Documentation/filesystems/fsverity.rst already mentions DAX (similar to
> fscrypt.rst). Is it intentional that you added this to the ext4-specific
> documentation instead?

I proposed this text[1] and there were no objections... I was looking at ext4
because only ext4 supports verity and DAX. I think having this in both the
ext4 docs and the verity docs helps.

Ira

[1] https://lore.kernel.org/lkml/[email protected]/

>
> - Eric

2020-05-18 16:25:44

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH 3/9] fs/ext4: Disallow encryption if inode is DAX

On Sun, May 17, 2020 at 10:03:15PM -0700, Ira Weiny wrote:
> On Fri, May 15, 2020 at 07:02:53PM -0700, Eric Biggers wrote:
> > On Tue, May 12, 2020 at 10:43:18PM -0700, [email protected] wrote:
> > > From: Ira Weiny <[email protected]>
> > >
> > > Encryption and DAX are incompatible. Changing the DAX mode due to a
> > > change in Encryption mode is wrong without a corresponding
> > > address_space_operations update.
> > >
> > > Make the 2 options mutually exclusive by returning an error if DAX was
> > > set first.
> > >
> > > Furthermore, clarify the documentation of the exclusivity and how that
> > > will work.
> > >
> > > Signed-off-by: Ira Weiny <[email protected]>
> > >
> > > ---
> > > Changes:
> > > remove WARN_ON_ONCE
> > > Add documentation to the encrypt doc WRT DAX
> > > ---
> > > Documentation/filesystems/fscrypt.rst | 4 +++-
> > > fs/ext4/super.c | 10 +---------
> > > 2 files changed, 4 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
> > > index aa072112cfff..1475b8d52fef 100644
> > > --- a/Documentation/filesystems/fscrypt.rst
> > > +++ b/Documentation/filesystems/fscrypt.rst
> > > @@ -1038,7 +1038,9 @@ astute users may notice some differences in behavior:
> > > - The ext4 filesystem does not support data journaling with encrypted
> > > regular files. It will fall back to ordered data mode instead.
> > >
> > > -- DAX (Direct Access) is not supported on encrypted files.
> > > +- DAX (Direct Access) is not supported on encrypted files. Attempts to enable
> > > + DAX on an encrypted file will fail. Mount options will _not_ enable DAX on
> > > + encrypted files.
> > >
> > > - The st_size of an encrypted symlink will not necessarily give the
> > > length of the symlink target as required by POSIX. It will actually
> > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > > index bf5fcb477f66..9873ab27e3fa 100644
> > > --- a/fs/ext4/super.c
> > > +++ b/fs/ext4/super.c
> > > @@ -1320,7 +1320,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> > > if (inode->i_ino == EXT4_ROOT_INO)
> > > return -EPERM;
> > >
> > > - if (WARN_ON_ONCE(IS_DAX(inode) && i_size_read(inode)))
> > > + if (IS_DAX(inode))
> > > return -EINVAL;
> > >
> > > res = ext4_convert_inline_data(inode);
> > > @@ -1344,10 +1344,6 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> > > ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
> > > ext4_clear_inode_state(inode,
> > > EXT4_STATE_MAY_INLINE_DATA);
> > > - /*
> > > - * Update inode->i_flags - S_ENCRYPTED will be enabled,
> > > - * S_DAX may be disabled
> > > - */
> > > ext4_set_inode_flags(inode);
> > > }
> > > return res;
> > > @@ -1371,10 +1367,6 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> > > ctx, len, 0);
> > > if (!res) {
> > > ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
> > > - /*
> > > - * Update inode->i_flags - S_ENCRYPTED will be enabled,
> > > - * S_DAX may be disabled
> > > - */
> > > ext4_set_inode_flags(inode);
> > > res = ext4_mark_inode_dirty(handle, inode);
> > > if (res)
> >
> > I'm confused by the ext4_set_context() change.
> >
> > ext4_set_context() is only called when FS_IOC_SET_ENCRYPTION_POLICY sets an
> > encryption policy on an empty directory, *or* when a new inode (regular, dir, or
> > symlink) is created in an encrypted directory (thus inheriting encryption from
> > its parent).
>
> I don't see the check which prevents FS_IOC_SET_ENCRYPTION_POLICY on a file?

It's in fscrypt_ioctl_set_policy().

>
> On inode creation, encryption will always usurp S_DAX...
>
> >
> > So when is it reachable when IS_DAX()? Is the issue that the DAX flag can now
> > be set on directories? The commit message doesn't seem to be talking about
> > directories. Is the behavior we want is that on an (empty) directory with the
> > DAX flag set, FS_IOC_SET_ENCRYPTION_POLICY should fail with EINVAL?
>
> We would want that but AFIAK S_DAX is never set on directories. Perhaps this
> is another place where S_DAX needs to be changed to the new inode flag?
> However, this would not be appropriate at this point in the series. At this
> point in the series S_DAX is still set based on the mount option and I'm 99%
> sure that only happens on regular files, not directories. So I'm confused now.

S_DAX is only set by ext4_set_inode_flags() which only sets it on regular files.

>
> This is, AFAICS, not going to affect correctness. It will only be confusing
> because the user will be able to set both DAX and encryption on the directory
> but files there will only see encryption being used... :-(
>
> Assuming you are correct about this call path only being valid on directories.
> It seems this IS_DAX() needs to be changed to check for EXT4_DAX_FL in
> "fs/ext4: Introduce DAX inode flag"? Then at that point we can prevent DAX and
> encryption on a directory. ... and at this point IS_DAX() could be removed at
> this point in the series???

I haven't read the whole series, but if you are indeed trying to prevent a
directory with EXT4_DAX_FL from being encrypted, then it does look like you'd
need to check EXT4_DAX_FL, not S_DAX.

The other question is what should happen when a file is created in an encrypted
directory when the filesystem is mounted with -o dax. Actually, I think I
missed something there. Currently (based on reading the code) the DAX flag will
get set first, and then ext4_set_context() will see IS_DAX() && i_size == 0 and
clear the DAX flag when setting the encrypt flag. So, the i_size == 0 check is
actually needed. Your patch (AFAICS) just makes creating an encrypted file fail
when '-o dax'. Is that intended? If not, maybe you should change it to check
S_NEW instead of i_size == 0 to make it clearer?

- Eric

2020-05-18 20:03:58

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH 3/9] fs/ext4: Disallow encryption if inode is DAX

On Mon, May 18, 2020 at 12:23:57PM -0700, Ira Weiny wrote:
> >
> > The other question is what should happen when a file is created in an encrypted
> > directory when the filesystem is mounted with -o dax. Actually, I think I
> > missed something there. Currently (based on reading the code) the DAX flag will
> > get set first, and then ext4_set_context()
>
> See this is where I am confused. Above you said that ext4_set_context() is only
> called on a directory. And I agree with you now having seen the check in
> fscrypt_ioctl_set_policy(). So what is the call path you are speaking of here?

Here's what I actually said:

ext4_set_context() is only called when FS_IOC_SET_ENCRYPTION_POLICY sets
an encryption policy on an empty directory, *or* when a new inode
(regular, dir, or symlink) is created in an encrypted directory (thus
inheriting encryption from its parent).

Just find the places where ->set_context() is called and follow them backwards.

- Eric

2020-05-20 02:03:23

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 3/9] fs/ext4: Disallow encryption if inode is DAX

On Mon, May 18, 2020 at 09:24:47AM -0700, Eric Biggers wrote:
> On Sun, May 17, 2020 at 10:03:15PM -0700, Ira Weiny wrote:

First off... OMG...

I'm seeing some possible user pitfalls which are complicating things IMO. It
probably does not matter because most users don't care and have either enabled
DAX on _every_ mount or _not_ enabled DAX on _every_ mount. And have _not_
used verity nor encryption while using DAX.

Verity is a bit easier because verity is not inherited and we only need to
protect against setting it if DAX is on.

However, it can be weird for the user thusly:

1) mount _without_ DAX
2) enable verity on individual inodes
3) unmount/mount _with_ DAX

Now the verity files are not enabled for DAX without any indication... <sigh>
This is still true with my patch. But at least it closes the hole of trying to
change the DAX flag after the fact (because verity was set).

Also both this check and the verity need to be maintained to keep the mount
option working as it was before...

For encryption it is more complicated because encryption can be set on
directories and inherited so the IS_DAX() check does nothing while '-o dax' is
used. Therefore users can:

1) mount _with_ DAX
2) enable encryption on a directory
3) files created in that directory will not have DAX set

And I now understand why the WARN_ON() was there... To tell users about this
craziness.

...

> > This is, AFAICS, not going to affect correctness. It will only be confusing
> > because the user will be able to set both DAX and encryption on the directory
> > but files there will only see encryption being used... :-(
> >
> > Assuming you are correct about this call path only being valid on directories.
> > It seems this IS_DAX() needs to be changed to check for EXT4_DAX_FL in
> > "fs/ext4: Introduce DAX inode flag"? Then at that point we can prevent DAX and
> > encryption on a directory. ... and at this point IS_DAX() could be removed at
> > this point in the series???
>
> I haven't read the whole series, but if you are indeed trying to prevent a
> directory with EXT4_DAX_FL from being encrypted, then it does look like you'd
> need to check EXT4_DAX_FL, not S_DAX.
>
> The other question is what should happen when a file is created in an encrypted
> directory when the filesystem is mounted with -o dax. Actually, I think I
> missed something there. Currently (based on reading the code) the DAX flag will
> get set first, and then ext4_set_context() will see IS_DAX() && i_size == 0 and
> clear the DAX flag when setting the encrypt flag.

I think you are correct.

>
> So, the i_size == 0 check is actually needed.
> Your patch (AFAICS) just makes creating an encrypted file fail
> when '-o dax'. Is that intended?

Yes that is what I intended but it is more complicated I see now.

The intent is that IS_DAX() should _never_ be true on an encrypted or verity
file... even if -o dax is specified. Because IS_DAX() should be a result of
the inode flags being checked. The order of the setting of those flags is a
bit odd for the encrypted case. I don't really like that DAX is set then
un-set. It is convoluted but I'm not clear right now how to fix it.

> If not, maybe you should change it to check
> S_NEW instead of i_size == 0 to make it clearer?

The patch is completely unnecessary.

It is much easier to make (EXT4_ENCRYPT_FL | EXT4_VERITY_FL) incompatible with
EXT4_DAX_FL when it is introduced later in the series. Furthermore this mutual
exclusion can be done on directories in the encrypt case. Which I think will
be nicer for the user if they get an error when trying to set one when the other
is set.

Ira

2020-05-20 13:14:41

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 3/9] fs/ext4: Disallow encryption if inode is DAX

On Tue 19-05-20 19:02:33, Ira Weiny wrote:
> On Mon, May 18, 2020 at 09:24:47AM -0700, Eric Biggers wrote:
> > On Sun, May 17, 2020 at 10:03:15PM -0700, Ira Weiny wrote:
>
> First off... OMG...
>
> I'm seeing some possible user pitfalls which are complicating things IMO. It
> probably does not matter because most users don't care and have either enabled
> DAX on _every_ mount or _not_ enabled DAX on _every_ mount. And have _not_
> used verity nor encryption while using DAX.
>
> Verity is a bit easier because verity is not inherited and we only need to
> protect against setting it if DAX is on.
>
> However, it can be weird for the user thusly:
>
> 1) mount _without_ DAX
> 2) enable verity on individual inodes
> 3) unmount/mount _with_ DAX
>
> Now the verity files are not enabled for DAX without any indication...
> <sigh> This is still true with my patch. But at least it closes the hole
> of trying to change the DAX flag after the fact (because verity was set).
>
> Also both this check and the verity need to be maintained to keep the mount
> option working as it was before...
>
> For encryption it is more complicated because encryption can be set on
> directories and inherited so the IS_DAX() check does nothing while '-o
> dax' is used. Therefore users can:
>
> 1) mount _with_ DAX
> 2) enable encryption on a directory
> 3) files created in that directory will not have DAX set
>
> And I now understand why the WARN_ON() was there... To tell users about this
> craziness.

Thanks for digging into this! I agree that just not setting S_DAX where
other inode features disallow that is probably the best.

> > > This is, AFAICS, not going to affect correctness. It will only be confusing
> > > because the user will be able to set both DAX and encryption on the directory
> > > but files there will only see encryption being used... :-(
> > >
> > > Assuming you are correct about this call path only being valid on directories.
> > > It seems this IS_DAX() needs to be changed to check for EXT4_DAX_FL in
> > > "fs/ext4: Introduce DAX inode flag"? Then at that point we can prevent DAX and
> > > encryption on a directory. ... and at this point IS_DAX() could be removed at
> > > this point in the series???
> >
> > I haven't read the whole series, but if you are indeed trying to prevent a
> > directory with EXT4_DAX_FL from being encrypted, then it does look like you'd
> > need to check EXT4_DAX_FL, not S_DAX.
> >
> > The other question is what should happen when a file is created in an encrypted
> > directory when the filesystem is mounted with -o dax. Actually, I think I
> > missed something there. Currently (based on reading the code) the DAX flag will
> > get set first, and then ext4_set_context() will see IS_DAX() && i_size == 0 and
> > clear the DAX flag when setting the encrypt flag.
>
> I think you are correct.
>
> >
> > So, the i_size == 0 check is actually needed.
> > Your patch (AFAICS) just makes creating an encrypted file fail
> > when '-o dax'. Is that intended?
>
> Yes that is what I intended but it is more complicated I see now.
>
> The intent is that IS_DAX() should _never_ be true on an encrypted or verity
> file... even if -o dax is specified. Because IS_DAX() should be a result of
> the inode flags being checked. The order of the setting of those flags is a
> bit odd for the encrypted case. I don't really like that DAX is set then
> un-set. It is convoluted but I'm not clear right now how to fix it.
>
> > If not, maybe you should change it to check
> > S_NEW instead of i_size == 0 to make it clearer?
>
> The patch is completely unnecessary.
>
> It is much easier to make (EXT4_ENCRYPT_FL | EXT4_VERITY_FL) incompatible
> with EXT4_DAX_FL when it is introduced later in the series. Furthermore
> this mutual exclusion can be done on directories in the encrypt case.
> Which I think will be nicer for the user if they get an error when trying
> to set one when the other is set.

Agreed.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR