2017-09-12 05:05:31

by Ross Zwisler

[permalink] [raw]
Subject: [PATCH v2 0/5] ext4: DAX data corruption fixes

This series prevents a pair of data corruptions with ext4 + DAX. The first
such corruption happens when combining the inline data feature with DAX,
and the second happens when combining data journaling with DAX.

Both can be reliably reproduced with the fstests that I have posted here:

https://patchwork.kernel.org/patch/9948377/
https://patchwork.kernel.org/patch/9948381/

My opinion is that the first three patches in this series should be applied
to the v4.14 RC series and backported to stable. The last two patches in
this series are just cleanup and can probably wait until v4.15.

Ross Zwisler (5):
ext4: prevent data corruption with inline data + DAX
ext4: prevent data corruption with journaling + DAX
ext4: add sanity check for encryption + DAX
ext4: add ext4_should_use_dax()
ext4: remove duplicate extended attributes defs

fs/ext4/ext4.h | 37 -------------------------------------
fs/ext4/inline.c | 10 ----------
fs/ext4/inode.c | 24 ++++++++++++++++--------
fs/ext4/ioctl.c | 16 +++++++++++++---
fs/ext4/super.c | 8 ++++++++
5 files changed, 37 insertions(+), 58 deletions(-)

--
2.9.5


2017-09-12 05:05:39

by Ross Zwisler

[permalink] [raw]
Subject: [PATCH v2 2/5] ext4: prevent data corruption with journaling + DAX

The current code has the potential for data corruption when changing an
inode's journaling mode, as that can result in a subsequent unsafe change
in S_DAX.

I've captured an instance of this data corruption in the following fstest:

https://patchwork.kernel.org/patch/9948377/

Prevent this data corruption from happening by disallowing changes to the
journaling mode if the '-o dax' mount option was used. This means that for
a given filesystem we could have a mix of inodes using either DAX or
data journaling, but whatever state the inodes are in will be held for the
duration of the mount.

Signed-off-by: Ross Zwisler <[email protected]>
Suggested-by: Jan Kara <[email protected]>
---
fs/ext4/inode.c | 5 -----
fs/ext4/ioctl.c | 16 +++++++++++++---
2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e963508..3207333 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5971,11 +5971,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA);
}
ext4_set_aops(inode);
- /*
- * Update inode->i_flags after EXT4_INODE_JOURNAL_DATA was updated.
- * E.g. S_DAX may get cleared / set.
- */
- ext4_set_inode_flags(inode);

jbd2_journal_unlock_updates(journal);
percpu_up_write(&sbi->s_journal_flag_rwsem);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index afb66d4..b0b754b 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -290,10 +290,20 @@ static int ext4_ioctl_setflags(struct inode *inode,
if (err)
goto flags_out;

- if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
+ if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
+ /*
+ * Changes to the journaling mode can cause unsafe changes to
+ * S_DAX if we are using the DAX mount option.
+ */
+ if (test_opt(inode->i_sb, DAX)) {
+ err = -EBUSY;
+ goto flags_out;
+ }
+
err = ext4_change_inode_journal_flag(inode, jflag);
- if (err)
- goto flags_out;
+ if (err)
+ goto flags_out;
+ }
if (migrate) {
if (flags & EXT4_EXTENTS_FL)
err = ext4_ext_migrate(inode);
--
2.9.5

2017-09-12 05:05:37

by Ross Zwisler

[permalink] [raw]
Subject: [PATCH v2 3/5] ext4: add sanity check for encryption + DAX

We prevent DAX from being used on inodes which are using ext4's built in
encryption via a check in ext4_set_inode_flags(). We do have what appears
to be an unsafe transition of S_DAX in ext4_set_context(), though, where
S_DAX can get disabled without us doing a proper writeback + invalidate.

There are also issues with mm-level races when changing the value of S_DAX,
as well as issues with the VM_MIXEDMAP flag:

https://www.spinics.net/lists/linux-xfs/msg09859.html

I actually think we are safe in this case because of the following:

1) You can't encrypt an existing file. Encryption can only be set on an
empty directory, with new inodes in that directory being created with
encryption turned on, so I don't think it's possible to turn encryption on
for a file that has open DAX mmaps or outstanding I/Os.

2) There is no way to turn encryption off on a given file. Once an inode
is encrypted, it stays encrypted for the life of that inode, so we don't
have to worry about the case where we turn encryption off and S_DAX
suddenly turns on.

3) The only way we end up in ext4_set_context() to turn on encryption is
when we are creating a new file in the encrypted directory. This happens
as part of ext4_create() before the inode has been allowed to do any I/O.
Here's the call tree:

ext4_create()
__ext4_new_inode()
ext4_set_inode_flags() // sets S_DAX
fscrypt_inherit_context()
fscrypt_get_encryption_info();
ext4_set_context() // sets EXT4_INODE_ENCRYPT, clears S_DAX

So, I actually think it's safe to transition S_DAX in ext4_set_context()
without any locking, writebacks or invalidations. I've added a
WARN_ON_ONCE() sanity check to make sure that we are notified if we ever
encounter a case where we are encrypting an inode that already has data,
in which case we need to add code to safely transition S_DAX.

Signed-off-by: Ross Zwisler <[email protected]>
CC: [email protected]
---
fs/ext4/super.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4251e50..c090780 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1159,6 +1159,9 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
if (inode->i_ino == EXT4_ROOT_INO)
return -EPERM;

+ if (WARN_ON_ONCE(IS_DAX(inode) && i_size_read(inode)))
+ return -EINVAL;
+
res = ext4_convert_inline_data(inode);
if (res)
return res;
--
2.9.5

2017-09-12 05:06:08

by Ross Zwisler

[permalink] [raw]
Subject: [PATCH v2 4/5] ext4: add ext4_should_use_dax()

This helper, in the spirit of ext4_should_dioread_nolock() et al., replaces
the complex conditional in ext4_set_inode_flags().

Signed-off-by: Ross Zwisler <[email protected]>
---
fs/ext4/inode.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3207333..525dd63 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4577,6 +4577,21 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
!ext4_test_inode_state(inode, EXT4_STATE_XATTR));
}

+static bool ext4_should_use_dax(struct inode *inode)
+{
+ if (!test_opt(inode->i_sb, DAX))
+ return false;
+ if (!S_ISREG(inode->i_mode))
+ return false;
+ if (ext4_should_journal_data(inode))
+ return false;
+ if (ext4_has_inline_data(inode))
+ return false;
+ if (ext4_encrypted_inode(inode))
+ return false;
+ return true;
+}
+
void ext4_set_inode_flags(struct inode *inode)
{
unsigned int flags = EXT4_I(inode)->i_flags;
@@ -4592,9 +4607,7 @@ void ext4_set_inode_flags(struct inode *inode)
new_fl |= S_NOATIME;
if (flags & EXT4_DIRSYNC_FL)
new_fl |= S_DIRSYNC;
- if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode) &&
- !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) &&
- !ext4_encrypted_inode(inode))
+ if (ext4_should_use_dax(inode))
new_fl |= S_DAX;
inode_set_flags(inode, new_fl,
S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX);
--
2.9.5

2017-09-12 05:06:07

by Ross Zwisler

[permalink] [raw]
Subject: [PATCH v2 5/5] ext4: remove duplicate extended attributes defs

The following commit:

commit 9b7365fc1c82 ("ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR
interface support")

added several defines related to extended attributes to ext4.h. They were
added within an #ifndef FS_IOC_FSGETXATTR block with the comment:

/* Until the uapi changes get merged for project quota... */

Those uapi changes were merged by this commit:

commit 334e580a6f97 ("fs: XFS_IOC_FS[SG]SETXATTR to FS_IOC_FS[SG]ETXATTR
promotion")

so all the definitions needed by ext4 are available in
include/uapi/linux/fs.h. Remove the duplicates from ext4.h.

Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Li Xi <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dave Chinner <[email protected]>
---
fs/ext4/ext4.h | 37 -------------------------------------
1 file changed, 37 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 84b9da1..83a857f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -644,43 +644,6 @@ enum {
#define EXT4_IOC_GET_ENCRYPTION_PWSALT FS_IOC_GET_ENCRYPTION_PWSALT
#define EXT4_IOC_GET_ENCRYPTION_POLICY FS_IOC_GET_ENCRYPTION_POLICY

-#ifndef FS_IOC_FSGETXATTR
-/* Until the uapi changes get merged for project quota... */
-
-#define FS_IOC_FSGETXATTR _IOR('X', 31, struct fsxattr)
-#define FS_IOC_FSSETXATTR _IOW('X', 32, struct fsxattr)
-
-/*
- * Structure for FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR.
- */
-struct fsxattr {
- __u32 fsx_xflags; /* xflags field value (get/set) */
- __u32 fsx_extsize; /* extsize field value (get/set)*/
- __u32 fsx_nextents; /* nextents field value (get) */
- __u32 fsx_projid; /* project identifier (get/set) */
- unsigned char fsx_pad[12];
-};
-
-/*
- * Flags for the fsx_xflags field
- */
-#define FS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
-#define FS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
-#define FS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
-#define FS_XFLAG_APPEND 0x00000010 /* all writes append */
-#define FS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
-#define FS_XFLAG_NOATIME 0x00000040 /* do not update access time */
-#define FS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
-#define FS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
-#define FS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
-#define FS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
-#define FS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
-#define FS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
-#define FS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
-#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
-#define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
-#endif /* !defined(FS_IOC_FSGETXATTR) */
-
#define EXT4_IOC_FSGETXATTR FS_IOC_FSGETXATTR
#define EXT4_IOC_FSSETXATTR FS_IOC_FSSETXATTR

--
2.9.5

2017-09-12 05:06:44

by Ross Zwisler

[permalink] [raw]
Subject: [PATCH v2 1/5] ext4: prevent data corruption with inline data + DAX

If an inode has inline data it is currently prevented from using DAX by a
check in ext4_set_inode_flags(). When the inode grows inline data via
ext4_create_inline_data() or removes its inline data via
ext4_destroy_inline_data_nolock(), the value of S_DAX can change.

Currently these changes are unsafe because we don't hold off page faults
and I/O, write back dirty radix tree entries and invalidate all mappings.
There are also issues with mm-level races when changing the value of S_DAX,
as well as issues with the VM_MIXEDMAP flag:

https://www.spinics.net/lists/linux-xfs/msg09859.html

The unsafe transition of S_DAX can reliably cause data corruption, as shown
by the following fstest:

https://patchwork.kernel.org/patch/9948381/

Fix this issue by preventing the DAX mount option from being used on
filesystems that were created to support inline data. Inline data is an
option given to mkfs.ext4.

Signed-off-by: Ross Zwisler <[email protected]>
CC: [email protected]
---
fs/ext4/inline.c | 10 ----------
fs/ext4/super.c | 5 +++++
2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 28c5c3a..fd95019 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -302,11 +302,6 @@ static int ext4_create_inline_data(handle_t *handle,
EXT4_I(inode)->i_inline_size = len + EXT4_MIN_INLINE_DATA_SIZE;
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
- /*
- * Propagate changes to inode->i_flags as well - e.g. S_DAX may
- * get cleared
- */
- ext4_set_inode_flags(inode);
get_bh(is.iloc.bh);
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);

@@ -451,11 +446,6 @@ static int ext4_destroy_inline_data_nolock(handle_t *handle,
}
}
ext4_clear_inode_flag(inode, EXT4_INODE_INLINE_DATA);
- /*
- * Propagate changes to inode->i_flags as well - e.g. S_DAX may
- * get set.
- */
- ext4_set_inode_flags(inode);

get_bh(is.iloc.bh);
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c9e7be5..4251e50 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3707,6 +3707,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
}

if (sbi->s_mount_opt & EXT4_MOUNT_DAX) {
+ if (ext4_has_feature_inline_data(sb)) {
+ ext4_msg(sb, KERN_ERR, "Cannot use DAX on a filesystem"
+ " that may contain inline data");
+ goto failed_mount;
+ }
err = bdev_dax_supported(sb, blocksize);
if (err)
goto failed_mount;
--
2.9.5

2017-09-12 06:38:42

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] ext4: prevent data corruption with inline data + DAX

On Mon 11-09-17 23:05:22, Ross Zwisler wrote:
> If an inode has inline data it is currently prevented from using DAX by a
> check in ext4_set_inode_flags(). When the inode grows inline data via
> ext4_create_inline_data() or removes its inline data via
> ext4_destroy_inline_data_nolock(), the value of S_DAX can change.
>
> Currently these changes are unsafe because we don't hold off page faults
> and I/O, write back dirty radix tree entries and invalidate all mappings.
> There are also issues with mm-level races when changing the value of S_DAX,
> as well as issues with the VM_MIXEDMAP flag:
>
> https://www.spinics.net/lists/linux-xfs/msg09859.html
>
> The unsafe transition of S_DAX can reliably cause data corruption, as shown
> by the following fstest:
>
> https://patchwork.kernel.org/patch/9948381/
>
> Fix this issue by preventing the DAX mount option from being used on
> filesystems that were created to support inline data. Inline data is an
> option given to mkfs.ext4.
>
> Signed-off-by: Ross Zwisler <[email protected]>
> CC: [email protected]

Looks good. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inline.c | 10 ----------
> fs/ext4/super.c | 5 +++++
> 2 files changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
> index 28c5c3a..fd95019 100644
> --- a/fs/ext4/inline.c
> +++ b/fs/ext4/inline.c
> @@ -302,11 +302,6 @@ static int ext4_create_inline_data(handle_t *handle,
> EXT4_I(inode)->i_inline_size = len + EXT4_MIN_INLINE_DATA_SIZE;
> ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
> ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
> - /*
> - * Propagate changes to inode->i_flags as well - e.g. S_DAX may
> - * get cleared
> - */
> - ext4_set_inode_flags(inode);
> get_bh(is.iloc.bh);
> error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
>
> @@ -451,11 +446,6 @@ static int ext4_destroy_inline_data_nolock(handle_t *handle,
> }
> }
> ext4_clear_inode_flag(inode, EXT4_INODE_INLINE_DATA);
> - /*
> - * Propagate changes to inode->i_flags as well - e.g. S_DAX may
> - * get set.
> - */
> - ext4_set_inode_flags(inode);
>
> get_bh(is.iloc.bh);
> error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index c9e7be5..4251e50 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3707,6 +3707,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> }
>
> if (sbi->s_mount_opt & EXT4_MOUNT_DAX) {
> + if (ext4_has_feature_inline_data(sb)) {
> + ext4_msg(sb, KERN_ERR, "Cannot use DAX on a filesystem"
> + " that may contain inline data");
> + goto failed_mount;
> + }
> err = bdev_dax_supported(sb, blocksize);
> if (err)
> goto failed_mount;
> --
> 2.9.5
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-09-12 06:41:47

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] ext4: prevent data corruption with journaling + DAX

On Mon 11-09-17 23:05:23, Ross Zwisler wrote:
> The current code has the potential for data corruption when changing an
> inode's journaling mode, as that can result in a subsequent unsafe change
> in S_DAX.
>
> I've captured an instance of this data corruption in the following fstest:
>
> https://patchwork.kernel.org/patch/9948377/
>
> Prevent this data corruption from happening by disallowing changes to the
> journaling mode if the '-o dax' mount option was used. This means that for
> a given filesystem we could have a mix of inodes using either DAX or
> data journaling, but whatever state the inodes are in will be held for the
> duration of the mount.
>
> Signed-off-by: Ross Zwisler <[email protected]>
> Suggested-by: Jan Kara <[email protected]>

I guess this is fine for now to stop corrupting data so:

Reviewed-by: Jan Kara <[email protected]>

But I think we should work on more user friendly (i.e., permissive)
version.

Honza

> ---
> fs/ext4/inode.c | 5 -----
> fs/ext4/ioctl.c | 16 +++++++++++++---
> 2 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e963508..3207333 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5971,11 +5971,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA);
> }
> ext4_set_aops(inode);
> - /*
> - * Update inode->i_flags after EXT4_INODE_JOURNAL_DATA was updated.
> - * E.g. S_DAX may get cleared / set.
> - */
> - ext4_set_inode_flags(inode);
>
> jbd2_journal_unlock_updates(journal);
> percpu_up_write(&sbi->s_journal_flag_rwsem);
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index afb66d4..b0b754b 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -290,10 +290,20 @@ static int ext4_ioctl_setflags(struct inode *inode,
> if (err)
> goto flags_out;
>
> - if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
> + if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
> + /*
> + * Changes to the journaling mode can cause unsafe changes to
> + * S_DAX if we are using the DAX mount option.
> + */
> + if (test_opt(inode->i_sb, DAX)) {
> + err = -EBUSY;
> + goto flags_out;
> + }
> +
> err = ext4_change_inode_journal_flag(inode, jflag);
> - if (err)
> - goto flags_out;
> + if (err)
> + goto flags_out;
> + }
> if (migrate) {
> if (flags & EXT4_EXTENTS_FL)
> err = ext4_ext_migrate(inode);
> --
> 2.9.5
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-09-12 06:45:03

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] ext4: add sanity check for encryption + DAX

On Mon 11-09-17 23:05:24, Ross Zwisler wrote:
> We prevent DAX from being used on inodes which are using ext4's built in
> encryption via a check in ext4_set_inode_flags(). We do have what appears
> to be an unsafe transition of S_DAX in ext4_set_context(), though, where
> S_DAX can get disabled without us doing a proper writeback + invalidate.
>
> There are also issues with mm-level races when changing the value of S_DAX,
> as well as issues with the VM_MIXEDMAP flag:
>
> https://www.spinics.net/lists/linux-xfs/msg09859.html
>
> I actually think we are safe in this case because of the following:
>
> 1) You can't encrypt an existing file. Encryption can only be set on an
> empty directory, with new inodes in that directory being created with
> encryption turned on, so I don't think it's possible to turn encryption on
> for a file that has open DAX mmaps or outstanding I/Os.
>
> 2) There is no way to turn encryption off on a given file. Once an inode
> is encrypted, it stays encrypted for the life of that inode, so we don't
> have to worry about the case where we turn encryption off and S_DAX
> suddenly turns on.
>
> 3) The only way we end up in ext4_set_context() to turn on encryption is
> when we are creating a new file in the encrypted directory. This happens
> as part of ext4_create() before the inode has been allowed to do any I/O.
> Here's the call tree:
>
> ext4_create()
> __ext4_new_inode()
> ext4_set_inode_flags() // sets S_DAX
> fscrypt_inherit_context()
> fscrypt_get_encryption_info();
> ext4_set_context() // sets EXT4_INODE_ENCRYPT, clears S_DAX
>
> So, I actually think it's safe to transition S_DAX in ext4_set_context()
> without any locking, writebacks or invalidations. I've added a
> WARN_ON_ONCE() sanity check to make sure that we are notified if we ever
> encounter a case where we are encrypting an inode that already has data,
> in which case we need to add code to safely transition S_DAX.
>
> Signed-off-by: Ross Zwisler <[email protected]>
> CC: [email protected]

Looks good to me - and frankly I think we can drop the stable CC here...
Anyway, you can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/super.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 4251e50..c090780 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1159,6 +1159,9 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> if (inode->i_ino == EXT4_ROOT_INO)
> return -EPERM;
>
> + if (WARN_ON_ONCE(IS_DAX(inode) && i_size_read(inode)))
> + return -EINVAL;
> +
> res = ext4_convert_inline_data(inode);
> if (res)
> return res;
> --
> 2.9.5
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-09-12 06:46:16

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] ext4: add ext4_should_use_dax()

On Mon 11-09-17 23:05:25, Ross Zwisler wrote:
> This helper, in the spirit of ext4_should_dioread_nolock() et al., replaces
> the complex conditional in ext4_set_inode_flags().
>
> Signed-off-by: Ross Zwisler <[email protected]>

Yeah, makes sense to me. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inode.c | 19 ++++++++++++++++---
> 1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3207333..525dd63 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4577,6 +4577,21 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
> !ext4_test_inode_state(inode, EXT4_STATE_XATTR));
> }
>
> +static bool ext4_should_use_dax(struct inode *inode)
> +{
> + if (!test_opt(inode->i_sb, DAX))
> + return false;
> + if (!S_ISREG(inode->i_mode))
> + return false;
> + if (ext4_should_journal_data(inode))
> + return false;
> + if (ext4_has_inline_data(inode))
> + return false;
> + if (ext4_encrypted_inode(inode))
> + return false;
> + return true;
> +}
> +
> void ext4_set_inode_flags(struct inode *inode)
> {
> unsigned int flags = EXT4_I(inode)->i_flags;
> @@ -4592,9 +4607,7 @@ void ext4_set_inode_flags(struct inode *inode)
> new_fl |= S_NOATIME;
> if (flags & EXT4_DIRSYNC_FL)
> new_fl |= S_DIRSYNC;
> - if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode) &&
> - !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) &&
> - !ext4_encrypted_inode(inode))
> + if (ext4_should_use_dax(inode))
> new_fl |= S_DAX;
> inode_set_flags(inode, new_fl,
> S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX);
> --
> 2.9.5
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-09-12 15:39:28

by Ross Zwisler

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] ext4: add sanity check for encryption + DAX

On Tue, Sep 12, 2017 at 08:45:00AM +0200, Jan Kara wrote:
> On Mon 11-09-17 23:05:24, Ross Zwisler wrote:
> > We prevent DAX from being used on inodes which are using ext4's built in
> > encryption via a check in ext4_set_inode_flags(). We do have what appears
> > to be an unsafe transition of S_DAX in ext4_set_context(), though, where
> > S_DAX can get disabled without us doing a proper writeback + invalidate.
> >
> > There are also issues with mm-level races when changing the value of S_DAX,
> > as well as issues with the VM_MIXEDMAP flag:
> >
> > https://www.spinics.net/lists/linux-xfs/msg09859.html
> >
> > I actually think we are safe in this case because of the following:
> >
> > 1) You can't encrypt an existing file. Encryption can only be set on an
> > empty directory, with new inodes in that directory being created with
> > encryption turned on, so I don't think it's possible to turn encryption on
> > for a file that has open DAX mmaps or outstanding I/Os.
> >
> > 2) There is no way to turn encryption off on a given file. Once an inode
> > is encrypted, it stays encrypted for the life of that inode, so we don't
> > have to worry about the case where we turn encryption off and S_DAX
> > suddenly turns on.
> >
> > 3) The only way we end up in ext4_set_context() to turn on encryption is
> > when we are creating a new file in the encrypted directory. This happens
> > as part of ext4_create() before the inode has been allowed to do any I/O.
> > Here's the call tree:
> >
> > ext4_create()
> > __ext4_new_inode()
> > ext4_set_inode_flags() // sets S_DAX
> > fscrypt_inherit_context()
> > fscrypt_get_encryption_info();
> > ext4_set_context() // sets EXT4_INODE_ENCRYPT, clears S_DAX
> >
> > So, I actually think it's safe to transition S_DAX in ext4_set_context()
> > without any locking, writebacks or invalidations. I've added a
> > WARN_ON_ONCE() sanity check to make sure that we are notified if we ever
> > encounter a case where we are encrypting an inode that already has data,
> > in which case we need to add code to safely transition S_DAX.
> >
> > Signed-off-by: Ross Zwisler <[email protected]>
> > CC: [email protected]
>
> Looks good to me - and frankly I think we can drop the stable CC here...

Sure, I'm fine to drop the CC to stable.

> Anyway, you can add:
>
> Reviewed-by: Jan Kara <[email protected]>
>
> Honza
>
> > ---
> > fs/ext4/super.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 4251e50..c090780 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -1159,6 +1159,9 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> > if (inode->i_ino == EXT4_ROOT_INO)
> > return -EPERM;
> >
> > + if (WARN_ON_ONCE(IS_DAX(inode) && i_size_read(inode)))
> > + return -EINVAL;
> > +
> > res = ext4_convert_inline_data(inode);
> > if (res)
> > return res;
> > --
> > 2.9.5
> >
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2017-09-29 17:37:55

by Ross Zwisler

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] ext4: DAX data corruption fixes

On Mon, Sep 11, 2017 at 11:05:21PM -0600, Ross Zwisler wrote:
> This series prevents a pair of data corruptions with ext4 + DAX. The first
> such corruption happens when combining the inline data feature with DAX,
> and the second happens when combining data journaling with DAX.
>
> Both can be reliably reproduced with the fstests that I have posted here:
>
> https://patchwork.kernel.org/patch/9948377/
> https://patchwork.kernel.org/patch/9948381/
>
> My opinion is that the first three patches in this series should be applied
> to the v4.14 RC series and backported to stable. The last two patches in
> this series are just cleanup and can probably wait until v4.15.
>
> Ross Zwisler (5):
> ext4: prevent data corruption with inline data + DAX
> ext4: prevent data corruption with journaling + DAX
> ext4: add sanity check for encryption + DAX
> ext4: add ext4_should_use_dax()
> ext4: remove duplicate extended attributes defs
>
> fs/ext4/ext4.h | 37 -------------------------------------
> fs/ext4/inline.c | 10 ----------
> fs/ext4/inode.c | 24 ++++++++++++++++--------
> fs/ext4/ioctl.c | 16 +++++++++++++---
> fs/ext4/super.c | 8 ++++++++
> 5 files changed, 37 insertions(+), 58 deletions(-)

Hey Ted,

I just wanted to ping this series, and see if at least the data corruption
fixes were headed for v4.14? I don't think these have been merged into any of
the ext4 branches yet.

Thanks,
- Ross