2022-06-09 09:55:35

by Javier Martinez Canillas

[permalink] [raw]
Subject: [PATCH v5 0/4] fat: add support for the renameat2 RENAME_EXCHANGE flag

Hello,

The series adds support for the renameat2 system call RENAME_EXCHANGE flag
(which allows to atomically replace two paths) to the vfat filesystem code.

There are many use cases for this, but we are particularly interested in
making possible for vfat filesystems to be part of OSTree [0] deployments.

Currently OSTree relies on symbolic links to make the deployment updates
an atomic transactional operation. But RENAME_EXCHANGE could be used [1]
to achieve a similar level of robustness when using a vfat filesystem.

Patch #1 is just a preparatory patch to introduce the RENAME_EXCHANGE
support, patch #2 moves some code blocks in vfat_rename() to a set of
helper functions, that can be reused by tvfat_rename_exchange() that's
added by patch #3 and finally patch #4 adds some kselftests to test it.

This is a v5 that addresses issues pointed out in v4:

https://lore.kernel.org/lkml/[email protected]/

[0]: https://github.com/ostreedev/ostree
[1]: https://github.com/ostreedev/ostree/issues/1649

Changes in v5:
- Only update nlink for different parent dirs and file types (OGAWA Hirofumi).

Changes in v4:
- Add new patch from OGAWA Hirofumi to use the helpers in vfat_rename().
- Rebase the patch on top of OGAWA Hirofumi proposed changes.
- Drop iversion increment for old and new file inodes (OGAWA Hirofumi).
- Add Muhammad Usama Anjum Acked-by tag.

Changes in v3:
- Add a .gitignore for the rename_exchange binary (Muhammad Usama Anjum).
- Include $(KHDR_INCLUDES) instead of hardcoding a relative path in Makefile
(Muhammad Usama Anjum).

Changes in v2:
- Only update the new_dir inode version and timestamps if != old_dir
(Alex Larsson).
- Add some helper functions to avoid duplicating code (OGAWA Hirofumi).
- Use braces for multi-lines blocks even if are one statement (OGAWA Hirofumi).
- Mention in commit message that the operation is as transactional as possible
but within the vfat limitations of not having a journal (Colin Walters).
- Call sync to flush the page cache before checking the file contents
(Alex Larsson).
- Drop RFC prefix since the patches already got some review.

Javier Martinez Canillas (3):
fat: add a vfat_rename2() and make existing .rename callback a helper
fat: add renameat2 RENAME_EXCHANGE flag support
selftests/filesystems: add a vfat RENAME_EXCHANGE test

OGAWA Hirofumi (1):
fat: factor out reusable code in vfat_rename() as helper functions

MAINTAINERS | 1 +
fs/fat/namei_vfat.c | 232 +++++++++++++++---
tools/testing/selftests/Makefile | 1 +
.../selftests/filesystems/fat/.gitignore | 2 +
.../selftests/filesystems/fat/Makefile | 7 +
.../testing/selftests/filesystems/fat/config | 2 +
.../filesystems/fat/rename_exchange.c | 37 +++
.../filesystems/fat/run_fat_tests.sh | 82 +++++++
8 files changed, 325 insertions(+), 39 deletions(-)
create mode 100644 tools/testing/selftests/filesystems/fat/.gitignore
create mode 100644 tools/testing/selftests/filesystems/fat/Makefile
create mode 100644 tools/testing/selftests/filesystems/fat/config
create mode 100644 tools/testing/selftests/filesystems/fat/rename_exchange.c
create mode 100755 tools/testing/selftests/filesystems/fat/run_fat_tests.sh

--
2.36.1


2022-06-09 10:15:08

by Javier Martinez Canillas

[permalink] [raw]
Subject: [PATCH v5 3/4] fat: add renameat2 RENAME_EXCHANGE flag support

The renameat2 RENAME_EXCHANGE flag allows to atomically exchange two paths
but is currently not supported by the Linux vfat filesystem driver.

Add a vfat_rename_exchange() helper function that implements this support.

The super block lock is acquired during the operation to ensure atomicity,
and in the error path actions made are reversed also with the mutex held.

It makes the operation as transactional as possible, within the limitation
impossed by vfat due not having a journal with logs to replay.

Signed-off-by: Javier Martinez Canillas <[email protected]>
---

Changes in v5:
- Only update nlink for different parent dirs and file types (OGAWA Hirofumi).

Changes in v4:
- Rebase the patch on top of OGAWA Hirofumi proposed changes.
- Drop iversion increment for old and new file inodes (OGAWA Hirofumi).

Changes in v2:
- Only update the new_dir inode version and timestamps if != old_dir
(Alex Larsson).
- Add some helper functions to avoid duplicating code (OGAWA Hirofumi).
- Use braces for multi-lines blocks even if are one statement (OGAWA Hirofumi).
- Mention in commit message that the operation is as transactional as possible
but within the vfat limitations of not having a journal (Colin Walters).

fs/fat/namei_vfat.c | 124 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 123 insertions(+), 1 deletion(-)

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 9c04053a8f1c..4aa096a15e59 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1042,13 +1042,135 @@ static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
goto out;
}

+static void vfat_exchange_ipos(struct inode *old_inode, struct inode *new_inode,
+ loff_t old_i_pos, loff_t new_i_pos)
+{
+ fat_detach(old_inode);
+ fat_detach(new_inode);
+ fat_attach(old_inode, new_i_pos);
+ fat_attach(new_inode, old_i_pos);
+}
+
+static void vfat_update_nlink(struct inode *dir, struct inode *inode)
+{
+ if (S_ISDIR(inode->i_mode))
+ drop_nlink(dir);
+ else
+ inc_nlink(dir);
+}
+
+static int vfat_rename_exchange(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
+{
+ struct buffer_head *old_dotdot_bh = NULL, *new_dotdot_bh = NULL;
+ struct msdos_dir_entry *old_dotdot_de = NULL, *new_dotdot_de = NULL;
+ struct inode *old_inode, *new_inode;
+ struct timespec64 ts = current_time(old_dir);
+ loff_t old_i_pos, new_i_pos;
+ int err, corrupt = 0;
+ struct super_block *sb = old_dir->i_sb;
+
+ old_inode = d_inode(old_dentry);
+ new_inode = d_inode(new_dentry);
+
+ /* Acquire super block lock for the operation to be atomic */
+ mutex_lock(&MSDOS_SB(sb)->s_lock);
+
+ /* if directories are not the same, get ".." info to update */
+ if (old_dir != new_dir) {
+ err = vfat_get_dotdot_de(old_inode, &old_dotdot_bh,
+ &old_dotdot_de);
+ if (err)
+ goto out;
+
+ err = vfat_get_dotdot_de(new_inode, &new_dotdot_bh,
+ &new_dotdot_de);
+ if (err)
+ goto out;
+ }
+
+ old_i_pos = MSDOS_I(old_inode)->i_pos;
+ new_i_pos = MSDOS_I(new_inode)->i_pos;
+
+ vfat_exchange_ipos(old_inode, new_inode, old_i_pos, new_i_pos);
+
+ err = vfat_sync_ipos(old_dir, new_inode);
+ if (err)
+ goto error_exchange;
+ err = vfat_sync_ipos(new_dir, old_inode);
+ if (err)
+ goto error_exchange;
+
+ /* update ".." directory entry info */
+ if (old_dotdot_de) {
+ err = vfat_update_dotdot_de(new_dir, old_inode, old_dotdot_bh,
+ old_dotdot_de);
+ if (err)
+ goto error_old_dotdot;
+ }
+
+ if (new_dotdot_de) {
+ err = vfat_update_dotdot_de(old_dir, new_inode, new_dotdot_bh,
+ new_dotdot_de);
+ if (err)
+ goto error_new_dotdot;
+ }
+
+ vfat_update_dir_metadata(old_dir, &ts);
+ /* if directories are not the same, update new_dir as well */
+ if (old_dir != new_dir) {
+ vfat_update_dir_metadata(new_dir, &ts);
+ /* nlink only needs to be updated if the file types differ */
+ if (old_inode->i_mode != new_inode->i_mode) {
+ vfat_update_nlink(old_dir, old_inode);
+ vfat_update_nlink(new_dir, new_inode);
+ }
+ }
+
+out:
+ brelse(old_dotdot_bh);
+ brelse(new_dotdot_bh);
+ mutex_unlock(&MSDOS_SB(sb)->s_lock);
+
+ return err;
+
+error_new_dotdot:
+ if (new_dotdot_de) {
+ corrupt |= vfat_update_dotdot_de(new_dir, new_inode,
+ new_dotdot_bh, new_dotdot_de);
+ }
+
+error_old_dotdot:
+ if (old_dotdot_de) {
+ corrupt |= vfat_update_dotdot_de(old_dir, old_inode,
+ old_dotdot_bh, old_dotdot_de);
+ }
+
+error_exchange:
+ vfat_exchange_ipos(old_inode, new_inode, new_i_pos, old_i_pos);
+ corrupt |= vfat_sync_ipos(new_dir, new_inode);
+ corrupt |= vfat_sync_ipos(old_dir, old_inode);
+
+ if (corrupt < 0) {
+ fat_fs_error(new_dir->i_sb,
+ "%s: Filesystem corrupted (i_pos %lld, %lld)",
+ __func__, old_i_pos, new_i_pos);
+ }
+ goto out;
+}
+
static int vfat_rename2(struct user_namespace *mnt_userns, struct inode *old_dir,
struct dentry *old_dentry, struct inode *new_dir,
struct dentry *new_dentry, unsigned int flags)
{
- if (flags & ~RENAME_NOREPLACE)
+ if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
return -EINVAL;

+ if (flags & RENAME_EXCHANGE) {
+ return vfat_rename_exchange(old_dir, old_dentry,
+ new_dir, new_dentry);
+ }
+
/* VFS already handled RENAME_NOREPLACE, handle it as a normal rename */
return vfat_rename(old_dir, old_dentry, new_dir, new_dentry);
}
--
2.36.1

2022-06-09 19:43:14

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH v5 3/4] fat: add renameat2 RENAME_EXCHANGE flag support

Javier Martinez Canillas <[email protected]> writes:

> +static void vfat_update_nlink(struct inode *dir, struct inode *inode)
> +{
> + if (S_ISDIR(inode->i_mode))
> + drop_nlink(dir);
> + else
> + inc_nlink(dir);
> +}

[...]

> + vfat_update_dir_metadata(old_dir, &ts);
> + /* if directories are not the same, update new_dir as well */
> + if (old_dir != new_dir) {
> + vfat_update_dir_metadata(new_dir, &ts);
> + /* nlink only needs to be updated if the file types differ */
> + if (old_inode->i_mode != new_inode->i_mode) {
> + vfat_update_nlink(old_dir, old_inode);
> + vfat_update_nlink(new_dir, new_inode);
> + }
> + }

Looks like unnecessary complex (and comparing raw i_mode, not S_ISDIR(),
better to change before make dir dirty). How about this change, it is
only tested slightly though? Can you review and test?

Thanks.
--
OGAWA Hirofumi <[email protected]>


Signed-off-by: OGAWA Hirofumi <[email protected]>
---
fs/fat/namei_vfat.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index c30f829..3aef834 100644
--- a/fs/fat/namei_vfat.c 2022-06-10 03:33:40.334212176 +0900
+++ b/fs/fat/namei_vfat.c 2022-06-10 04:11:30.817064192 +0900
@@ -1055,12 +1055,10 @@ static void vfat_exchange_ipos(struct in
fat_attach(new_inode, old_i_pos);
}

-static void vfat_update_nlink(struct inode *dir, struct inode *inode)
+static void vfat_move_nlink(struct inode *src, struct inode *dst)
{
- if (S_ISDIR(inode->i_mode))
- drop_nlink(dir);
- else
- inc_nlink(dir);
+ drop_nlink(src);
+ inc_nlink(dst);
}

static int vfat_rename_exchange(struct inode *old_dir, struct dentry *old_dentry,
@@ -1112,7 +1110,6 @@ static int vfat_rename_exchange(struct i
if (err)
goto error_old_dotdot;
}
-
if (new_dotdot_de) {
err = vfat_update_dotdot_de(old_dir, new_inode, new_dotdot_bh,
new_dotdot_de);
@@ -1120,16 +1117,18 @@ static int vfat_rename_exchange(struct i
goto error_new_dotdot;
}

+ /* if cross directory and only one is a directory, adjust nlink */
+ if (!old_dotdot_de != !new_dotdot_de) {
+ if (old_dotdot_de)
+ vfat_move_nlink(old_dir, new_dir);
+ else
+ vfat_move_nlink(new_dir, old_dir);
+ }
+
vfat_update_dir_metadata(old_dir, &ts);
/* if directories are not the same, update new_dir as well */
- if (old_dir != new_dir) {
+ if (old_dir != new_dir)
vfat_update_dir_metadata(new_dir, &ts);
- /* nlink only needs to be updated if the file types differ */
- if (old_inode->i_mode != new_inode->i_mode) {
- vfat_update_nlink(old_dir, old_inode);
- vfat_update_nlink(new_dir, new_inode);
- }
- }

out:
brelse(old_dotdot_bh);
_

2022-06-09 22:03:54

by Javier Martinez Canillas

[permalink] [raw]
Subject: Re: [PATCH v5 3/4] fat: add renameat2 RENAME_EXCHANGE flag support

Hello OGAWA,

Thanks for your feedback.

On 6/9/22 21:17, OGAWA Hirofumi wrote:
> Javier Martinez Canillas <[email protected]> writes:
>
>> +static void vfat_update_nlink(struct inode *dir, struct inode *inode)
>> +{
>> + if (S_ISDIR(inode->i_mode))
>> + drop_nlink(dir);
>> + else
>> + inc_nlink(dir);
>> +}
>
> [...]
>
>> + vfat_update_dir_metadata(old_dir, &ts);
>> + /* if directories are not the same, update new_dir as well */
>> + if (old_dir != new_dir) {
>> + vfat_update_dir_metadata(new_dir, &ts);
>> + /* nlink only needs to be updated if the file types differ */
>> + if (old_inode->i_mode != new_inode->i_mode) {
>> + vfat_update_nlink(old_dir, old_inode);
>> + vfat_update_nlink(new_dir, new_inode);
>> + }
>> + }
>
> Looks like unnecessary complex (and comparing raw i_mode, not S_ISDIR(),
> better to change before make dir dirty). How about this change, it is
> only tested slightly though? Can you review and test?
>

Your change looks good to me and indeed the logic is simpler than in mine.

I've also tested it and AFAICT it works correctly as well. Do you plan to
squash this or should I respin a new revision of the whole patch-set ? If
you want to post it as a follow-up I'm also OK with that.

--
Best regards,

Javier Martinez Canillas
Linux Engineering
Red Hat

2022-06-10 03:23:33

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH v5 3/4] fat: add renameat2 RENAME_EXCHANGE flag support

Javier Martinez Canillas <[email protected]> writes:

>>
>> Looks like unnecessary complex (and comparing raw i_mode, not S_ISDIR(),
>> better to change before make dir dirty). How about this change, it is
>> only tested slightly though? Can you review and test?
>>
>
> Your change looks good to me and indeed the logic is simpler than in mine.
>
> I've also tested it and AFAICT it works correctly as well. Do you plan to
> squash this or should I respin a new revision of the whole patch-set ? If
> you want to post it as a follow-up I'm also OK with that.

Could you merge to your patchset, and re-send?

Thanks.
--
OGAWA Hirofumi <[email protected]>

2022-06-10 07:04:28

by Javier Martinez Canillas

[permalink] [raw]
Subject: Re: [PATCH v5 3/4] fat: add renameat2 RENAME_EXCHANGE flag support

On 6/10/22 05:17, OGAWA Hirofumi wrote:
> Javier Martinez Canillas <[email protected]> writes:
>
>>>
>>> Looks like unnecessary complex (and comparing raw i_mode, not S_ISDIR(),
>>> better to change before make dir dirty). How about this change, it is
>>> only tested slightly though? Can you review and test?
>>>
>>
>> Your change looks good to me and indeed the logic is simpler than in mine.
>>
>> I've also tested it and AFAICT it works correctly as well. Do you plan to
>> squash this or should I respin a new revision of the whole patch-set ? If
>> you want to post it as a follow-up I'm also OK with that.
>
> Could you merge to your patchset, and re-send?
>

Sure thing, I'll do that.

--
Best regards,

Javier Martinez Canillas
Linux Engineering
Red Hat