2022-06-10 08:30:07

by Javier Martinez Canillas

[permalink] [raw]
Subject: [PATCH v6 0/4] fat: add support for the renameat2 RENAME_EXCHANGE flag

Hello,

The series adds support for the renameat2 system call RENAME_EXCHANGE flag
(which allows to atomically replace two paths) to the vfat filesystem code.

There are many use cases for this, but we are particularly interested in
making possible for vfat filesystems to be part of OSTree [0] deployments.

Currently OSTree relies on symbolic links to make the deployment updates
an atomic transactional operation. But RENAME_EXCHANGE could be used [1]
to achieve a similar level of robustness when using a vfat filesystem.

Patch #1 is just a preparatory patch to introduce the RENAME_EXCHANGE
support, patch #2 moves some code blocks in vfat_rename() to a set of
helper functions, that can be reused by tvfat_rename_exchange() that's
added by patch #3 and finally patch #4 adds some kselftests to test it.

This is a v6 that addresses issues pointed out in v5:

https://lkml.org/lkml/2022/6/9/361

[0]: https://github.com/ostreedev/ostree
[1]: https://github.com/ostreedev/ostree/issues/1649

Changes in v6:
- Simplify logic to determine if nlink have to modified (OGAWA Hirofumi).

Changes in v5:
- Only update nlink for different parent dirs and file types (OGAWA Hirofumi).

Changes in v4:
- Add new patch from OGAWA Hirofumi to use the helpers in vfat_rename().
- Rebase the patch on top of OGAWA Hirofumi proposed changes.
- Drop iversion increment for old and new file inodes (OGAWA Hirofumi).
- Add Muhammad Usama Anjum Acked-by tag.

Changes in v3:
- Add a .gitignore for the rename_exchange binary (Muhammad Usama Anjum).
- Include $(KHDR_INCLUDES) instead of hardcoding a relative path in Makefile
(Muhammad Usama Anjum).

Changes in v2:
- Only update the new_dir inode version and timestamps if != old_dir
(Alex Larsson).
- Add some helper functions to avoid duplicating code (OGAWA Hirofumi).
- Use braces for multi-lines blocks even if are one statement (OGAWA Hirofumi).
- Mention in commit message that the operation is as transactional as possible
but within the vfat limitations of not having a journal (Colin Walters).
- Call sync to flush the page cache before checking the file contents
(Alex Larsson).
- Drop RFC prefix since the patches already got some review.

Javier Martinez Canillas (3):
fat: add a vfat_rename2() and make existing .rename callback a helper
fat: add renameat2 RENAME_EXCHANGE flag support
selftests/filesystems: add a vfat RENAME_EXCHANGE test

OGAWA Hirofumi (1):
fat: factor out reusable code in vfat_rename() as helper functions

MAINTAINERS | 1 +
fs/fat/namei_vfat.c | 231 +++++++++++++++---
tools/testing/selftests/Makefile | 1 +
.../selftests/filesystems/fat/.gitignore | 2 +
.../selftests/filesystems/fat/Makefile | 7 +
.../testing/selftests/filesystems/fat/config | 2 +
.../filesystems/fat/rename_exchange.c | 37 +++
.../filesystems/fat/run_fat_tests.sh | 82 +++++++
8 files changed, 324 insertions(+), 39 deletions(-)
create mode 100644 tools/testing/selftests/filesystems/fat/.gitignore
create mode 100644 tools/testing/selftests/filesystems/fat/Makefile
create mode 100644 tools/testing/selftests/filesystems/fat/config
create mode 100644 tools/testing/selftests/filesystems/fat/rename_exchange.c
create mode 100755 tools/testing/selftests/filesystems/fat/run_fat_tests.sh

--
2.36.1


2022-06-10 09:05:57

by Javier Martinez Canillas

[permalink] [raw]
Subject: [PATCH v6 4/4] selftests/filesystems: add a vfat RENAME_EXCHANGE test

Add a test for the renameat2 RENAME_EXCHANGE support in vfat, but split it
in a tool that just does the rename exchange and a script that is run by
the kselftests framework on `make TARGETS="filesystems/fat" kselftest`.

That way the script can be easily extended to test other file operations.

The script creates a 1 MiB disk image, that is then formated with a vfat
filesystem and mounted using a loop device. That way all file operations
are done on an ephemeral filesystem.

Signed-off-by: Javier Martinez Canillas <[email protected]>
Acked-by: Muhammad Usama Anjum <[email protected]>
---

(no changes since v4)

Changes in v4:
- Add Muhammad Usama Anjum Acked-by tag.

Changes in v3:
- Add a .gitignore for the rename_exchange binary (Muhammad Usama Anjum).
- Include $(KHDR_INCLUDES) instead of hardcoding a relative path in Makefile
(Muhammad Usama Anjum).

Changes in v2:
- Call sync to flush the page cache before checking the file contents
(Alex Larsson).

MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
.../selftests/filesystems/fat/.gitignore | 2 +
.../selftests/filesystems/fat/Makefile | 7 ++
.../testing/selftests/filesystems/fat/config | 2 +
.../filesystems/fat/rename_exchange.c | 37 +++++++++
.../filesystems/fat/run_fat_tests.sh | 82 +++++++++++++++++++
7 files changed, 132 insertions(+)
create mode 100644 tools/testing/selftests/filesystems/fat/.gitignore
create mode 100644 tools/testing/selftests/filesystems/fat/Makefile
create mode 100644 tools/testing/selftests/filesystems/fat/config
create mode 100644 tools/testing/selftests/filesystems/fat/rename_exchange.c
create mode 100755 tools/testing/selftests/filesystems/fat/run_fat_tests.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 4fdbbd6c1984..158771bb7755 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20841,6 +20841,7 @@ M: OGAWA Hirofumi <[email protected]>
S: Maintained
F: Documentation/filesystems/vfat.rst
F: fs/fat/
+F: tools/testing/selftests/filesystems/fat/

VFIO DRIVER
M: Alex Williamson <[email protected]>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 0aedcd76cf0f..fc59ad849a90 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -16,6 +16,7 @@ TARGETS += exec
TARGETS += filesystems
TARGETS += filesystems/binderfs
TARGETS += filesystems/epoll
+TARGETS += filesystems/fat
TARGETS += firmware
TARGETS += fpu
TARGETS += ftrace
diff --git a/tools/testing/selftests/filesystems/fat/.gitignore b/tools/testing/selftests/filesystems/fat/.gitignore
new file mode 100644
index 000000000000..b89920ed841c
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fat/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+rename_exchange
diff --git a/tools/testing/selftests/filesystems/fat/Makefile b/tools/testing/selftests/filesystems/fat/Makefile
new file mode 100644
index 000000000000..902033f6ef09
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fat/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+
+TEST_PROGS := run_fat_tests.sh
+TEST_GEN_PROGS_EXTENDED := rename_exchange
+CFLAGS += -O2 -g -Wall $(KHDR_INCLUDES)
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/filesystems/fat/config b/tools/testing/selftests/filesystems/fat/config
new file mode 100644
index 000000000000..6cf95e787a17
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fat/config
@@ -0,0 +1,2 @@
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_VFAT_FS=y
diff --git a/tools/testing/selftests/filesystems/fat/rename_exchange.c b/tools/testing/selftests/filesystems/fat/rename_exchange.c
new file mode 100644
index 000000000000..e488ad354fce
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fat/rename_exchange.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Program that atomically exchanges two paths using
+ * the renameat2() system call RENAME_EXCHANGE flag.
+ *
+ * Copyright 2022 Red Hat Inc.
+ * Author: Javier Martinez Canillas <[email protected]>
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+void print_usage(const char *program)
+{
+ printf("Usage: %s [oldpath] [newpath]\n", program);
+ printf("Atomically exchange oldpath and newpath\n");
+}
+
+int main(int argc, char *argv[])
+{
+ int ret;
+
+ if (argc != 3) {
+ print_usage(argv[0]);
+ exit(EXIT_FAILURE);
+ }
+
+ ret = renameat2(AT_FDCWD, argv[1], AT_FDCWD, argv[2], RENAME_EXCHANGE);
+ if (ret) {
+ perror("rename exchange failed");
+ exit(EXIT_FAILURE);
+ }
+
+ exit(EXIT_SUCCESS);
+}
diff --git a/tools/testing/selftests/filesystems/fat/run_fat_tests.sh b/tools/testing/selftests/filesystems/fat/run_fat_tests.sh
new file mode 100755
index 000000000000..7f35dc3d15df
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fat/run_fat_tests.sh
@@ -0,0 +1,82 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run filesystem operations tests on an 1 MiB disk image that is formatted with
+# a vfat filesystem and mounted in a temporary directory using a loop device.
+#
+# Copyright 2022 Red Hat Inc.
+# Author: Javier Martinez Canillas <[email protected]>
+
+set -e
+set -u
+set -o pipefail
+
+BASE_DIR="$(dirname $0)"
+TMP_DIR="$(mktemp -d /tmp/fat_tests_tmp.XXXX)"
+IMG_PATH="${TMP_DIR}/fat.img"
+MNT_PATH="${TMP_DIR}/mnt"
+
+cleanup()
+{
+ mountpoint -q "${MNT_PATH}" && unmount_image
+ rm -rf "${TMP_DIR}"
+}
+trap cleanup SIGINT SIGTERM EXIT
+
+create_loopback()
+{
+ touch "${IMG_PATH}"
+ chattr +C "${IMG_PATH}" >/dev/null 2>&1 || true
+
+ truncate -s 1M "${IMG_PATH}"
+ mkfs.vfat "${IMG_PATH}" >/dev/null 2>&1
+}
+
+mount_image()
+{
+ mkdir -p "${MNT_PATH}"
+ sudo mount -o loop "${IMG_PATH}" "${MNT_PATH}"
+}
+
+rename_exchange_test()
+{
+ local rename_exchange="${BASE_DIR}/rename_exchange"
+ local old_path="${MNT_PATH}/old_file"
+ local new_path="${MNT_PATH}/new_file"
+
+ echo old | sudo tee "${old_path}" >/dev/null 2>&1
+ echo new | sudo tee "${new_path}" >/dev/null 2>&1
+ sudo "${rename_exchange}" "${old_path}" "${new_path}" >/dev/null 2>&1
+ sudo sync -f "${MNT_PATH}"
+ grep new "${old_path}" >/dev/null 2>&1
+ grep old "${new_path}" >/dev/null 2>&1
+}
+
+rename_exchange_subdir_test()
+{
+ local rename_exchange="${BASE_DIR}/rename_exchange"
+ local dir_path="${MNT_PATH}/subdir"
+ local old_path="${MNT_PATH}/old_file"
+ local new_path="${dir_path}/new_file"
+
+ sudo mkdir -p "${dir_path}"
+ echo old | sudo tee "${old_path}" >/dev/null 2>&1
+ echo new | sudo tee "${new_path}" >/dev/null 2>&1
+ sudo "${rename_exchange}" "${old_path}" "${new_path}" >/dev/null 2>&1
+ sudo sync -f "${MNT_PATH}"
+ grep new "${old_path}" >/dev/null 2>&1
+ grep old "${new_path}" >/dev/null 2>&1
+}
+
+unmount_image()
+{
+ sudo umount "${MNT_PATH}" &> /dev/null
+}
+
+create_loopback
+mount_image
+rename_exchange_test
+rename_exchange_subdir_test
+unmount_image
+
+exit 0
--
2.36.1

2022-06-10 09:06:51

by Javier Martinez Canillas

[permalink] [raw]
Subject: [PATCH v6 2/4] fat: factor out reusable code in vfat_rename() as helper functions

From: OGAWA Hirofumi <[email protected]>

The vfat_rename() function is quite big and there are code blocks that can
be moved into helper functions. This not only simplify the implementation
of that function but also allows these helpers to be reused.

For example, the helpers can be used by the handler of the RENAME_EXCHANGE
flag once this is implemented in a subsequent patch.

Signed-off-by: OGAWA Hirofumi <[email protected]>
Signed-off-by: Javier Martinez Canillas <[email protected]>
---

(no changes since v4)

Changes in v4:
- Add new patch from OGAWA Hirofumi to use the helpers in vfat_rename().

fs/fat/namei_vfat.c | 89 +++++++++++++++++++++++++++++----------------
1 file changed, 57 insertions(+), 32 deletions(-)

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 88ccb2ee3537..9c04053a8f1c 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -889,16 +889,55 @@ static int vfat_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
return err;
}

+static int vfat_get_dotdot_de(struct inode *inode, struct buffer_head **bh,
+ struct msdos_dir_entry **de)
+{
+ if (S_ISDIR(inode->i_mode)) {
+ if (fat_get_dotdot_entry(inode, bh, de))
+ return -EIO;
+ }
+ return 0;
+}
+
+static int vfat_sync_ipos(struct inode *dir, struct inode *inode)
+{
+ if (IS_DIRSYNC(dir))
+ return fat_sync_inode(inode);
+ mark_inode_dirty(inode);
+ return 0;
+}
+
+static int vfat_update_dotdot_de(struct inode *dir, struct inode *inode,
+ struct buffer_head *dotdot_bh,
+ struct msdos_dir_entry *dotdot_de)
+{
+ fat_set_start(dotdot_de, MSDOS_I(dir)->i_logstart);
+ mark_buffer_dirty_inode(dotdot_bh, inode);
+ if (IS_DIRSYNC(dir))
+ return sync_dirty_buffer(dotdot_bh);
+ return 0;
+}
+
+static void vfat_update_dir_metadata(struct inode *dir, struct timespec64 *ts)
+{
+ inode_inc_iversion(dir);
+ fat_truncate_time(dir, ts, S_CTIME | S_MTIME);
+ if (IS_DIRSYNC(dir))
+ (void)fat_sync_inode(dir);
+ else
+ mark_inode_dirty(dir);
+}
+
static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
{
struct buffer_head *dotdot_bh;
- struct msdos_dir_entry *dotdot_de;
+ struct msdos_dir_entry *dotdot_de = NULL;
struct inode *old_inode, *new_inode;
struct fat_slot_info old_sinfo, sinfo;
struct timespec64 ts;
loff_t new_i_pos;
- int err, is_dir, update_dotdot, corrupt = 0;
+ int err, is_dir, corrupt = 0;
struct super_block *sb = old_dir->i_sb;

old_sinfo.bh = sinfo.bh = dotdot_bh = NULL;
@@ -909,15 +948,13 @@ static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
if (err)
goto out;

- is_dir = S_ISDIR(old_inode->i_mode);
- update_dotdot = (is_dir && old_dir != new_dir);
- if (update_dotdot) {
- if (fat_get_dotdot_entry(old_inode, &dotdot_bh, &dotdot_de)) {
- err = -EIO;
+ if (old_dir != new_dir) {
+ err = vfat_get_dotdot_de(old_inode, &dotdot_bh, &dotdot_de);
+ if (err)
goto out;
- }
}

+ is_dir = S_ISDIR(old_inode->i_mode);
ts = current_time(old_dir);
if (new_inode) {
if (is_dir) {
@@ -938,21 +975,15 @@ static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,

fat_detach(old_inode);
fat_attach(old_inode, new_i_pos);
- if (IS_DIRSYNC(new_dir)) {
- err = fat_sync_inode(old_inode);
- if (err)
- goto error_inode;
- } else
- mark_inode_dirty(old_inode);
+ err = vfat_sync_ipos(new_dir, old_inode);
+ if (err)
+ goto error_inode;

- if (update_dotdot) {
- fat_set_start(dotdot_de, MSDOS_I(new_dir)->i_logstart);
- mark_buffer_dirty_inode(dotdot_bh, old_inode);
- if (IS_DIRSYNC(new_dir)) {
- err = sync_dirty_buffer(dotdot_bh);
- if (err)
- goto error_dotdot;
- }
+ if (dotdot_de) {
+ err = vfat_update_dotdot_de(new_dir, old_inode, dotdot_bh,
+ dotdot_de);
+ if (err)
+ goto error_dotdot;
drop_nlink(old_dir);
if (!new_inode)
inc_nlink(new_dir);
@@ -962,12 +993,7 @@ static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
old_sinfo.bh = NULL;
if (err)
goto error_dotdot;
- inode_inc_iversion(old_dir);
- fat_truncate_time(old_dir, &ts, S_CTIME|S_MTIME);
- if (IS_DIRSYNC(old_dir))
- (void)fat_sync_inode(old_dir);
- else
- mark_inode_dirty(old_dir);
+ vfat_update_dir_metadata(old_dir, &ts);

if (new_inode) {
drop_nlink(new_inode);
@@ -987,10 +1013,9 @@ static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
/* data cluster is shared, serious corruption */
corrupt = 1;

- if (update_dotdot) {
- fat_set_start(dotdot_de, MSDOS_I(old_dir)->i_logstart);
- mark_buffer_dirty_inode(dotdot_bh, old_inode);
- corrupt |= sync_dirty_buffer(dotdot_bh);
+ if (dotdot_de) {
+ corrupt |= vfat_update_dotdot_de(old_dir, old_inode, dotdot_bh,
+ dotdot_de);
}
error_inode:
fat_detach(old_inode);
--
2.36.1

2022-06-10 09:09:13

by Javier Martinez Canillas

[permalink] [raw]
Subject: [PATCH v6 1/4] fat: add a vfat_rename2() and make existing .rename callback a helper

Currently vfat only supports the RENAME_NOREPLACE flag which is handled by
the virtual file system layer but doesn't support the RENAME_EXCHANGE flag.

Add a vfat_rename2() function to be used as the .rename callback and move
the current vfat_rename() handler to a helper. This is in preparation for
implementing the RENAME_NOREPLACE flag using a different helper function.

Signed-off-by: Javier Martinez Canillas <[email protected]>
---

(no changes since v1)

fs/fat/namei_vfat.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index c573314806cf..88ccb2ee3537 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -889,9 +889,8 @@ static int vfat_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
return err;
}

-static int vfat_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
- struct dentry *old_dentry, struct inode *new_dir,
- struct dentry *new_dentry, unsigned int flags)
+static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
{
struct buffer_head *dotdot_bh;
struct msdos_dir_entry *dotdot_de;
@@ -902,9 +901,6 @@ static int vfat_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
int err, is_dir, update_dotdot, corrupt = 0;
struct super_block *sb = old_dir->i_sb;

- if (flags & ~RENAME_NOREPLACE)
- return -EINVAL;
-
old_sinfo.bh = sinfo.bh = dotdot_bh = NULL;
old_inode = d_inode(old_dentry);
new_inode = d_inode(new_dentry);
@@ -1021,13 +1017,24 @@ static int vfat_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
goto out;
}

+static int vfat_rename2(struct user_namespace *mnt_userns, struct inode *old_dir,
+ struct dentry *old_dentry, struct inode *new_dir,
+ struct dentry *new_dentry, unsigned int flags)
+{
+ if (flags & ~RENAME_NOREPLACE)
+ return -EINVAL;
+
+ /* VFS already handled RENAME_NOREPLACE, handle it as a normal rename */
+ return vfat_rename(old_dir, old_dentry, new_dir, new_dentry);
+}
+
static const struct inode_operations vfat_dir_inode_operations = {
.create = vfat_create,
.lookup = vfat_lookup,
.unlink = vfat_unlink,
.mkdir = vfat_mkdir,
.rmdir = vfat_rmdir,
- .rename = vfat_rename,
+ .rename = vfat_rename2,
.setattr = fat_setattr,
.getattr = fat_getattr,
.update_time = fat_update_time,
--
2.36.1

2022-06-12 14:15:07

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] fat: add support for the renameat2 RENAME_EXCHANGE flag

Javier Martinez Canillas <[email protected]> writes:

> The series adds support for the renameat2 system call RENAME_EXCHANGE flag
> (which allows to atomically replace two paths) to the vfat filesystem code.
>
> There are many use cases for this, but we are particularly interested in
> making possible for vfat filesystems to be part of OSTree [0] deployments.
>
> Currently OSTree relies on symbolic links to make the deployment updates
> an atomic transactional operation. But RENAME_EXCHANGE could be used [1]
> to achieve a similar level of robustness when using a vfat filesystem.
>
> Patch #1 is just a preparatory patch to introduce the RENAME_EXCHANGE
> support, patch #2 moves some code blocks in vfat_rename() to a set of
> helper functions, that can be reused by tvfat_rename_exchange() that's
> added by patch #3 and finally patch #4 adds some kselftests to test it.
>
> This is a v6 that addresses issues pointed out in v5:
>
> https://lkml.org/lkml/2022/6/9/361
>
> [0]: https://github.com/ostreedev/ostree
> [1]: https://github.com/ostreedev/ostree/issues/1649

Looks good this patchset.

Acked-by: OGAWA Hirofumi <[email protected]>

Thanks.
--
OGAWA Hirofumi <[email protected]>