2024-02-21 15:30:15

by JunChao Sun

[permalink] [raw]
Subject: A problem about BLK_OPEN_RESTRICT_WRITES

Hi

I saw that ext4 has supported BLK_OPEN_RESTRICT_WRITES in commits
aca740cecbe("fs: open block device after superblock creation") and
afde134b5bd0("ext4: Block writes to journal device"). I'm not certain
whether these commits caused the following issue.

Environment:
6.8.0-rc3-00279-g4a7bbe7519b6-dirty(commit 4a7bbe7519b6a5).
sjc@sjc-laptop:~/linux$ mkfs.ext4 -V
mke2fs 1.47.0 (5-Feb-2023)
Using EXT2FS Library version 1.47.0
sjc@sjc-laptop:~/linux$ mount -V
mount from util-linux 2.39.1 (libmount 2.39.1: selinux, smack, btrfs,
verity, namespaces, idmapping, assert, debug)

Problem:
When I mounted the ext4 file system in the qemu system, I encountered
the following error:
root@q:~/linux# mount -t ext4 ext4.img /mnt/ext4/
[ 848.897532] loop1: detected capacity change from 0 to 2097152
[ 848.905535] /dev/loop1: Can't open blockdev
mount: /mnt/ext4: /dev/loop1 already mounted or mount point busy.
dmesg(1) may have more information after failed mount system call.

I reviewed the relevant code and found that the mount program first
calls the openat system call to open the /dev/loop1 file, followed by
the mount system call (with /dev/loop1 as the first parameter).

As for the former openat system call, it eventually reaches the chain
of (vfs_open->do_dentry_open->blkdev_open->bdev_open_by_dev->bdev_claim_write_access).
In bdev_claim_write_access, the following logic applies:
/* Claim exclusive or shared write access. */
if (mode & BLK_OPEN_RESTRICT_WRITES)
bdev_block_writes(bdev);
else if (mode & BLK_OPEN_WRITE)
bdev->bd_writers++;
The argument mode here doesn't set BLK_OPEN_RESTRICT_WRITES flag, so
goes bdev->bd_writers++.

And in the latter mount system call, the following logic is followed:
(vfs_get_tree->get_tree_bdev->setup_bdev_super->bdev_open_by_dev->bdev_may_open).
In bdev_may_open, the following logic applies:
if (mode & BLK_OPEN_RESTRICT_WRITES && bdev->bd_writers > 0)
return false;

Due to the fact that the argument mode has already been set with the
BLK_OPEN_RESTRICT_WRITES flag in the setup_bdev_super function, and
since bdev->bd_writers is already 1 at this point, the function
returns false. This ultimately leads to the mount system call
returning the EBUSY error.

Is this indeed a problem, or is there a misunderstanding in my
comprehension? If it is indeed a problem, can we resolve it by
removing the BLK_OPEN_RESTRICT_WRITES from the sb_open_mode macro
definition?

At last, here is a partial output of the trace captured using the
strace command:

openat(AT_FDCWD, "/home/sjc/linux/ext4.img", O_RDWR|O_CLOEXEC) = 3
openat(AT_FDCWD, "/dev/loop1", O_RDWR|O_CLOEXEC) = 4
ioctl(4, LOOP_CONFIGURE, {fd=3, block_size=0, info={lo_offset=0, lo_[
891.723213] /dev/loop1: Can't open blockdev
number=0, lo_flags=LO_FLAGS_AUTOCLEAR,
lo_file_name="/home/sjc/linux/ext4.img", ...}}) = 0
close(3) = 0
newfstatat(AT_FDCWD, "/dev/loop1", {st_mode=S_IFBLK|0600,
st_rdev=makedev(0x7, 0x1), ...}, 0) = 0
openat(AT_FDCWD, "/sys/dev/block/7:1", O_RDONLY|O_CLOEXEC) = 3
openat(3, "loop/autoclear", O_RDONLY|O_CLOEXEC) = 5
fcntl(5, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
newfstatat(5, "", {st_mode=S_IFREG|0444, st_size=4096, ...}, AT_EMPTY_PATH) = 0
read(5, "1\n", 4096) = 2
close(5) = 0
openat(3, "ro", O_RDONLY|O_CLOEXEC) = 5
fcntl(5, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
newfstatat(5, "", {st_mode=S_IFREG|0444, st_size=4096, ...}, AT_EMPTY_PATH) = 0
read(5, "0\n", 4096) = 2
close(5) = 0
close(3) = 0
statx(AT_FDCWD, "/sbin/mount.ext4",
AT_STATX_DONT_SYNC|AT_NO_AUTOMOUNT, STATX_TYPE|STATX_MODE|STATX_INO,
0x7ffd6c0b8ce0) = -1 ENOENT (No such file or directory)
statx(AT_FDCWD, "/sbin/fs.d/mount.ext4",
AT_STATX_DONT_SYNC|AT_NO_AUTOMOUNT, STATX_TYPE|STATX_MODE|STATX_INO,
0x7ffd6c0b8ce0) = -1 ENOENT (No such file or directory)
statx(AT_FDCWD, "/sbin/fs/mount.ext4",
AT_STATX_DONT_SYNC|AT_NO_AUTOMOUNT, STATX_TYPE|STATX_MODE|STATX_INO,
0x7ffd6c0b8ce0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/run/mount/utab", {st_mode=S_IFREG|0644,
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/run/mount/utab", {st_mode=S_IFREG|0644,
st_size=0, ...}, 0) = 0
geteuid() = 0
getegid() = 0
getuid() = 0
getgid() = 0
access("/run/mount/utab", R_OK|W_OK) = 0
mount("/dev/loop1", "/mnt/ext4", "ext4", 0, NULL) = -1 EBUSY (Device
or resource busy)


Best regards.


2024-02-21 16:20:04

by Jan Kara

[permalink] [raw]
Subject: Re: A problem about BLK_OPEN_RESTRICT_WRITES

Hello,

On Wed 21-02-24 23:29:39, JunChao Sun wrote:
> I saw that ext4 has supported BLK_OPEN_RESTRICT_WRITES in commits
> aca740cecbe("fs: open block device after superblock creation") and
> afde134b5bd0("ext4: Block writes to journal device"). I'm not certain
> whether these commits caused the following issue.
>
> Environment:
> 6.8.0-rc3-00279-g4a7bbe7519b6-dirty(commit 4a7bbe7519b6a5).
> sjc@sjc-laptop:~/linux$ mkfs.ext4 -V
> mke2fs 1.47.0 (5-Feb-2023)
> Using EXT2FS Library version 1.47.0
> sjc@sjc-laptop:~/linux$ mount -V
> mount from util-linux 2.39.1 (libmount 2.39.1: selinux, smack, btrfs,
> verity, namespaces, idmapping, assert, debug)
>
> Problem:
> When I mounted the ext4 file system in the qemu system, I encountered
> the following error:
> root@q:~/linux# mount -t ext4 ext4.img /mnt/ext4/
> [ 848.897532] loop1: detected capacity change from 0 to 2097152
> [ 848.905535] /dev/loop1: Can't open blockdev
> mount: /mnt/ext4: /dev/loop1 already mounted or mount point busy.
> dmesg(1) may have more information after failed mount system call.
>
> I reviewed the relevant code and found that the mount program first
> calls the openat system call to open the /dev/loop1 file, followed by
> the mount system call (with /dev/loop1 as the first parameter).
>
> As for the former openat system call, it eventually reaches the chain
> of (vfs_open->do_dentry_open->blkdev_open->bdev_open_by_dev->bdev_claim_write_access).
> In bdev_claim_write_access, the following logic applies:
> /* Claim exclusive or shared write access. */
> if (mode & BLK_OPEN_RESTRICT_WRITES)
> bdev_block_writes(bdev);
> else if (mode & BLK_OPEN_WRITE)
> bdev->bd_writers++;
> The argument mode here doesn't set BLK_OPEN_RESTRICT_WRITES flag, so
> goes bdev->bd_writers++.
>
> And in the latter mount system call, the following logic is followed:
> (vfs_get_tree->get_tree_bdev->setup_bdev_super->bdev_open_by_dev->bdev_may_open).
> In bdev_may_open, the following logic applies:
> if (mode & BLK_OPEN_RESTRICT_WRITES && bdev->bd_writers > 0)
> return false;
>
> Due to the fact that the argument mode has already been set with the
> BLK_OPEN_RESTRICT_WRITES flag in the setup_bdev_super function, and
> since bdev->bd_writers is already 1 at this point, the function
> returns false. This ultimately leads to the mount system call
> returning the EBUSY error.

Yes, this is correct. How mount(8) sets up loop devices gets broken by when
BLK_OPEN_RESTRICT_WRITES is enabled.

> Is this indeed a problem, or is there a misunderstanding in my
> comprehension? If it is indeed a problem, can we resolve it by
> removing the BLK_OPEN_RESTRICT_WRITES from the sb_open_mode macro
> definition?

No. Cases like above are the reason why there's still a config option
CONFIG_BLK_DEV_WRITE_MOUNTED and it defaults to 'y'. We need to be fixing
userspace - util-linux in this case - to avoid having writeable file handle
open to block devices that are being mounted.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-02-21 21:48:04

by Theodore Ts'o

[permalink] [raw]
Subject: Re: A problem about BLK_OPEN_RESTRICT_WRITES

On Wed, Feb 21, 2024 at 05:19:54PM +0100, Jan Kara wrote:
>
> No. Cases like above are the reason why there's still a config option
> CONFIG_BLK_DEV_WRITE_MOUNTED and it defaults to 'y'. We need to be fixing
> userspace - util-linux in this case - to avoid having writeable file handle
> open to block devices that are being mounted.

Note also that at least as far as ext4 is concerned, I don't recommend
that people use CONFIG_BLK_DEV_WRITE_MOUNTED on production systems.
This will break programs like tune2fs operating on mounted file
systems. There is a plan to add super to allow various superblock
tuning operations to bet set via ioctls, much like the new ioctl's
which allow the label and uuid to be set via an ioctl. This will
require users upgrade to newer kerrnels and newer versions of
e2fsprogs, so it will a while before we're at that point.

For now, the main use of CONFIG_BLK_DEV_WRITE_MOUNTED is to prevent
tools like syzbot from issuing false positives; I don't recommend that
it be used in other situations.

Cheers,

- Ted