Colin Ian King reported the following
1. create a minix file system and mount it
2. open a file on the file system with O_RDWR | O_CREAT | O_TRUNC | O_DIRECT
3. open fails with -EINVAL but leaves an empty file behind. All other open() failures don't leave the
failed open files behind.
The reason is because when checking the O_DIRECT in do_dentry_open, the inode has created, and later err
processing can't remove the inode.
The patch will remove the file in last step of open in do_open().
Signed-off-by: Qinghua Jin <[email protected]>
Reported-by: Colin Ian King <[email protected]>
---
fs/namei.c | 6 ++++++
fs/open.c | 6 ------
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 1f9d2187c765..081feb804154 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3426,6 +3426,12 @@ static int do_open(struct nameidata *nd,
error = vfs_open(&nd->path, file);
if (!error)
error = ima_file_check(file, op->acc_mode);
+ if (!error && (file->f_flags & O_DIRECT)) {
+ if (!file->f_mapping->a_ops || !file->f_mapping->a_ops->direct_IO) {
+ do_unlinkat(AT_FDCWD, getname_kernel(nd->name->name));
+ return -EINVAL;
+ }
+ }
if (!error && do_truncate)
error = handle_truncate(mnt_userns, file);
if (unlikely(error > 0)) {
diff --git a/fs/open.c b/fs/open.c
index f732fb94600c..2829c3613c0f 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -838,12 +838,6 @@ static int do_dentry_open(struct file *f,
file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
- /* NB: we're sure to have correct a_ops only after f_op->open */
- if (f->f_flags & O_DIRECT) {
- if (!f->f_mapping->a_ops || !f->f_mapping->a_ops->direct_IO)
- return -EINVAL;
- }
-
/*
* XXX: Huge page cache doesn't support writing yet. Drop all page
* cache for this file before processing writes.
--
2.30.2
On Wed, Jan 05, 2022 at 05:23:30AM +0800, Qinghua Jin wrote:
> OK, thanks.
> So, how to resolve this bug? I think we should check the param before
> creating the inode and dentry. Maybe the best place is in the lookup_open
> where the last component get created, and the parent inode and child inode
> is in the same filesystem, so I think we can use
> dir_inode->i_mapping->a_ops->direct_IO to check the O_DIRECT param. The
> code looks like this:
> if((open_flag & O_CREAT) && (open_flag & O_DIRECT)) {
> if (!dir_inode->i_mapping || !dir_inode->i_mapping->a_ops ||
> !dir_inode->i_mapping->a_ops->direct_IO)
> return ERR_PTR(-EINVAL);
> }
> should do the work.
Why would it? Seriously, why would directories' use of page cache resemble that
by the regular files on the same fs? Sure, for minixfs we have that - directory
contents there happens to be stored in exact same way as for regular files.
For many other filesystem types it's not true.
Incidentally, how would an inode possibly get NULL ->i_mapping? Or NULL
->a_ops, for that matter (IOW, the existing check is also partially
cargo-culted)...
Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: bf75e8b6842a7dba690263318503c228c1fb93b2 ("[PATCH v2] vfs: fix bug when opening a file with O_DIRECT on a file system that does not support it will leave an empty file")
url: https://github.com/0day-ci/linux/commits/Qinghua-Jin/vfs-fix-bug-when-opening-a-file-with-O_DIRECT-on-a-file-system-that-does-not-support-it-will-leave-an-empty-file/20220104-174321
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git c9e6606c7fe92b50a02ce51dda82586ebdf99b48
patch link: https://lore.kernel.org/linux-fsdevel/[email protected]
in testcase: trinity
version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
with following parameters:
runtime: 300s
group: group-02
test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 305.785680][ T3724] WARNING: possible recursive locking detected
[ 305.792683][ T3724] 5.16.0-rc8-00001-gbf75e8b6842a #1 Not tainted
[ 305.799629][ T3724] --------------------------------------------
[ 305.805972][ T3724] trinity-main/3724 is trying to acquire lock:
[ 305.829352][ T3724] ffff8881bc402448 (sb_writers#5){.+.+}-{0:0}, at: do_unlinkat (fs/namei.c:4152)
[ 305.836035][ T3724]
[ 305.836035][ T3724] but task is already holding lock:
[ 305.847224][ T3724] ffff8881bc402448 (sb_writers#5){.+.+}-{0:0}, at: do_open (fs/nfs/dir.c:998)
[ 305.853885][ T3724]
[ 305.853885][ T3724] other info that might help us debug this:
[ 305.865444][ T3724] Possible unsafe locking scenario:
[ 305.865444][ T3724]
[ 305.876999][ T3724] CPU0
[ 305.882412][ T3724] ----
[ 305.888119][ T3724] lock(sb_writers#5);
[ 305.893721][ T3724] lock(sb_writers#5);
[ 305.899301][ T3724]
[ 305.899301][ T3724] *** DEADLOCK ***
[ 305.899301][ T3724]
[ 305.914974][ T3724] May be due to missing lock nesting notation
[ 305.914974][ T3724]
[ 305.926124][ T3724] 1 lock held by trinity-main/3724:
[ 305.931835][ T3724] #0: ffff8881bc402448 (sb_writers#5){.+.+}-{0:0}, at: do_open (fs/nfs/dir.c:998)
[ 305.938387][ T3724]
[ 305.938387][ T3724] stack backtrace:
[ 305.948298][ T3724] CPU: 0 PID: 3724 Comm: trinity-main Not tainted 5.16.0-rc8-00001-gbf75e8b6842a #1
[ 305.954179][ T3724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 305.960312][ T3724] Call Trace:
[ 305.965171][ T3724] <TASK>
[ 305.979438][ T3724] dump_stack_lvl (lib/dump_stack.c:107)
[ 305.984394][ T3724] __lock_acquire.cold (kernel/locking/lockdep.c:2956 kernel/locking/lockdep.c:2999 kernel/locking/lockdep.c:3788 kernel/locking/lockdep.c:5027)
[ 305.989632][ T3724] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4885)
[ 305.994509][ T3724] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[ 305.999423][ T3724] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283 kernel/rcu/update.c:125)
[ 306.004326][ T3724] lock_acquire (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5639 kernel/locking/lockdep.c:5602)
[ 306.008945][ T3724] ? do_unlinkat (fs/namei.c:4152)
[ 306.034198][ T3724] ? rcu_read_unlock (include/linux/rcupdate.h:717 (discriminator 5))
[ 306.040394][ T3724] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[ 306.045137][ T3724] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[ 306.049566][ T3724] mnt_want_write (include/linux/percpu-rwsem.h:53 include/linux/fs.h:1810 include/linux/fs.h:1880 fs/namespace.c:376)
[ 306.053945][ T3724] ? do_unlinkat (fs/namei.c:4152)
[ 306.058228][ T3724] do_unlinkat (fs/namei.c:4152)
[ 306.061922][ T3724] ? __x64_sys_rmdir (fs/namei.c:4134)
[ 306.065653][ T3724] ? rcu_read_lock_bh_held (kernel/rcu/update.c:120)
[ 306.069157][ T3724] ? __kasan_slab_alloc (mm/kasan/common.c:46 mm/kasan/common.c:434 mm/kasan/common.c:467)
[ 306.072498][ T3724] ? kmem_cache_alloc (include/trace/events/kmem.h:54 mm/slub.c:3249)
[ 306.075899][ T3724] ? process_measurement (security/integrity/ima/ima_main.c:512)
[ 306.079195][ T3724] ? memcpy (mm/kasan/shadow.c:65 (discriminator 1))
[ 306.082202][ T3724] do_open (fs/nfs/dir.c:1006)
[ 306.085397][ T3724] path_openat (fs/namei.c:3566)
[ 306.088636][ T3724] ? do_open (fs/nfs/dir.c:2048)
[ 306.091782][ T3724] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4885)
[ 306.094996][ T3724] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[ 306.098025][ T3724] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[ 306.101057][ T3724] do_filp_open (fs/namei.c:3592)
[ 306.104074][ T3724] ? expand_files (fs/file.c:206 (discriminator 2))
[ 306.106969][ T3724] ? path_openat (fs/namei.c:3586)
[ 306.109934][ T3724] ? do_raw_spin_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:513 include/asm-generic/qspinlock.h:82 kernel/locking/spinlock_debug.c:115)
[ 306.112940][ T3724] ? rwlock_bug+0xc0/0xc0
[ 306.115909][ T3724] ? _raw_spin_unlock (arch/x86/include/asm/preempt.h:103 include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:186)
[ 306.118948][ T3724] ? alloc_fd (fs/file.c:526 (discriminator 10))
[ 306.122085][ T3724] do_sys_openat2 (fs/open.c:1206)
[ 306.125414][ T3724] ? file_open_root (fs/open.c:1192)
[ 306.128632][ T3724] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[ 306.131859][ T3724] do_sys_open (fs/open.c:1220)
[ 306.134832][ T3724] ? filp_open (fs/open.c:1220)
[ 306.137880][ T3724] ? __ia32_compat_sys_ia32_stat64 (arch/x86/kernel/sys_ia32.c:177)
[ 306.141081][ T3724] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:4293 kernel/locking/lockdep.c:4244)
[ 306.144229][ T3724] ? syscall_enter_from_user_mode_prepare (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 kernel/entry/common.c:118)
[ 306.147464][ T3724] ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4356)
[ 306.150356][ T3724] __do_fast_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:178)
[ 306.153198][ T3724] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283 kernel/rcu/update.c:125)
[ 306.156313][ T3724] ? rcu_read_lock_bh_held (kernel/rcu/update.c:120)
[ 306.159209][ T3724] ? __do_fast_syscall_32 (arch/x86/entry/common.c:183)
[ 306.162071][ T3724] ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4356)
[ 306.164820][ T3724] ? __do_fast_syscall_32 (arch/x86/entry/common.c:183)
[ 306.167728][ T3724] ? __do_fast_syscall_32 (arch/x86/entry/common.c:183)
[ 306.170564][ T3724] ? __do_fast_syscall_32 (arch/x86/entry/common.c:183)
[ 306.173390][ T3724] ? __do_fast_syscall_32 (arch/x86/entry/common.c:183)
[ 306.176149][ T3724] do_fast_syscall_32 (arch/x86/entry/common.c:203)
[ 306.178775][ T3724] entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:141)
[ 306.181608][ T3724] RIP: 0023:0xf7f9b549
[ 306.184195][ T3724] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
All code
========
0: 03 74 c0 01 add 0x1(%rax,%rax,8),%esi
4: 10 05 03 74 b8 01 adc %al,0x1b87403(%rip) # 0x1b8740d
a: 10 06 adc %al,(%rsi)
c: 03 74 b4 01 add 0x1(%rsp,%rsi,4),%esi
10: 10 07 adc %al,(%rdi)
12: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi
16: 10 08 adc %cl,(%rax)
18: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi
1c: 00 00 add %al,(%rax)
1e: 00 00 add %al,(%rax)
20: 00 51 52 add %dl,0x52(%rcx)
23: 55 push %rbp
24: 89 e5 mov %esp,%ebp
26: 0f 34 sysenter
28: cd 80 int $0x80
2a:* 5d pop %rbp <-- trapping instruction
2b: 5a pop %rdx
2c: 59 pop %rcx
2d: c3 retq
2e: 90 nop
2f: 90 nop
30: 90 nop
31: 90 nop
32: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
39: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
Code starting with the faulting instruction
===========================================
0: 5d pop %rbp
1: 5a pop %rdx
2: 59 pop %rcx
3: c3 retq
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
f: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
To reproduce:
# build kernel
cd linux
cp config-5.16.0-rc8-00001-gbf75e8b6842a .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang