2024-03-12 07:10:55

by Weiß, Simone

[permalink] [raw]
Subject: possible deadlock in ovl_llseek 27c1936af506

Dear Miklos and Amir,

For some experimentation, I have been running fuzzing campaigns and I
noticed a possible deadlock in ovl_llseek .

As there is a C reproducer, it could be bisected being introduced with:

commit 27c1936af5068b5367078a65df6a3d4de3e94e9a
Author: Miklos Szeredi <[email protected]>
Date: Mon Apr 12 12:00:37 2021 +0200

ovl: allow upperdir inside lowerdir

commit 708fa01597fa002599756bf56a96d0de1677375c upstream.

Commit 146d62e5a586 ("ovl: detect overlapping layers") made sure we don't
have overlapping layers, but it also broke the arguably valid use case of

mount -olowerdir=/,upperdir=/subdir,..

where upperdir overlaps lowerdir on the same filesystem. This has been
causing regressions.

Revert the check, but only for the specific case where upperdir and/or
workdir are subdirectories of lowerdir. Any other overlap (e.g. lowerdir
is subdirectory of upperdir, etc) case is crazy, so leave the check in
place for those.

Overlaps are detected at lookup time too, so reverting the mount time check
should be safe.

It was reproducible on v5.10.212 and a syz-crush check also found crashes on
v6.8-rc1.


The C reproducer is automatically generated by syzkaller and included below.

If you need any further information, just let me know.

Regards,
Simone

Log:
======================================================
WARNING: possible circular locking dependency detected
5.10.34-eb-corbos-standard-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor175/7735 is trying to acquire lock:
ffff00000c54a0a0
(&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_inode_lock
fs/overlayfs/overlayfs.h:362 [inline]
(&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_llseek+0xec/0x194
fs/overlayfs/file.c:207

but task is already holding lock:
ffff00000c60eca0 (&sb->s_type->i_mutex_key#15/5){+.+.}-{3:3}, at:
inode_lock_nested include/linux/fs.h:809 [inline]
ffff00000c60eca0 (&sb->s_type->i_mutex_key#15/5){+.+.}-{3:3}, at:
lock_rename+0x10c/0x144 fs/namei.c:2772

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&sb->s_type->i_mutex_key#15/5
){+.+.}-{3:3}:
__lock_release kernel/locking/lockdep.c:5160 [inline]
lock_release+0x244/0x390 kernel/locking/lockdep.c:5464
up_write+0x4c/0x154 kernel/locking/rwsem.c:1609
inode_unlock include/linux/fs.h:779 [inline]
unlock_rename+0x28/0x60 fs/namei.c:2779
ovl_workdir_ok fs/overlayfs/super.c:915 [inline]
ovl_get_workdir fs/overlayfs/super.c:1405 [inline]
ovl_fill_super+0x62c/0x28d0 fs/overlayfs/super.c:1965
mount_nodev+0x70/0xf0 fs/super.c:1465
ovl_mount+0x3c/0x50 fs/overlayfs/super.c:2050
legacy_get_tree+0x34/0xb0 fs/fs_context.c:592
vfs_get_tree+0x34/0xe0 fs/super.c:1549
do_new_mount fs/namespace.c:2881 [inline]
path_mount+0xd50/0x1600 fs/namespace.c:3211
do_mount fs/namespace.c:3224 [inline]
__do_sys_mount fs/namespace.c:3432 [inline]
__se_sys_mount fs/namespace.c:3409 [inline]
__arm64_sys_mount+0x680/0x7d0 fs/namespace.c:3409
__invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
do_el0_svc+0xe0/0x340 arch/arm64/kernel/syscall.c:197
el0_svc+0x24/0x34 arch/arm64/kernel/entry-common.c:367
el0_sync_handler+0xec/0x210 arch/arm64/kernel/entry-common.c:383
el0_sync+0x17c/0x180 arch/arm64/kernel/entry.S:672

-> #1 (&type->s_vfs_rename_key){+.+.}-{3:3}:
lock_acquire+0x68/0x84 kernel/locking/lockdep.c:5417
__mutex_lock_common kernel/locking/mutex.c:959 [inline]
__mutex_lock+0x84/0x730 kernel/locking/mutex.c:1106
mutex_lock_nested+0x40/0x50 kernel/locking/mutex.c:1121
lock_rename+0x3c/0x144 fs/namei.c:2755
ovl_copy_up_workdir fs/overlayfs/copy_up.c:595 [inline]
ovl_do_copy_up fs/overlayfs/copy_up.c:746 [inline]
ovl_copy_up_one+0x434/0x12ec fs/overlayfs/copy_up.c:916
ovl_copy_up_flags+0x100/0x164 fs/overlayfs/copy_up.c:961
ovl_maybe_copy_up+0x104/0x14c fs/overlayfs/copy_up.c:993
ovl_open+0x4c/0x110 fs/overlayfs/file.c:154
do_dentry_open+0x2a0/0x5c0 fs/open.c:817
vfs_open+0x38/0x50 fs/open.c:931
do_open fs/namei.c:3243 [inline]
path_openat+0xc88/0x1050 fs/namei.c:3360
do_filp_open+0x8c/0x170 fs/namei.c:3387
do_sys_openat2+0xf4/0x240 fs/open.c:1172
__do_sys_openat2 fs/open.c:1227 [inline]
__se_sys_openat2 fs/open.c:1207 [inline]
__arm64_sys_openat2+0x304/0x410 fs/open.c:1207
__invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
do_el0_svc+0xe0/0x340 arch/arm64/kernel/syscall.c:197
el0_svc+0x24/0x34 arch/arm64/kernel/entry-common.c:367
el0_sync_handler+0xec/0x210 arch/arm64/kernel/entry-common.c:383
el0_sync+0x17c/0x180 arch/arm64/kernel/entry.S:672

-> #0 (&ovl_i_lock_key[depth]){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:2869 [inline]
check_prevs_add kernel/locking/lockdep.c:2994 [inline]
validate_chain kernel/locking/lockdep.c:3609 [inline]
__lock_acquire+0x10ec/0x18a4 kernel/locking/lockdep.c:4834
lock_acquire.part.0+0xec/0x2e0 kernel/locking/lockdep.c:5444
lock_acquire+0x68/0x84 kernel/locking/lockdep.c:5417
__mutex_lock_common kernel/locking/mutex.c:959 [inline]
__mutex_lock+0x84/0x730 kernel/locking/mutex.c:1106
mutex_lock_nested+0x40/0x50 kernel/locking/mutex.c:1121
ovl_inode_lock fs/overlayfs/overlayfs.h:362 [inline]
ovl_llseek+0xec/0x194 fs/overlayfs/file.c:207
vfs_llseek+0x60/0x80 fs/read_write.c:300
ovl_copy_up_data+0x21c/0x390 fs/overlayfs/copy_up.c:199
ovl_copy_up_inode+0x258/0x2b4 fs/overlayfs/copy_up.c:507
ovl_copy_up_workdir fs/overlayfs/copy_up.c:609 [inline]
ovl_do_copy_up fs/overlayfs/copy_up.c:746 [inline]
ovl_copy_up_one+0x4cc/0x12ec fs/overlayfs/copy_up.c:916
ovl_copy_up_flags+0x100/0x164 fs/overlayfs/copy_up.c:961
ovl_maybe_copy_up+0x104/0x14c fs/overlayfs/copy_up.c:993
ovl_open+0x4c/0x110 fs/overlayfs/file.c:154
do_dentry_open+0x2a0/0x5c0 fs/open.c:817
vfs_open+0x38/0x50 fs/open.c:931
do_open fs/namei.c:3243 [inline]
path_openat+0xc88/0x1050 fs/namei.c:3360
do_filp_open+0x8c/0x170 fs/namei.c:3387
do_sys_openat2+0xf4/0x240 fs/open.c:1172
__do_sys_openat2 fs/open.c:1227 [inline]
__se_sys_openat2 fs/open.c:1207 [inline]
__arm64_sys_openat2+0x304/0x410 fs/open.c:1207
__invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
do_el0_svc+0xe0/0x340 arch/arm64/kernel/syscall.c:197
el0_svc+0x24/0x34 arch/arm64/kernel/entry-common.c:367
el0_sync_handler+0xec/0x210 arch/arm64/kernel/entry-common.c:383
el0_sync+0x17c/0x180 arch/arm64/kernel/entry.S:672

other info that might help us debug this:

Chain exists of:
&ovl_i_lock_key[depth] --> &type->s_vfs_rename_key --> &sb->s_type-
>i_mutex_key#15/5

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&sb->s_type->i_mutex_key#15/5);
lock(&type->s_vfs_rename_key);
lock(&sb->s_type->i_mutex_key#15/5);
lock(&ovl_i_lock_key[depth]);

*** DEADLOCK ***

5 locks held by syz-executor175/7735:
#0: ffff000005302438 (sb_writers#9){.+.+}-{0:0}, at: __sb_start_write
include/linux/fs.h:1594 [inline]
#0: ffff000005302438 (sb_writers#9){.+.+}-{0:0}, at: sb_start_write
include/linux/fs.h:1664 [inline]
#0: ffff000005302438 (sb_writers#9){.+.+}-{0:0}, at: mnt_want_write+0x24/0x80
fs/namespace.c:354
#1: ffff00000c54bcc0 (&ovl_i_lock_key[depth]#2){+.+.}-{3:3}, at:
ovl_inode_lock_interruptible fs/overlayfs/overlayfs.h:367 [inline]
#1: ffff00000c54bcc0 (&ovl_i_lock_key[depth]#2){+.+.}-{3:3}, at:
ovl_copy_up_start+0x34/0x160 fs/overlayfs/util.c:533
#2: ffff000005302720 (&type->s_vfs_rename_key){+.+.}-{3:3}, at:
lock_rename+0x3c/0x144 fs/namei.c:2755
#3: ffff000006711110 (&sb->s_type->i_mutex_key#15/1){+.+.}-{3:3}, at:
inode_lock_nested include/linux/fs.h:809 [inline]
#3: ffff000006711110 (&sb->s_type->i_mutex_key#15/1){+.+.}-{3:3}, at:
lock_rename+0xfc/0x144 fs/namei.c:2771
#4: ffff00000c60eca0 (&sb->s_type->i_mutex_key#15/5){+.+.}-{3:3}, at:
inode_lock_nested include/linux/fs.h:809 [inline]
#4: ffff00000c60eca0 (&sb->s_type->i_mutex_key#15/5){+.+.}-{3:3}, at:
lock_rename+0x10c/0x144 fs/namei.c:2772

stack backtrace:
CPU: 0 PID: 7735 Comm: syz-executor175 Not tainted 5.10.34-eb-corbos-standard-
syzkaller #0
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2e0 arch/arm64/include/asm/atomic_ll_sc.h:222
show_stack+0x2c/0x40 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d4/0x26c lib/dump_stack.c:118
print_circular_bug+0x1f8/0x200 kernel/locking/lockdep.c:1997
check_noncircular+0x100/0x114 kernel/locking/lockdep.c:2118
check_prev_add kernel/locking/lockdep.c:2869 [inline]
check_prevs_add kernel/locking/lockdep.c:2994 [inline]
validate_chain kernel/locking/lockdep.c:3609 [inline]
__lock_acquire+0x10ec/0x18a4 kernel/locking/lockdep.c:4834
lock_acquire.part.0+0xec/0x2e0 kernel/locking/lockdep.c:5444
lock_acquire+0x68/0x84 kernel/locking/lockdep.c:5417
__mutex_lock_common kernel/locking/mutex.c:959 [inline]
__mutex_lock+0x84/0x730 kernel/locking/mutex.c:1106
mutex_lock_nested+0x40/0x50 kernel/locking/mutex.c:1121
ovl_inode_lock fs/overlayfs/overlayfs.h:362 [inline]
ovl_llseek+0xec/0x194 fs/overlayfs/file.c:207
vfs_llseek+0x60/0x80 fs/read_write.c:300
ovl_copy_up_data+0x21c/0x390 fs/overlayfs/copy_up.c:199
ovl_copy_up_inode+0x258/0x2b4 fs/overlayfs/copy_up.c:507
ovl_copy_up_workdir fs/overlayfs/copy_up.c:609 [inline]
ovl_do_copy_up fs/overlayfs/copy_up.c:746 [inline]
ovl_copy_up_one+0x4cc/0x12ec fs/overlayfs/copy_up.c:916
ovl_copy_up_flags+0x100/0x164 fs/overlayfs/copy_up.c:961
ovl_maybe_copy_up+0x104/0x14c fs/overlayfs/copy_up.c:993
ovl_open+0x4c/0x110 fs/overlayfs/file.c:154
do_dentry_open+0x2a0/0x5c0 fs/open.c:817
vfs_open+0x38/0x50 fs/open.c:931
do_open fs/namei.c:3243 [inline]
path_openat+0xc88/0x1050 fs/namei.c:3360
do_filp_open+0x8c/0x170 fs/namei.c:3387
do_sys_openat2+0xf4/0x240 fs/open.c:1172
__do_sys_openat2 fs/open.c:1227 [inline]
__se_sys_openat2 fs/open.c:1207 [inline]
__arm64_sys_openat2+0x304/0x410 fs/open.c:1207
__invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
do_el0_svc+0xe0/0x340 arch/arm64/kernel/syscall.c:197
el0_svc+0x24/0x34 arch/arm64/kernel/entry-common.c:367
el0_sync_handler+0xec/0x210 arch/arm64/kernel/entry-common.c:383
el0_sync+0x17c/0x180 arch/arm64/kernel/entry.S:672


C Reproducer:

// https://None.appspot.com/bug?id=f10e9988ed129179c80858a403259185ef332f5d
// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#include <linux/futex.h>

#ifndef __NR_chdir
#define __NR_chdir 49
#endif
#ifndef __NR_mkdirat
#define __NR_mkdirat 34
#endif
#ifndef __NR_mmap
#define __NR_mmap 222
#endif
#ifndef __NR_mount
#define __NR_mount 40
#endif
#ifndef __NR_openat
#define __NR_openat 56
#endif
#ifndef __NR_openat2
#define __NR_openat2 437
#endif
#ifndef __NR_write
#define __NR_write 64
#endif

static unsigned long long procid;

static void sleep_ms(uint64_t ms)
{
usleep(ms * 1000);
}

static uint64_t current_time_ms(void)
{
struct timespec ts;
if (clock_gettime(CLOCK_MONOTONIC, &ts))
exit(1);
return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}

static void use_temporary_dir(void)
{
char tmpdir_template[] = "./syzkaller.XXXXXX";
char* tmpdir = mkdtemp(tmpdir_template);
if (!tmpdir)
exit(1);
if (chmod(tmpdir, 0777))
exit(1);
if (chdir(tmpdir))
exit(1);
}

static void thread_start(void* (*fn)(void*), void* arg)
{
pthread_t th;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 128 << 10);
int i = 0;
for (; i < 100; i++) {
if (pthread_create(&th, &attr, fn, arg) == 0) {
pthread_attr_destroy(&attr);
return;
}
if (errno == EAGAIN) {
usleep(50);
continue;
}
break;
}
exit(1);
}

typedef struct {
int state;
} event_t;

static void event_init(event_t* ev)
{
ev->state = 0;
}

static void event_reset(event_t* ev)
{
ev->state = 0;
}

static void event_set(event_t* ev)
{
if (ev->state)
exit(1);
__atomic_store_n(&ev->state, 1, __ATOMIC_RELEASE);
syscall(SYS_futex, &ev->state, FUTEX_WAKE | FUTEX_PRIVATE_FLAG, 1000000);
}

static void event_wait(event_t* ev)
{
while (!__atomic_load_n(&ev->state, __ATOMIC_ACQUIRE))
syscall(SYS_futex, &ev->state, FUTEX_WAIT | FUTEX_PRIVATE_FLAG, 0, 0);
}

static int event_isset(event_t* ev)
{
return __atomic_load_n(&ev->state, __ATOMIC_ACQUIRE);
}

static int event_timedwait(event_t* ev, uint64_t timeout)
{
uint64_t start = current_time_ms();
uint64_t now = start;
for (;;) {
uint64_t remain = timeout - (now - start);
struct timespec ts;
ts.tv_sec = remain / 1000;
ts.tv_nsec = (remain % 1000) * 1000 * 1000;
syscall(SYS_futex, &ev->state, FUTEX_WAIT | FUTEX_PRIVATE_FLAG, 0, &ts);
if (__atomic_load_n(&ev->state, __ATOMIC_ACQUIRE))
return 1;
now = current_time_ms();
if (now - start > timeout)
return 0;
}
}

static bool write_file(const char* file, const char* what, ...)
{
char buf[1024];
va_list args;
va_start(args, what);
vsnprintf(buf, sizeof(buf), what, args);
va_end(args);
buf[sizeof(buf) - 1] = 0;
int len = strlen(buf);
int fd = open(file, O_WRONLY | O_CLOEXEC);
if (fd == -1)
return false;
if (write(fd, buf, len) != len) {
int err = errno;
close(fd);
errno = err;
return false;
}
close(fd);
return true;
}

#define FS_IOC_SETFLAGS _IOW('f', 2, long)
static void remove_dir(const char* dir)
{
int iter = 0;
DIR* dp = 0;
retry:
while (umount2(dir, MNT_DETACH | UMOUNT_NOFOLLOW) == 0) {
}
dp = opendir(dir);
if (dp == NULL) {
if (errno == EMFILE) {
exit(1);
}
exit(1);
}
struct dirent* ep = 0;
while ((ep = readdir(dp))) {
if (strcmp(ep->d_name, ".") == 0 || strcmp(ep->d_name, "..") == 0)
continue;
char filename[FILENAME_MAX];
snprintf(filename, sizeof(filename), "%s/%s", dir, ep->d_name);
while (umount2(filename, MNT_DETACH | UMOUNT_NOFOLLOW) == 0) {
}
struct stat st;
if (lstat(filename, &st))
exit(1);
if (S_ISDIR(st.st_mode)) {
remove_dir(filename);
continue;
}
int i;
for (i = 0;; i++) {
if (unlink(filename) == 0)
break;
if (errno == EPERM) {
int fd = open(filename, O_RDONLY);
if (fd != -1) {
long flags = 0;
if (ioctl(fd, FS_IOC_SETFLAGS, &flags) == 0) {
}
close(fd);
continue;
}
}
if (errno == EROFS) {
break;
}
if (errno != EBUSY || i > 100)
exit(1);
if (umount2(filename, MNT_DETACH | UMOUNT_NOFOLLOW))
exit(1);
}
}
closedir(dp);
for (int i = 0;; i++) {
if (rmdir(dir) == 0)
break;
if (i < 100) {
if (errno == EPERM) {
int fd = open(dir, O_RDONLY);
if (fd != -1) {
long flags = 0;
if (ioctl(fd, FS_IOC_SETFLAGS, &flags) == 0) {
}
close(fd);
continue;
}
}
if (errno == EROFS) {
break;
}
if (errno == EBUSY) {
if (umount2(dir, MNT_DETACH | UMOUNT_NOFOLLOW))
exit(1);
continue;
}
if (errno == ENOTEMPTY) {
if (iter < 100) {
iter++;
goto retry;
}
}
}
exit(1);
}
}

static void kill_and_wait(int pid, int* status)
{
kill(-pid, SIGKILL);
kill(pid, SIGKILL);
for (int i = 0; i < 100; i++) {
if (waitpid(-1, status, WNOHANG | __WALL) == pid)
return;
usleep(1000);
}
DIR* dir = opendir("/sys/fs/fuse/connections");
if (dir) {
for (;;) {
struct dirent* ent = readdir(dir);
if (!ent)
break;
if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
continue;
char abort[300];
snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
ent->d_name);
int fd = open(abort, O_WRONLY);
if (fd == -1) {
continue;
}
if (write(fd, abort, 1) < 0) {
}
close(fd);
}
closedir(dir);
} else {
}
while (waitpid(-1, status, __WALL) != pid) {
}
}

static void setup_test()
{
prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
setpgrp();
write_file("/proc/self/oom_score_adj", "1000");
if (symlink("/dev/binderfs", "./binderfs")) {
}
}

struct thread_t {
int created, call;
event_t ready, done;
};

static struct thread_t threads[16];
static void execute_call(int call);
static int running;

static void* thr(void* arg)
{
struct thread_t* th = (struct thread_t*)arg;
for (;;) {
event_wait(&th->ready);
event_reset(&th->ready);
execute_call(th->call);
__atomic_fetch_sub(&running, 1, __ATOMIC_RELAXED);
event_set(&th->done);
}
return 0;
}

static void execute_one(void)
{
int i, call, thread;
for (call = 0; call < 12; call++) {
for (thread = 0; thread < (int)(sizeof(threads) / sizeof(threads[0]));
thread++) {
struct thread_t* th = &threads[thread];
if (!th->created) {
th->created = 1;
event_init(&th->ready);
event_init(&th->done);
event_set(&th->done);
thread_start(thr, th);
}
if (!event_isset(&th->done))
continue;
event_reset(&th->done);
th->call = call;
__atomic_fetch_add(&running, 1, __ATOMIC_RELAXED);
event_set(&th->ready);
event_timedwait(&th->done, 50);
break;
}
}
for (i = 0; i < 100 && __atomic_load_n(&running, __ATOMIC_RELAXED); i++)
sleep_ms(1);
}

static void execute_one(void);

#define WAIT_FLAGS __WALL

static void loop(void)
{
int iter = 0;
for (;; iter++) {
char cwdbuf[32];
sprintf(cwdbuf, "./%d", iter);
if (mkdir(cwdbuf, 0777))
exit(1);
int pid = fork();
if (pid < 0)
exit(1);
if (pid == 0) {
if (chdir(cwdbuf))
exit(1);
setup_test();
execute_one();
exit(0);
}
int status = 0;
uint64_t start = current_time_ms();
for (;;) {
if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
break;
sleep_ms(1);
if (current_time_ms() - start < 5000)
continue;
kill_and_wait(pid, &status);
break;
}
remove_dir(cwdbuf);
}
}

uint64_t r[1] = {0xffffffffffffffff};

void execute_call(int call)
{
intptr_t res = 0;
switch (call) {
case 0:
memcpy((void*)0x20000000, "./file0\000", 8);
syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x20000000ul,
/*mode=*/0ul);
break;
case 1:
memcpy((void*)0x20000080, "./file0\000", 8);
memcpy((void*)0x200000c0, "ramfs\000", 6);
syscall(__NR_mount, /*src=*/0ul, /*dst=*/0x20000080ul,
/*type=*/0x200000c0ul, /*flags=*/0ul, /*data=*/0ul);
break;
case 2:
memcpy((void*)0x20000240, "./file0/file0\000", 14);
syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x20000240ul,
/*mode=*/0ul);
break;
case 3:
memcpy((void*)0x20000480, "./file0/file0\000", 14);
syscall(__NR_chdir, /*dir=*/0x20000480ul);
break;
case 4:
memcpy((void*)0x20000180, "./file0\000", 8);
syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x20000180ul,
/*mode=*/0ul);
break;
case 5:
memcpy((void*)0x20000080, "./file0/file1\000", 14);
res =
syscall(__NR_openat, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20000080ul,
/*flags=O_CREAT|O_CLOEXEC|O_RDWR*/ 0x80042ul, /*mode=*/0ul);
if (res != -1)
r[0] = res;
break;
case 6:
syscall(__NR_write, /*fd=*/r[0], /*data=*/0x20000980ul, /*len=*/0x58ul);
break;
case 7:
memcpy((void*)0x200000c0, "./file0/file0\000", 14);
syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x200000c0ul,
/*mode=*/0ul);
break;
case 8:
memcpy((void*)0x20000280, "./file1\000", 8);
syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x20000280ul,
/*mode=*/0ul);
break;
case 9:
memcpy((void*)0x20000200, "./file0\000", 8);
memcpy((void*)0x200001c0, "overlay\000", 8);
memcpy((void*)0x20000340, "lowerdir", 8);
*(uint8_t*)0x20000348 = 0x3d;
memcpy((void*)0x20000349, "./file0", 7);
*(uint8_t*)0x20000350 = 0x2c;
memcpy((void*)0x20000351, "workdir", 7);
*(uint8_t*)0x20000358 = 0x3d;
memcpy((void*)0x20000359, "./file1", 7);
*(uint8_t*)0x20000360 = 0x2c;
memcpy((void*)0x20000361, "upperdir", 8);
*(uint8_t*)0x20000369 = 0x3d;
memcpy((void*)0x2000036a, "./file0/file0", 13);
*(uint8_t*)0x20000377 = 0x2c;
*(uint8_t*)0x20000378 = 0;
syscall(__NR_mount, /*src=*/0ul, /*dst=*/0x20000200ul,
/*type=*/0x200001c0ul, /*flags=*/0ul, /*opts=*/0x20000340ul);
break;
case 10:
memcpy((void*)0x20000200, "./file0\000", 8);
memcpy((void*)0x200001c0, "overlay\000", 8);
memcpy((void*)0x20000340, "lowerdir", 8);
*(uint8_t*)0x20000348 = 0x3d;
memcpy((void*)0x20000349, "./file0", 7);
*(uint8_t*)0x20000350 = 0x2c;
memcpy((void*)0x20000351, "workdir", 7);
*(uint8_t*)0x20000358 = 0x3d;
memcpy((void*)0x20000359, "./file1", 7);
*(uint8_t*)0x20000360 = 0x2c;
memcpy((void*)0x20000361, "upperdir", 8);
*(uint8_t*)0x20000369 = 0x3d;
memcpy((void*)0x2000036a, "./file0/file0", 13);
*(uint8_t*)0x20000377 = 0x2c;
*(uint8_t*)0x20000378 = 0;
syscall(__NR_mount, /*src=*/0ul, /*dst=*/0x20000200ul,
/*type=*/0x200001c0ul, /*flags=*/0ul, /*opts=*/0x20000340ul);
break;
case 11:
memcpy((void*)0x20000100, "./file0/file1\000", 14);
*(uint64_t*)0x20000140 = 0x80041;
*(uint64_t*)0x20000148 = 0;
*(uint64_t*)0x20000150 = 0;
syscall(__NR_openat2, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20000100ul,
/*how=*/0x20000140ul, /*size=*/0x18ul);
break;
}
}
int main(void)
{
syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
/*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
/*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
/*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
/*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
/*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
/*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
/*offset=*/0ul);
for (procid = 0; procid < 6; procid++) {
if (fork() == 0) {
use_temporary_dir();
loop();
}
}
sleep(1000000);
return 0;

}


2024-03-12 10:55:34

by Hillf Danton

[permalink] [raw]
Subject: Re: possible deadlock in ovl_llseek 27c1936af506

On Tue, 12 Mar 2024 07:10:27 +0000 Weiß, Simone <[email protected]>
>
> For some experimentation, I have been running fuzzing campaigns and I
> noticed a possible deadlock in ovl_llseek .

Feel free to take a look at another case of deadlock [1]

[1] https://lore.kernel.org/lkml/ZPOtwcMHN_fpdrpt@boqun-archlinux/

2024-03-13 13:14:43

by Amir Goldstein

[permalink] [raw]
Subject: Re: possible deadlock in ovl_llseek 27c1936af506

On Tue, Mar 12, 2024 at 9:10 AM Weiß, Simone
<[email protected]> wrote:
>
> Dear Miklos and Amir,
>
> For some experimentation, I have been running fuzzing campaigns and I
> noticed a possible deadlock in ovl_llseek .
>
> As there is a C reproducer, it could be bisected being introduced with:
>
> commit 27c1936af5068b5367078a65df6a3d4de3e94e9a
> Author: Miklos Szeredi <[email protected]>
> Date: Mon Apr 12 12:00:37 2021 +0200
>
> ovl: allow upperdir inside lowerdir
>
> commit 708fa01597fa002599756bf56a96d0de1677375c upstream.
>
> Commit 146d62e5a586 ("ovl: detect overlapping layers") made sure we don't
> have overlapping layers, but it also broke the arguably valid use case of
>
> mount -olowerdir=/,upperdir=/subdir,..
>
> where upperdir overlaps lowerdir on the same filesystem. This has been
> causing regressions.
>
> Revert the check, but only for the specific case where upperdir and/or
> workdir are subdirectories of lowerdir. Any other overlap (e.g. lowerdir
> is subdirectory of upperdir, etc) case is crazy, so leave the check in
> place for those.
>
> Overlaps are detected at lookup time too, so reverting the mount time check
> should be safe.
>
> It was reproducible on v5.10.212 and a syz-crush check also found crashes on
> v6.8-rc1.
>
>

The reason for this report is calling llseek() on lower ovl from
ovl_copy_up_data() when ovl_copy_up_data() is called with upper
inode lock and the lower ovl uses the same upper fs.

It looks to me like the possible deadlock should have been solved by commit
c63e56a4a652 ovl: do not open/llseek lower file with upper sb_writers held
that moved ovl_copy_up_data() out of the inode_lock() scope.

I am not sure what the statement "a syz-crush check also found crashes
on v6.8-rc1."
means - does it mean that this reproducer produced this same lockdep warning
on upstream kernel?

Thanks,
Amir.


> The C reproducer is automatically generated by syzkaller and included below.
>
> If you need any further information, just let me know.
>
> Regards,
> Simone
>
> Log:
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.10.34-eb-corbos-standard-syzkaller #0 Not tainted
> ------------------------------------------------------
> syz-executor175/7735 is trying to acquire lock:
> ffff00000c54a0a0
> (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_inode_lock
> fs/overlayfs/overlayfs.h:362 [inline]
> (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_llseek+0xec/0x194
> fs/overlayfs/file.c:207
>
> but task is already holding lock:
> ffff00000c60eca0 (&sb->s_type->i_mutex_key#15/5){+.+.}-{3:3}, at:
> inode_lock_nested include/linux/fs.h:809 [inline]
> ffff00000c60eca0 (&sb->s_type->i_mutex_key#15/5){+.+.}-{3:3}, at:
> lock_rename+0x10c/0x144 fs/namei.c:2772
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&sb->s_type->i_mutex_key#15/5
> ){+.+.}-{3:3}:
> __lock_release kernel/locking/lockdep.c:5160 [inline]
> lock_release+0x244/0x390 kernel/locking/lockdep.c:5464
> up_write+0x4c/0x154 kernel/locking/rwsem.c:1609
> inode_unlock include/linux/fs.h:779 [inline]
> unlock_rename+0x28/0x60 fs/namei.c:2779
> ovl_workdir_ok fs/overlayfs/super.c:915 [inline]
> ovl_get_workdir fs/overlayfs/super.c:1405 [inline]
> ovl_fill_super+0x62c/0x28d0 fs/overlayfs/super.c:1965
> mount_nodev+0x70/0xf0 fs/super.c:1465
> ovl_mount+0x3c/0x50 fs/overlayfs/super.c:2050
> legacy_get_tree+0x34/0xb0 fs/fs_context.c:592
> vfs_get_tree+0x34/0xe0 fs/super.c:1549
> do_new_mount fs/namespace.c:2881 [inline]
> path_mount+0xd50/0x1600 fs/namespace.c:3211
> do_mount fs/namespace.c:3224 [inline]
> __do_sys_mount fs/namespace.c:3432 [inline]
> __se_sys_mount fs/namespace.c:3409 [inline]
> __arm64_sys_mount+0x680/0x7d0 fs/namespace.c:3409
> __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
> invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
> el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
> do_el0_svc+0xe0/0x340 arch/arm64/kernel/syscall.c:197
> el0_svc+0x24/0x34 arch/arm64/kernel/entry-common.c:367
> el0_sync_handler+0xec/0x210 arch/arm64/kernel/entry-common.c:383
> el0_sync+0x17c/0x180 arch/arm64/kernel/entry.S:672
>
> -> #1 (&type->s_vfs_rename_key){+.+.}-{3:3}:
> lock_acquire+0x68/0x84 kernel/locking/lockdep.c:5417
> __mutex_lock_common kernel/locking/mutex.c:959 [inline]
> __mutex_lock+0x84/0x730 kernel/locking/mutex.c:1106
> mutex_lock_nested+0x40/0x50 kernel/locking/mutex.c:1121
> lock_rename+0x3c/0x144 fs/namei.c:2755
> ovl_copy_up_workdir fs/overlayfs/copy_up.c:595 [inline]
> ovl_do_copy_up fs/overlayfs/copy_up.c:746 [inline]
> ovl_copy_up_one+0x434/0x12ec fs/overlayfs/copy_up.c:916
> ovl_copy_up_flags+0x100/0x164 fs/overlayfs/copy_up.c:961
> ovl_maybe_copy_up+0x104/0x14c fs/overlayfs/copy_up.c:993
> ovl_open+0x4c/0x110 fs/overlayfs/file.c:154
> do_dentry_open+0x2a0/0x5c0 fs/open.c:817
> vfs_open+0x38/0x50 fs/open.c:931
> do_open fs/namei.c:3243 [inline]
> path_openat+0xc88/0x1050 fs/namei.c:3360
> do_filp_open+0x8c/0x170 fs/namei.c:3387
> do_sys_openat2+0xf4/0x240 fs/open.c:1172
> __do_sys_openat2 fs/open.c:1227 [inline]
> __se_sys_openat2 fs/open.c:1207 [inline]
> __arm64_sys_openat2+0x304/0x410 fs/open.c:1207
> __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
> invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
> el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
> do_el0_svc+0xe0/0x340 arch/arm64/kernel/syscall.c:197
> el0_svc+0x24/0x34 arch/arm64/kernel/entry-common.c:367
> el0_sync_handler+0xec/0x210 arch/arm64/kernel/entry-common.c:383
> el0_sync+0x17c/0x180 arch/arm64/kernel/entry.S:672
>
> -> #0 (&ovl_i_lock_key[depth]){+.+.}-{3:3}:
> check_prev_add kernel/locking/lockdep.c:2869 [inline]
> check_prevs_add kernel/locking/lockdep.c:2994 [inline]
> validate_chain kernel/locking/lockdep.c:3609 [inline]
> __lock_acquire+0x10ec/0x18a4 kernel/locking/lockdep.c:4834
> lock_acquire.part.0+0xec/0x2e0 kernel/locking/lockdep.c:5444
> lock_acquire+0x68/0x84 kernel/locking/lockdep.c:5417
> __mutex_lock_common kernel/locking/mutex.c:959 [inline]
> __mutex_lock+0x84/0x730 kernel/locking/mutex.c:1106
> mutex_lock_nested+0x40/0x50 kernel/locking/mutex.c:1121
> ovl_inode_lock fs/overlayfs/overlayfs.h:362 [inline]
> ovl_llseek+0xec/0x194 fs/overlayfs/file.c:207
> vfs_llseek+0x60/0x80 fs/read_write.c:300
> ovl_copy_up_data+0x21c/0x390 fs/overlayfs/copy_up.c:199
> ovl_copy_up_inode+0x258/0x2b4 fs/overlayfs/copy_up.c:507
> ovl_copy_up_workdir fs/overlayfs/copy_up.c:609 [inline]
> ovl_do_copy_up fs/overlayfs/copy_up.c:746 [inline]
> ovl_copy_up_one+0x4cc/0x12ec fs/overlayfs/copy_up.c:916
> ovl_copy_up_flags+0x100/0x164 fs/overlayfs/copy_up.c:961
> ovl_maybe_copy_up+0x104/0x14c fs/overlayfs/copy_up.c:993
> ovl_open+0x4c/0x110 fs/overlayfs/file.c:154
> do_dentry_open+0x2a0/0x5c0 fs/open.c:817
> vfs_open+0x38/0x50 fs/open.c:931
> do_open fs/namei.c:3243 [inline]
> path_openat+0xc88/0x1050 fs/namei.c:3360
> do_filp_open+0x8c/0x170 fs/namei.c:3387
> do_sys_openat2+0xf4/0x240 fs/open.c:1172
> __do_sys_openat2 fs/open.c:1227 [inline]
> __se_sys_openat2 fs/open.c:1207 [inline]
> __arm64_sys_openat2+0x304/0x410 fs/open.c:1207
> __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
> invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
> el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
> do_el0_svc+0xe0/0x340 arch/arm64/kernel/syscall.c:197
> el0_svc+0x24/0x34 arch/arm64/kernel/entry-common.c:367
> el0_sync_handler+0xec/0x210 arch/arm64/kernel/entry-common.c:383
> el0_sync+0x17c/0x180 arch/arm64/kernel/entry.S:672
>
> other info that might help us debug this:
>
> Chain exists of:
> &ovl_i_lock_key[depth] --> &type->s_vfs_rename_key --> &sb->s_type-
> >i_mutex_key#15/5
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&sb->s_type->i_mutex_key#15/5);
> lock(&type->s_vfs_rename_key);
> lock(&sb->s_type->i_mutex_key#15/5);
> lock(&ovl_i_lock_key[depth]);
>
> *** DEADLOCK ***
>
> 5 locks held by syz-executor175/7735:
> #0: ffff000005302438 (sb_writers#9){.+.+}-{0:0}, at: __sb_start_write
> include/linux/fs.h:1594 [inline]
> #0: ffff000005302438 (sb_writers#9){.+.+}-{0:0}, at: sb_start_write
> include/linux/fs.h:1664 [inline]
> #0: ffff000005302438 (sb_writers#9){.+.+}-{0:0}, at: mnt_want_write+0x24/0x80
> fs/namespace.c:354
> #1: ffff00000c54bcc0 (&ovl_i_lock_key[depth]#2){+.+.}-{3:3}, at:
> ovl_inode_lock_interruptible fs/overlayfs/overlayfs.h:367 [inline]
> #1: ffff00000c54bcc0 (&ovl_i_lock_key[depth]#2){+.+.}-{3:3}, at:
> ovl_copy_up_start+0x34/0x160 fs/overlayfs/util.c:533
> #2: ffff000005302720 (&type->s_vfs_rename_key){+.+.}-{3:3}, at:
> lock_rename+0x3c/0x144 fs/namei.c:2755
> #3: ffff000006711110 (&sb->s_type->i_mutex_key#15/1){+.+.}-{3:3}, at:
> inode_lock_nested include/linux/fs.h:809 [inline]
> #3: ffff000006711110 (&sb->s_type->i_mutex_key#15/1){+.+.}-{3:3}, at:
> lock_rename+0xfc/0x144 fs/namei.c:2771
> #4: ffff00000c60eca0 (&sb->s_type->i_mutex_key#15/5){+.+.}-{3:3}, at:
> inode_lock_nested include/linux/fs.h:809 [inline]
> #4: ffff00000c60eca0 (&sb->s_type->i_mutex_key#15/5){+.+.}-{3:3}, at:
> lock_rename+0x10c/0x144 fs/namei.c:2772
>
> stack backtrace:
> CPU: 0 PID: 7735 Comm: syz-executor175 Not tainted 5.10.34-eb-corbos-standard-
> syzkaller #0
> Hardware name: linux,dummy-virt (DT)
> Call trace:
> dump_backtrace+0x0/0x2e0 arch/arm64/include/asm/atomic_ll_sc.h:222
> show_stack+0x2c/0x40 arch/arm64/kernel/stacktrace.c:196
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x1d4/0x26c lib/dump_stack.c:118
> print_circular_bug+0x1f8/0x200 kernel/locking/lockdep.c:1997
> check_noncircular+0x100/0x114 kernel/locking/lockdep.c:2118
> check_prev_add kernel/locking/lockdep.c:2869 [inline]
> check_prevs_add kernel/locking/lockdep.c:2994 [inline]
> validate_chain kernel/locking/lockdep.c:3609 [inline]
> __lock_acquire+0x10ec/0x18a4 kernel/locking/lockdep.c:4834
> lock_acquire.part.0+0xec/0x2e0 kernel/locking/lockdep.c:5444
> lock_acquire+0x68/0x84 kernel/locking/lockdep.c:5417
> __mutex_lock_common kernel/locking/mutex.c:959 [inline]
> __mutex_lock+0x84/0x730 kernel/locking/mutex.c:1106
> mutex_lock_nested+0x40/0x50 kernel/locking/mutex.c:1121
> ovl_inode_lock fs/overlayfs/overlayfs.h:362 [inline]
> ovl_llseek+0xec/0x194 fs/overlayfs/file.c:207
> vfs_llseek+0x60/0x80 fs/read_write.c:300
> ovl_copy_up_data+0x21c/0x390 fs/overlayfs/copy_up.c:199
> ovl_copy_up_inode+0x258/0x2b4 fs/overlayfs/copy_up.c:507
> ovl_copy_up_workdir fs/overlayfs/copy_up.c:609 [inline]
> ovl_do_copy_up fs/overlayfs/copy_up.c:746 [inline]
> ovl_copy_up_one+0x4cc/0x12ec fs/overlayfs/copy_up.c:916
> ovl_copy_up_flags+0x100/0x164 fs/overlayfs/copy_up.c:961
> ovl_maybe_copy_up+0x104/0x14c fs/overlayfs/copy_up.c:993
> ovl_open+0x4c/0x110 fs/overlayfs/file.c:154
> do_dentry_open+0x2a0/0x5c0 fs/open.c:817
> vfs_open+0x38/0x50 fs/open.c:931
> do_open fs/namei.c:3243 [inline]
> path_openat+0xc88/0x1050 fs/namei.c:3360
> do_filp_open+0x8c/0x170 fs/namei.c:3387
> do_sys_openat2+0xf4/0x240 fs/open.c:1172
> __do_sys_openat2 fs/open.c:1227 [inline]
> __se_sys_openat2 fs/open.c:1207 [inline]
> __arm64_sys_openat2+0x304/0x410 fs/open.c:1207
> __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
> invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
> el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
> do_el0_svc+0xe0/0x340 arch/arm64/kernel/syscall.c:197
> el0_svc+0x24/0x34 arch/arm64/kernel/entry-common.c:367
> el0_sync_handler+0xec/0x210 arch/arm64/kernel/entry-common.c:383
> el0_sync+0x17c/0x180 arch/arm64/kernel/entry.S:672
>
>
> C Reproducer:
>
> // https://None.appspot.com/bug?id=f10e9988ed129179c80858a403259185ef332f5d
> // autogenerated by syzkaller (https://github.com/google/syzkaller)
>
> #define _GNU_SOURCE
>
> #include <dirent.h>
> #include <endian.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <pthread.h>
> #include <signal.h>
> #include <stdarg.h>
> #include <stdbool.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/ioctl.h>
> #include <sys/mount.h>
> #include <sys/prctl.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <time.h>
> #include <unistd.h>
>
> #include <linux/futex.h>
>
> #ifndef __NR_chdir
> #define __NR_chdir 49
> #endif
> #ifndef __NR_mkdirat
> #define __NR_mkdirat 34
> #endif
> #ifndef __NR_mmap
> #define __NR_mmap 222
> #endif
> #ifndef __NR_mount
> #define __NR_mount 40
> #endif
> #ifndef __NR_openat
> #define __NR_openat 56
> #endif
> #ifndef __NR_openat2
> #define __NR_openat2 437
> #endif
> #ifndef __NR_write
> #define __NR_write 64
> #endif
>
> static unsigned long long procid;
>
> static void sleep_ms(uint64_t ms)
> {
> usleep(ms * 1000);
> }
>
> static uint64_t current_time_ms(void)
> {
> struct timespec ts;
> if (clock_gettime(CLOCK_MONOTONIC, &ts))
> exit(1);
> return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
> }
>
> static void use_temporary_dir(void)
> {
> char tmpdir_template[] = "./syzkaller.XXXXXX";
> char* tmpdir = mkdtemp(tmpdir_template);
> if (!tmpdir)
> exit(1);
> if (chmod(tmpdir, 0777))
> exit(1);
> if (chdir(tmpdir))
> exit(1);
> }
>
> static void thread_start(void* (*fn)(void*), void* arg)
> {
> pthread_t th;
> pthread_attr_t attr;
> pthread_attr_init(&attr);
> pthread_attr_setstacksize(&attr, 128 << 10);
> int i = 0;
> for (; i < 100; i++) {
> if (pthread_create(&th, &attr, fn, arg) == 0) {
> pthread_attr_destroy(&attr);
> return;
> }
> if (errno == EAGAIN) {
> usleep(50);
> continue;
> }
> break;
> }
> exit(1);
> }
>
> typedef struct {
> int state;
> } event_t;
>
> static void event_init(event_t* ev)
> {
> ev->state = 0;
> }
>
> static void event_reset(event_t* ev)
> {
> ev->state = 0;
> }
>
> static void event_set(event_t* ev)
> {
> if (ev->state)
> exit(1);
> __atomic_store_n(&ev->state, 1, __ATOMIC_RELEASE);
> syscall(SYS_futex, &ev->state, FUTEX_WAKE | FUTEX_PRIVATE_FLAG, 1000000);
> }
>
> static void event_wait(event_t* ev)
> {
> while (!__atomic_load_n(&ev->state, __ATOMIC_ACQUIRE))
> syscall(SYS_futex, &ev->state, FUTEX_WAIT | FUTEX_PRIVATE_FLAG, 0, 0);
> }
>
> static int event_isset(event_t* ev)
> {
> return __atomic_load_n(&ev->state, __ATOMIC_ACQUIRE);
> }
>
> static int event_timedwait(event_t* ev, uint64_t timeout)
> {
> uint64_t start = current_time_ms();
> uint64_t now = start;
> for (;;) {
> uint64_t remain = timeout - (now - start);
> struct timespec ts;
> ts.tv_sec = remain / 1000;
> ts.tv_nsec = (remain % 1000) * 1000 * 1000;
> syscall(SYS_futex, &ev->state, FUTEX_WAIT | FUTEX_PRIVATE_FLAG, 0, &ts);
> if (__atomic_load_n(&ev->state, __ATOMIC_ACQUIRE))
> return 1;
> now = current_time_ms();
> if (now - start > timeout)
> return 0;
> }
> }
>
> static bool write_file(const char* file, const char* what, ...)
> {
> char buf[1024];
> va_list args;
> va_start(args, what);
> vsnprintf(buf, sizeof(buf), what, args);
> va_end(args);
> buf[sizeof(buf) - 1] = 0;
> int len = strlen(buf);
> int fd = open(file, O_WRONLY | O_CLOEXEC);
> if (fd == -1)
> return false;
> if (write(fd, buf, len) != len) {
> int err = errno;
> close(fd);
> errno = err;
> return false;
> }
> close(fd);
> return true;
> }
>
> #define FS_IOC_SETFLAGS _IOW('f', 2, long)
> static void remove_dir(const char* dir)
> {
> int iter = 0;
> DIR* dp = 0;
> retry:
> while (umount2(dir, MNT_DETACH | UMOUNT_NOFOLLOW) == 0) {
> }
> dp = opendir(dir);
> if (dp == NULL) {
> if (errno == EMFILE) {
> exit(1);
> }
> exit(1);
> }
> struct dirent* ep = 0;
> while ((ep = readdir(dp))) {
> if (strcmp(ep->d_name, ".") == 0 || strcmp(ep->d_name, "..") == 0)
> continue;
> char filename[FILENAME_MAX];
> snprintf(filename, sizeof(filename), "%s/%s", dir, ep->d_name);
> while (umount2(filename, MNT_DETACH | UMOUNT_NOFOLLOW) == 0) {
> }
> struct stat st;
> if (lstat(filename, &st))
> exit(1);
> if (S_ISDIR(st.st_mode)) {
> remove_dir(filename);
> continue;
> }
> int i;
> for (i = 0;; i++) {
> if (unlink(filename) == 0)
> break;
> if (errno == EPERM) {
> int fd = open(filename, O_RDONLY);
> if (fd != -1) {
> long flags = 0;
> if (ioctl(fd, FS_IOC_SETFLAGS, &flags) == 0) {
> }
> close(fd);
> continue;
> }
> }
> if (errno == EROFS) {
> break;
> }
> if (errno != EBUSY || i > 100)
> exit(1);
> if (umount2(filename, MNT_DETACH | UMOUNT_NOFOLLOW))
> exit(1);
> }
> }
> closedir(dp);
> for (int i = 0;; i++) {
> if (rmdir(dir) == 0)
> break;
> if (i < 100) {
> if (errno == EPERM) {
> int fd = open(dir, O_RDONLY);
> if (fd != -1) {
> long flags = 0;
> if (ioctl(fd, FS_IOC_SETFLAGS, &flags) == 0) {
> }
> close(fd);
> continue;
> }
> }
> if (errno == EROFS) {
> break;
> }
> if (errno == EBUSY) {
> if (umount2(dir, MNT_DETACH | UMOUNT_NOFOLLOW))
> exit(1);
> continue;
> }
> if (errno == ENOTEMPTY) {
> if (iter < 100) {
> iter++;
> goto retry;
> }
> }
> }
> exit(1);
> }
> }
>
> static void kill_and_wait(int pid, int* status)
> {
> kill(-pid, SIGKILL);
> kill(pid, SIGKILL);
> for (int i = 0; i < 100; i++) {
> if (waitpid(-1, status, WNOHANG | __WALL) == pid)
> return;
> usleep(1000);
> }
> DIR* dir = opendir("/sys/fs/fuse/connections");
> if (dir) {
> for (;;) {
> struct dirent* ent = readdir(dir);
> if (!ent)
> break;
> if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
> continue;
> char abort[300];
> snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
> ent->d_name);
> int fd = open(abort, O_WRONLY);
> if (fd == -1) {
> continue;
> }
> if (write(fd, abort, 1) < 0) {
> }
> close(fd);
> }
> closedir(dir);
> } else {
> }
> while (waitpid(-1, status, __WALL) != pid) {
> }
> }
>
> static void setup_test()
> {
> prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
> setpgrp();
> write_file("/proc/self/oom_score_adj", "1000");
> if (symlink("/dev/binderfs", "./binderfs")) {
> }
> }
>
> struct thread_t {
> int created, call;
> event_t ready, done;
> };
>
> static struct thread_t threads[16];
> static void execute_call(int call);
> static int running;
>
> static void* thr(void* arg)
> {
> struct thread_t* th = (struct thread_t*)arg;
> for (;;) {
> event_wait(&th->ready);
> event_reset(&th->ready);
> execute_call(th->call);
> __atomic_fetch_sub(&running, 1, __ATOMIC_RELAXED);
> event_set(&th->done);
> }
> return 0;
> }
>
> static void execute_one(void)
> {
> int i, call, thread;
> for (call = 0; call < 12; call++) {
> for (thread = 0; thread < (int)(sizeof(threads) / sizeof(threads[0]));
> thread++) {
> struct thread_t* th = &threads[thread];
> if (!th->created) {
> th->created = 1;
> event_init(&th->ready);
> event_init(&th->done);
> event_set(&th->done);
> thread_start(thr, th);
> }
> if (!event_isset(&th->done))
> continue;
> event_reset(&th->done);
> th->call = call;
> __atomic_fetch_add(&running, 1, __ATOMIC_RELAXED);
> event_set(&th->ready);
> event_timedwait(&th->done, 50);
> break;
> }
> }
> for (i = 0; i < 100 && __atomic_load_n(&running, __ATOMIC_RELAXED); i++)
> sleep_ms(1);
> }
>
> static void execute_one(void);
>
> #define WAIT_FLAGS __WALL
>
> static void loop(void)
> {
> int iter = 0;
> for (;; iter++) {
> char cwdbuf[32];
> sprintf(cwdbuf, "./%d", iter);
> if (mkdir(cwdbuf, 0777))
> exit(1);
> int pid = fork();
> if (pid < 0)
> exit(1);
> if (pid == 0) {
> if (chdir(cwdbuf))
> exit(1);
> setup_test();
> execute_one();
> exit(0);
> }
> int status = 0;
> uint64_t start = current_time_ms();
> for (;;) {
> if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
> break;
> sleep_ms(1);
> if (current_time_ms() - start < 5000)
> continue;
> kill_and_wait(pid, &status);
> break;
> }
> remove_dir(cwdbuf);
> }
> }
>
> uint64_t r[1] = {0xffffffffffffffff};
>
> void execute_call(int call)
> {
> intptr_t res = 0;
> switch (call) {
> case 0:
> memcpy((void*)0x20000000, "./file0\000", 8);
> syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x20000000ul,
> /*mode=*/0ul);
> break;
> case 1:
> memcpy((void*)0x20000080, "./file0\000", 8);
> memcpy((void*)0x200000c0, "ramfs\000", 6);
> syscall(__NR_mount, /*src=*/0ul, /*dst=*/0x20000080ul,
> /*type=*/0x200000c0ul, /*flags=*/0ul, /*data=*/0ul);
> break;
> case 2:
> memcpy((void*)0x20000240, "./file0/file0\000", 14);
> syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x20000240ul,
> /*mode=*/0ul);
> break;
> case 3:
> memcpy((void*)0x20000480, "./file0/file0\000", 14);
> syscall(__NR_chdir, /*dir=*/0x20000480ul);
> break;
> case 4:
> memcpy((void*)0x20000180, "./file0\000", 8);
> syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x20000180ul,
> /*mode=*/0ul);
> break;
> case 5:
> memcpy((void*)0x20000080, "./file0/file1\000", 14);
> res =
> syscall(__NR_openat, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20000080ul,
> /*flags=O_CREAT|O_CLOEXEC|O_RDWR*/ 0x80042ul, /*mode=*/0ul);
> if (res != -1)
> r[0] = res;
> break;
> case 6:
> syscall(__NR_write, /*fd=*/r[0], /*data=*/0x20000980ul, /*len=*/0x58ul);
> break;
> case 7:
> memcpy((void*)0x200000c0, "./file0/file0\000", 14);
> syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x200000c0ul,
> /*mode=*/0ul);
> break;
> case 8:
> memcpy((void*)0x20000280, "./file1\000", 8);
> syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x20000280ul,
> /*mode=*/0ul);
> break;
> case 9:
> memcpy((void*)0x20000200, "./file0\000", 8);
> memcpy((void*)0x200001c0, "overlay\000", 8);
> memcpy((void*)0x20000340, "lowerdir", 8);
> *(uint8_t*)0x20000348 = 0x3d;
> memcpy((void*)0x20000349, "./file0", 7);
> *(uint8_t*)0x20000350 = 0x2c;
> memcpy((void*)0x20000351, "workdir", 7);
> *(uint8_t*)0x20000358 = 0x3d;
> memcpy((void*)0x20000359, "./file1", 7);
> *(uint8_t*)0x20000360 = 0x2c;
> memcpy((void*)0x20000361, "upperdir", 8);
> *(uint8_t*)0x20000369 = 0x3d;
> memcpy((void*)0x2000036a, "./file0/file0", 13);
> *(uint8_t*)0x20000377 = 0x2c;
> *(uint8_t*)0x20000378 = 0;
> syscall(__NR_mount, /*src=*/0ul, /*dst=*/0x20000200ul,
> /*type=*/0x200001c0ul, /*flags=*/0ul, /*opts=*/0x20000340ul);
> break;
> case 10:
> memcpy((void*)0x20000200, "./file0\000", 8);
> memcpy((void*)0x200001c0, "overlay\000", 8);
> memcpy((void*)0x20000340, "lowerdir", 8);
> *(uint8_t*)0x20000348 = 0x3d;
> memcpy((void*)0x20000349, "./file0", 7);
> *(uint8_t*)0x20000350 = 0x2c;
> memcpy((void*)0x20000351, "workdir", 7);
> *(uint8_t*)0x20000358 = 0x3d;
> memcpy((void*)0x20000359, "./file1", 7);
> *(uint8_t*)0x20000360 = 0x2c;
> memcpy((void*)0x20000361, "upperdir", 8);
> *(uint8_t*)0x20000369 = 0x3d;
> memcpy((void*)0x2000036a, "./file0/file0", 13);
> *(uint8_t*)0x20000377 = 0x2c;
> *(uint8_t*)0x20000378 = 0;
> syscall(__NR_mount, /*src=*/0ul, /*dst=*/0x20000200ul,
> /*type=*/0x200001c0ul, /*flags=*/0ul, /*opts=*/0x20000340ul);
> break;
> case 11:
> memcpy((void*)0x20000100, "./file0/file1\000", 14);
> *(uint64_t*)0x20000140 = 0x80041;
> *(uint64_t*)0x20000148 = 0;
> *(uint64_t*)0x20000150 = 0;
> syscall(__NR_openat2, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20000100ul,
> /*how=*/0x20000140ul, /*size=*/0x18ul);
> break;
> }
> }
> int main(void)
> {
> syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
> /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> for (procid = 0; procid < 6; procid++) {
> if (fork() == 0) {
> use_temporary_dir();
> loop();
> }
> }
> sleep(1000000);
> return 0;
>
> }

2024-03-13 14:47:53

by Miklos Szeredi

[permalink] [raw]
Subject: Re: possible deadlock in ovl_llseek 27c1936af506

On Wed, 13 Mar 2024 at 14:14, Amir Goldstein <[email protected]> wrote:

> The reason for this report is calling llseek() on lower ovl from
> ovl_copy_up_data() when ovl_copy_up_data() is called with upper
> inode lock and the lower ovl uses the same upper fs.
>
> It looks to me like the possible deadlock should have been solved by commit
> c63e56a4a652 ovl: do not open/llseek lower file with upper sb_writers held
> that moved ovl_copy_up_data() out of the inode_lock() scope.

That commit is in v6.7, so something different must be happening on v6.8-rc1.

Simone, please send a new report for v6.8-rc1 if a lockdep splat can
be reproduced on that kernel.

Thanks,
Miklos

2024-03-13 15:50:28

by Weiß, Simone

[permalink] [raw]
Subject: Re: possible deadlock in ovl_llseek 27c1936af506

On Wed, 2024-03-13 at 15:47 +0100, Miklos Szeredi wrote:
> On Wed, 13 Mar 2024 at 14:14, Amir Goldstein <[email protected]> wrote:
>
> > The reason for this report is calling llseek() on lower ovl from
> > ovl_copy_up_data() when ovl_copy_up_data() is called with upper
> > inode lock and the lower ovl uses the same upper fs.
> >
> > It looks to me like the possible deadlock should have been solved by commit
> > c63e56a4a652 ovl: do not open/llseek lower file with upper sb_writers held
> > that moved ovl_copy_up_data() out of the inode_lock() scope.
>
> That commit is in v6.7, so something different must be happening on v6.8-rc1.
>
> Simone, please send a new report for v6.8-rc1 if a lockdep splat can
> be reproduced on that kernel.
>
> Thanks,
> Miklos

Sure, I will try to reproduce it again and send a new report if needed.

Thanks,
Simone