2019-06-05 13:55:52

by Mark Rutland

[permalink] [raw]
Subject: "Dentry still in use" splats in v5.2-rc3

Hi All,

While fuzzing arm64 v5.2-rc3, Syzkaller started hitting splats of the
form:

BUG: Dentry (____ptrval____){i=1,n=/} still in use (2) [unmount of bpf bpf]

... which I can reliably reproduce with the following C program
(partially minimized from what Syzkaller auto-generated).

It looks like any filesystem will do. I've seen splats with "bpf",
"hugetlbfs", "rpc_pipefs", and "tmpfs", and can reproduce the problem
with any of these.

Any ideas?

I'm using the config from my fuzzing/5.2-rc3 branch on kernel.org [1].

Thanks,
Mark.

----
#include <unistd.h>
#include <sys/syscall.h>

/*
* NOTE: these are the arm64 numbers
*/
#ifndef __NR_fsconfig
#define __NR_fsconfig 431
#endif
#ifndef __NR_fsmount
#define __NR_fsmount 432
#endif
#ifndef __NR_fsopen
#define __NR_fsopen 430
#endif

int main(void)
{
int fs, mnt;

fs = syscall(__NR_fsopen, "bpf", 0);
syscall(__NR_fsconfig, fs, 6, 0, 0, 0);
mnt = syscall(__NR_fsmount, fs, 0, 0);
fchdir(mnt);

close(fs);
close(mnt);
}

----

----
[ 29.746323][ T245] BUG: Dentry (____ptrval____){i=1,n=/} still in use (2) [unmount of bpf bpf]
[ 29.748645][ T245] WARNING: CPU: 3 PID: 245 at fs/dcache.c:1529 umount_check+0x170/0x1b8
[ 29.750313][ T245] CPU: 3 PID: 245 Comm: repro Not tainted 5.2.0-rc3-00004-gff694e8 #1
[ 29.752165][ T245] Hardware name: linux,dummy-virt (DT)
[ 29.753406][ T245] pstate: 80400005 (Nzcv daif +PAN -UAO)
[ 29.754640][ T245] pc : umount_check+0x170/0x1b8
[ 29.755708][ T245] lr : umount_check+0x170/0x1b8
[ 29.756821][ T245] sp : ffff8000647b7ac0
[ 29.757730][ T245] x29: ffff8000647b7ac0 x28: ffff20001073dc38
[ 29.759047][ T245] x27: ffff8000666f4788 x26: ffff800064732040
[ 29.760428][ T245] x25: ffff10000c8e6325 x24: ffff200014f62500
[ 29.761755][ T245] x23: 0000000000000001 x22: ffff200015041e80
[ 29.763061][ T245] x21: ffff8000647aec80 x20: 0000000000000002
[ 29.764441][ T245] x19: ffff8000666f4788 x18: 0000000000000000
[ 29.765764][ T245] x17: 0000000000000000 x16: 0000000000000000
[ 29.767064][ T245] x15: 0000000000000000 x14: ffff200014f70788
[ 29.768445][ T245] x13: 00000000f2000000 x12: ffff10000d566546
[ 29.769774][ T245] x11: 1ffff0000d566545 x10: ffff10000d566545
[ 29.771098][ T245] x9 : 1ffff0000d566545 x8 : dfff200000000000
[ 29.772484][ T245] x7 : ffff10000d566546 x6 : ffff80006ab32a2f
[ 29.773820][ T245] x5 : ffff10000d566546 x4 : ffff10000d566546
[ 29.775155][ T245] x3 : 1fffe40002d30afc x2 : 24cbddc7f4015a00
[ 29.776539][ T245] x1 : 0000000000000000 x0 : 000000000000004c
[ 29.777868][ T245] Call trace:
[ 29.778598][ T245] umount_check+0x170/0x1b8
[ 29.779574][ T245] d_walk.part.2+0x100/0x6a0
[ 29.780610][ T245] do_one_tree+0x34/0x58
[ 29.781577][ T245] shrink_dcache_for_umount+0x60/0x110
[ 29.782752][ T245] generic_shutdown_super+0x68/0x360
[ 29.783913][ T245] kill_anon_super+0x44/0x70
[ 29.784932][ T245] kill_litter_super+0x4c/0x60
[ 29.786054][ T245] deactivate_locked_super+0x8c/0xf0
[ 29.787214][ T245] deactivate_super+0xd8/0xf8
[ 29.788278][ T245] cleanup_mnt+0x90/0x128
[ 29.789388][ T245] __cleanup_mnt+0x1c/0x28
[ 29.790362][ T245] task_work_run+0x124/0x198
[ 29.791374][ T245] do_notify_resume+0x664/0x778
[ 29.792440][ T245] work_pending+0x8/0x14
[ 29.793439][ T245] irq event stamp: 1502
[ 29.794374][ T245] hardirqs last enabled at (1501): [<ffff2000102afd48>] console_unlock+0x700/0xcc0
[ 29.796424][ T245] hardirqs last disabled at (1502): [<ffff200010082110>] do_debug_exception+0x118/0x438
[ 29.798605][ T245] softirqs last enabled at (1498): [<ffff200010083574>] __do_softirq+0xbc4/0x10c8
[ 29.800639][ T245] softirqs last disabled at (1443): [<ffff20001019f96c>] irq_exit+0x2c4/0x338
[ 29.802581][ T245] ---[ end trace cd8baed7622b7c8b ]---
[ 29.804034][ T245] VFS: Busy inodes after unmount of bpf. Self-destruct in 5 seconds. Have a nice day...
----

[1] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git fuzzing/5.2-rc3


2019-06-10 18:31:26

by Eric Biggers

[permalink] [raw]
Subject: Re: "Dentry still in use" splats in v5.2-rc3

On Wed, Jun 05, 2019 at 02:54:01PM +0100, Mark Rutland wrote:
> Hi All,
>
> While fuzzing arm64 v5.2-rc3, Syzkaller started hitting splats of the
> form:
>
> BUG: Dentry (____ptrval____){i=1,n=/} still in use (2) [unmount of bpf bpf]
>
> ... which I can reliably reproduce with the following C program
> (partially minimized from what Syzkaller auto-generated).
>
> It looks like any filesystem will do. I've seen splats with "bpf",
> "hugetlbfs", "rpc_pipefs", and "tmpfs", and can reproduce the problem
> with any of these.
>
> Any ideas?
>
> I'm using the config from my fuzzing/5.2-rc3 branch on kernel.org [1].
>
> Thanks,
> Mark.
>
> ----
> #include <unistd.h>
> #include <sys/syscall.h>
>
> /*
> * NOTE: these are the arm64 numbers
> */
> #ifndef __NR_fsconfig
> #define __NR_fsconfig 431
> #endif
> #ifndef __NR_fsmount
> #define __NR_fsmount 432
> #endif
> #ifndef __NR_fsopen
> #define __NR_fsopen 430
> #endif
>
> int main(void)
> {
> int fs, mnt;
>
> fs = syscall(__NR_fsopen, "bpf", 0);
> syscall(__NR_fsconfig, fs, 6, 0, 0, 0);
> mnt = syscall(__NR_fsmount, fs, 0, 0);
> fchdir(mnt);
>
> close(fs);
> close(mnt);
> }
>

David and Al, is sys_fsmount() missing a call to mntget()?

- Eric

2019-06-12 18:44:34

by Eric Biggers

[permalink] [raw]
Subject: [PATCH] vfs: fsmount: add missing mntget()

From: Eric Biggers <[email protected]>

sys_fsmount() needs to take a reference to the new mount when adding it
to the anonymous mount namespace. Otherwise the filesystem can be
unmounted while it's still in use, as found by syzkaller.

Reported-by: Mark Rutland <[email protected]>
Reported-by: [email protected]
Reported-by: [email protected]
Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock")
Signed-off-by: Eric Biggers <[email protected]>
---
fs/namespace.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index b26778bdc236e..5dc137a22d406 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3445,6 +3445,7 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
ns->root = mnt;
ns->mounts = 1;
list_add(&mnt->mnt_list, &ns->list);
+ mntget(newmount.mnt);

/* Attach to an apparent O_PATH fd with a note that we need to unmount
* it, not just simply put it.
--
2.22.0.rc2.383.gf4fbbf30c2-goog

2019-06-13 15:55:37

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH] vfs: fsmount: add missing mntget()

On Wed, Jun 12, 2019 at 11:43:13AM -0700, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> sys_fsmount() needs to take a reference to the new mount when adding it
> to the anonymous mount namespace. Otherwise the filesystem can be
> unmounted while it's still in use, as found by syzkaller.
>
> Reported-by: Mark Rutland <[email protected]>
> Reported-by: [email protected]
> Reported-by: [email protected]
> Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock")
> Signed-off-by: Eric Biggers <[email protected]>

With this patch applied, my reproducer from [1] no longer triggers the
issue. I polled /proc/meminfo, and don't see any leak. FWIW:

Tested-by: Mark Rutland <[email protected]>

Thanks for fixing this!

Mark.

[1] https://lore.kernel.org/lkml/[email protected]/

> ---
> fs/namespace.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index b26778bdc236e..5dc137a22d406 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3445,6 +3445,7 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
> ns->root = mnt;
> ns->mounts = 1;
> list_add(&mnt->mnt_list, &ns->list);
> + mntget(newmount.mnt);
>
> /* Attach to an apparent O_PATH fd with a note that we need to unmount
> * it, not just simply put it.
> --
> 2.22.0.rc2.383.gf4fbbf30c2-goog
>

2019-06-13 16:05:39

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] vfs: fsmount: add missing mntget()

On Wed, Jun 12, 2019 at 11:43:13AM -0700, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> sys_fsmount() needs to take a reference to the new mount when adding it
> to the anonymous mount namespace. Otherwise the filesystem can be
> unmounted while it's still in use, as found by syzkaller.

So it needs one count for the file (which dentry_open() obtains) and one for the
attachment into the anonymous namespace. The latter one is dropped at cleanup
time, so your patch appears to be correct at getting that ref.

I wonder why such a blatant use-after-free was missed in normal testing. RCU
delayed freeing, I guess?

How about this additional sanity checking patch?

Thanks,
Miklos


diff --git a/fs/namespace.c b/fs/namespace.c
index b26778bdc236..c638f220805a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -153,10 +153,10 @@ static inline void mnt_add_count(struct mount *mnt, int n)
/*
* vfsmount lock must be held for write
*/
-unsigned int mnt_get_count(struct mount *mnt)
+int mnt_get_count(struct mount *mnt)
{
#ifdef CONFIG_SMP
- unsigned int count = 0;
+ int count = 0;
int cpu;

for_each_possible_cpu(cpu) {
@@ -1140,6 +1140,8 @@ static DECLARE_DELAYED_WORK(delayed_mntput_work, delayed_mntput);

static void mntput_no_expire(struct mount *mnt)
{
+ int count;
+
rcu_read_lock();
if (likely(READ_ONCE(mnt->mnt_ns))) {
/*
@@ -1162,11 +1164,13 @@ static void mntput_no_expire(struct mount *mnt)
*/
smp_mb();
mnt_add_count(mnt, -1);
- if (mnt_get_count(mnt)) {
+ count = mnt_get_count(mnt);
+ if (count > 0) {
rcu_read_unlock();
unlock_mount_hash();
return;
}
+ WARN_ON(count < 0);
if (unlikely(mnt->mnt.mnt_flags & MNT_DOOMED)) {
rcu_read_unlock();
unlock_mount_hash();
diff --git a/fs/pnode.h b/fs/pnode.h
index 49a058c73e4c..26f74e092bd9 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -44,7 +44,7 @@ int propagate_mount_busy(struct mount *, int);
void propagate_mount_unlock(struct mount *);
void mnt_release_group_id(struct mount *);
int get_dominating_id(struct mount *mnt, const struct path *root);
-unsigned int mnt_get_count(struct mount *mnt);
+int mnt_get_count(struct mount *mnt);
void mnt_set_mountpoint(struct mount *, struct mountpoint *,
struct mount *);
void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp,

2019-06-13 16:41:30

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH] vfs: fsmount: add missing mntget()

On Thu, Jun 13, 2019 at 10:47:28AM +0200, Miklos Szeredi wrote:
> On Wed, Jun 12, 2019 at 11:43:13AM -0700, Eric Biggers wrote:
> > From: Eric Biggers <[email protected]>
> >
> > sys_fsmount() needs to take a reference to the new mount when adding it
> > to the anonymous mount namespace. Otherwise the filesystem can be
> > unmounted while it's still in use, as found by syzkaller.
>
> So it needs one count for the file (which dentry_open() obtains) and one for the
> attachment into the anonymous namespace. The latter one is dropped at cleanup
> time, so your patch appears to be correct at getting that ref.

Yes.

>
> I wonder why such a blatant use-after-free was missed in normal testing. RCU
> delayed freeing, I guess?

It's because mount freeing is delayed by task_work_add(), so normally the refcnt
would be dropped to -1 when the file is closed without problems. The problems
only showed up if you took another reference, e.g. by fchdir().

>
> How about this additional sanity checking patch?

Seems like a good idea. Without my fix, the WARNING is triggered by the
following program (no fchdir() needed):

int main(void)
{
int fs;

fs = syscall(__NR_fsopen, "ramfs", 0);
syscall(__NR_fsconfig, fs, 6, 0, 0, 0);
syscall(__NR_fsmount, fs, 0, 0);
}

Can you send it as a formal patch?

- Eric

2019-06-17 21:35:14

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] vfs: fsmount: add missing mntget()

On Wed, Jun 12, 2019 at 11:43:13AM -0700, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> sys_fsmount() needs to take a reference to the new mount when adding it
> to the anonymous mount namespace. Otherwise the filesystem can be
> unmounted while it's still in use, as found by syzkaller.
>
> Reported-by: Mark Rutland <[email protected]>
> Reported-by: [email protected]
> Reported-by: [email protected]
> Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock")
> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/namespace.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index b26778bdc236e..5dc137a22d406 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3445,6 +3445,7 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
> ns->root = mnt;
> ns->mounts = 1;
> list_add(&mnt->mnt_list, &ns->list);
> + mntget(newmount.mnt);
>
> /* Attach to an apparent O_PATH fd with a note that we need to unmount
> * it, not just simply put it.

Nice catch... Applied, with apologies for having been MIA lately.

2019-07-09 23:02:03

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH] vfs: fsmount: add missing mntget()

On Thu, Jun 13, 2019 at 10:47:28AM +0200, Miklos Szeredi wrote:
> On Wed, Jun 12, 2019 at 11:43:13AM -0700, Eric Biggers wrote:
> > From: Eric Biggers <[email protected]>
> >
> > sys_fsmount() needs to take a reference to the new mount when adding it
> > to the anonymous mount namespace. Otherwise the filesystem can be
> > unmounted while it's still in use, as found by syzkaller.
>
> So it needs one count for the file (which dentry_open() obtains) and one for the
> attachment into the anonymous namespace. The latter one is dropped at cleanup
> time, so your patch appears to be correct at getting that ref.
>
> I wonder why such a blatant use-after-free was missed in normal testing. RCU
> delayed freeing, I guess?
>
> How about this additional sanity checking patch?
>
> Thanks,
> Miklos
>
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index b26778bdc236..c638f220805a 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -153,10 +153,10 @@ static inline void mnt_add_count(struct mount *mnt, int n)
> /*
> * vfsmount lock must be held for write
> */
> -unsigned int mnt_get_count(struct mount *mnt)
> +int mnt_get_count(struct mount *mnt)
> {
> #ifdef CONFIG_SMP
> - unsigned int count = 0;
> + int count = 0;
> int cpu;
>
> for_each_possible_cpu(cpu) {
> @@ -1140,6 +1140,8 @@ static DECLARE_DELAYED_WORK(delayed_mntput_work, delayed_mntput);
>
> static void mntput_no_expire(struct mount *mnt)
> {
> + int count;
> +
> rcu_read_lock();
> if (likely(READ_ONCE(mnt->mnt_ns))) {
> /*
> @@ -1162,11 +1164,13 @@ static void mntput_no_expire(struct mount *mnt)
> */
> smp_mb();
> mnt_add_count(mnt, -1);
> - if (mnt_get_count(mnt)) {
> + count = mnt_get_count(mnt);
> + if (count > 0) {
> rcu_read_unlock();
> unlock_mount_hash();
> return;
> }
> + WARN_ON(count < 0);
> if (unlikely(mnt->mnt.mnt_flags & MNT_DOOMED)) {
> rcu_read_unlock();
> unlock_mount_hash();
> diff --git a/fs/pnode.h b/fs/pnode.h
> index 49a058c73e4c..26f74e092bd9 100644
> --- a/fs/pnode.h
> +++ b/fs/pnode.h
> @@ -44,7 +44,7 @@ int propagate_mount_busy(struct mount *, int);
> void propagate_mount_unlock(struct mount *);
> void mnt_release_group_id(struct mount *);
> int get_dominating_id(struct mount *mnt, const struct path *root);
> -unsigned int mnt_get_count(struct mount *mnt);
> +int mnt_get_count(struct mount *mnt);
> void mnt_set_mountpoint(struct mount *, struct mountpoint *,
> struct mount *);
> void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp,

Miklos, are you planning to send this as a formal patch?

- Eric

2019-07-10 00:34:30

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] vfs: fsmount: add missing mntget()

On Tue, Jul 09, 2019 at 04:00:29PM -0700, Eric Biggers wrote:

> > index 49a058c73e4c..26f74e092bd9 100644
> > --- a/fs/pnode.h
> > +++ b/fs/pnode.h
> > @@ -44,7 +44,7 @@ int propagate_mount_busy(struct mount *, int);
> > void propagate_mount_unlock(struct mount *);
> > void mnt_release_group_id(struct mount *);
> > int get_dominating_id(struct mount *mnt, const struct path *root);
> > -unsigned int mnt_get_count(struct mount *mnt);
> > +int mnt_get_count(struct mount *mnt);
> > void mnt_set_mountpoint(struct mount *, struct mountpoint *,
> > struct mount *);
> > void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp,
>
> Miklos, are you planning to send this as a formal patch?

Hold it for a while, OK? There's an unpleasant issue (a very long-standing
one) with boxen that have an obscene amount of RAM. Some of the counters
involved will need to become long. This is the coming cycle fodder (mounts
and inodes are relatively easy; it's dentry->d_count that brings arseloads
of fun) and I'd rather deal with that sanity check as part of the same series.
It's not forgotten... Patch series re limiting the number of negative
dentries is also getting into the same mix. Watch #work.dcache - what's
in there is basically prep work for the big pile for the next cycle; it'll
be interesting...

2019-10-16 07:50:23

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH] vfs: fsmount: add missing mntget()

On Wed, Jul 10, 2019 at 01:31:13AM +0100, Al Viro wrote:
> On Tue, Jul 09, 2019 at 04:00:29PM -0700, Eric Biggers wrote:
>
> > > index 49a058c73e4c..26f74e092bd9 100644
> > > --- a/fs/pnode.h
> > > +++ b/fs/pnode.h
> > > @@ -44,7 +44,7 @@ int propagate_mount_busy(struct mount *, int);
> > > void propagate_mount_unlock(struct mount *);
> > > void mnt_release_group_id(struct mount *);
> > > int get_dominating_id(struct mount *mnt, const struct path *root);
> > > -unsigned int mnt_get_count(struct mount *mnt);
> > > +int mnt_get_count(struct mount *mnt);
> > > void mnt_set_mountpoint(struct mount *, struct mountpoint *,
> > > struct mount *);
> > > void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp,
> >
> > Miklos, are you planning to send this as a formal patch?
>
> Hold it for a while, OK? There's an unpleasant issue (a very long-standing
> one) with boxen that have an obscene amount of RAM. Some of the counters
> involved will need to become long. This is the coming cycle fodder (mounts
> and inodes are relatively easy; it's dentry->d_count that brings arseloads
> of fun) and I'd rather deal with that sanity check as part of the same series.
> It's not forgotten... Patch series re limiting the number of negative
> dentries is also getting into the same mix. Watch #work.dcache - what's
> in there is basically prep work for the big pile for the next cycle; it'll
> be interesting...

Al, whatever happened to the refcounting patches you mentioned here?

- Eric