2021-04-28 17:46:22

by Pavel Skripkin

[permalink] [raw]
Subject: [PATCH] ext4: fix memory leak in ext4_fill_super

syzbot reported memory leak in ext4 subsyetem.
The problem appears, when thread_stop() call happens
before wake_up_process().

Normally, this data will be freed by
created thread, but if kthread_stop()
returned -EINTR, this data should be freed manually

Reported-by: [email protected]
Tested-by: [email protected]
Signed-off-by: Pavel Skripkin <[email protected]>
---
fs/ext4/super.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b9693680463a..9c33e97bd5c5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5156,8 +5156,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
failed_mount3:
flush_work(&sbi->s_error_work);
del_timer_sync(&sbi->s_err_report);
- if (sbi->s_mmp_tsk)
- kthread_stop(sbi->s_mmp_tsk);
+ if (sbi->s_mmp_tsk) {
+ if (kthread_stop(sbi->s_mmp_tsk) == -EINTR)
+ kfree(kthread_data(sbi->s_mmp_tsk));
+ }
failed_mount2:
rcu_read_lock();
group_desc = rcu_dereference(sbi->s_group_desc);
--
2.31.1


2021-04-29 11:09:12

by Pavel Skripkin

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix memory leak in ext4_fill_super

On Thu, 29 Apr 2021 12:01:46 +0200
Vegard Nossum <[email protected]> wrote:

>
> On 2021-04-28 19:28, Pavel Skripkin wrote:
> > syzbot reported memory leak in ext4 subsyetem.
> > The problem appears, when thread_stop() call happens
> > before wake_up_process().
> >
> > Normally, this data will be freed by
> > created thread, but if kthread_stop()
> > returned -EINTR, this data should be freed manually
> >
> > Reported-by: [email protected]
> > Tested-by: [email protected]
> > Signed-off-by: Pavel Skripkin <[email protected]>
> > ---
> > fs/ext4/super.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index b9693680463a..9c33e97bd5c5 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -5156,8 +5156,10 @@ static int ext4_fill_super(struct
> > super_block *sb, void *data, int silent) failed_mount3:
> > flush_work(&sbi->s_error_work);
> > del_timer_sync(&sbi->s_err_report);
> > - if (sbi->s_mmp_tsk)
> > - kthread_stop(sbi->s_mmp_tsk);
> > + if (sbi->s_mmp_tsk) {
> > + if (kthread_stop(sbi->s_mmp_tsk) == -EINTR)
> > + kfree(kthread_data(sbi->s_mmp_tsk));
> > + }
> > failed_mount2:
> > rcu_read_lock();
> > group_desc = rcu_dereference(sbi->s_group_desc);
> >
>
> So I've looked at this, and the puzzling thing is that ext4 uses
> kthread_run() which immediately calls wake_up_process() -- according
> to the kerneldoc for kthread_stop(), it shouldn't return -EINTR in
> this case:
>
> * Returns the result of threadfn(), or %-EINTR if wake_up_process()
> * was never called.
> */
> int kthread_stop(struct task_struct *k)
>
> So it really looks like kthread_stop() can return -EINTR even when
> wake_up_process() has been called but the thread hasn't had a chance
> to run yet?
>
> If this is true, then we either have to fix kthread_create() to make
> sure it respects the behaviour that is claimed by the comment OR we
> have to audit every single kthread_stop() in the kernel which does
> not check for -EINTR.
>

Me and Vegard found the root case of this bug:

static int kthread(void *_create)
{
....
ret = -EINTR;
if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) {
cgroup_kthread_ready();
__kthread_parkme(self);
ret = threadfn(data);
}

do_exit(ret);
}

There is a change, that kthread_stop() call will happen before this
. It means, that all kthread_stop() return value must be checked
everywhere

Vegard wrote code snippet, which reproduces this behavior:

#include <linux/printk.h>
#include <linux/proc_fs.h>
#include <linux/kthread.h>

static int test_thread(void *data)
{
printk(KERN_ERR "test_thread()\n");
return 0;
}

static int test_show(struct seq_file *seq, void *data)
{
struct task_struct *t = kthread_run(test_thread, NULL, "test");
if (!IS_ERR(t)) {
int ret = kthread_stop(t);
printk(KERN_ERR "kthread_stop() = %d\n", ret);
}

return 0;
}

static void __init init_test(void)
{
proc_create_single("test", 0444, NULL, &test_show);
}

late_initcall(init_test);


>
> Vegard



With regards,
Pavel Skripkin

2021-04-29 11:37:48

by Pavel Skripkin

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix memory leak in ext4_fill_super

On Thu, 29 Apr 2021 12:01:46 +0200
Vegard Nossum <[email protected]> wrote:

>
> On 2021-04-28 19:28, Pavel Skripkin wrote:
> > syzbot reported memory leak in ext4 subsyetem.
> > The problem appears, when thread_stop() call happens
> > before wake_up_process().
> >
> > Normally, this data will be freed by
> > created thread, but if kthread_stop()
> > returned -EINTR, this data should be freed manually
> >
> > Reported-by: [email protected]
> > Tested-by: [email protected]
> > Signed-off-by: Pavel Skripkin <[email protected]>
> > ---
> > fs/ext4/super.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index b9693680463a..9c33e97bd5c5 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -5156,8 +5156,10 @@ static int ext4_fill_super(struct
> > super_block *sb, void *data, int silent) failed_mount3:
> > flush_work(&sbi->s_error_work);
> > del_timer_sync(&sbi->s_err_report);
> > - if (sbi->s_mmp_tsk)
> > - kthread_stop(sbi->s_mmp_tsk);
> > + if (sbi->s_mmp_tsk) {
> > + if (kthread_stop(sbi->s_mmp_tsk) == -EINTR)
> > + kfree(kthread_data(sbi->s_mmp_tsk));
> > + }
> > failed_mount2:
> > rcu_read_lock();
> > group_desc = rcu_dereference(sbi->s_group_desc);
> >
>
> So I've looked at this, and the puzzling thing is that ext4 uses
> kthread_run() which immediately calls wake_up_process() -- according
> to the kerneldoc for kthread_stop(), it shouldn't return -EINTR in
> this case:
>
> * Returns the result of threadfn(), or %-EINTR if wake_up_process()
> * was never called.
> */
> int kthread_stop(struct task_struct *k)
>
> So it really looks like kthread_stop() can return -EINTR even when
> wake_up_process() has been called but the thread hasn't had a chance
> to run yet?
>
> If this is true, then we either have to fix kthread_create() to make
> sure it respects the behaviour that is claimed by the comment OR we
> have to audit every single kthread_stop() in the kernel which does
> not check for -EINTR.
>
>
> Vegard

I am sorry for my complitely broken mail client :(

Me and Vegard found the root case of this bug:

static int kthread(void *_create)
{
....
ret = -EINTR;
if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) {
cgroup_kthread_ready();
__kthread_parkme(self);
ret = threadfn(data);
}

do_exit(ret);
}

There is a chance, that kthread_stop() call will happen before
threadfn call. It means, that kthread_stop() return value must be checked everywhere,
isn't it? Otherwise, there are a lot of potential memory leaks,
because some developers rely on the fact, that data allocated for the thread will
be freed _inside_ thread function.

Vegard wrote the code snippet, which reproduces this behavior:

#include <linux/printk.h>
#include <linux/proc_fs.h>
#include <linux/kthread.h>

static int test_thread(void *data)
{
printk(KERN_ERR "test_thread()\n");
return 0;
}

static int test_show(struct seq_file *seq, void *data)
{
struct task_struct *t = kthread_run(test_thread, NULL, "test");
if (!IS_ERR(t)) {
int ret = kthread_stop(t);
printk(KERN_ERR "kthread_stop() = %d\n", ret);
}

return 0;
}

static void __init init_test(void)
{
proc_create_single("test", 0444, NULL, &test_show);
}

late_initcall(init_test);

So, is this behavior is expected or not? Should maintainers rewrite
code, which doesn't check kthread_stop() return value?


With regards,
Pavel Skripkin

2021-04-29 17:06:09

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix memory leak in ext4_fill_super

On Thu, Apr 29, 2021 at 02:33:54PM +0300, Pavel Skripkin wrote:
>
> There is a chance, that kthread_stop() call will happen before
> threadfn call. It means, that kthread_stop() return value must be checked everywhere,
> isn't it? Otherwise, there are a lot of potential memory leaks,
> because some developers rely on the fact, that data allocated for the thread will
> be freed _inside_ thread function.

That's not the only potential way that we could leak memory. Earlier
in kthread(), if this memory allocation fails,

self = kzalloc(sizeof(*self), GFP_KERNEL);

we will exit with -ENOMEM. So at the very least all callers of
kthread_stop() also need to check for -ENOMEM as well as -EINTR ---
or, be somehow sure that the thread function was successfully called
and started. In this particular case, the ext4 mount code had just
started the kmmpd thread, and then detected that something else had
gone wrong, and failed the mount before the kmmpd thread ever had a
chance to run.

I think if we want to fix this more generally across the whole kernel,
we would need to have a variant of kthread_run which supplies two
functions --- one which is the thread function, and the other which is
a cleanup function. The cleanup function could just be kfree, but
there will be other cases where the cleanup function will need to do
other work before freeing the data structure (e.g., brelse((struct
mmpd_data *)data->bh)).

Is it worth it to provide such a cleanup function, which if present
would be called any time the thread exits or is killed? I dunno.
It's probably simpler to just strongly recommend that the cleanup work
should never be done in the thread function, but after kthread_stop()
is called, whether it returns an error or not. That's probably the
right fix for ext4, I think.

(Although note that kthread_stop(sbi->s_mmp_task) is called in
multiple places in fs/ext4/super.c, not just in the single location
which this patch touches.)

- Ted

2021-04-29 20:40:39

by Pavel Skripkin

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix memory leak in ext4_fill_super

On Thu, 29 Apr 2021 13:05:01 -0400
"Theodore Ts'o" <[email protected]> wrote:

> On Thu, Apr 29, 2021 at 02:33:54PM +0300, Pavel Skripkin wrote:
> >
> > There is a chance, that kthread_stop() call will happen before
> > threadfn call. It means, that kthread_stop() return value must be
> > checked everywhere, isn't it? Otherwise, there are a lot of
> > potential memory leaks, because some developers rely on the fact,
> > that data allocated for the thread will be freed _inside_ thread
> > function.
>
> That's not the only potential way that we could leak memory. Earlier
> in kthread(), if this memory allocation fails,
>
> self = kzalloc(sizeof(*self), GFP_KERNEL);
>
> we will exit with -ENOMEM. So at the very least all callers of
> kthread_stop() also need to check for -ENOMEM as well as -EINTR ---
> or, be somehow sure that the thread function was successfully called
> and started. In this particular case, the ext4 mount code had just
> started the kmmpd thread, and then detected that something else had
> gone wrong, and failed the mount before the kmmpd thread ever had a
> chance to run.

There is a small problem about -ENOMEM:

static int kmmpd(void *data)
{
...
retval = read_mmp_block(sb, &bh_check, mmp_block);
if (retval) {
ext4_error_err(sb, -retval,
"error reading MMP data: %d",
retval);
goto exit_thread;
}
...

exit_thread:
EXT4_SB(sb)->s_mmp_tsk = NULL;
kfree(data);
brelse(bh);
return retval;
}

read_mmp_block can return -ENOMEM. In this case double free will happen.
I believe, we can change `return retval;` to `return 0;`, because
kthread return value isn't checked anywhere.

What do You think about it?


With regards,
Pavel Skripkin

2021-04-29 21:41:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix memory leak in ext4_fill_super

On Thu, Apr 29, 2021 at 11:09:56PM +0300, Pavel Skripkin wrote:
> > we will exit with -ENOMEM. So at the very least all callers of
> > kthread_stop() also need to check for -ENOMEM as well as -EINTR ---
> > or, be somehow sure that the thread function was successfully called
> > and started. In this particular case, the ext4 mount code had just
> > started the kmmpd thread, and then detected that something else had
> > gone wrong, and failed the mount before the kmmpd thread ever had a
> > chance to run.
>
> There is a small problem about -ENOMEM...

What I'd suggest is that we simply move

> exit_thread:
> EXT4_SB(sb)->s_mmp_tsk = NULL;
> kfree(data);
> brelse(bh);
> return retval;
> }

out of the thread function. That means hanging struct mmpd_data off
the struct ext4_sb_info structure, and then adding a function like
this to fs/ext4/mmp.c

static void ext4_stop_mmpd(struct ext4_sb_info *sbi)
{
if (sbi->s_mmp_tsk) {
kthread_stop(sbi->s_mmp_tsk);
brelse(sbi->s_mmp_data->bh);
kfree(sbi->s_mmp_data);
sbi->s_mmp_data = NULL;
sbi->s_mmp_tsk = NULL;
}
}

Basically, just move all of the cleanup so it is done after the
kthread is stopped, so we don't have to do any fancy error checking.
We just do it unconditionally.

- Ted

P.S. Actually, we could drop the struct mmpd_data altogether, and
just hang the buffer head as sbi->s_mmp_bh. Then we can just pass the
struct super * as the private pointer to kmmpd().

2021-04-29 22:07:08

by Pavel Skripkin

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix memory leak in ext4_fill_super

On Thu, 29 Apr 2021 17:41:12 -0400
"Theodore Ts'o" <[email protected]> wrote:

> On Thu, Apr 29, 2021 at 11:09:56PM +0300, Pavel Skripkin wrote:
> > > we will exit with -ENOMEM. So at the very least all callers of
> > > kthread_stop() also need to check for -ENOMEM as well as -EINTR
> > > --- or, be somehow sure that the thread function was successfully
> > > called and started. In this particular case, the ext4 mount code
> > > had just started the kmmpd thread, and then detected that
> > > something else had gone wrong, and failed the mount before the
> > > kmmpd thread ever had a chance to run.
> >
> > There is a small problem about -ENOMEM...
>
> What I'd suggest is that we simply move
>
> > exit_thread:
> > EXT4_SB(sb)->s_mmp_tsk = NULL;
> > kfree(data);
> > brelse(bh);
> > return retval;
> > }
>
> out of the thread function. That means hanging struct mmpd_data off
> the struct ext4_sb_info structure, and then adding a function like
> this to fs/ext4/mmp.c
>
> static void ext4_stop_mmpd(struct ext4_sb_info *sbi)
> {
> if (sbi->s_mmp_tsk) {
> kthread_stop(sbi->s_mmp_tsk);
> brelse(sbi->s_mmp_data->bh);
> kfree(sbi->s_mmp_data);
> sbi->s_mmp_data = NULL;
> sbi->s_mmp_tsk = NULL;
> }
> }
>
> Basically, just move all of the cleanup so it is done after the
> kthread is stopped, so we don't have to do any fancy error checking.
> We just do it unconditionally.

This sound much better than my idea. Will do it in v2.

Thanks!

>
> - Ted
>
> P.S. Actually, we could drop the struct mmpd_data altogether, and
> just hang the buffer head as sbi->s_mmp_bh. Then we can just pass the
> struct super * as the private pointer to kmmpd().
>



With regards,
Pavel Skripkin

2021-04-30 03:45:40

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix memory leak in ext4_fill_super

On Fri, Apr 30, 2021 at 01:05:47AM +0300, Pavel Skripkin wrote:
> > out of the thread function. That means hanging struct mmpd_data off
> > the struct ext4_sb_info structure, and then adding a function like
> > this to fs/ext4/mmp.c
> >
> > static void ext4_stop_mmpd(struct ext4_sb_info *sbi)

That should "extern void ...", since it will be called from
fs/ext4/super.c. I had originally was thinking to put this function
in fs/ext4/super.c, but from the perspective of keeping the MMP code
in the same source file, it probably makes sense to keep this function
with the other MMP functions.

> > {
> > if (sbi->s_mmp_tsk) {
> > kthread_stop(sbi->s_mmp_tsk);
> > brelse(sbi->s_mmp_data->bh);
> > kfree(sbi->s_mmp_data);
> > sbi->s_mmp_data = NULL;
> > sbi->s_mmp_tsk = NULL;
> > }
> > }
> >
> > Basically, just move all of the cleanup so it is done after the
> > kthread is stopped, so we don't have to do any fancy error checking.
> > We just do it unconditionally.

Cheers,

- Ted

2021-04-30 18:51:30

by Pavel Skripkin

[permalink] [raw]
Subject: [PATCH v2] ext4: fix memory leak in ext4_fill_super

static int kthread(void *_create) will return -ENOMEM
or -EINTR in case of internal failure or
kthread_stop() call happens before threadfn call.

To prevent fancy error checking and make code
more straightforward we moved all cleanup code out
of kmmpd threadfn.

Also, dropped struct mmpd_data at all. Now struct super_block
is a threadfn data and struct buffer_head embedded into
struct ext4_sb_info.

Reported-by: [email protected]
Signed-off-by: Pavel Skripkin <[email protected]>
---
fs/ext4/ext4.h | 4 ++++
fs/ext4/mmp.c | 28 +++++++++++++---------------
fs/ext4/super.c | 10 ++++------
3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 826a56e3bbd2..62210cbea84b 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1490,6 +1490,7 @@ struct ext4_sb_info {
struct kobject s_kobj;
struct completion s_kobj_unregister;
struct super_block *s_sb;
+ struct buffer_head *s_mmp_bh;

/* Journaling */
struct journal_s *s_journal;
@@ -3663,6 +3664,9 @@ extern struct ext4_io_end_vec *ext4_last_io_end_vec(ext4_io_end_t *io_end);
/* mmp.c */
extern int ext4_multi_mount_protect(struct super_block *, ext4_fsblk_t);

+/* mmp.c */
+extern void ext4_stop_mmpd(struct ext4_sb_info *sbi);
+
/* verity.c */
extern const struct fsverity_operations ext4_verityops;

diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 795c3ff2907c..623bad399612 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -127,9 +127,9 @@ void __dump_mmp_msg(struct super_block *sb, struct mmp_struct *mmp,
*/
static int kmmpd(void *data)
{
- struct super_block *sb = ((struct mmpd_data *) data)->sb;
- struct buffer_head *bh = ((struct mmpd_data *) data)->bh;
+ struct super_block *sb = (struct super_block *) data;
struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+ struct buffer_head *bh = EXT4_SB(sb)->s_mmp_bh;
struct mmp_struct *mmp;
ext4_fsblk_t mmp_block;
u32 seq = 0;
@@ -245,12 +245,18 @@ static int kmmpd(void *data)
retval = write_mmp_block(sb, bh);

exit_thread:
- EXT4_SB(sb)->s_mmp_tsk = NULL;
- kfree(data);
- brelse(bh);
return retval;
}

+void ext4_stop_mmpd(struct ext4_sb_info *sbi)
+{
+ if (sbi->s_mmp_tsk) {
+ kthread_stop(sbi->s_mmp_tsk);
+ brelse(sbi->s_mmp_bh);
+ sbi->s_mmp_tsk = NULL;
+ }
+}
+
/*
* Get a random new sequence number but make sure it is not greater than
* EXT4_MMP_SEQ_MAX.
@@ -275,7 +281,6 @@ int ext4_multi_mount_protect(struct super_block *sb,
struct ext4_super_block *es = EXT4_SB(sb)->s_es;
struct buffer_head *bh = NULL;
struct mmp_struct *mmp = NULL;
- struct mmpd_data *mmpd_data;
u32 seq;
unsigned int mmp_check_interval = le16_to_cpu(es->s_mmp_update_interval);
unsigned int wait_time = 0;
@@ -364,24 +369,17 @@ int ext4_multi_mount_protect(struct super_block *sb,
goto failed;
}

- mmpd_data = kmalloc(sizeof(*mmpd_data), GFP_KERNEL);
- if (!mmpd_data) {
- ext4_warning(sb, "not enough memory for mmpd_data");
- goto failed;
- }
- mmpd_data->sb = sb;
- mmpd_data->bh = bh;
+ EXT4_SB(sb)->s_mmp_bh = bh;

/*
* Start a kernel thread to update the MMP block periodically.
*/
- EXT4_SB(sb)->s_mmp_tsk = kthread_run(kmmpd, mmpd_data, "kmmpd-%.*s",
+ EXT4_SB(sb)->s_mmp_tsk = kthread_run(kmmpd, sb, "kmmpd-%.*s",
(int)sizeof(mmp->mmp_bdevname),
bdevname(bh->b_bdev,
mmp->mmp_bdevname));
if (IS_ERR(EXT4_SB(sb)->s_mmp_tsk)) {
EXT4_SB(sb)->s_mmp_tsk = NULL;
- kfree(mmpd_data);
ext4_warning(sb, "Unable to create kmmpd thread for %s.",
sb->s_id);
goto failed;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b9693680463a..539f89c5431f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1244,8 +1244,8 @@ static void ext4_put_super(struct super_block *sb)
ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
sbi->s_ea_block_cache = NULL;

- if (sbi->s_mmp_tsk)
- kthread_stop(sbi->s_mmp_tsk);
+ ext4_stop_mmpd(sbi);
+
brelse(sbi->s_sbh);
sb->s_fs_info = NULL;
/*
@@ -5156,8 +5156,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
failed_mount3:
flush_work(&sbi->s_error_work);
del_timer_sync(&sbi->s_err_report);
- if (sbi->s_mmp_tsk)
- kthread_stop(sbi->s_mmp_tsk);
+ ext4_stop_mmpd(sbi);
failed_mount2:
rcu_read_lock();
group_desc = rcu_dereference(sbi->s_group_desc);
@@ -5952,8 +5951,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
*/
ext4_mark_recovery_complete(sb, es);
}
- if (sbi->s_mmp_tsk)
- kthread_stop(sbi->s_mmp_tsk);
+ ext4_stop_mmpd(sbi);
} else {
/* Make sure we can mount this feature set readwrite */
if (ext4_has_feature_readonly(sb) ||
--
2.31.1

2021-05-17 14:19:47

by Pavel Skripkin

[permalink] [raw]
Subject: Re: [PATCH v2] ext4: fix memory leak in ext4_fill_super

Hi!

Is all ok with this one, or I should send v3? :)

With regards,
Pavel Skripkin

On Fri, 30 Apr 2021 21:50:46 +0300
Pavel Skripkin <[email protected]> wrote:
> static int kthread(void *_create) will return -ENOMEM
> or -EINTR in case of internal failure or
> kthread_stop() call happens before threadfn call.
>
> To prevent fancy error checking and make code
> more straightforward we moved all cleanup code out
> of kmmpd threadfn.
>
> Also, dropped struct mmpd_data at all. Now struct super_block
> is a threadfn data and struct buffer_head embedded into
> struct ext4_sb_info.
>
> Reported-by: [email protected]
> Signed-off-by: Pavel Skripkin <[email protected]>
> ---
> fs/ext4/ext4.h | 4 ++++
> fs/ext4/mmp.c | 28 +++++++++++++---------------
> fs/ext4/super.c | 10 ++++------
> 3 files changed, 21 insertions(+), 21 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 826a56e3bbd2..62210cbea84b 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1490,6 +1490,7 @@ struct ext4_sb_info {
> struct kobject s_kobj;
> struct completion s_kobj_unregister;
> struct super_block *s_sb;
> + struct buffer_head *s_mmp_bh;
>
> /* Journaling */
> struct journal_s *s_journal;
> @@ -3663,6 +3664,9 @@ extern struct ext4_io_end_vec
> *ext4_last_io_end_vec(ext4_io_end_t *io_end); /* mmp.c */
> extern int ext4_multi_mount_protect(struct super_block *,
> ext4_fsblk_t);
> +/* mmp.c */
> +extern void ext4_stop_mmpd(struct ext4_sb_info *sbi);
> +
> /* verity.c */
> extern const struct fsverity_operations ext4_verityops;
>
> diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
> index 795c3ff2907c..623bad399612 100644
> --- a/fs/ext4/mmp.c
> +++ b/fs/ext4/mmp.c
> @@ -127,9 +127,9 @@ void __dump_mmp_msg(struct super_block *sb,
> struct mmp_struct *mmp, */
> static int kmmpd(void *data)
> {
> - struct super_block *sb = ((struct mmpd_data *) data)->sb;
> - struct buffer_head *bh = ((struct mmpd_data *) data)->bh;
> + struct super_block *sb = (struct super_block *) data;
> struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> + struct buffer_head *bh = EXT4_SB(sb)->s_mmp_bh;
> struct mmp_struct *mmp;
> ext4_fsblk_t mmp_block;
> u32 seq = 0;
> @@ -245,12 +245,18 @@ static int kmmpd(void *data)
> retval = write_mmp_block(sb, bh);
>
> exit_thread:
> - EXT4_SB(sb)->s_mmp_tsk = NULL;
> - kfree(data);
> - brelse(bh);
> return retval;
> }
>
> +void ext4_stop_mmpd(struct ext4_sb_info *sbi)
> +{
> + if (sbi->s_mmp_tsk) {
> + kthread_stop(sbi->s_mmp_tsk);
> + brelse(sbi->s_mmp_bh);
> + sbi->s_mmp_tsk = NULL;
> + }
> +}
> +
> /*
> * Get a random new sequence number but make sure it is not greater
> than
> * EXT4_MMP_SEQ_MAX.
> @@ -275,7 +281,6 @@ int ext4_multi_mount_protect(struct super_block
> *sb, struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> struct buffer_head *bh = NULL;
> struct mmp_struct *mmp = NULL;
> - struct mmpd_data *mmpd_data;
> u32 seq;
> unsigned int mmp_check_interval =
> le16_to_cpu(es->s_mmp_update_interval); unsigned int wait_time = 0;
> @@ -364,24 +369,17 @@ int ext4_multi_mount_protect(struct super_block
> *sb, goto failed;
> }
>
> - mmpd_data = kmalloc(sizeof(*mmpd_data), GFP_KERNEL);
> - if (!mmpd_data) {
> - ext4_warning(sb, "not enough memory for mmpd_data");
> - goto failed;
> - }
> - mmpd_data->sb = sb;
> - mmpd_data->bh = bh;
> + EXT4_SB(sb)->s_mmp_bh = bh;
>
> /*
> * Start a kernel thread to update the MMP block
> periodically. */
> - EXT4_SB(sb)->s_mmp_tsk = kthread_run(kmmpd, mmpd_data,
> "kmmpd-%.*s",
> + EXT4_SB(sb)->s_mmp_tsk = kthread_run(kmmpd, sb, "kmmpd-%.*s",
> (int)sizeof(mmp->mmp_bdevname),
> bdevname(bh->b_bdev,
> mmp->mmp_bdevname));
> if (IS_ERR(EXT4_SB(sb)->s_mmp_tsk)) {
> EXT4_SB(sb)->s_mmp_tsk = NULL;
> - kfree(mmpd_data);
> ext4_warning(sb, "Unable to create kmmpd thread for
> %s.", sb->s_id);
> goto failed;
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index b9693680463a..539f89c5431f 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1244,8 +1244,8 @@ static void ext4_put_super(struct super_block
> *sb) ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
> sbi->s_ea_block_cache = NULL;
>
> - if (sbi->s_mmp_tsk)
> - kthread_stop(sbi->s_mmp_tsk);
> + ext4_stop_mmpd(sbi);
> +
> brelse(sbi->s_sbh);
> sb->s_fs_info = NULL;
> /*
> @@ -5156,8 +5156,7 @@ static int ext4_fill_super(struct super_block
> *sb, void *data, int silent) failed_mount3:
> flush_work(&sbi->s_error_work);
> del_timer_sync(&sbi->s_err_report);
> - if (sbi->s_mmp_tsk)
> - kthread_stop(sbi->s_mmp_tsk);
> + ext4_stop_mmpd(sbi);
> failed_mount2:
> rcu_read_lock();
> group_desc = rcu_dereference(sbi->s_group_desc);
> @@ -5952,8 +5951,7 @@ static int ext4_remount(struct super_block *sb,
> int *flags, char *data) */
> ext4_mark_recovery_complete(sb, es);
> }
> - if (sbi->s_mmp_tsk)
> - kthread_stop(sbi->s_mmp_tsk);
> + ext4_stop_mmpd(sbi);
> } else {
> /* Make sure we can mount this feature set
> readwrite */ if (ext4_has_feature_readonly(sb) ||


2021-05-18 23:29:38

by Pavel Skripkin

[permalink] [raw]
Subject: Re: [PATCH v2] ext4: fix memory leak in ext4_fill_super

On Mon, 17 May 2021 16:40:34 +0300
Pavel Skripkin <[email protected]> wrote:
> Hi!
>
> Is all ok with this one, or I should send v3? :)
>

BTW, this patch fixes this bug as well
https://syzkaller.appspot.com/bug?id=e2765a883959fd094e6a1c40f3502114fa17c550


With regards,
Pavel Skripkin

> With regards,
> Pavel Skripkin
>
> On Fri, 30 Apr 2021 21:50:46 +0300
> Pavel Skripkin <[email protected]> wrote:
> > static int kthread(void *_create) will return -ENOMEM
> > or -EINTR in case of internal failure or
> > kthread_stop() call happens before threadfn call.
> >
> > To prevent fancy error checking and make code
> > more straightforward we moved all cleanup code out
> > of kmmpd threadfn.
> >
> > Also, dropped struct mmpd_data at all. Now struct super_block
> > is a threadfn data and struct buffer_head embedded into
> > struct ext4_sb_info.
> >
> > Reported-by: [email protected]
> > Signed-off-by: Pavel Skripkin <[email protected]>
> > ---
> > fs/ext4/ext4.h | 4 ++++
> > fs/ext4/mmp.c | 28 +++++++++++++---------------
> > fs/ext4/super.c | 10 ++++------
> > 3 files changed, 21 insertions(+), 21 deletions(-)
> >
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 826a56e3bbd2..62210cbea84b 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -1490,6 +1490,7 @@ struct ext4_sb_info {
> > struct kobject s_kobj;
> > struct completion s_kobj_unregister;
> > struct super_block *s_sb;
> > + struct buffer_head *s_mmp_bh;
> >
> > /* Journaling */
> > struct journal_s *s_journal;
> > @@ -3663,6 +3664,9 @@ extern struct ext4_io_end_vec
> > *ext4_last_io_end_vec(ext4_io_end_t *io_end); /* mmp.c */
> > extern int ext4_multi_mount_protect(struct super_block *,
> > ext4_fsblk_t);
> > +/* mmp.c */
> > +extern void ext4_stop_mmpd(struct ext4_sb_info *sbi);
> > +
> > /* verity.c */
> > extern const struct fsverity_operations ext4_verityops;
> >
> > diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
> > index 795c3ff2907c..623bad399612 100644
> > --- a/fs/ext4/mmp.c
> > +++ b/fs/ext4/mmp.c
> > @@ -127,9 +127,9 @@ void __dump_mmp_msg(struct super_block *sb,
> > struct mmp_struct *mmp, */
> > static int kmmpd(void *data)
> > {
> > - struct super_block *sb = ((struct mmpd_data *) data)->sb;
> > - struct buffer_head *bh = ((struct mmpd_data *) data)->bh;
> > + struct super_block *sb = (struct super_block *) data;
> > struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> > + struct buffer_head *bh = EXT4_SB(sb)->s_mmp_bh;
> > struct mmp_struct *mmp;
> > ext4_fsblk_t mmp_block;
> > u32 seq = 0;
> > @@ -245,12 +245,18 @@ static int kmmpd(void *data)
> > retval = write_mmp_block(sb, bh);
> >
> > exit_thread:
> > - EXT4_SB(sb)->s_mmp_tsk = NULL;
> > - kfree(data);
> > - brelse(bh);
> > return retval;
> > }
> >
> > +void ext4_stop_mmpd(struct ext4_sb_info *sbi)
> > +{
> > + if (sbi->s_mmp_tsk) {
> > + kthread_stop(sbi->s_mmp_tsk);
> > + brelse(sbi->s_mmp_bh);
> > + sbi->s_mmp_tsk = NULL;
> > + }
> > +}
> > +
> > /*
> > * Get a random new sequence number but make sure it is not greater
> > than
> > * EXT4_MMP_SEQ_MAX.
> > @@ -275,7 +281,6 @@ int ext4_multi_mount_protect(struct super_block
> > *sb, struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> > struct buffer_head *bh = NULL;
> > struct mmp_struct *mmp = NULL;
> > - struct mmpd_data *mmpd_data;
> > u32 seq;
> > unsigned int mmp_check_interval =
> > le16_to_cpu(es->s_mmp_update_interval); unsigned int wait_time = 0;
> > @@ -364,24 +369,17 @@ int ext4_multi_mount_protect(struct
> > super_block *sb, goto failed;
> > }
> >
> > - mmpd_data = kmalloc(sizeof(*mmpd_data), GFP_KERNEL);
> > - if (!mmpd_data) {
> > - ext4_warning(sb, "not enough memory for
> > mmpd_data");
> > - goto failed;
> > - }
> > - mmpd_data->sb = sb;
> > - mmpd_data->bh = bh;
> > + EXT4_SB(sb)->s_mmp_bh = bh;
> >
> > /*
> > * Start a kernel thread to update the MMP block
> > periodically. */
> > - EXT4_SB(sb)->s_mmp_tsk = kthread_run(kmmpd, mmpd_data,
> > "kmmpd-%.*s",
> > + EXT4_SB(sb)->s_mmp_tsk = kthread_run(kmmpd, sb,
> > "kmmpd-%.*s", (int)sizeof(mmp->mmp_bdevname),
> > bdevname(bh->b_bdev,
> > mmp->mmp_bdevname));
> > if (IS_ERR(EXT4_SB(sb)->s_mmp_tsk)) {
> > EXT4_SB(sb)->s_mmp_tsk = NULL;
> > - kfree(mmpd_data);
> > ext4_warning(sb, "Unable to create kmmpd thread for
> > %s.", sb->s_id);
> > goto failed;
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index b9693680463a..539f89c5431f 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -1244,8 +1244,8 @@ static void ext4_put_super(struct super_block
> > *sb) ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
> > sbi->s_ea_block_cache = NULL;
> >
> > - if (sbi->s_mmp_tsk)
> > - kthread_stop(sbi->s_mmp_tsk);
> > + ext4_stop_mmpd(sbi);
> > +
> > brelse(sbi->s_sbh);
> > sb->s_fs_info = NULL;
> > /*
> > @@ -5156,8 +5156,7 @@ static int ext4_fill_super(struct super_block
> > *sb, void *data, int silent) failed_mount3:
> > flush_work(&sbi->s_error_work);
> > del_timer_sync(&sbi->s_err_report);
> > - if (sbi->s_mmp_tsk)
> > - kthread_stop(sbi->s_mmp_tsk);
> > + ext4_stop_mmpd(sbi);
> > failed_mount2:
> > rcu_read_lock();
> > group_desc = rcu_dereference(sbi->s_group_desc);
> > @@ -5952,8 +5951,7 @@ static int ext4_remount(struct super_block
> > *sb, int *flags, char *data) */
> > ext4_mark_recovery_complete(sb,
> > es); }
> > - if (sbi->s_mmp_tsk)
> > - kthread_stop(sbi->s_mmp_tsk);
> > + ext4_stop_mmpd(sbi);
> > } else {
> > /* Make sure we can mount this feature set
> > readwrite */ if (ext4_has_feature_readonly(sb) ||
>

2021-05-21 09:41:12

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix memory leak in ext4_fill_super

On Wed, Apr 28, 2021 at 10:19:28PM +0000, Alexey Makhalov wrote:
> I've recently discovered that doing infinite loop of
> systemctl start <ext4_on_lvm>.mount, and
> systemctl stop <ext4_on_lvm>.mount
> linearly increases percpu allocator memory consumption.
> In several hours, it might lead to system instability by
> consuming most of the memory.
>
> Bug is not reproducible when the ext4 filesystem is on
> physical partition, but it is persistent when ext4 is on
> logical volume.

Why is this the case? It sounds like we're looking a buffer for each
mount where the block size is not 1k. It shouldn't matter whether it
is a physical partition or not.

- Ted

2021-05-21 20:18:03

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix memory leak in ext4_fill_super

On Fri, May 21, 2021 at 12:43:46AM -0700, Alexey Makhalov wrote:
> Hi Ted,
>
> Good point! This paragraph can be just dropped as the next one
> describes the issue with superblock re-read. Will send v2 shortly.

Thanks; for what it's worth, I'm going to be editing the commit
description anyway; it's really helpful during the patch review to
understand how you found the problem, and how to reproduce it.
However, the commit description when it lands upstream will include a
link to this mail thread on lore.kernel.org, and so including a long
stack trace isn't really necessary.

I'm going to cut it down to something like this:

------------------------------
ext4: fix memory leak in ext4_fill_super

Buffer head references must be released before calling kill_bdev();
otherwise the buffer head (and its page referenced by b_data) will not
be freed by kill_bdev, and subsequently that bh will be leaked.

If blocksizes differ, sb_set_blocksize() will kill current buffers and
page cache by using kill_bdev(). And then super block will be reread
again but using correct blocksize this time. sb_set_blocksize() didn't
fully free superblock page and buffer head, and being busy, they were
not freed and instead leaked.

This can easily be reproduced by calling an infinite loop of:

systemctl start <ext4_on_lvm>.mount, and
systemctl stop <ext4_on_lvm>.mount

... since systemd creates a cgroup for each slice which it mounts, and
the bh leak get amplified by a dying memory cgroup that also never
gets freed, and memory consumption is much more easily noticed.

Signed-off-by: Alexey Makhalov <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Fixes: ce40733ce93d ("ext4: Check for return value from sb_set_blocksize")
Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3")
Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: Andreas Dilger <[email protected]>
------------------------------

The commit description above is 18 lines (exclusive of trailers and
headers) versus 71 lines, and is much more to the point for someone
who might be doing code archeology via "git log".

When submitting this as a patch, you can either use a separate cover
letter (git format-patch --cover-letter) or including the explanation
after the --- in the diff, so that it disappears before the commit is
added via "git am". But it's not that hard for me to rework a commit
description, so I'll take care of it for this patch; no need to send a
V3.

Cheers,

- Ted

2021-05-21 20:20:46

by Alexey Makhalov

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix memory leak in ext4_fill_super

Sounds good! Thanks for the guidance and v3) —Alexey

> On May 21, 2021, at 7:29 AM, Theodore Y. Ts'o <[email protected]> wrote:
>
> On Fri, May 21, 2021 at 12:43:46AM -0700, Alexey Makhalov wrote:
>> Hi Ted,
>>
>> Good point! This paragraph can be just dropped as the next one
>> describes the issue with superblock re-read. Will send v2 shortly.
>
> Thanks; for what it's worth, I'm going to be editing the commit
> description anyway; it's really helpful during the patch review to
> understand how you found the problem, and how to reproduce it.
> However, the commit description when it lands upstream will include a
> link to this mail thread on lore.kernel.org, and so including a long
> stack trace isn't really necessary.
>
> I'm going to cut it down to something like this:
>
> ------------------------------
> ext4: fix memory leak in ext4_fill_super
>
> Buffer head references must be released before calling kill_bdev();
> otherwise the buffer head (and its page referenced by b_data) will not
> be freed by kill_bdev, and subsequently that bh will be leaked.
>
> If blocksizes differ, sb_set_blocksize() will kill current buffers and
> page cache by using kill_bdev(). And then super block will be reread
> again but using correct blocksize this time. sb_set_blocksize() didn't
> fully free superblock page and buffer head, and being busy, they were
> not freed and instead leaked.
>
> This can easily be reproduced by calling an infinite loop of:
>
> systemctl start <ext4_on_lvm>.mount, and
> systemctl stop <ext4_on_lvm>.mount
>
> ... since systemd creates a cgroup for each slice which it mounts, and
> the bh leak get amplified by a dying memory cgroup that also never
> gets freed, and memory consumption is much more easily noticed.
>
> Signed-off-by: Alexey Makhalov <[email protected]>
> Cc: [email protected]
> Link: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F459B4724-842E-4B47-B2E7-D29805431E69%40vmware.com&amp;data=04%7C01%7Camakhalov%40vmware.com%7Ce5ae2270f1134d9a3cce08d91c64cb26%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637572041508854286%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=%2Fa%2FqTcBfL1tYkIq0byM46DXmpxTFOraAly2Ib9sbghE%3D&amp;reserved=0
> Fixes: ce40733ce93d ("ext4: Check for return value from sb_set_blocksize")
> Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3")
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> Cc: Andreas Dilger <[email protected]>
> ------------------------------
>
> The commit description above is 18 lines (exclusive of trailers and
> headers) versus 71 lines, and is much more to the point for someone
> who might be doing code archeology via "git log".
>
> When submitting this as a patch, you can either use a separate cover
> letter (git format-patch --cover-letter) or including the explanation
> after the --- in the diff, so that it disappears before the commit is
> added via "git am". But it's not that hard for me to rework a commit
> description, so I'll take care of it for this patch; no need to send a
> V3.
>
> Cheers,
>
> - Ted


Attachments:
signature.asc (849.00 B)
Message signed with OpenPGP

2021-06-17 02:45:04

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2] ext4: fix memory leak in ext4_fill_super

On Fri, Apr 30, 2021 at 09:50:46PM +0300, Pavel Skripkin wrote:
> static int kthread(void *_create) will return -ENOMEM
> or -EINTR in case of internal failure or
> kthread_stop() call happens before threadfn call.
>
> To prevent fancy error checking and make code
> more straightforward we moved all cleanup code out
> of kmmpd threadfn.
>
> Also, dropped struct mmpd_data at all. Now struct super_block
> is a threadfn data and struct buffer_head embedded into
> struct ext4_sb_info.
>
> Reported-by: [email protected]
> Signed-off-by: Pavel Skripkin <[email protected]>

Applied, thanks! (And apologies for the delay)

- Ted