2009-03-11 14:32:18

by Curt Wohlgemuth

[permalink] [raw]
Subject: Use of kmalloc vs vmalloc in ext4?

I've been running various tests of ext4 partitions lately, and have
found that with very low memory situations, I'm getting intermittent
mount failures due to ENOMEM from ext4_mb_init() and
ext4_fill_flex_info() . Here's a typical dmesg from the latter:

EXT4-fs: not enough memory for 8198 flex groups
EXT4-fs: unable to initialize flex_bg meta info!

This is from a kzalloc() call of size ~64k . I think the
ext4_mb_init() calls to kmalloc() and alloc_percpu() are even smaller.

I was wondering why all the code in ext4 (and ext[23], for that
matter) uses kmalloc() and friends instead of vmalloc(), at least
where it's safe; is it just for performance reasons?

I've seen the above errors when I do a mount -a, causing several
partitions to be mounted; I can usually mount the failed ones by hand
right afterwards, but this is a big difference for us, in our
environment, compared to, say, ext2 partitions.

Thanks,
Curt


2009-04-06 06:45:15

by Michael Rubin

[permalink] [raw]
Subject: Re: Use of kmalloc vs vmalloc in ext4?

Anyone have any comments? Or historical reasons? We operate with some
constrained memory situations, and were wondering if a patch to move
from kmalloc to vmalloc would be well received.

mrubin

On Wed, Mar 11, 2009 at 7:32 AM, Curt Wohlgemuth <[email protected]> wrote:
> I've been running various tests of ext4 partitions lately, and have
> found that with very low memory situations, I'm getting intermittent
> mount failures due to ENOMEM from ext4_mb_init() and
> ext4_fill_flex_info() . ?Here's a typical dmesg from the latter:
>
> ? ? ? ? ?EXT4-fs: not enough memory for 8198 flex groups
> ? ? ? ? ?EXT4-fs: unable to initialize flex_bg meta info!
>
> This is from a kzalloc() call of size ~64k . ?I think the
> ext4_mb_init() calls to kmalloc() and alloc_percpu() are even smaller.
>
> I was wondering why all the code in ext4 (and ext[23], for that
> matter) uses kmalloc() and friends instead of vmalloc(), at least
> where it's safe; is it just for performance reasons?
>
> I've seen the above errors when I do a mount -a, causing several
> partitions to be mounted; I can usually mount the failed ones by hand
> right afterwards, but this is a big difference for us, in our
> environment, compared to, say, ext2 partitions.
>
> Thanks,
> Curt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2009-04-06 09:33:20

by Andreas Dilger

[permalink] [raw]
Subject: Re: Use of kmalloc vs vmalloc in ext4?

On Apr 05, 2009 23:45 -0700, Michael Rubin wrote:
> Anyone have any comments? Or historical reasons? We operate with some
> constrained memory situations, and were wondering if a patch to move
> from kmalloc to vmalloc would be well received.

On 32-bit machines vmalloc space is tiny, and in all cases vmalloc
performance sucks, so traditionally very little kernel allocation
is done with vmalloc. For one-off allocations like per-fs it is
probably OK to change them to vmalloc.

> On Wed, Mar 11, 2009 at 7:32 AM, Curt Wohlgemuth <[email protected]> wrote:
> > I've been running various tests of ext4 partitions lately, and have
> > found that with very low memory situations, I'm getting intermittent
> > mount failures due to ENOMEM from ext4_mb_init() and
> > ext4_fill_flex_info() . ?Here's a typical dmesg from the latter:
> >
> > ? ? ? ? ?EXT4-fs: not enough memory for 8198 flex groups
> > ? ? ? ? ?EXT4-fs: unable to initialize flex_bg meta info!
> >
> > This is from a kzalloc() call of size ~64k . ?I think the
> > ext4_mb_init() calls to kmalloc() and alloc_percpu() are even smaller.
> >
> > I was wondering why all the code in ext4 (and ext[23], for that
> > matter) uses kmalloc() and friends instead of vmalloc(), at least
> > where it's safe; is it just for performance reasons?
> >
> > I've seen the above errors when I do a mount -a, causing several
> > partitions to be mounted; I can usually mount the failed ones by hand
> > right afterwards, but this is a big difference for us, in our
> > environment, compared to, say, ext2 partitions.
> >
> > Thanks,
> > Curt
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2009-04-25 03:01:04

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH] ext4: Fallback to vmalloc if kmalloc can't allocate s_flex_groups array

For very large filesystems, the s_flex_groups array can get quite big.
For example, a 16TB filesystem will have 8192 flex groups by default,
so the array is 96k, which is marginal for kmalloc(). On the other
hand, a 160GB filesystem will have 80 flex groups, so the array will
be 960 bytes. So we try to allocate the array first using kmalloc(),
and if that fails, we'll try to use vmalloc() instead.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/super.c | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2958f4e..0682fe0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1620,6 +1620,7 @@ static int ext4_fill_flex_info(struct super_block *sb)
ext4_group_t flex_group_count;
ext4_group_t flex_group;
int groups_per_flex = 0;
+ size_t size;
int i;

if (!sbi->s_es->s_log_groups_per_flex) {
@@ -1634,8 +1635,13 @@ static int ext4_fill_flex_info(struct super_block *sb)
flex_group_count = ((sbi->s_groups_count + groups_per_flex - 1) +
((le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks) + 1) <<
EXT4_DESC_PER_BLOCK_BITS(sb))) / groups_per_flex;
- sbi->s_flex_groups = kzalloc(flex_group_count *
- sizeof(struct flex_groups), GFP_KERNEL);
+ size = flex_group_count * sizeof(struct flex_groups);
+ sbi->s_flex_groups = kzalloc(size, GFP_KERNEL);
+ if (sbi->s_flex_groups == NULL) {
+ sbi->s_flex_groups = vmalloc(size);
+ if (sbi->s_flex_groups)
+ memset(sbi->s_flex_groups, 0, size);
+ }
if (sbi->s_flex_groups == NULL) {
printk(KERN_ERR "EXT4-fs: not enough memory for "
"%u flex groups\n", flex_group_count);
@@ -2849,7 +2855,10 @@ failed_mount3:
failed_mount2:
for (i = 0; i < db_count; i++)
brelse(sbi->s_group_desc[i]);
- kfree(sbi->s_group_desc);
+ if (is_vmalloc_addr(sbi->s_group_desc))
+ vfree(sbi->s_group_desc);
+ else
+ kfree(sbi->s_group_desc);
failed_mount:
if (sbi->s_proc) {
remove_proc_entry(sb->s_id, ext4_proc_root);
--
1.5.6.3


2009-04-25 03:07:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Use of kmalloc vs vmalloc in ext4?

On Sun, Apr 05, 2009 at 11:45:10PM -0700, Michael Rubin wrote:
> Anyone have any comments? Or historical reasons? We operate with some
> constrained memory situations, and were wondering if a patch to move
> from kmalloc to vmalloc would be well received.

Sorry, I didn't have time to get to this until now. I've been burning
the midnight oil getting the e2fsprogs 1.41.5 release out, as well as
catching up after the Collab Summit.

The best thing to do I think is to try using kmalloc(), and if it
fails, to fall back to vmalloc(). We can use is_vmalloc_addr() to
decide whether to use vfree() or kfree(), a trick which I picked up
from fs/ntfs/malloc.h. I'm surprised this trick isn't used more in
the kernel. Perhaps there should be made a general-purpose
infrastructure, but for now I'll just open-code it.

- Ted

2009-04-25 03:28:05

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fallback to vmalloc if kmalloc can't allocate s_flex_groups array

Theodore Ts'o wrote:
> For very large filesystems, the s_flex_groups array can get quite big.
> For example, a 16TB filesystem will have 8192 flex groups by default,
> so the array is 96k, which is marginal for kmalloc(). On the other
> hand, a 160GB filesystem will have 80 flex groups, so the array will
> be 960 bytes. So we try to allocate the array first using kmalloc(),
> and if that fails, we'll try to use vmalloc() instead.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/super.c | 15 ++++++++++++---
> 1 files changed, 12 insertions(+), 3 deletions(-)
>

...

> @@ -2849,7 +2855,10 @@ failed_mount3:
> failed_mount2:
> for (i = 0; i < db_count; i++)
> brelse(sbi->s_group_desc[i]);
> - kfree(sbi->s_group_desc);
> + if (is_vmalloc_addr(sbi->s_group_desc))
> + vfree(sbi->s_group_desc);
> + else
> + kfree(sbi->s_group_desc);
> failed_mount:
> if (sbi->s_proc) {
> remove_proc_entry(sb->s_id, ext4_proc_root);

er, won't you need the same vfree/kfree treatment in ext4_put_super? :)

-Eric

2009-04-25 03:40:03

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Use of kmalloc vs vmalloc in ext4?

P.S. What sort of flex_bg size are you using?

EXT4-fs: not enough memory for 8198 flex groups
EXT4-fs: unable to initialize flex_bg meta info!

Modern e2fsprogs default to using 16 block groups per flex_bg, which
means 8198 flex groups is a little over 16 TB --- which the mainline
e2fsprogs doesn't support yet. You wouldn't be using a smaller
flex_bg size for some reason, are you?

If so, can you say something about why? I've been actually thinking
that we might want to bump up the flex_bg size slight more, but it
appears you're using something smaller; was this deliberate.

- Ted

2009-04-25 03:57:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fallback to vmalloc if kmalloc can't allocate s_flex_groups array

On Fri, Apr 24, 2009 at 10:28:04PM -0500, Eric Sandeen wrote:
>
> er, won't you need the same vfree/kfree treatment in ext4_put_super? :)
>

Yeah, good catch. Time for me to go to bed... Here's the fixed
patch.

- Ted

>From 1fbe30e0fed504fd021c9d35998b4815f34be73e Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <[email protected]>
Date: Fri, 24 Apr 2009 23:57:15 -0400
Subject: [PATCH] ext4: Fallback to vmalloc if kmalloc can't allocate s_flex_groups array

For very large filesystems, the s_flex_groups array can get quite big.
For example, a 16TB filesystem will have 8192 flex groups by default,
so the array is 96k, which is *very* marginal for kmalloc(). On the
other hand, a 160GB filesystem will have 80 flex groups, so the array
will be 960 bytes. So we try to allocate the array first using
kmalloc(), and if that fails, we'll try to use vmalloc() instead.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/super.c | 21 ++++++++++++++++++---
1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2958f4e..f19d8b8 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -586,7 +586,10 @@ static void ext4_put_super(struct super_block *sb)
for (i = 0; i < sbi->s_gdb_count; i++)
brelse(sbi->s_group_desc[i]);
kfree(sbi->s_group_desc);
- kfree(sbi->s_flex_groups);
+ if (is_vmalloc_addr(sbi->s_flex_groups))
+ vfree(sbi->s_flex_groups);
+ else
+ kfree(sbi->s_flex_groups);
percpu_counter_destroy(&sbi->s_freeblocks_counter);
percpu_counter_destroy(&sbi->s_freeinodes_counter);
percpu_counter_destroy(&sbi->s_dirs_counter);
@@ -1620,6 +1623,7 @@ static int ext4_fill_flex_info(struct super_block *sb)
ext4_group_t flex_group_count;
ext4_group_t flex_group;
int groups_per_flex = 0;
+ size_t size;
int i;

if (!sbi->s_es->s_log_groups_per_flex) {
@@ -1634,8 +1638,13 @@ static int ext4_fill_flex_info(struct super_block *sb)
flex_group_count = ((sbi->s_groups_count + groups_per_flex - 1) +
((le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks) + 1) <<
EXT4_DESC_PER_BLOCK_BITS(sb))) / groups_per_flex;
- sbi->s_flex_groups = kzalloc(flex_group_count *
- sizeof(struct flex_groups), GFP_KERNEL);
+ size = flex_group_count * sizeof(struct flex_groups);
+ sbi->s_flex_groups = kzalloc(size, GFP_KERNEL);
+ if (sbi->s_flex_groups == NULL) {
+ sbi->s_flex_groups = vmalloc(size);
+ if (sbi->s_flex_groups)
+ memset(sbi->s_flex_groups, 0, size);
+ }
if (sbi->s_flex_groups == NULL) {
printk(KERN_ERR "EXT4-fs: not enough memory for "
"%u flex groups\n", flex_group_count);
@@ -2842,6 +2851,12 @@ failed_mount4:
sbi->s_journal = NULL;
}
failed_mount3:
+ if (sbi->s_flex_groups) {
+ if (is_vmalloc_addr(sbi->s_flex_groups))
+ vfree(sbi->s_flex_groups);
+ else
+ kfree(sbi->s_flex_groups);
+ }
percpu_counter_destroy(&sbi->s_freeblocks_counter);
percpu_counter_destroy(&sbi->s_freeinodes_counter);
percpu_counter_destroy(&sbi->s_dirs_counter);
--
1.5.6.3


2009-04-26 02:12:27

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Use of kmalloc vs vmalloc in ext4?

On Fri, Apr 24, 2009 at 11:39:55PM -0400, Theodore Tso wrote:
> P.S. What sort of flex_bg size are you using?
>
> EXT4-fs: not enough memory for 8198 flex groups
> EXT4-fs: unable to initialize flex_bg meta info!
>
> Modern e2fsprogs default to using 16 block groups per flex_bg, which
> means 8198 flex groups is a little over 16 TB --- which the mainline
> e2fsprogs doesn't support yet. You wouldn't be using a smaller
> flex_bg size for some reason, are you?

Oh, never mind. I didn't realize this last night, but we allocate
sbi->s_flex_counts so it is big enough in case the filesystem gets
resized to the maximum size. So that's why it was trying to allocate
that many flex groups. On the other hand, it means that it's much
more likely for us to need the extra memory.

- Ted