2016-01-07 00:46:47

by lokesh jaliminche

[permalink] [raw]
Subject: [PATCH v4] EXT4: optimizing group search for inode allocation

you are right flex groups cannot be completely empty as some
blocks are used for inode table and allocation bitmaps. I
missed that point. I have corrected that.Also I have tested
and validated this patch by putting some traces it works!
Here are corresponding logs:
commands:
=========
mkfs.ext4 -g 10000 -G 2 /dev/sdc
mount -t ext4 /dev/sdc /mnt/test_ext4/
cd /mnt/test_ext4
mkdir 1
cd 1

logs:
=====
Jan 7 03:36:45 root kernel: [ 64.792939] max_usable_blocks_per_flex_group = [19692]
Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_clusters = [19264]
Jan 7 03:36:45 root kernel: [ 64.792939] Inodes_per_flex_group = [4864]
Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_inodes = [4853]
Jan 7 03:36:45 root kernel: [ 64.792950] max_usable_blocks_per_flex_group = [19692]
Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_clusters = [19481]
Jan 7 03:36:45 root kernel: [ 64.792950] Inodes_per_flex_group = [4864]
Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_inodes = [4864]
Jan 7 03:36:45 root kernel: [ 64.792958] max_usable_blocks_per_flex_group = [19692]
Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_clusters = [19481]
Jan 7 03:36:45 root kernel: [ 64.792958] Inodes_per_flex_group = [4864]
Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_inodes = [4864]
Jan 7 03:36:45 root kernel: [ 64.792965] max_usable_blocks_per_flex_group = [19692]
Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_clusters = [19481]
Jan 7 03:36:45 root kernel: [ 64.792965] Inodes_per_flex_group = [4864]
Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_inodes = [4864]
Jan 7 03:36:45 root kernel: [ 64.792972] max_usable_blocks_per_flex_group = [19692]
Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_clusters = [19481]
Jan 7 03:36:45 root kernel: [ 64.792972] Inodes_per_flex_group = [4864]
Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_inodes = [4864]
Jan 7 03:36:45 root kernel: [ 64.792979] max_usable_blocks_per_flex_group = [19692]<<<<<<<<<<<
Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_clusters = [19692]<<<<<<<<<<<
Jan 7 03:36:45 root kernel: [ 64.792979] Inodes_per_flex_group = [4864]<<<<<<<<<<<<
Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_inodes = [4864]<<<<<<<<<<<<
Jan 7 03:36:45 root kernel: [ 64.792985] found_flex_bg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>condition satisfied


[Patch v4]:
==========
From: Lokesh Nagappa Jaliminche <[email protected]>
Date: Thu, 7 Jan 2016 05:12:20 +0530
Subject: [PATCH] ext4: optimizing group serch for inode allocation

Added a check at the start of group search loop to
avoid looping unecessarily in case of empty group.
This also allow group search to jump directly to
"found_flex_bg" with "stats" and "group" already set,
so there is no need to go through the extra steps of
setting "best_desc" , "best_group" and then break
out of the loop just to set "stats" and "group" again.

Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
---
fs/ext4/ext4.h | 2 ++
fs/ext4/ialloc.c | 21 +++++++++++++++++++--
2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index cc7ca4e..fc48f9a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -326,6 +326,8 @@ struct flex_groups {
#define EXT4_MAX_DESC_SIZE EXT4_MIN_BLOCK_SIZE
#define EXT4_DESC_SIZE(s) (EXT4_SB(s)->s_desc_size)
#ifdef __KERNEL__
+# define EXT4_INODE_TABLE_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_itb_per_group)
+# define EXT4_BLOCKS_PER_CLUSTER(s) (EXT4_SB(s)->s_cluster_ratio)
# define EXT4_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_blocks_per_group)
# define EXT4_CLUSTERS_PER_GROUP(s) (EXT4_SB(s)->s_clusters_per_group)
# define EXT4_DESC_PER_BLOCK(s) (EXT4_SB(s)->s_desc_per_block)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1b8024d..dc66ad8 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -463,7 +463,6 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
sbi->s_log_groups_per_flex;
parent_group >>= sbi->s_log_groups_per_flex;
}
-
freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
avefreei = freei / ngroups;
freeb = EXT4_C2B(sbi,
@@ -477,7 +476,20 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
(ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
int best_ndir = inodes_per_group;
int ret = -1;
-
+ /* blocks used for inode table and allocation bitmaps */
+ unsigned int max_usable_blocks_per_flex_group;
+ unsigned int blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
+ unsigned int inode_table_blocks_per_group = \
+ EXT4_INODE_TABLE_BLOCKS_PER_GROUP(sb);
+ /* Number of blocks per cluster */
+ unsigned int cluster_ratio = EXT4_BLOCKS_PER_CLUSTER(sb);
+ unsigned int inodes_per_flex_group = inodes_per_group * \
+ flex_size;
+ max_usable_blocks_per_flex_group = \
+ ((blocks_per_group*flex_size) - \
+ (inode_table_blocks_per_group * \
+ flex_size + 2 * flex_size)) / \
+ cluster_ratio;
if (qstr) {
hinfo.hash_version = DX_HASH_HALF_MD4;
hinfo.seed = sbi->s_hash_seed;
@@ -489,6 +501,11 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
for (i = 0; i < ngroups; i++) {
g = (parent_group + i) % ngroups;
get_orlov_stats(sb, g, flex_size, &stats);
+ /* can't get better group than empty group */
+ if (max_usable_blocks_per_flex_group == \
+ stats.free_clusters && \
+ inodes_per_flex_group == stats.free_inodes)
+ goto found_flex_bg;
if (!stats.free_inodes)
continue;
if (stats.used_dirs >= best_ndir)
--
1.7.1

Regards,
Lokesh

On Sun, Jan 03, 2016 at 11:34:51AM -0700, Andreas Dilger wrote:
> On Dec 26, 2015, at 01:41, Lokesh Jaliminche <[email protected]> wrote:
> >
> >
> > From 5199982d29ff291b181c25af63a22d6f2d6d3f7b Mon Sep 17 00:00:00 2001
> > From: Lokesh Nagappa Jaliminche <[email protected]>
> > Date: Sat, 26 Dec 2015 13:19:36 +0530
> > Subject: [PATCH v3] ext4: optimizing group search for inode allocation
> >
> > Added a check at the start of group search loop to
> > avoid looping unecessarily in case of empty group.
> > This also allow group search to jump directly to
> > "found_flex_bg" with "stats" and "group" already set,
> > so there is no need to go through the extra steps of
> > setting "best_desc" and "best_group" and then break
> > out of the loop just to set "stats" and "group" again.
> >
> > Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
> > ---
> > fs/ext4/ialloc.c | 12 ++++++++++--
> > 1 files changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> > index 1b8024d..16f94db 100644
> > --- a/fs/ext4/ialloc.c
> > +++ b/fs/ext4/ialloc.c
> > @@ -473,10 +473,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
> >
> > if (S_ISDIR(mode) &&
> > - ((parent == d_inode(sb->s_root)) ||
> > - (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> > + ((parent == d_inode(sb->s_root)) ||
> > + (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
>
> I don't understand why you are changing the indentation here?
>
> > + unsigned int inodes_per_flex_group;
> > + unsigned long int clusters_per_flex_group;
> > int best_ndir = inodes_per_group;
> > int ret = -1;
> > + inodes_per_flex_group = inodes_per_group * flex_size;
> > + clusters_per_flex_group = sbi->s_clusters_per_group * flex_size;
>
> Have you actually tested if this code works? When I was looking at this
> code I was accidentally looking at the ext2 version, which only deals with
> individual groups, and not flex groups. I don't think that a flex group
> can actually be completely empty, since it will always contain at least
> an inode table and some bitmaps.
>
> This calculation needs to take this into account or it will just be
> overhead and not an optimization. Something like:
>
> /* account for inode table and allocation bitmaps */
> clusters_per_flex_group = sbi->s_clusters_per_group * flex_size -
> ((inodes_per_flex_group * EXT4_INODE_SIZE(sb) +
> (flex_size << sb->s_blocksize_bits) * 2 ) >>
> EXT4_SB(sb)->s_cluster_bits;
>
> It would be worthwhile to print this out and verify that there are actually
> groups with this many free blocks in a newly formatted filesystem.
>
> Cheers, Andreas
>
> > if (qstr) {
> > hinfo.hash_version = DX_HASH_HALF_MD4;
> > @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > for (i = 0; i < ngroups; i++) {
> > g = (parent_group + i) % ngroups;
> > get_orlov_stats(sb, g, flex_size, &stats);
> > + /* the group can't get any better than empty */
> > + if (inodes_per_flex_group == stats.free_inodes &&
> > + clusters_per_flex_group == stats.free_clusters)
> > + goto found_flex_bg;
> > if (!stats.free_inodes)
> > continue;
> > if (stats.used_dirs >= best_ndir)
> > --
> > 1.7.1
> >
> >
> >> On Mon, Dec 21, 2015 at 10:27:37AM -0700, Andreas Dilger wrote:
> >>> On Dec 18, 2015, at 4:32 PM, lokesh jaliminche <[email protected]> wrote:
> >>>
> >>> From 9e09fef78b2fa552c883bf8124af873abfde0805 Mon Sep 17 00:00:00 2001
> >>> From: Lokesh Nagappa Jaliminche <[email protected]>
> >>
> >> In the patch summary there is a typo - s/serch/search/
> >>
> >> Also, it appears your email client has mangled the whitespace in the
> >> patch. Please use "git send-email" to send your patch to the list:
> >>
> >> git send-email --to [email protected] --cc [email protected] HEAD~1
> >>
> >> so that it arrives intact. You probably need to set it up in ~/.gitconfig:
> >>
> >> [sendemail]
> >> confirm = compose
> >> smtpdomain = {your client hostname}
> >> smtpserver = {your SMTP server hostname}
> >>
> >> Cheers, Andreas
> >>
> >>> Added a check at the start of group search loop to
> >>> avoid looping unecessarily in case of empty group.
> >>> This also allow group search to jump directly to
> >>> "found_flex_bg" with "stats" and "group" already set,
> >>> so there is no need to go through the extra steps of
> >>> setting "best_desc" and "best_group" and then break
> >>> out of the loop just to set "stats" and "group" again.
> >>>
> >>> Signed-off-by: Lokesh N Jaliminche <[email protected]>
> >>> ---
> >>> fs/ext4/ialloc.c | 8 ++++++++
> >>> 1 files changed, 8 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> >>> index 1b8024d..588bf8e 100644
> >>> --- a/fs/ext4/ialloc.c
> >>> +++ b/fs/ext4/ialloc.c
> >>> @@ -446,6 +446,8 @@ static int find_group_orlov(struct super_block
> >>> *sb, struct inode *parent,
> >>> struct ext4_sb_info *sbi = EXT4_SB(sb);
> >>> ext4_group_t real_ngroups = ext4_get_groups_count(sb);
> >>> int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
> >>> + unsigned int inodes_per_flex_group;
> >>> + long unsigned int blocks_per_clustre;
> >>> unsigned int freei, avefreei, grp_free;
> >>> ext4_fsblk_t freeb, avefreec;
> >>> unsigned int ndirs;
> >>> @@ -470,6 +472,8 @@ static int find_group_orlov(struct super_block
> >>> *sb, struct inode *parent,
> >>> percpu_counter_read_positive(&sbi->s_freeclusters_counter));
> >>> avefreec = freeb;
> >>> do_div(avefreec, ngroups);
> >>> + inodes_per_flex_group = inodes_per_group * flex_size;
> >>> + blocks_per_clustre = sbi->s_blocks_per_group * flex_size;
> >>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
> >>>
> >>> if (S_ISDIR(mode) &&
> >>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block
> >>> *sb, struct inode *parent,
> >>> for (i = 0; i < ngroups; i++) {
> >>> g = (parent_group + i) % ngroups;
> >>> get_orlov_stats(sb, g, flex_size, &stats);
> >>> + /* the group can't get any better than empty */
> >>> + if (inodes_per_flex_group == stats.free_inodes &&
> >>> + blocks_per_clustre == stats.free_clusters)
> >>> + goto found_flex_bg;
> >>> if (!stats.free_inodes)
> >>> continue;
> >>> if (stats.used_dirs >= best_ndir)
> >>> --
> >>> 1.7.1
> >>> <0001-EXT4-optimizing-group-serch-for-inode-allocation.patch>
> >>
> >>
> >> Cheers, Andreas
> >
> >
> > <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>


Attachments:
(No filename) (13.63 kB)
0001-ext4-optimizing-group-serch-for-inode-allocation.patch (3.22 kB)
Download all attachments

2016-01-11 21:13:53

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH v4] EXT4: optimizing group search for inode allocation

On Jan 6, 2016, at 5:46 PM, Lokesh Jaliminche <[email protected]> wrote:
>
> you are right flex groups cannot be completely empty as some
> blocks are used for inode table and allocation bitmaps. I
> missed that point. I have corrected that. Also I have tested
> and validated this patch by putting some traces it works!

Good news. Thanks for updating the patch and testing it.
More comments below.

> Here are corresponding logs:
> commands:
> =========
> mkfs.ext4 -g 10000 -G 2 /dev/sdc
> mount -t ext4 /dev/sdc /mnt/test_ext4/
> cd /mnt/test_ext4
> mkdir 1
> cd 1
>
> logs:
> =====
> Jan 7 03:36:45 root kernel: [ 64.792939] max_usable_blocks_per_flex_group = [19692]
> Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_clusters = [19264]
> Jan 7 03:36:45 root kernel: [ 64.792939] Inodes_per_flex_group = [4864]
> Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_inodes = [4853]
> Jan 7 03:36:45 root kernel: [ 64.792950] max_usable_blocks_per_flex_group = [19692]
> Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_clusters = [19481]
> Jan 7 03:36:45 root kernel: [ 64.792950] Inodes_per_flex_group = [4864]
> Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_inodes = [4864]
> Jan 7 03:36:45 root kernel: [ 64.792958] max_usable_blocks_per_flex_group = [19692]
> Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_clusters = [19481]
> Jan 7 03:36:45 root kernel: [ 64.792958] Inodes_per_flex_group = [4864]
> Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_inodes = [4864]
> Jan 7 03:36:45 root kernel: [ 64.792965] max_usable_blocks_per_flex_group = [19692]
> Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_clusters = [19481]
> Jan 7 03:36:45 root kernel: [ 64.792965] Inodes_per_flex_group = [4864]
> Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_inodes = [4864]
> Jan 7 03:36:45 root kernel: [ 64.792972] max_usable_blocks_per_flex_group = [19692]
> Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_clusters = [19481]
> Jan 7 03:36:45 root kernel: [ 64.792972] Inodes_per_flex_group = [4864]
> Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_inodes = [4864]
> Jan 7 03:36:45 root kernel: [ 64.792979] max_usable_blocks_per_flex_group = [19692]<<<<<<<<<<<
> Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_clusters = [19692]<<<<<<<<<<<
> Jan 7 03:36:45 root kernel: [ 64.792979] Inodes_per_flex_group = [4864]<<<<<<<<<<<<
> Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_inodes = [4864]<<<<<<<<<<<<
> Jan 7 03:36:45 root kernel: [ 64.792985] found_flex_bg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>condition satisfied
>
>
> [Patch v4]:
> ==========
> From: Lokesh Nagappa Jaliminche <[email protected]>
> Date: Thu, 7 Jan 2016 05:12:20 +0530
> Subject: [PATCH] ext4: optimizing group serch for inode allocation
>
> Added a check at the start of group search loop to
> avoid looping unecessarily in case of empty group.
> This also allow group search to jump directly to
> "found_flex_bg" with "stats" and "group" already set,
> so there is no need to go through the extra steps of
> setting "best_desc" , "best_group" and then break
> out of the loop just to set "stats" and "group" again.
>
> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
> ---
> fs/ext4/ext4.h | 2 ++
> fs/ext4/ialloc.c | 21 +++++++++++++++++++--
> 2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index cc7ca4e..fc48f9a 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -326,6 +326,8 @@ struct flex_groups {
> #define EXT4_MAX_DESC_SIZE EXT4_MIN_BLOCK_SIZE
> #define EXT4_DESC_SIZE(s) (EXT4_SB(s)->s_desc_size)
> #ifdef __KERNEL__
> +# define EXT4_INODE_TABLE_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_itb_per_group)
> +# define EXT4_BLOCKS_PER_CLUSTER(s) (EXT4_SB(s)->s_cluster_ratio)

I don't think there is much value to these macros. They aren't much
shorter than the actual variable access, and they don't provide any
improved clarity for the reader.

I think they were introduced many years ago so that this header could
be shared between the kernel and e2fsprogs, but that is no longer true
so I don't think there is a need to add new ones.

Ted, any opinion on this?

> # define EXT4_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_blocks_per_group)
> # define EXT4_CLUSTERS_PER_GROUP(s) (EXT4_SB(s)->s_clusters_per_group)
> # define EXT4_DESC_PER_BLOCK(s) (EXT4_SB(s)->s_desc_per_block)
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 1b8024d..dc66ad8 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -463,7 +463,6 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> sbi->s_log_groups_per_flex;
> parent_group >>= sbi->s_log_groups_per_flex;
> }
> -
> freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
> avefreei = freei / ngroups;

Not sure why this line is being deleted?

> @@ -477,7 +476,20 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> int best_ndir = inodes_per_group;
> int ret = -1;
> -
> + /* blocks used for inode table and allocation bitmaps */
> + unsigned int max_usable_blocks_per_flex_group;

This could be shorter and still clear, like "max_blocks_per_flex".

> + unsigned int blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
> + unsigned int inode_table_blocks_per_group = \
> + EXT4_INODE_TABLE_BLOCKS_PER_GROUP(sb);
> + /* Number of blocks per cluster */
> + unsigned int cluster_ratio = EXT4_BLOCKS_PER_CLUSTER(sb);

There also isn't much value to making local variables that hold these
same values again. It increases stack space for little benefit. They
are only accessed once anyway.

> + unsigned int inodes_per_flex_group = inodes_per_group * \
> + flex_size;
> + max_usable_blocks_per_flex_group = \
> + ((blocks_per_group*flex_size) - \

Spaces around that '*'.

> + (inode_table_blocks_per_group * \
> + flex_size + 2 * flex_size)) / \

This is C and not a CPP macro, so no need for linefeed escapes.

> + cluster_ratio;

This should be ">> EXT4_SB(sb)->s_cluster_bits" to avoid the divide.

> if (qstr) {
> hinfo.hash_version = DX_HASH_HALF_MD4;
> hinfo.seed = sbi->s_hash_seed;
> @@ -489,6 +501,11 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> for (i = 0; i < ngroups; i++) {
> g = (parent_group + i) % ngroups;
> get_orlov_stats(sb, g, flex_size, &stats);
> + /* can't get better group than empty group */
> + if (max_usable_blocks_per_flex_group == \
> + stats.free_clusters && \

No need for linefeed escapes.

> + inodes_per_flex_group == stats.free_inodes)

Align continued line after '(' from the "if (" line.

> + goto found_flex_bg;

This is indented too much. It will work properly when you fix the
previous line.

> if (!stats.free_inodes)
> continue;
> if (stats.used_dirs >= best_ndir)
> --
> 1.7.1
>
> Regards,
> Lokesh
>
> On Sun, Jan 03, 2016 at 11:34:51AM -0700, Andreas Dilger wrote:
>> On Dec 26, 2015, at 01:41, Lokesh Jaliminche <[email protected]> wrote:
>>>
>>>
>>> From 5199982d29ff291b181c25af63a22d6f2d6d3f7b Mon Sep 17 00:00:00 2001
>>> From: Lokesh Nagappa Jaliminche <[email protected]>
>>> Date: Sat, 26 Dec 2015 13:19:36 +0530
>>> Subject: [PATCH v3] ext4: optimizing group search for inode allocation
>>>
>>> Added a check at the start of group search loop to
>>> avoid looping unecessarily in case of empty group.
>>> This also allow group search to jump directly to
>>> "found_flex_bg" with "stats" and "group" already set,
>>> so there is no need to go through the extra steps of
>>> setting "best_desc" and "best_group" and then break
>>> out of the loop just to set "stats" and "group" again.
>>>
>>> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
>>> ---
>>> fs/ext4/ialloc.c | 12 ++++++++++--
>>> 1 files changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>>> index 1b8024d..16f94db 100644
>>> --- a/fs/ext4/ialloc.c
>>> +++ b/fs/ext4/ialloc.c
>>> @@ -473,10 +473,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
>>>
>>> if (S_ISDIR(mode) &&
>>> - ((parent == d_inode(sb->s_root)) ||
>>> - (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
>>> + ((parent == d_inode(sb->s_root)) ||
>>> + (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
>>
>> I don't understand why you are changing the indentation here?
>>
>>> + unsigned int inodes_per_flex_group;
>>> + unsigned long int clusters_per_flex_group;
>>> int best_ndir = inodes_per_group;
>>> int ret = -1;
>>> + inodes_per_flex_group = inodes_per_group * flex_size;
>>> + clusters_per_flex_group = sbi->s_clusters_per_group * flex_size;
>>
>> Have you actually tested if this code works? When I was looking at this
>> code I was accidentally looking at the ext2 version, which only deals with
>> individual groups, and not flex groups. I don't think that a flex group
>> can actually be completely empty, since it will always contain at least
>> an inode table and some bitmaps.
>>
>> This calculation needs to take this into account or it will just be
>> overhead and not an optimization. Something like:
>>
>> /* account for inode table and allocation bitmaps */
>> clusters_per_flex_group = sbi->s_clusters_per_group * flex_size -
>> ((inodes_per_flex_group * EXT4_INODE_SIZE(sb) +
>> (flex_size << sb->s_blocksize_bits) * 2 ) >>
>> EXT4_SB(sb)->s_cluster_bits;
>>
>> It would be worthwhile to print this out and verify that there are actually
>> groups with this many free blocks in a newly formatted filesystem.
>>
>> Cheers, Andreas
>>
>>> if (qstr) {
>>> hinfo.hash_version = DX_HASH_HALF_MD4;
>>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>> for (i = 0; i < ngroups; i++) {
>>> g = (parent_group + i) % ngroups;
>>> get_orlov_stats(sb, g, flex_size, &stats);
>>> + /* the group can't get any better than empty */
>>> + if (inodes_per_flex_group == stats.free_inodes &&
>>> + clusters_per_flex_group == stats.free_clusters)
>>> + goto found_flex_bg;
>>> if (!stats.free_inodes)
>>> continue;
>>> if (stats.used_dirs >= best_ndir)
>>> --
>>> 1.7.1
>>>
>>>
>>>> On Mon, Dec 21, 2015 at 10:27:37AM -0700, Andreas Dilger wrote:
>>>>> On Dec 18, 2015, at 4:32 PM, lokesh jaliminche <[email protected]> wrote:
>>>>>
>>>>> From 9e09fef78b2fa552c883bf8124af873abfde0805 Mon Sep 17 00:00:00 2001
>>>>> From: Lokesh Nagappa Jaliminche <[email protected]>
>>>>
>>>> In the patch summary there is a typo - s/serch/search/
>>>>
>>>> Also, it appears your email client has mangled the whitespace in the
>>>> patch. Please use "git send-email" to send your patch to the list:
>>>>
>>>> git send-email --to [email protected] --cc [email protected] HEAD~1
>>>>
>>>> so that it arrives intact. You probably need to set it up in ~/.gitconfig:
>>>>
>>>> [sendemail]
>>>> confirm = compose
>>>> smtpdomain = {your client hostname}
>>>> smtpserver = {your SMTP server hostname}
>>>>
>>>> Cheers, Andreas
>>>>
>>>>> Added a check at the start of group search loop to
>>>>> avoid looping unecessarily in case of empty group.
>>>>> This also allow group search to jump directly to
>>>>> "found_flex_bg" with "stats" and "group" already set,
>>>>> so there is no need to go through the extra steps of
>>>>> setting "best_desc" and "best_group" and then break
>>>>> out of the loop just to set "stats" and "group" again.
>>>>>
>>>>> Signed-off-by: Lokesh N Jaliminche <[email protected]>
>>>>> ---
>>>>> fs/ext4/ialloc.c | 8 ++++++++
>>>>> 1 files changed, 8 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>>>>> index 1b8024d..588bf8e 100644
>>>>> --- a/fs/ext4/ialloc.c
>>>>> +++ b/fs/ext4/ialloc.c
>>>>> @@ -446,6 +446,8 @@ static int find_group_orlov(struct super_block
>>>>> *sb, struct inode *parent,
>>>>> struct ext4_sb_info *sbi = EXT4_SB(sb);
>>>>> ext4_group_t real_ngroups = ext4_get_groups_count(sb);
>>>>> int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
>>>>> + unsigned int inodes_per_flex_group;
>>>>> + long unsigned int blocks_per_clustre;
>>>>> unsigned int freei, avefreei, grp_free;
>>>>> ext4_fsblk_t freeb, avefreec;
>>>>> unsigned int ndirs;
>>>>> @@ -470,6 +472,8 @@ static int find_group_orlov(struct super_block
>>>>> *sb, struct inode *parent,
>>>>> percpu_counter_read_positive(&sbi->s_freeclusters_counter));
>>>>> avefreec = freeb;
>>>>> do_div(avefreec, ngroups);
>>>>> + inodes_per_flex_group = inodes_per_group * flex_size;
>>>>> + blocks_per_clustre = sbi->s_blocks_per_group * flex_size;
>>>>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
>>>>>
>>>>> if (S_ISDIR(mode) &&
>>>>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block
>>>>> *sb, struct inode *parent,
>>>>> for (i = 0; i < ngroups; i++) {
>>>>> g = (parent_group + i) % ngroups;
>>>>> get_orlov_stats(sb, g, flex_size, &stats);
>>>>> + /* the group can't get any better than empty */
>>>>> + if (inodes_per_flex_group == stats.free_inodes &&
>>>>> + blocks_per_clustre == stats.free_clusters)
>>>>> + goto found_flex_bg;
>>>>> if (!stats.free_inodes)
>>>>> continue;
>>>>> if (stats.used_dirs >= best_ndir)
>>>>> --
>>>>> 1.7.1
>>>>> <0001-EXT4-optimizing-group-serch-for-inode-allocation.patch>
>>>>
>>>>
>>>> Cheers, Andreas
>>>
>>>
>>> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>


Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-01-16 22:50:23

by lokesh jaliminche

[permalink] [raw]
Subject: Re: [PATCH v5] EXT4: optimizing group search for inode allocation

From: Lokesh Nagappa Jaliminche <[email protected]>
Date: Thu, 7 Jan 2016 05:12:20 +0530
Subject: [PATCH v5] ext4: optimizing group serch for inode allocation

Added a check at the start of group search loop to
avoid looping unecessarily in case of empty group.
This also allow group search to jump directly to
"found_flex_bg" with "stats" and "group" already set,
so there is no need to go through the extra steps of
setting "best_desc" , "best_group" and then break
out of the loop just to set "stats" and "group" again.

Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
---
fs/ext4/ialloc.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1b8024d..d2835cc 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -477,6 +477,12 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
(ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
int best_ndir = inodes_per_group;
int ret = -1;
+ /* Maximum usable blocks per group as some blocks are used
+ * for inode tables and allocation bitmaps */
+ unsigned int max_blocks_per_flex =
+ ((sbi->s_blocks_per_group -
+ (sbi->s_itb_per_group + 2)) *
+ flex_size) >> sbi->s_cluster_bits;

if (qstr) {
hinfo.hash_version = DX_HASH_HALF_MD4;
@@ -489,6 +495,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
for (i = 0; i < ngroups; i++) {
g = (parent_group + i) % ngroups;
get_orlov_stats(sb, g, flex_size, &stats);
+ /* can't get better group than empty group */
+ if (max_blocks_per_flex == stats.free_clusters &&
+ (inodes_per_group * flex_size) == stats.free_inodes)
+ goto found_flex_bg;
if (!stats.free_inodes)
continue;
if (stats.used_dirs >= best_ndir)
--.
1.7.1

On Mon, Jan 11, 2016 at 02:13:46PM -0700, Andreas Dilger wrote:
> On Jan 6, 2016, at 5:46 PM, Lokesh Jaliminche <[email protected]> wrote:
> >
> > you are right flex groups cannot be completely empty as some
> > blocks are used for inode table and allocation bitmaps. I
> > missed that point. I have corrected that. Also I have tested
> > and validated this patch by putting some traces it works!
>
> Good news. Thanks for updating the patch and testing it.
> More comments below.
>
> > Here are corresponding logs:
> > commands:
> > =========
> > mkfs.ext4 -g 10000 -G 2 /dev/sdc
> > mount -t ext4 /dev/sdc /mnt/test_ext4/
> > cd /mnt/test_ext4
> > mkdir 1
> > cd 1
> >
> > logs:
> > =====
> > Jan 7 03:36:45 root kernel: [ 64.792939] max_usable_blocks_per_flex_group = [19692]
> > Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_clusters = [19264]
> > Jan 7 03:36:45 root kernel: [ 64.792939] Inodes_per_flex_group = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_inodes = [4853]
> > Jan 7 03:36:45 root kernel: [ 64.792950] max_usable_blocks_per_flex_group = [19692]
> > Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_clusters = [19481]
> > Jan 7 03:36:45 root kernel: [ 64.792950] Inodes_per_flex_group = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_inodes = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792958] max_usable_blocks_per_flex_group = [19692]
> > Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_clusters = [19481]
> > Jan 7 03:36:45 root kernel: [ 64.792958] Inodes_per_flex_group = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_inodes = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792965] max_usable_blocks_per_flex_group = [19692]
> > Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_clusters = [19481]
> > Jan 7 03:36:45 root kernel: [ 64.792965] Inodes_per_flex_group = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_inodes = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792972] max_usable_blocks_per_flex_group = [19692]
> > Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_clusters = [19481]
> > Jan 7 03:36:45 root kernel: [ 64.792972] Inodes_per_flex_group = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_inodes = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792979] max_usable_blocks_per_flex_group = [19692]<<<<<<<<<<<
> > Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_clusters = [19692]<<<<<<<<<<<
> > Jan 7 03:36:45 root kernel: [ 64.792979] Inodes_per_flex_group = [4864]<<<<<<<<<<<<
> > Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_inodes = [4864]<<<<<<<<<<<<
> > Jan 7 03:36:45 root kernel: [ 64.792985] found_flex_bg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>condition satisfied
> >
> >
> > [Patch v4]:
> > ==========
> > From: Lokesh Nagappa Jaliminche <[email protected]>
> > Date: Thu, 7 Jan 2016 05:12:20 +0530
> > Subject: [PATCH] ext4: optimizing group serch for inode allocation
> >
> > Added a check at the start of group search loop to
> > avoid looping unecessarily in case of empty group.
> > This also allow group search to jump directly to
> > "found_flex_bg" with "stats" and "group" already set,
> > so there is no need to go through the extra steps of
> > setting "best_desc" , "best_group" and then break
> > out of the loop just to set "stats" and "group" again.
> >
> > Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
> > ---
> > fs/ext4/ext4.h | 2 ++
> > fs/ext4/ialloc.c | 21 +++++++++++++++++++--
> > 2 files changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index cc7ca4e..fc48f9a 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -326,6 +326,8 @@ struct flex_groups {
> > #define EXT4_MAX_DESC_SIZE EXT4_MIN_BLOCK_SIZE
> > #define EXT4_DESC_SIZE(s) (EXT4_SB(s)->s_desc_size)
> > #ifdef __KERNEL__
> > +# define EXT4_INODE_TABLE_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_itb_per_group)
> > +# define EXT4_BLOCKS_PER_CLUSTER(s) (EXT4_SB(s)->s_cluster_ratio)
>
> I don't think there is much value to these macros. They aren't much
> shorter than the actual variable access, and they don't provide any
> improved clarity for the reader.
>
> I think they were introduced many years ago so that this header could
> be shared between the kernel and e2fsprogs, but that is no longer true
> so I don't think there is a need to add new ones.
>
> Ted, any opinion on this?
>
> > # define EXT4_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_blocks_per_group)
> > # define EXT4_CLUSTERS_PER_GROUP(s) (EXT4_SB(s)->s_clusters_per_group)
> > # define EXT4_DESC_PER_BLOCK(s) (EXT4_SB(s)->s_desc_per_block)
> > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> > index 1b8024d..dc66ad8 100644
> > --- a/fs/ext4/ialloc.c
> > +++ b/fs/ext4/ialloc.c
> > @@ -463,7 +463,6 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > sbi->s_log_groups_per_flex;
> > parent_group >>= sbi->s_log_groups_per_flex;
> > }
> > -
> > freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
> > avefreei = freei / ngroups;
>
> Not sure why this line is being deleted?
>
> > @@ -477,7 +476,20 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> > int best_ndir = inodes_per_group;
> > int ret = -1;
> > -
> > + /* blocks used for inode table and allocation bitmaps */
> > + unsigned int max_usable_blocks_per_flex_group;
>
> This could be shorter and still clear, like "max_blocks_per_flex".
>
> > + unsigned int blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
> > + unsigned int inode_table_blocks_per_group = \
> > + EXT4_INODE_TABLE_BLOCKS_PER_GROUP(sb);
> > + /* Number of blocks per cluster */
> > + unsigned int cluster_ratio = EXT4_BLOCKS_PER_CLUSTER(sb);
>
> There also isn't much value to making local variables that hold these
> same values again. It increases stack space for little benefit. They
> are only accessed once anyway.
>
> > + unsigned int inodes_per_flex_group = inodes_per_group * \
> > + flex_size;
> > + max_usable_blocks_per_flex_group = \
> > + ((blocks_per_group*flex_size) - \
>
> Spaces around that '*'.
>
> > + (inode_table_blocks_per_group * \
> > + flex_size + 2 * flex_size)) / \
>
> This is C and not a CPP macro, so no need for linefeed escapes.
>
> > + cluster_ratio;
>
> This should be ">> EXT4_SB(sb)->s_cluster_bits" to avoid the divide.
>
> > if (qstr) {
> > hinfo.hash_version = DX_HASH_HALF_MD4;
> > hinfo.seed = sbi->s_hash_seed;
> > @@ -489,6 +501,11 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > for (i = 0; i < ngroups; i++) {
> > g = (parent_group + i) % ngroups;
> > get_orlov_stats(sb, g, flex_size, &stats);
> > + /* can't get better group than empty group */
> > + if (max_usable_blocks_per_flex_group == \
> > + stats.free_clusters && \
>
> No need for linefeed escapes.
>
> > + inodes_per_flex_group == stats.free_inodes)
>
> Align continued line after '(' from the "if (" line.
>
> > + goto found_flex_bg;
>
> This is indented too much. It will work properly when you fix the
> previous line.
>
> > if (!stats.free_inodes)
> > continue;
> > if (stats.used_dirs >= best_ndir)
> > --
> > 1.7.1
> >
> > Regards,
> > Lokesh
> >
> > On Sun, Jan 03, 2016 at 11:34:51AM -0700, Andreas Dilger wrote:
> >> On Dec 26, 2015, at 01:41, Lokesh Jaliminche <[email protected]> wrote:
> >>>
> >>>
> >>> From 5199982d29ff291b181c25af63a22d6f2d6d3f7b Mon Sep 17 00:00:00 2001
> >>> From: Lokesh Nagappa Jaliminche <[email protected]>
> >>> Date: Sat, 26 Dec 2015 13:19:36 +0530
> >>> Subject: [PATCH v3] ext4: optimizing group search for inode allocation
> >>>
> >>> Added a check at the start of group search loop to
> >>> avoid looping unecessarily in case of empty group.
> >>> This also allow group search to jump directly to
> >>> "found_flex_bg" with "stats" and "group" already set,
> >>> so there is no need to go through the extra steps of
> >>> setting "best_desc" and "best_group" and then break
> >>> out of the loop just to set "stats" and "group" again.
> >>>
> >>> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
> >>> ---
> >>> fs/ext4/ialloc.c | 12 ++++++++++--
> >>> 1 files changed, 10 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> >>> index 1b8024d..16f94db 100644
> >>> --- a/fs/ext4/ialloc.c
> >>> +++ b/fs/ext4/ialloc.c
> >>> @@ -473,10 +473,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> >>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
> >>>
> >>> if (S_ISDIR(mode) &&
> >>> - ((parent == d_inode(sb->s_root)) ||
> >>> - (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> >>> + ((parent == d_inode(sb->s_root)) ||
> >>> + (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> >>
> >> I don't understand why you are changing the indentation here?
> >>
> >>> + unsigned int inodes_per_flex_group;
> >>> + unsigned long int clusters_per_flex_group;
> >>> int best_ndir = inodes_per_group;
> >>> int ret = -1;
> >>> + inodes_per_flex_group = inodes_per_group * flex_size;
> >>> + clusters_per_flex_group = sbi->s_clusters_per_group * flex_size;
> >>
> >> Have you actually tested if this code works? When I was looking at this
> >> code I was accidentally looking at the ext2 version, which only deals with
> >> individual groups, and not flex groups. I don't think that a flex group
> >> can actually be completely empty, since it will always contain at least
> >> an inode table and some bitmaps.
> >>
> >> This calculation needs to take this into account or it will just be
> >> overhead and not an optimization. Something like:
> >>
> >> /* account for inode table and allocation bitmaps */
> >> clusters_per_flex_group = sbi->s_clusters_per_group * flex_size -
> >> ((inodes_per_flex_group * EXT4_INODE_SIZE(sb) +
> >> (flex_size << sb->s_blocksize_bits) * 2 ) >>
> >> EXT4_SB(sb)->s_cluster_bits;
> >>
> >> It would be worthwhile to print this out and verify that there are actually
> >> groups with this many free blocks in a newly formatted filesystem.
> >>
> >> Cheers, Andreas
> >>
> >>> if (qstr) {
> >>> hinfo.hash_version = DX_HASH_HALF_MD4;
> >>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> >>> for (i = 0; i < ngroups; i++) {
> >>> g = (parent_group + i) % ngroups;
> >>> get_orlov_stats(sb, g, flex_size, &stats);
> >>> + /* the group can't get any better than empty */
> >>> + if (inodes_per_flex_group == stats.free_inodes &&
> >>> + clusters_per_flex_group == stats.free_clusters)
> >>> + goto found_flex_bg;
> >>> if (!stats.free_inodes)
> >>> continue;
> >>> if (stats.used_dirs >= best_ndir)
> >>> --
> >>> 1.7.1
> >>>
> >>>
> >>>> On Mon, Dec 21, 2015 at 10:27:37AM -0700, Andreas Dilger wrote:
> >>>>> On Dec 18, 2015, at 4:32 PM, lokesh jaliminche <[email protected]> wrote:
> >>>>>
> >>>>> From 9e09fef78b2fa552c883bf8124af873abfde0805 Mon Sep 17 00:00:00 2001
> >>>>> From: Lokesh Nagappa Jaliminche <[email protected]>
> >>>>
> >>>> In the patch summary there is a typo - s/serch/search/
> >>>>
> >>>> Also, it appears your email client has mangled the whitespace in the
> >>>> patch. Please use "git send-email" to send your patch to the list:
> >>>>
> >>>> git send-email --to [email protected] --cc [email protected] HEAD~1
> >>>>
> >>>> so that it arrives intact. You probably need to set it up in ~/.gitconfig:
> >>>>
> >>>> [sendemail]
> >>>> confirm = compose
> >>>> smtpdomain = {your client hostname}
> >>>> smtpserver = {your SMTP server hostname}
> >>>>
> >>>> Cheers, Andreas
> >>>>
> >>>>> Added a check at the start of group search loop to
> >>>>> avoid looping unecessarily in case of empty group.
> >>>>> This also allow group search to jump directly to
> >>>>> "found_flex_bg" with "stats" and "group" already set,
> >>>>> so there is no need to go through the extra steps of
> >>>>> setting "best_desc" and "best_group" and then break
> >>>>> out of the loop just to set "stats" and "group" again.
> >>>>>
> >>>>> Signed-off-by: Lokesh N Jaliminche <[email protected]>
> >>>>> ---
> >>>>> fs/ext4/ialloc.c | 8 ++++++++
> >>>>> 1 files changed, 8 insertions(+), 0 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> >>>>> index 1b8024d..588bf8e 100644
> >>>>> --- a/fs/ext4/ialloc.c
> >>>>> +++ b/fs/ext4/ialloc.c
> >>>>> @@ -446,6 +446,8 @@ static int find_group_orlov(struct super_block
> >>>>> *sb, struct inode *parent,
> >>>>> struct ext4_sb_info *sbi = EXT4_SB(sb);
> >>>>> ext4_group_t real_ngroups = ext4_get_groups_count(sb);
> >>>>> int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
> >>>>> + unsigned int inodes_per_flex_group;
> >>>>> + long unsigned int blocks_per_clustre;
> >>>>> unsigned int freei, avefreei, grp_free;
> >>>>> ext4_fsblk_t freeb, avefreec;
> >>>>> unsigned int ndirs;
> >>>>> @@ -470,6 +472,8 @@ static int find_group_orlov(struct super_block
> >>>>> *sb, struct inode *parent,
> >>>>> percpu_counter_read_positive(&sbi->s_freeclusters_counter));
> >>>>> avefreec = freeb;
> >>>>> do_div(avefreec, ngroups);
> >>>>> + inodes_per_flex_group = inodes_per_group * flex_size;
> >>>>> + blocks_per_clustre = sbi->s_blocks_per_group * flex_size;
> >>>>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
> >>>>>
> >>>>> if (S_ISDIR(mode) &&
> >>>>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block
> >>>>> *sb, struct inode *parent,
> >>>>> for (i = 0; i < ngroups; i++) {
> >>>>> g = (parent_group + i) % ngroups;
> >>>>> get_orlov_stats(sb, g, flex_size, &stats);
> >>>>> + /* the group can't get any better than empty */
> >>>>> + if (inodes_per_flex_group == stats.free_inodes &&
> >>>>> + blocks_per_clustre == stats.free_clusters)
> >>>>> + goto found_flex_bg;
> >>>>> if (!stats.free_inodes)
> >>>>> continue;
> >>>>> if (stats.used_dirs >= best_ndir)
> >>>>> --
> >>>>> 1.7.1
> >>>>> <0001-EXT4-optimizing-group-serch-for-inode-allocation.patch>
> >>>>
> >>>>
> >>>> Cheers, Andreas
> >>>
> >>>
> >>> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
> > <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
>
>
> Cheers, Andreas
>
>
>
>
>



Attachments:
(No filename) (17.59 kB)
0001-ext4-optimizing-group-serch-for-inode-allocation.patch (1.87 kB)
Download all attachments

2016-01-16 23:02:39

by lokesh jaliminche

[permalink] [raw]
Subject: Re: [PATCH v5] EXT4: optimizing group search for inode allocation

From: Lokesh Nagappa Jaliminche <[email protected]>
Date: Thu, 7 Jan 2016 05:12:20 +0530
Subject: [PATCH v5] ext4: optimizing group serch for inode allocation

Added a check at the start of group search loop to
avoid looping unecessarily in case of empty group.
This also allow group search to jump directly to
"found_flex_bg" with "stats" and "group" already set,
so there is no need to go through the extra steps of
setting "best_desc" , "best_group" and then break
out of the loop just to set "stats" and "group" again.

Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
---
fs/ext4/ialloc.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1b8024d..d2835cc 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -477,6 +477,12 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
(ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
int best_ndir = inodes_per_group;
int ret = -1;
+ /* Maximum usable blocks per group as some blocks are used
+ * for inode tables and allocation bitmaps */
+ unsigned int max_blocks_per_flex =
+ ((sbi->s_blocks_per_group -
+ (sbi->s_itb_per_group + 2)) *
+ flex_size) >> sbi->s_cluster_bits;

if (qstr) {
hinfo.hash_version = DX_HASH_HALF_MD4;
@@ -489,6 +495,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
for (i = 0; i < ngroups; i++) {
g = (parent_group + i) % ngroups;
get_orlov_stats(sb, g, flex_size, &stats);
+ /* can't get better group than empty group */
+ if (max_blocks_per_flex == stats.free_clusters &&
+ (inodes_per_group * flex_size) == stats.free_inodes)
+ goto found_flex_bg;
if (!stats.free_inodes)
continue;
if (stats.used_dirs >= best_ndir)
--
1.7.1



On Mon, Jan 11, 2016 at 02:13:46PM -0700, Andreas Dilger wrote:
> On Jan 6, 2016, at 5:46 PM, Lokesh Jaliminche <[email protected]> wrote:
> >
> > you are right flex groups cannot be completely empty as some
> > blocks are used for inode table and allocation bitmaps. I
> > missed that point. I have corrected that. Also I have tested
> > and validated this patch by putting some traces it works!
>
> Good news. Thanks for updating the patch and testing it.
> More comments below.
>
> > Here are corresponding logs:
> > commands:
> > =========
> > mkfs.ext4 -g 10000 -G 2 /dev/sdc
> > mount -t ext4 /dev/sdc /mnt/test_ext4/
> > cd /mnt/test_ext4
> > mkdir 1
> > cd 1
> >
> > logs:
> > =====
> > Jan 7 03:36:45 root kernel: [ 64.792939] max_usable_blocks_per_flex_group = [19692]
> > Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_clusters = [19264]
> > Jan 7 03:36:45 root kernel: [ 64.792939] Inodes_per_flex_group = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_inodes = [4853]
> > Jan 7 03:36:45 root kernel: [ 64.792950] max_usable_blocks_per_flex_group = [19692]
> > Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_clusters = [19481]
> > Jan 7 03:36:45 root kernel: [ 64.792950] Inodes_per_flex_group = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_inodes = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792958] max_usable_blocks_per_flex_group = [19692]
> > Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_clusters = [19481]
> > Jan 7 03:36:45 root kernel: [ 64.792958] Inodes_per_flex_group = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_inodes = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792965] max_usable_blocks_per_flex_group = [19692]
> > Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_clusters = [19481]
> > Jan 7 03:36:45 root kernel: [ 64.792965] Inodes_per_flex_group = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_inodes = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792972] max_usable_blocks_per_flex_group = [19692]
> > Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_clusters = [19481]
> > Jan 7 03:36:45 root kernel: [ 64.792972] Inodes_per_flex_group = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_inodes = [4864]
> > Jan 7 03:36:45 root kernel: [ 64.792979] max_usable_blocks_per_flex_group = [19692]<<<<<<<<<<<
> > Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_clusters = [19692]<<<<<<<<<<<
> > Jan 7 03:36:45 root kernel: [ 64.792979] Inodes_per_flex_group = [4864]<<<<<<<<<<<<
> > Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_inodes = [4864]<<<<<<<<<<<<
> > Jan 7 03:36:45 root kernel: [ 64.792985] found_flex_bg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>condition satisfied
> >
> >
> > [Patch v4]:
> > ==========
> > From: Lokesh Nagappa Jaliminche <[email protected]>
> > Date: Thu, 7 Jan 2016 05:12:20 +0530
> > Subject: [PATCH] ext4: optimizing group serch for inode allocation
> >
> > Added a check at the start of group search loop to
> > avoid looping unecessarily in case of empty group.
> > This also allow group search to jump directly to
> > "found_flex_bg" with "stats" and "group" already set,
> > so there is no need to go through the extra steps of
> > setting "best_desc" , "best_group" and then break
> > out of the loop just to set "stats" and "group" again.
> >
> > Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
> > ---
> > fs/ext4/ext4.h | 2 ++
> > fs/ext4/ialloc.c | 21 +++++++++++++++++++--
> > 2 files changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index cc7ca4e..fc48f9a 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -326,6 +326,8 @@ struct flex_groups {
> > #define EXT4_MAX_DESC_SIZE EXT4_MIN_BLOCK_SIZE
> > #define EXT4_DESC_SIZE(s) (EXT4_SB(s)->s_desc_size)
> > #ifdef __KERNEL__
> > +# define EXT4_INODE_TABLE_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_itb_per_group)
> > +# define EXT4_BLOCKS_PER_CLUSTER(s) (EXT4_SB(s)->s_cluster_ratio)
>
> I don't think there is much value to these macros. They aren't much
> shorter than the actual variable access, and they don't provide any
> improved clarity for the reader.
>
> I think they were introduced many years ago so that this header could
> be shared between the kernel and e2fsprogs, but that is no longer true
> so I don't think there is a need to add new ones.
>
> Ted, any opinion on this?
>
> > # define EXT4_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_blocks_per_group)
> > # define EXT4_CLUSTERS_PER_GROUP(s) (EXT4_SB(s)->s_clusters_per_group)
> > # define EXT4_DESC_PER_BLOCK(s) (EXT4_SB(s)->s_desc_per_block)
> > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> > index 1b8024d..dc66ad8 100644
> > --- a/fs/ext4/ialloc.c
> > +++ b/fs/ext4/ialloc.c
> > @@ -463,7 +463,6 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > sbi->s_log_groups_per_flex;
> > parent_group >>= sbi->s_log_groups_per_flex;
> > }
> > -
> > freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
> > avefreei = freei / ngroups;
>
> Not sure why this line is being deleted?
>
> > @@ -477,7 +476,20 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> > int best_ndir = inodes_per_group;
> > int ret = -1;
> > -
> > + /* blocks used for inode table and allocation bitmaps */
> > + unsigned int max_usable_blocks_per_flex_group;
>
> This could be shorter and still clear, like "max_blocks_per_flex".
>
> > + unsigned int blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
> > + unsigned int inode_table_blocks_per_group = \
> > + EXT4_INODE_TABLE_BLOCKS_PER_GROUP(sb);
> > + /* Number of blocks per cluster */
> > + unsigned int cluster_ratio = EXT4_BLOCKS_PER_CLUSTER(sb);
>
> There also isn't much value to making local variables that hold these
> same values again. It increases stack space for little benefit. They
> are only accessed once anyway.
>
> > + unsigned int inodes_per_flex_group = inodes_per_group * \
> > + flex_size;
> > + max_usable_blocks_per_flex_group = \
> > + ((blocks_per_group*flex_size) - \
>
> Spaces around that '*'.
>
> > + (inode_table_blocks_per_group * \
> > + flex_size + 2 * flex_size)) / \
>
> This is C and not a CPP macro, so no need for linefeed escapes.
>
> > + cluster_ratio;
>
> This should be ">> EXT4_SB(sb)->s_cluster_bits" to avoid the divide.
>
> > if (qstr) {
> > hinfo.hash_version = DX_HASH_HALF_MD4;
> > hinfo.seed = sbi->s_hash_seed;
> > @@ -489,6 +501,11 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > for (i = 0; i < ngroups; i++) {
> > g = (parent_group + i) % ngroups;
> > get_orlov_stats(sb, g, flex_size, &stats);
> > + /* can't get better group than empty group */
> > + if (max_usable_blocks_per_flex_group == \
> > + stats.free_clusters && \
>
> No need for linefeed escapes.
>
> > + inodes_per_flex_group == stats.free_inodes)
>
> Align continued line after '(' from the "if (" line.
>
> > + goto found_flex_bg;
>
> This is indented too much. It will work properly when you fix the
> previous line.
>
> > if (!stats.free_inodes)
> > continue;
> > if (stats.used_dirs >= best_ndir)
> > --
> > 1.7.1
> >
> > Regards,
> > Lokesh
> >
> > On Sun, Jan 03, 2016 at 11:34:51AM -0700, Andreas Dilger wrote:
> >> On Dec 26, 2015, at 01:41, Lokesh Jaliminche <[email protected]> wrote:
> >>>
> >>>
> >>> From 5199982d29ff291b181c25af63a22d6f2d6d3f7b Mon Sep 17 00:00:00 2001
> >>> From: Lokesh Nagappa Jaliminche <[email protected]>
> >>> Date: Sat, 26 Dec 2015 13:19:36 +0530
> >>> Subject: [PATCH v3] ext4: optimizing group search for inode allocation
> >>>
> >>> Added a check at the start of group search loop to
> >>> avoid looping unecessarily in case of empty group.
> >>> This also allow group search to jump directly to
> >>> "found_flex_bg" with "stats" and "group" already set,
> >>> so there is no need to go through the extra steps of
> >>> setting "best_desc" and "best_group" and then break
> >>> out of the loop just to set "stats" and "group" again.
> >>>
> >>> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
> >>> ---
> >>> fs/ext4/ialloc.c | 12 ++++++++++--
> >>> 1 files changed, 10 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> >>> index 1b8024d..16f94db 100644
> >>> --- a/fs/ext4/ialloc.c
> >>> +++ b/fs/ext4/ialloc.c
> >>> @@ -473,10 +473,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> >>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
> >>>
> >>> if (S_ISDIR(mode) &&
> >>> - ((parent == d_inode(sb->s_root)) ||
> >>> - (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> >>> + ((parent == d_inode(sb->s_root)) ||
> >>> + (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> >>
> >> I don't understand why you are changing the indentation here?
> >>
> >>> + unsigned int inodes_per_flex_group;
> >>> + unsigned long int clusters_per_flex_group;
> >>> int best_ndir = inodes_per_group;
> >>> int ret = -1;
> >>> + inodes_per_flex_group = inodes_per_group * flex_size;
> >>> + clusters_per_flex_group = sbi->s_clusters_per_group * flex_size;
> >>
> >> Have you actually tested if this code works? When I was looking at this
> >> code I was accidentally looking at the ext2 version, which only deals with
> >> individual groups, and not flex groups. I don't think that a flex group
> >> can actually be completely empty, since it will always contain at least
> >> an inode table and some bitmaps.
> >>
> >> This calculation needs to take this into account or it will just be
> >> overhead and not an optimization. Something like:
> >>
> >> /* account for inode table and allocation bitmaps */
> >> clusters_per_flex_group = sbi->s_clusters_per_group * flex_size -
> >> ((inodes_per_flex_group * EXT4_INODE_SIZE(sb) +
> >> (flex_size << sb->s_blocksize_bits) * 2 ) >>
> >> EXT4_SB(sb)->s_cluster_bits;
> >>
> >> It would be worthwhile to print this out and verify that there are actually
> >> groups with this many free blocks in a newly formatted filesystem.
> >>
> >> Cheers, Andreas
> >>
> >>> if (qstr) {
> >>> hinfo.hash_version = DX_HASH_HALF_MD4;
> >>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> >>> for (i = 0; i < ngroups; i++) {
> >>> g = (parent_group + i) % ngroups;
> >>> get_orlov_stats(sb, g, flex_size, &stats);
> >>> + /* the group can't get any better than empty */
> >>> + if (inodes_per_flex_group == stats.free_inodes &&
> >>> + clusters_per_flex_group == stats.free_clusters)
> >>> + goto found_flex_bg;
> >>> if (!stats.free_inodes)
> >>> continue;
> >>> if (stats.used_dirs >= best_ndir)
> >>> --
> >>> 1.7.1
> >>>
> >>>
> >>>> On Mon, Dec 21, 2015 at 10:27:37AM -0700, Andreas Dilger wrote:
> >>>>> On Dec 18, 2015, at 4:32 PM, lokesh jaliminche <[email protected]> wrote:
> >>>>>
> >>>>> From 9e09fef78b2fa552c883bf8124af873abfde0805 Mon Sep 17 00:00:00 2001
> >>>>> From: Lokesh Nagappa Jaliminche <[email protected]>
> >>>>
> >>>> In the patch summary there is a typo - s/serch/search/
> >>>>
> >>>> Also, it appears your email client has mangled the whitespace in the
> >>>> patch. Please use "git send-email" to send your patch to the list:
> >>>>
> >>>> git send-email --to [email protected] --cc [email protected] HEAD~1
> >>>>
> >>>> so that it arrives intact. You probably need to set it up in ~/.gitconfig:
> >>>>
> >>>> [sendemail]
> >>>> confirm = compose
> >>>> smtpdomain = {your client hostname}
> >>>> smtpserver = {your SMTP server hostname}
> >>>>
> >>>> Cheers, Andreas
> >>>>
> >>>>> Added a check at the start of group search loop to
> >>>>> avoid looping unecessarily in case of empty group.
> >>>>> This also allow group search to jump directly to
> >>>>> "found_flex_bg" with "stats" and "group" already set,
> >>>>> so there is no need to go through the extra steps of
> >>>>> setting "best_desc" and "best_group" and then break
> >>>>> out of the loop just to set "stats" and "group" again.
> >>>>>
> >>>>> Signed-off-by: Lokesh N Jaliminche <[email protected]>
> >>>>> ---
> >>>>> fs/ext4/ialloc.c | 8 ++++++++
> >>>>> 1 files changed, 8 insertions(+), 0 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> >>>>> index 1b8024d..588bf8e 100644
> >>>>> --- a/fs/ext4/ialloc.c
> >>>>> +++ b/fs/ext4/ialloc.c
> >>>>> @@ -446,6 +446,8 @@ static int find_group_orlov(struct super_block
> >>>>> *sb, struct inode *parent,
> >>>>> struct ext4_sb_info *sbi = EXT4_SB(sb);
> >>>>> ext4_group_t real_ngroups = ext4_get_groups_count(sb);
> >>>>> int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
> >>>>> + unsigned int inodes_per_flex_group;
> >>>>> + long unsigned int blocks_per_clustre;
> >>>>> unsigned int freei, avefreei, grp_free;
> >>>>> ext4_fsblk_t freeb, avefreec;
> >>>>> unsigned int ndirs;
> >>>>> @@ -470,6 +472,8 @@ static int find_group_orlov(struct super_block
> >>>>> *sb, struct inode *parent,
> >>>>> percpu_counter_read_positive(&sbi->s_freeclusters_counter));
> >>>>> avefreec = freeb;
> >>>>> do_div(avefreec, ngroups);
> >>>>> + inodes_per_flex_group = inodes_per_group * flex_size;
> >>>>> + blocks_per_clustre = sbi->s_blocks_per_group * flex_size;
> >>>>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
> >>>>>
> >>>>> if (S_ISDIR(mode) &&
> >>>>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block
> >>>>> *sb, struct inode *parent,
> >>>>> for (i = 0; i < ngroups; i++) {
> >>>>> g = (parent_group + i) % ngroups;
> >>>>> get_orlov_stats(sb, g, flex_size, &stats);
> >>>>> + /* the group can't get any better than empty */
> >>>>> + if (inodes_per_flex_group == stats.free_inodes &&
> >>>>> + blocks_per_clustre == stats.free_clusters)
> >>>>> + goto found_flex_bg;
> >>>>> if (!stats.free_inodes)
> >>>>> continue;
> >>>>> if (stats.used_dirs >= best_ndir)
> >>>>> --
> >>>>> 1.7.1
> >>>>> <0001-EXT4-optimizing-group-serch-for-inode-allocation.patch>
> >>>>
> >>>>
> >>>> Cheers, Andreas
> >>>
> >>>
> >>> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
> > <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
>
>
> Cheers, Andreas
>
>
>
>
>



Attachments:
(No filename) (17.96 kB)
0001-ext4-optimizing-group-serch-for-inode-allocation.patch (1.87 kB)
Download all attachments

2016-01-18 21:50:59

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH v5] EXT4: optimizing group search for inode allocation

On Jan 16, 2016, at 4:02 PM, Lokesh Jaliminche <[email protected]> wrote:
>
> From: Lokesh Nagappa Jaliminche <[email protected]>
> Date: Thu, 7 Jan 2016 05:12:20 +0530
> Subject: [PATCH v5] ext4: optimizing group serch for inode allocation
>
> Added a check at the start of group search loop to
> avoid looping unecessarily in case of empty group.
> This also allow group search to jump directly to
> "found_flex_bg" with "stats" and "group" already set,
> so there is no need to go through the extra steps of
> setting "best_desc" , "best_group" and then break
> out of the loop just to set "stats" and "group" again.

This looks very good, just one minor potential issue below.

> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
> ---
> fs/ext4/ialloc.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 1b8024d..d2835cc 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -477,6 +477,12 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> int best_ndir = inodes_per_group;
> int ret = -1;
> + /* Maximum usable blocks per group as some blocks are used
> + * for inode tables and allocation bitmaps */
> + unsigned int max_blocks_per_flex =

This should be an unsigned long long or __u64, since the max_blocks_per_flex
calculation may overflow a 32-bit value on filesystems with larger block_size
and a large value for flex_size (larger than 2^13 for a 64KB blocksize).

The "stats.free_clusters" is already a __u64 for this reason.

> + ((sbi->s_blocks_per_group -
> + (sbi->s_itb_per_group + 2)) *
> + flex_size) >> sbi->s_cluster_bits;
>
> if (qstr) {
> hinfo.hash_version = DX_HASH_HALF_MD4;
> @@ -489,6 +495,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> for (i = 0; i < ngroups; i++) {
> g = (parent_group + i) % ngroups;
> get_orlov_stats(sb, g, flex_size, &stats);
> + /* can't get better group than empty group */
> + if (max_blocks_per_flex == stats.free_clusters &&
> + (inodes_per_group * flex_size) == stats.free_inodes)
> + goto found_flex_bg;
> if (!stats.free_inodes)
> continue;
> if (stats.used_dirs >= best_ndir)
> --
> 1.7.1
>
>
>
> On Mon, Jan 11, 2016 at 02:13:46PM -0700, Andreas Dilger wrote:
>> On Jan 6, 2016, at 5:46 PM, Lokesh Jaliminche <[email protected]> wrote:
>>>
>>> you are right flex groups cannot be completely empty as some
>>> blocks are used for inode table and allocation bitmaps. I
>>> missed that point. I have corrected that. Also I have tested
>>> and validated this patch by putting some traces it works!
>>
>> Good news. Thanks for updating the patch and testing it.
>> More comments below.
>>
>>> Here are corresponding logs:
>>> commands:
>>> =========
>>> mkfs.ext4 -g 10000 -G 2 /dev/sdc
>>> mount -t ext4 /dev/sdc /mnt/test_ext4/
>>> cd /mnt/test_ext4
>>> mkdir 1
>>> cd 1
>>>
>>> logs:
>>> =====
>>> Jan 7 03:36:45 root kernel: [ 64.792939] max_usable_blocks_per_flex_group = [19692]
>>> Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_clusters = [19264]
>>> Jan 7 03:36:45 root kernel: [ 64.792939] Inodes_per_flex_group = [4864]
>>> Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_inodes = [4853]
>>> Jan 7 03:36:45 root kernel: [ 64.792950] max_usable_blocks_per_flex_group = [19692]
>>> Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_clusters = [19481]
>>> Jan 7 03:36:45 root kernel: [ 64.792950] Inodes_per_flex_group = [4864]
>>> Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_inodes = [4864]
>>> Jan 7 03:36:45 root kernel: [ 64.792958] max_usable_blocks_per_flex_group = [19692]
>>> Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_clusters = [19481]
>>> Jan 7 03:36:45 root kernel: [ 64.792958] Inodes_per_flex_group = [4864]
>>> Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_inodes = [4864]
>>> Jan 7 03:36:45 root kernel: [ 64.792965] max_usable_blocks_per_flex_group = [19692]
>>> Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_clusters = [19481]
>>> Jan 7 03:36:45 root kernel: [ 64.792965] Inodes_per_flex_group = [4864]
>>> Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_inodes = [4864]
>>> Jan 7 03:36:45 root kernel: [ 64.792972] max_usable_blocks_per_flex_group = [19692]
>>> Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_clusters = [19481]
>>> Jan 7 03:36:45 root kernel: [ 64.792972] Inodes_per_flex_group = [4864]
>>> Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_inodes = [4864]
>>> Jan 7 03:36:45 root kernel: [ 64.792979] max_usable_blocks_per_flex_group = [19692]<<<<<<<<<<<
>>> Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_clusters = [19692]<<<<<<<<<<<
>>> Jan 7 03:36:45 root kernel: [ 64.792979] Inodes_per_flex_group = [4864]<<<<<<<<<<<<
>>> Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_inodes = [4864]<<<<<<<<<<<<
>>> Jan 7 03:36:45 root kernel: [ 64.792985] found_flex_bg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>condition satisfied
>>>
>>>
>>> [Patch v4]:
>>> ==========
>>> From: Lokesh Nagappa Jaliminche <[email protected]>
>>> Date: Thu, 7 Jan 2016 05:12:20 +0530
>>> Subject: [PATCH] ext4: optimizing group serch for inode allocation
>>>
>>> Added a check at the start of group search loop to
>>> avoid looping unecessarily in case of empty group.
>>> This also allow group search to jump directly to
>>> "found_flex_bg" with "stats" and "group" already set,
>>> so there is no need to go through the extra steps of
>>> setting "best_desc" , "best_group" and then break
>>> out of the loop just to set "stats" and "group" again.
>>>
>>> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
>>> ---
>>> fs/ext4/ext4.h | 2 ++
>>> fs/ext4/ialloc.c | 21 +++++++++++++++++++--
>>> 2 files changed, 21 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>>> index cc7ca4e..fc48f9a 100644
>>> --- a/fs/ext4/ext4.h
>>> +++ b/fs/ext4/ext4.h
>>> @@ -326,6 +326,8 @@ struct flex_groups {
>>> #define EXT4_MAX_DESC_SIZE EXT4_MIN_BLOCK_SIZE
>>> #define EXT4_DESC_SIZE(s) (EXT4_SB(s)->s_desc_size)
>>> #ifdef __KERNEL__
>>> +# define EXT4_INODE_TABLE_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_itb_per_group)
>>> +# define EXT4_BLOCKS_PER_CLUSTER(s) (EXT4_SB(s)->s_cluster_ratio)
>>
>> I don't think there is much value to these macros. They aren't much
>> shorter than the actual variable access, and they don't provide any
>> improved clarity for the reader.
>>
>> I think they were introduced many years ago so that this header could
>> be shared between the kernel and e2fsprogs, but that is no longer true
>> so I don't think there is a need to add new ones.
>>
>> Ted, any opinion on this?
>>
>>> # define EXT4_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_blocks_per_group)
>>> # define EXT4_CLUSTERS_PER_GROUP(s) (EXT4_SB(s)->s_clusters_per_group)
>>> # define EXT4_DESC_PER_BLOCK(s) (EXT4_SB(s)->s_desc_per_block)
>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>>> index 1b8024d..dc66ad8 100644
>>> --- a/fs/ext4/ialloc.c
>>> +++ b/fs/ext4/ialloc.c
>>> @@ -463,7 +463,6 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>> sbi->s_log_groups_per_flex;
>>> parent_group >>= sbi->s_log_groups_per_flex;
>>> }
>>> -
>>> freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
>>> avefreei = freei / ngroups;
>>
>> Not sure why this line is being deleted?
>>
>>> @@ -477,7 +476,20 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>> (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
>>> int best_ndir = inodes_per_group;
>>> int ret = -1;
>>> -
>>> + /* blocks used for inode table and allocation bitmaps */
>>> + unsigned int max_usable_blocks_per_flex_group;
>>
>> This could be shorter and still clear, like "max_blocks_per_flex".
>>
>>> + unsigned int blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
>>> + unsigned int inode_table_blocks_per_group = \
>>> + EXT4_INODE_TABLE_BLOCKS_PER_GROUP(sb);
>>> + /* Number of blocks per cluster */
>>> + unsigned int cluster_ratio = EXT4_BLOCKS_PER_CLUSTER(sb);
>>
>> There also isn't much value to making local variables that hold these
>> same values again. It increases stack space for little benefit. They
>> are only accessed once anyway.
>>
>>> + unsigned int inodes_per_flex_group = inodes_per_group * \
>>> + flex_size;
>>> + max_usable_blocks_per_flex_group = \
>>> + ((blocks_per_group*flex_size) - \
>>
>> Spaces around that '*'.
>>
>>> + (inode_table_blocks_per_group * \
>>> + flex_size + 2 * flex_size)) / \
>>
>> This is C and not a CPP macro, so no need for linefeed escapes.
>>
>>> + cluster_ratio;
>>
>> This should be ">> EXT4_SB(sb)->s_cluster_bits" to avoid the divide.
>>
>>> if (qstr) {
>>> hinfo.hash_version = DX_HASH_HALF_MD4;
>>> hinfo.seed = sbi->s_hash_seed;
>>> @@ -489,6 +501,11 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>> for (i = 0; i < ngroups; i++) {
>>> g = (parent_group + i) % ngroups;
>>> get_orlov_stats(sb, g, flex_size, &stats);
>>> + /* can't get better group than empty group */
>>> + if (max_usable_blocks_per_flex_group == \
>>> + stats.free_clusters && \
>>
>> No need for linefeed escapes.
>>
>>> + inodes_per_flex_group == stats.free_inodes)
>>
>> Align continued line after '(' from the "if (" line.
>>
>>> + goto found_flex_bg;
>>
>> This is indented too much. It will work properly when you fix the
>> previous line.
>>
>>> if (!stats.free_inodes)
>>> continue;
>>> if (stats.used_dirs >= best_ndir)
>>> --
>>> 1.7.1
>>>
>>> Regards,
>>> Lokesh
>>>
>>> On Sun, Jan 03, 2016 at 11:34:51AM -0700, Andreas Dilger wrote:
>>>> On Dec 26, 2015, at 01:41, Lokesh Jaliminche <[email protected]> wrote:
>>>>>
>>>>>
>>>>> From 5199982d29ff291b181c25af63a22d6f2d6d3f7b Mon Sep 17 00:00:00 2001
>>>>> From: Lokesh Nagappa Jaliminche <[email protected]>
>>>>> Date: Sat, 26 Dec 2015 13:19:36 +0530
>>>>> Subject: [PATCH v3] ext4: optimizing group search for inode allocation
>>>>>
>>>>> Added a check at the start of group search loop to
>>>>> avoid looping unecessarily in case of empty group.
>>>>> This also allow group search to jump directly to
>>>>> "found_flex_bg" with "stats" and "group" already set,
>>>>> so there is no need to go through the extra steps of
>>>>> setting "best_desc" and "best_group" and then break
>>>>> out of the loop just to set "stats" and "group" again.
>>>>>
>>>>> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
>>>>> ---
>>>>> fs/ext4/ialloc.c | 12 ++++++++++--
>>>>> 1 files changed, 10 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>>>>> index 1b8024d..16f94db 100644
>>>>> --- a/fs/ext4/ialloc.c
>>>>> +++ b/fs/ext4/ialloc.c
>>>>> @@ -473,10 +473,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>>>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
>>>>>
>>>>> if (S_ISDIR(mode) &&
>>>>> - ((parent == d_inode(sb->s_root)) ||
>>>>> - (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
>>>>> + ((parent == d_inode(sb->s_root)) ||
>>>>> + (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
>>>>
>>>> I don't understand why you are changing the indentation here?
>>>>
>>>>> + unsigned int inodes_per_flex_group;
>>>>> + unsigned long int clusters_per_flex_group;
>>>>> int best_ndir = inodes_per_group;
>>>>> int ret = -1;
>>>>> + inodes_per_flex_group = inodes_per_group * flex_size;
>>>>> + clusters_per_flex_group = sbi->s_clusters_per_group * flex_size;
>>>>
>>>> Have you actually tested if this code works? When I was looking at this
>>>> code I was accidentally looking at the ext2 version, which only deals with
>>>> individual groups, and not flex groups. I don't think that a flex group
>>>> can actually be completely empty, since it will always contain at least
>>>> an inode table and some bitmaps.
>>>>
>>>> This calculation needs to take this into account or it will just be
>>>> overhead and not an optimization. Something like:
>>>>
>>>> /* account for inode table and allocation bitmaps */
>>>> clusters_per_flex_group = sbi->s_clusters_per_group * flex_size -
>>>> ((inodes_per_flex_group * EXT4_INODE_SIZE(sb) +
>>>> (flex_size << sb->s_blocksize_bits) * 2 ) >>
>>>> EXT4_SB(sb)->s_cluster_bits;
>>>>
>>>> It would be worthwhile to print this out and verify that there are actually
>>>> groups with this many free blocks in a newly formatted filesystem.
>>>>
>>>> Cheers, Andreas
>>>>
>>>>> if (qstr) {
>>>>> hinfo.hash_version = DX_HASH_HALF_MD4;
>>>>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>>>> for (i = 0; i < ngroups; i++) {
>>>>> g = (parent_group + i) % ngroups;
>>>>> get_orlov_stats(sb, g, flex_size, &stats);
>>>>> + /* the group can't get any better than empty */
>>>>> + if (inodes_per_flex_group == stats.free_inodes &&
>>>>> + clusters_per_flex_group == stats.free_clusters)
>>>>> + goto found_flex_bg;
>>>>> if (!stats.free_inodes)
>>>>> continue;
>>>>> if (stats.used_dirs >= best_ndir)
>>>>> --
>>>>> 1.7.1
>>>>>
>>>>>
>>>>>> On Mon, Dec 21, 2015 at 10:27:37AM -0700, Andreas Dilger wrote:
>>>>>>> On Dec 18, 2015, at 4:32 PM, lokesh jaliminche <[email protected]> wrote:
>>>>>>>
>>>>>>> From 9e09fef78b2fa552c883bf8124af873abfde0805 Mon Sep 17 00:00:00 2001
>>>>>>> From: Lokesh Nagappa Jaliminche <[email protected]>
>>>>>>
>>>>>> In the patch summary there is a typo - s/serch/search/
>>>>>>
>>>>>> Also, it appears your email client has mangled the whitespace in the
>>>>>> patch. Please use "git send-email" to send your patch to the list:
>>>>>>
>>>>>> git send-email --to [email protected] --cc [email protected] HEAD~1
>>>>>>
>>>>>> so that it arrives intact. You probably need to set it up in ~/.gitconfig:
>>>>>>
>>>>>> [sendemail]
>>>>>> confirm = compose
>>>>>> smtpdomain = {your client hostname}
>>>>>> smtpserver = {your SMTP server hostname}
>>>>>>
>>>>>> Cheers, Andreas
>>>>>>
>>>>>>> Added a check at the start of group search loop to
>>>>>>> avoid looping unecessarily in case of empty group.
>>>>>>> This also allow group search to jump directly to
>>>>>>> "found_flex_bg" with "stats" and "group" already set,
>>>>>>> so there is no need to go through the extra steps of
>>>>>>> setting "best_desc" and "best_group" and then break
>>>>>>> out of the loop just to set "stats" and "group" again.
>>>>>>>
>>>>>>> Signed-off-by: Lokesh N Jaliminche <[email protected]>
>>>>>>> ---
>>>>>>> fs/ext4/ialloc.c | 8 ++++++++
>>>>>>> 1 files changed, 8 insertions(+), 0 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>>>>>>> index 1b8024d..588bf8e 100644
>>>>>>> --- a/fs/ext4/ialloc.c
>>>>>>> +++ b/fs/ext4/ialloc.c
>>>>>>> @@ -446,6 +446,8 @@ static int find_group_orlov(struct super_block
>>>>>>> *sb, struct inode *parent,
>>>>>>> struct ext4_sb_info *sbi = EXT4_SB(sb);
>>>>>>> ext4_group_t real_ngroups = ext4_get_groups_count(sb);
>>>>>>> int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
>>>>>>> + unsigned int inodes_per_flex_group;
>>>>>>> + long unsigned int blocks_per_clustre;
>>>>>>> unsigned int freei, avefreei, grp_free;
>>>>>>> ext4_fsblk_t freeb, avefreec;
>>>>>>> unsigned int ndirs;
>>>>>>> @@ -470,6 +472,8 @@ static int find_group_orlov(struct super_block
>>>>>>> *sb, struct inode *parent,
>>>>>>> percpu_counter_read_positive(&sbi->s_freeclusters_counter));
>>>>>>> avefreec = freeb;
>>>>>>> do_div(avefreec, ngroups);
>>>>>>> + inodes_per_flex_group = inodes_per_group * flex_size;
>>>>>>> + blocks_per_clustre = sbi->s_blocks_per_group * flex_size;
>>>>>>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
>>>>>>>
>>>>>>> if (S_ISDIR(mode) &&
>>>>>>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block
>>>>>>> *sb, struct inode *parent,
>>>>>>> for (i = 0; i < ngroups; i++) {
>>>>>>> g = (parent_group + i) % ngroups;
>>>>>>> get_orlov_stats(sb, g, flex_size, &stats);
>>>>>>> + /* the group can't get any better than empty */
>>>>>>> + if (inodes_per_flex_group == stats.free_inodes &&
>>>>>>> + blocks_per_clustre == stats.free_clusters)
>>>>>>> + goto found_flex_bg;
>>>>>>> if (!stats.free_inodes)
>>>>>>> continue;
>>>>>>> if (stats.used_dirs >= best_ndir)
>>>>>>> --
>>>>>>> 1.7.1
>>>>>>> <0001-EXT4-optimizing-group-serch-for-inode-allocation.patch>
>>>>>>
>>>>>>
>>>>>> Cheers, Andreas
>>>>>
>>>>>
>>>>> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
>>> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
>>
>>
>> Cheers, Andreas
>>
>>
>>
>>
>>
>
>
> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>


Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-01-19 06:08:21

by lokesh jaliminche

[permalink] [raw]
Subject: Re: [PATCH v6] EXT4: optimizing group search for inode allocation

From: Lokesh Nagappa Jaliminche <[email protected]>
Subject: [PATCH] ext4: optimizing group serch for inode allocation

Added a check at the start of group search loop to
avoid looping unecessarily in case of empty group.
This also allow group search to jump directly to
"found_flex_bg" with "stats" and "group" already set,
so there is no need to go through the extra steps of
setting "best_desc" , "best_group" and then break
out of the loop just to set "stats" and "group" again.

Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
---
fs/ext4/ialloc.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1b8024d..3463d30 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -477,6 +477,12 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
(ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
int best_ndir = inodes_per_group;
int ret = -1;
+ /* Maximum usable blocks per group as some blocks are used
+ * for inode tables and allocation bitmaps */
+ unsigned long long max_blocks_per_flex =
+ ((sbi->s_blocks_per_group -
+ (sbi->s_itb_per_group + 2)) *
+ flex_size) >> sbi->s_cluster_bits;

if (qstr) {
hinfo.hash_version = DX_HASH_HALF_MD4;
@@ -489,6 +495,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
for (i = 0; i < ngroups; i++) {
g = (parent_group + i) % ngroups;
get_orlov_stats(sb, g, flex_size, &stats);
+ /* can't get better group than empty group */
+ if (max_blocks_per_flex == stats.free_clusters &&
+ (inodes_per_group * flex_size) == stats.free_inodes)
+ goto found_flex_bg;
if (!stats.free_inodes)
continue;
if (stats.used_dirs >= best_ndir)
--
1.7.1



On Mon, Jan 18, 2016 at 02:50:48PM -0700, Andreas Dilger wrote:
> On Jan 16, 2016, at 4:02 PM, Lokesh Jaliminche <[email protected]> wrote:
> >
> > From: Lokesh Nagappa Jaliminche <[email protected]>
> > Date: Thu, 7 Jan 2016 05:12:20 +0530
> > Subject: [PATCH v5] ext4: optimizing group serch for inode allocation
> >
> > Added a check at the start of group search loop to
> > avoid looping unecessarily in case of empty group.
> > This also allow group search to jump directly to
> > "found_flex_bg" with "stats" and "group" already set,
> > so there is no need to go through the extra steps of
> > setting "best_desc" , "best_group" and then break
> > out of the loop just to set "stats" and "group" again.
>
> This looks very good, just one minor potential issue below.
>
> > Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
> > ---
> > fs/ext4/ialloc.c | 10 ++++++++++
> > 1 files changed, 10 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> > index 1b8024d..d2835cc 100644
> > --- a/fs/ext4/ialloc.c
> > +++ b/fs/ext4/ialloc.c
> > @@ -477,6 +477,12 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> > int best_ndir = inodes_per_group;
> > int ret = -1;
> > + /* Maximum usable blocks per group as some blocks are used
> > + * for inode tables and allocation bitmaps */
> > + unsigned int max_blocks_per_flex =
>
> This should be an unsigned long long or __u64, since the max_blocks_per_flex
> calculation may overflow a 32-bit value on filesystems with larger block_size
> and a large value for flex_size (larger than 2^13 for a 64KB blocksize).
>
> The "stats.free_clusters" is already a __u64 for this reason.
>
> > + ((sbi->s_blocks_per_group -
> > + (sbi->s_itb_per_group + 2)) *
> > + flex_size) >> sbi->s_cluster_bits;
> >
> > if (qstr) {
> > hinfo.hash_version = DX_HASH_HALF_MD4;
> > @@ -489,6 +495,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > for (i = 0; i < ngroups; i++) {
> > g = (parent_group + i) % ngroups;
> > get_orlov_stats(sb, g, flex_size, &stats);
> > + /* can't get better group than empty group */
> > + if (max_blocks_per_flex == stats.free_clusters &&
> > + (inodes_per_group * flex_size) == stats.free_inodes)
> > + goto found_flex_bg;
> > if (!stats.free_inodes)
> > continue;
> > if (stats.used_dirs >= best_ndir)
> > --
> > 1.7.1
> >
> >
> >
> > On Mon, Jan 11, 2016 at 02:13:46PM -0700, Andreas Dilger wrote:
> >> On Jan 6, 2016, at 5:46 PM, Lokesh Jaliminche <[email protected]> wrote:
> >>>
> >>> you are right flex groups cannot be completely empty as some
> >>> blocks are used for inode table and allocation bitmaps. I
> >>> missed that point. I have corrected that. Also I have tested
> >>> and validated this patch by putting some traces it works!
> >>
> >> Good news. Thanks for updating the patch and testing it.
> >> More comments below.
> >>
> >>> Here are corresponding logs:
> >>> commands:
> >>> =========
> >>> mkfs.ext4 -g 10000 -G 2 /dev/sdc
> >>> mount -t ext4 /dev/sdc /mnt/test_ext4/
> >>> cd /mnt/test_ext4
> >>> mkdir 1
> >>> cd 1
> >>>
> >>> logs:
> >>> =====
> >>> Jan 7 03:36:45 root kernel: [ 64.792939] max_usable_blocks_per_flex_group = [19692]
> >>> Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_clusters = [19264]
> >>> Jan 7 03:36:45 root kernel: [ 64.792939] Inodes_per_flex_group = [4864]
> >>> Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_inodes = [4853]
> >>> Jan 7 03:36:45 root kernel: [ 64.792950] max_usable_blocks_per_flex_group = [19692]
> >>> Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_clusters = [19481]
> >>> Jan 7 03:36:45 root kernel: [ 64.792950] Inodes_per_flex_group = [4864]
> >>> Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_inodes = [4864]
> >>> Jan 7 03:36:45 root kernel: [ 64.792958] max_usable_blocks_per_flex_group = [19692]
> >>> Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_clusters = [19481]
> >>> Jan 7 03:36:45 root kernel: [ 64.792958] Inodes_per_flex_group = [4864]
> >>> Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_inodes = [4864]
> >>> Jan 7 03:36:45 root kernel: [ 64.792965] max_usable_blocks_per_flex_group = [19692]
> >>> Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_clusters = [19481]
> >>> Jan 7 03:36:45 root kernel: [ 64.792965] Inodes_per_flex_group = [4864]
> >>> Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_inodes = [4864]
> >>> Jan 7 03:36:45 root kernel: [ 64.792972] max_usable_blocks_per_flex_group = [19692]
> >>> Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_clusters = [19481]
> >>> Jan 7 03:36:45 root kernel: [ 64.792972] Inodes_per_flex_group = [4864]
> >>> Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_inodes = [4864]
> >>> Jan 7 03:36:45 root kernel: [ 64.792979] max_usable_blocks_per_flex_group = [19692]<<<<<<<<<<<
> >>> Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_clusters = [19692]<<<<<<<<<<<
> >>> Jan 7 03:36:45 root kernel: [ 64.792979] Inodes_per_flex_group = [4864]<<<<<<<<<<<<
> >>> Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_inodes = [4864]<<<<<<<<<<<<
> >>> Jan 7 03:36:45 root kernel: [ 64.792985] found_flex_bg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>condition satisfied
> >>>
> >>>
> >>> [Patch v4]:
> >>> ==========
> >>> From: Lokesh Nagappa Jaliminche <[email protected]>
> >>> Date: Thu, 7 Jan 2016 05:12:20 +0530
> >>> Subject: [PATCH] ext4: optimizing group serch for inode allocation
> >>>
> >>> Added a check at the start of group search loop to
> >>> avoid looping unecessarily in case of empty group.
> >>> This also allow group search to jump directly to
> >>> "found_flex_bg" with "stats" and "group" already set,
> >>> so there is no need to go through the extra steps of
> >>> setting "best_desc" , "best_group" and then break
> >>> out of the loop just to set "stats" and "group" again.
> >>>
> >>> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
> >>> ---
> >>> fs/ext4/ext4.h | 2 ++
> >>> fs/ext4/ialloc.c | 21 +++++++++++++++++++--
> >>> 2 files changed, 21 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> >>> index cc7ca4e..fc48f9a 100644
> >>> --- a/fs/ext4/ext4.h
> >>> +++ b/fs/ext4/ext4.h
> >>> @@ -326,6 +326,8 @@ struct flex_groups {
> >>> #define EXT4_MAX_DESC_SIZE EXT4_MIN_BLOCK_SIZE
> >>> #define EXT4_DESC_SIZE(s) (EXT4_SB(s)->s_desc_size)
> >>> #ifdef __KERNEL__
> >>> +# define EXT4_INODE_TABLE_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_itb_per_group)
> >>> +# define EXT4_BLOCKS_PER_CLUSTER(s) (EXT4_SB(s)->s_cluster_ratio)
> >>
> >> I don't think there is much value to these macros. They aren't much
> >> shorter than the actual variable access, and they don't provide any
> >> improved clarity for the reader.
> >>
> >> I think they were introduced many years ago so that this header could
> >> be shared between the kernel and e2fsprogs, but that is no longer true
> >> so I don't think there is a need to add new ones.
> >>
> >> Ted, any opinion on this?
> >>
> >>> # define EXT4_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_blocks_per_group)
> >>> # define EXT4_CLUSTERS_PER_GROUP(s) (EXT4_SB(s)->s_clusters_per_group)
> >>> # define EXT4_DESC_PER_BLOCK(s) (EXT4_SB(s)->s_desc_per_block)
> >>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> >>> index 1b8024d..dc66ad8 100644
> >>> --- a/fs/ext4/ialloc.c
> >>> +++ b/fs/ext4/ialloc.c
> >>> @@ -463,7 +463,6 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> >>> sbi->s_log_groups_per_flex;
> >>> parent_group >>= sbi->s_log_groups_per_flex;
> >>> }
> >>> -
> >>> freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
> >>> avefreei = freei / ngroups;
> >>
> >> Not sure why this line is being deleted?
> >>
> >>> @@ -477,7 +476,20 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> >>> (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> >>> int best_ndir = inodes_per_group;
> >>> int ret = -1;
> >>> -
> >>> + /* blocks used for inode table and allocation bitmaps */
> >>> + unsigned int max_usable_blocks_per_flex_group;
> >>
> >> This could be shorter and still clear, like "max_blocks_per_flex".
> >>
> >>> + unsigned int blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
> >>> + unsigned int inode_table_blocks_per_group = \
> >>> + EXT4_INODE_TABLE_BLOCKS_PER_GROUP(sb);
> >>> + /* Number of blocks per cluster */
> >>> + unsigned int cluster_ratio = EXT4_BLOCKS_PER_CLUSTER(sb);
> >>
> >> There also isn't much value to making local variables that hold these
> >> same values again. It increases stack space for little benefit. They
> >> are only accessed once anyway.
> >>
> >>> + unsigned int inodes_per_flex_group = inodes_per_group * \
> >>> + flex_size;
> >>> + max_usable_blocks_per_flex_group = \
> >>> + ((blocks_per_group*flex_size) - \
> >>
> >> Spaces around that '*'.
> >>
> >>> + (inode_table_blocks_per_group * \
> >>> + flex_size + 2 * flex_size)) / \
> >>
> >> This is C and not a CPP macro, so no need for linefeed escapes.
> >>
> >>> + cluster_ratio;
> >>
> >> This should be ">> EXT4_SB(sb)->s_cluster_bits" to avoid the divide.
> >>
> >>> if (qstr) {
> >>> hinfo.hash_version = DX_HASH_HALF_MD4;
> >>> hinfo.seed = sbi->s_hash_seed;
> >>> @@ -489,6 +501,11 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> >>> for (i = 0; i < ngroups; i++) {
> >>> g = (parent_group + i) % ngroups;
> >>> get_orlov_stats(sb, g, flex_size, &stats);
> >>> + /* can't get better group than empty group */
> >>> + if (max_usable_blocks_per_flex_group == \
> >>> + stats.free_clusters && \
> >>
> >> No need for linefeed escapes.
> >>
> >>> + inodes_per_flex_group == stats.free_inodes)
> >>
> >> Align continued line after '(' from the "if (" line.
> >>
> >>> + goto found_flex_bg;
> >>
> >> This is indented too much. It will work properly when you fix the
> >> previous line.
> >>
> >>> if (!stats.free_inodes)
> >>> continue;
> >>> if (stats.used_dirs >= best_ndir)
> >>> --
> >>> 1.7.1
> >>>
> >>> Regards,
> >>> Lokesh
> >>>
> >>> On Sun, Jan 03, 2016 at 11:34:51AM -0700, Andreas Dilger wrote:
> >>>> On Dec 26, 2015, at 01:41, Lokesh Jaliminche <[email protected]> wrote:
> >>>>>
> >>>>>
> >>>>> From 5199982d29ff291b181c25af63a22d6f2d6d3f7b Mon Sep 17 00:00:00 2001
> >>>>> From: Lokesh Nagappa Jaliminche <[email protected]>
> >>>>> Date: Sat, 26 Dec 2015 13:19:36 +0530
> >>>>> Subject: [PATCH v3] ext4: optimizing group search for inode allocation
> >>>>>
> >>>>> Added a check at the start of group search loop to
> >>>>> avoid looping unecessarily in case of empty group.
> >>>>> This also allow group search to jump directly to
> >>>>> "found_flex_bg" with "stats" and "group" already set,
> >>>>> so there is no need to go through the extra steps of
> >>>>> setting "best_desc" and "best_group" and then break
> >>>>> out of the loop just to set "stats" and "group" again.
> >>>>>
> >>>>> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
> >>>>> ---
> >>>>> fs/ext4/ialloc.c | 12 ++++++++++--
> >>>>> 1 files changed, 10 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> >>>>> index 1b8024d..16f94db 100644
> >>>>> --- a/fs/ext4/ialloc.c
> >>>>> +++ b/fs/ext4/ialloc.c
> >>>>> @@ -473,10 +473,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> >>>>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
> >>>>>
> >>>>> if (S_ISDIR(mode) &&
> >>>>> - ((parent == d_inode(sb->s_root)) ||
> >>>>> - (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> >>>>> + ((parent == d_inode(sb->s_root)) ||
> >>>>> + (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> >>>>
> >>>> I don't understand why you are changing the indentation here?
> >>>>
> >>>>> + unsigned int inodes_per_flex_group;
> >>>>> + unsigned long int clusters_per_flex_group;
> >>>>> int best_ndir = inodes_per_group;
> >>>>> int ret = -1;
> >>>>> + inodes_per_flex_group = inodes_per_group * flex_size;
> >>>>> + clusters_per_flex_group = sbi->s_clusters_per_group * flex_size;
> >>>>
> >>>> Have you actually tested if this code works? When I was looking at this
> >>>> code I was accidentally looking at the ext2 version, which only deals with
> >>>> individual groups, and not flex groups. I don't think that a flex group
> >>>> can actually be completely empty, since it will always contain at least
> >>>> an inode table and some bitmaps.
> >>>>
> >>>> This calculation needs to take this into account or it will just be
> >>>> overhead and not an optimization. Something like:
> >>>>
> >>>> /* account for inode table and allocation bitmaps */
> >>>> clusters_per_flex_group = sbi->s_clusters_per_group * flex_size -
> >>>> ((inodes_per_flex_group * EXT4_INODE_SIZE(sb) +
> >>>> (flex_size << sb->s_blocksize_bits) * 2 ) >>
> >>>> EXT4_SB(sb)->s_cluster_bits;
> >>>>
> >>>> It would be worthwhile to print this out and verify that there are actually
> >>>> groups with this many free blocks in a newly formatted filesystem.
> >>>>
> >>>> Cheers, Andreas
> >>>>
> >>>>> if (qstr) {
> >>>>> hinfo.hash_version = DX_HASH_HALF_MD4;
> >>>>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> >>>>> for (i = 0; i < ngroups; i++) {
> >>>>> g = (parent_group + i) % ngroups;
> >>>>> get_orlov_stats(sb, g, flex_size, &stats);
> >>>>> + /* the group can't get any better than empty */
> >>>>> + if (inodes_per_flex_group == stats.free_inodes &&
> >>>>> + clusters_per_flex_group == stats.free_clusters)
> >>>>> + goto found_flex_bg;
> >>>>> if (!stats.free_inodes)
> >>>>> continue;
> >>>>> if (stats.used_dirs >= best_ndir)
> >>>>> --
> >>>>> 1.7.1
> >>>>>
> >>>>>
> >>>>>> On Mon, Dec 21, 2015 at 10:27:37AM -0700, Andreas Dilger wrote:
> >>>>>>> On Dec 18, 2015, at 4:32 PM, lokesh jaliminche <[email protected]> wrote:
> >>>>>>>
> >>>>>>> From 9e09fef78b2fa552c883bf8124af873abfde0805 Mon Sep 17 00:00:00 2001
> >>>>>>> From: Lokesh Nagappa Jaliminche <[email protected]>
> >>>>>>
> >>>>>> In the patch summary there is a typo - s/serch/search/
> >>>>>>
> >>>>>> Also, it appears your email client has mangled the whitespace in the
> >>>>>> patch. Please use "git send-email" to send your patch to the list:
> >>>>>>
> >>>>>> git send-email --to [email protected] --cc [email protected] HEAD~1
> >>>>>>
> >>>>>> so that it arrives intact. You probably need to set it up in ~/.gitconfig:
> >>>>>>
> >>>>>> [sendemail]
> >>>>>> confirm = compose
> >>>>>> smtpdomain = {your client hostname}
> >>>>>> smtpserver = {your SMTP server hostname}
> >>>>>>
> >>>>>> Cheers, Andreas
> >>>>>>
> >>>>>>> Added a check at the start of group search loop to
> >>>>>>> avoid looping unecessarily in case of empty group.
> >>>>>>> This also allow group search to jump directly to
> >>>>>>> "found_flex_bg" with "stats" and "group" already set,
> >>>>>>> so there is no need to go through the extra steps of
> >>>>>>> setting "best_desc" and "best_group" and then break
> >>>>>>> out of the loop just to set "stats" and "group" again.
> >>>>>>>
> >>>>>>> Signed-off-by: Lokesh N Jaliminche <[email protected]>
> >>>>>>> ---
> >>>>>>> fs/ext4/ialloc.c | 8 ++++++++
> >>>>>>> 1 files changed, 8 insertions(+), 0 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> >>>>>>> index 1b8024d..588bf8e 100644
> >>>>>>> --- a/fs/ext4/ialloc.c
> >>>>>>> +++ b/fs/ext4/ialloc.c
> >>>>>>> @@ -446,6 +446,8 @@ static int find_group_orlov(struct super_block
> >>>>>>> *sb, struct inode *parent,
> >>>>>>> struct ext4_sb_info *sbi = EXT4_SB(sb);
> >>>>>>> ext4_group_t real_ngroups = ext4_get_groups_count(sb);
> >>>>>>> int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
> >>>>>>> + unsigned int inodes_per_flex_group;
> >>>>>>> + long unsigned int blocks_per_clustre;
> >>>>>>> unsigned int freei, avefreei, grp_free;
> >>>>>>> ext4_fsblk_t freeb, avefreec;
> >>>>>>> unsigned int ndirs;
> >>>>>>> @@ -470,6 +472,8 @@ static int find_group_orlov(struct super_block
> >>>>>>> *sb, struct inode *parent,
> >>>>>>> percpu_counter_read_positive(&sbi->s_freeclusters_counter));
> >>>>>>> avefreec = freeb;
> >>>>>>> do_div(avefreec, ngroups);
> >>>>>>> + inodes_per_flex_group = inodes_per_group * flex_size;
> >>>>>>> + blocks_per_clustre = sbi->s_blocks_per_group * flex_size;
> >>>>>>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
> >>>>>>>
> >>>>>>> if (S_ISDIR(mode) &&
> >>>>>>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block
> >>>>>>> *sb, struct inode *parent,
> >>>>>>> for (i = 0; i < ngroups; i++) {
> >>>>>>> g = (parent_group + i) % ngroups;
> >>>>>>> get_orlov_stats(sb, g, flex_size, &stats);
> >>>>>>> + /* the group can't get any better than empty */
> >>>>>>> + if (inodes_per_flex_group == stats.free_inodes &&
> >>>>>>> + blocks_per_clustre == stats.free_clusters)
> >>>>>>> + goto found_flex_bg;
> >>>>>>> if (!stats.free_inodes)
> >>>>>>> continue;
> >>>>>>> if (stats.used_dirs >= best_ndir)
> >>>>>>> --
> >>>>>>> 1.7.1
> >>>>>>> <0001-EXT4-optimizing-group-serch-for-inode-allocation.patch>
> >>>>>>
> >>>>>>
> >>>>>> Cheers, Andreas
> >>>>>
> >>>>>
> >>>>> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
> >>> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
> >>
> >>
> >> Cheers, Andreas
> >>
> >>
> >>
> >>
> >>
> >
> >
> > <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
>
>
> Cheers, Andreas
>
>
>
>
>



Attachments:
(No filename) (21.60 kB)
0001-ext4-optimizing-group-serch-for-inode-allocation.patch (1.88 kB)
Download all attachments

2016-01-19 07:25:03

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH v6] EXT4: optimizing group search for inode allocation


> On Jan 18, 2016, at 11:08 PM, Lokesh Jaliminche <[email protected]> wrote:
>
> From: Lokesh Nagappa Jaliminche <[email protected]>
> Subject: [PATCH] ext4: optimizing group serch for inode allocation
>
> Added a check at the start of group search loop to
> avoid looping unecessarily in case of empty group.
> This also allow group search to jump directly to
> "found_flex_bg" with "stats" and "group" already set,
> so there is no need to go through the extra steps of
> setting "best_desc" , "best_group" and then break
> out of the loop just to set "stats" and "group" again.
>
> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>

Thanks for sticking with the patch. This one looks good.

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> fs/ext4/ialloc.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 1b8024d..3463d30 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -477,6 +477,12 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
> int best_ndir = inodes_per_group;
> int ret = -1;
> + /* Maximum usable blocks per group as some blocks are used
> + * for inode tables and allocation bitmaps */
> + unsigned long long max_blocks_per_flex =
> + ((sbi->s_blocks_per_group -
> + (sbi->s_itb_per_group + 2)) *
> + flex_size) >> sbi->s_cluster_bits;
>
> if (qstr) {
> hinfo.hash_version = DX_HASH_HALF_MD4;
> @@ -489,6 +495,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> for (i = 0; i < ngroups; i++) {
> g = (parent_group + i) % ngroups;
> get_orlov_stats(sb, g, flex_size, &stats);
> + /* can't get better group than empty group */
> + if (max_blocks_per_flex == stats.free_clusters &&
> + (inodes_per_group * flex_size) == stats.free_inodes)
> + goto found_flex_bg;
> if (!stats.free_inodes)
> continue;
> if (stats.used_dirs >= best_ndir)
> --
> 1.7.1
>
>
>
> On Mon, Jan 18, 2016 at 02:50:48PM -0700, Andreas Dilger wrote:
>> On Jan 16, 2016, at 4:02 PM, Lokesh Jaliminche <[email protected]> wrote:
>>>
>>> From: Lokesh Nagappa Jaliminche <[email protected]>
>>> Date: Thu, 7 Jan 2016 05:12:20 +0530
>>> Subject: [PATCH v5] ext4: optimizing group serch for inode allocation
>>>
>>> Added a check at the start of group search loop to
>>> avoid looping unecessarily in case of empty group.
>>> This also allow group search to jump directly to
>>> "found_flex_bg" with "stats" and "group" already set,
>>> so there is no need to go through the extra steps of
>>> setting "best_desc" , "best_group" and then break
>>> out of the loop just to set "stats" and "group" again.
>>
>> This looks very good, just one minor potential issue below.
>>
>>> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
>>> ---
>>> fs/ext4/ialloc.c | 10 ++++++++++
>>> 1 files changed, 10 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>>> index 1b8024d..d2835cc 100644
>>> --- a/fs/ext4/ialloc.c
>>> +++ b/fs/ext4/ialloc.c
>>> @@ -477,6 +477,12 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>> (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
>>> int best_ndir = inodes_per_group;
>>> int ret = -1;
>>> + /* Maximum usable blocks per group as some blocks are used
>>> + * for inode tables and allocation bitmaps */
>>> + unsigned int max_blocks_per_flex =
>>
>> This should be an unsigned long long or __u64, since the max_blocks_per_flex
>> calculation may overflow a 32-bit value on filesystems with larger block_size
>> and a large value for flex_size (larger than 2^13 for a 64KB blocksize).
>>
>> The "stats.free_clusters" is already a __u64 for this reason.
>>
>>> + ((sbi->s_blocks_per_group -
>>> + (sbi->s_itb_per_group + 2)) *
>>> + flex_size) >> sbi->s_cluster_bits;
>>>
>>> if (qstr) {
>>> hinfo.hash_version = DX_HASH_HALF_MD4;
>>> @@ -489,6 +495,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>> for (i = 0; i < ngroups; i++) {
>>> g = (parent_group + i) % ngroups;
>>> get_orlov_stats(sb, g, flex_size, &stats);
>>> + /* can't get better group than empty group */
>>> + if (max_blocks_per_flex == stats.free_clusters &&
>>> + (inodes_per_group * flex_size) == stats.free_inodes)
>>> + goto found_flex_bg;
>>> if (!stats.free_inodes)
>>> continue;
>>> if (stats.used_dirs >= best_ndir)
>>> --
>>> 1.7.1
>>>
>>>
>>>
>>> On Mon, Jan 11, 2016 at 02:13:46PM -0700, Andreas Dilger wrote:
>>>> On Jan 6, 2016, at 5:46 PM, Lokesh Jaliminche <[email protected]> wrote:
>>>>>
>>>>> you are right flex groups cannot be completely empty as some
>>>>> blocks are used for inode table and allocation bitmaps. I
>>>>> missed that point. I have corrected that. Also I have tested
>>>>> and validated this patch by putting some traces it works!
>>>>
>>>> Good news. Thanks for updating the patch and testing it.
>>>> More comments below.
>>>>
>>>>> Here are corresponding logs:
>>>>> commands:
>>>>> =========
>>>>> mkfs.ext4 -g 10000 -G 2 /dev/sdc
>>>>> mount -t ext4 /dev/sdc /mnt/test_ext4/
>>>>> cd /mnt/test_ext4
>>>>> mkdir 1
>>>>> cd 1
>>>>>
>>>>> logs:
>>>>> =====
>>>>> Jan 7 03:36:45 root kernel: [ 64.792939] max_usable_blocks_per_flex_group = [19692]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_clusters = [19264]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792939] Inodes_per_flex_group = [4864]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_inodes = [4853]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792950] max_usable_blocks_per_flex_group = [19692]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_clusters = [19481]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792950] Inodes_per_flex_group = [4864]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_inodes = [4864]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792958] max_usable_blocks_per_flex_group = [19692]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_clusters = [19481]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792958] Inodes_per_flex_group = [4864]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_inodes = [4864]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792965] max_usable_blocks_per_flex_group = [19692]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_clusters = [19481]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792965] Inodes_per_flex_group = [4864]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_inodes = [4864]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792972] max_usable_blocks_per_flex_group = [19692]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_clusters = [19481]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792972] Inodes_per_flex_group = [4864]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_inodes = [4864]
>>>>> Jan 7 03:36:45 root kernel: [ 64.792979] max_usable_blocks_per_flex_group = [19692]<<<<<<<<<<<
>>>>> Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_clusters = [19692]<<<<<<<<<<<
>>>>> Jan 7 03:36:45 root kernel: [ 64.792979] Inodes_per_flex_group = [4864]<<<<<<<<<<<<
>>>>> Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_inodes = [4864]<<<<<<<<<<<<
>>>>> Jan 7 03:36:45 root kernel: [ 64.792985] found_flex_bg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>condition satisfied
>>>>>
>>>>>
>>>>> [Patch v4]:
>>>>> ==========
>>>>> From: Lokesh Nagappa Jaliminche <[email protected]>
>>>>> Date: Thu, 7 Jan 2016 05:12:20 +0530
>>>>> Subject: [PATCH] ext4: optimizing group serch for inode allocation
>>>>>
>>>>> Added a check at the start of group search loop to
>>>>> avoid looping unecessarily in case of empty group.
>>>>> This also allow group search to jump directly to
>>>>> "found_flex_bg" with "stats" and "group" already set,
>>>>> so there is no need to go through the extra steps of
>>>>> setting "best_desc" , "best_group" and then break
>>>>> out of the loop just to set "stats" and "group" again.
>>>>>
>>>>> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
>>>>> ---
>>>>> fs/ext4/ext4.h | 2 ++
>>>>> fs/ext4/ialloc.c | 21 +++++++++++++++++++--
>>>>> 2 files changed, 21 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>>>>> index cc7ca4e..fc48f9a 100644
>>>>> --- a/fs/ext4/ext4.h
>>>>> +++ b/fs/ext4/ext4.h
>>>>> @@ -326,6 +326,8 @@ struct flex_groups {
>>>>> #define EXT4_MAX_DESC_SIZE EXT4_MIN_BLOCK_SIZE
>>>>> #define EXT4_DESC_SIZE(s) (EXT4_SB(s)->s_desc_size)
>>>>> #ifdef __KERNEL__
>>>>> +# define EXT4_INODE_TABLE_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_itb_per_group)
>>>>> +# define EXT4_BLOCKS_PER_CLUSTER(s) (EXT4_SB(s)->s_cluster_ratio)
>>>>
>>>> I don't think there is much value to these macros. They aren't much
>>>> shorter than the actual variable access, and they don't provide any
>>>> improved clarity for the reader.
>>>>
>>>> I think they were introduced many years ago so that this header could
>>>> be shared between the kernel and e2fsprogs, but that is no longer true
>>>> so I don't think there is a need to add new ones.
>>>>
>>>> Ted, any opinion on this?
>>>>
>>>>> # define EXT4_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_blocks_per_group)
>>>>> # define EXT4_CLUSTERS_PER_GROUP(s) (EXT4_SB(s)->s_clusters_per_group)
>>>>> # define EXT4_DESC_PER_BLOCK(s) (EXT4_SB(s)->s_desc_per_block)
>>>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>>>>> index 1b8024d..dc66ad8 100644
>>>>> --- a/fs/ext4/ialloc.c
>>>>> +++ b/fs/ext4/ialloc.c
>>>>> @@ -463,7 +463,6 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>>>> sbi->s_log_groups_per_flex;
>>>>> parent_group >>= sbi->s_log_groups_per_flex;
>>>>> }
>>>>> -
>>>>> freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
>>>>> avefreei = freei / ngroups;
>>>>
>>>> Not sure why this line is being deleted?
>>>>
>>>>> @@ -477,7 +476,20 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>>>> (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
>>>>> int best_ndir = inodes_per_group;
>>>>> int ret = -1;
>>>>> -
>>>>> + /* blocks used for inode table and allocation bitmaps */
>>>>> + unsigned int max_usable_blocks_per_flex_group;
>>>>
>>>> This could be shorter and still clear, like "max_blocks_per_flex".
>>>>
>>>>> + unsigned int blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
>>>>> + unsigned int inode_table_blocks_per_group = \
>>>>> + EXT4_INODE_TABLE_BLOCKS_PER_GROUP(sb);
>>>>> + /* Number of blocks per cluster */
>>>>> + unsigned int cluster_ratio = EXT4_BLOCKS_PER_CLUSTER(sb);
>>>>
>>>> There also isn't much value to making local variables that hold these
>>>> same values again. It increases stack space for little benefit. They
>>>> are only accessed once anyway.
>>>>
>>>>> + unsigned int inodes_per_flex_group = inodes_per_group * \
>>>>> + flex_size;
>>>>> + max_usable_blocks_per_flex_group = \
>>>>> + ((blocks_per_group*flex_size) - \
>>>>
>>>> Spaces around that '*'.
>>>>
>>>>> + (inode_table_blocks_per_group * \
>>>>> + flex_size + 2 * flex_size)) / \
>>>>
>>>> This is C and not a CPP macro, so no need for linefeed escapes.
>>>>
>>>>> + cluster_ratio;
>>>>
>>>> This should be ">> EXT4_SB(sb)->s_cluster_bits" to avoid the divide.
>>>>
>>>>> if (qstr) {
>>>>> hinfo.hash_version = DX_HASH_HALF_MD4;
>>>>> hinfo.seed = sbi->s_hash_seed;
>>>>> @@ -489,6 +501,11 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>>>> for (i = 0; i < ngroups; i++) {
>>>>> g = (parent_group + i) % ngroups;
>>>>> get_orlov_stats(sb, g, flex_size, &stats);
>>>>> + /* can't get better group than empty group */
>>>>> + if (max_usable_blocks_per_flex_group == \
>>>>> + stats.free_clusters && \
>>>>
>>>> No need for linefeed escapes.
>>>>
>>>>> + inodes_per_flex_group == stats.free_inodes)
>>>>
>>>> Align continued line after '(' from the "if (" line.
>>>>
>>>>> + goto found_flex_bg;
>>>>
>>>> This is indented too much. It will work properly when you fix the
>>>> previous line.
>>>>
>>>>> if (!stats.free_inodes)
>>>>> continue;
>>>>> if (stats.used_dirs >= best_ndir)
>>>>> --
>>>>> 1.7.1
>>>>>
>>>>> Regards,
>>>>> Lokesh
>>>>>
>>>>> On Sun, Jan 03, 2016 at 11:34:51AM -0700, Andreas Dilger wrote:
>>>>>> On Dec 26, 2015, at 01:41, Lokesh Jaliminche <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> From 5199982d29ff291b181c25af63a22d6f2d6d3f7b Mon Sep 17 00:00:00 2001
>>>>>>> From: Lokesh Nagappa Jaliminche <[email protected]>
>>>>>>> Date: Sat, 26 Dec 2015 13:19:36 +0530
>>>>>>> Subject: [PATCH v3] ext4: optimizing group search for inode allocation
>>>>>>>
>>>>>>> Added a check at the start of group search loop to
>>>>>>> avoid looping unecessarily in case of empty group.
>>>>>>> This also allow group search to jump directly to
>>>>>>> "found_flex_bg" with "stats" and "group" already set,
>>>>>>> so there is no need to go through the extra steps of
>>>>>>> setting "best_desc" and "best_group" and then break
>>>>>>> out of the loop just to set "stats" and "group" again.
>>>>>>>
>>>>>>> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>
>>>>>>> ---
>>>>>>> fs/ext4/ialloc.c | 12 ++++++++++--
>>>>>>> 1 files changed, 10 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>>>>>>> index 1b8024d..16f94db 100644
>>>>>>> --- a/fs/ext4/ialloc.c
>>>>>>> +++ b/fs/ext4/ialloc.c
>>>>>>> @@ -473,10 +473,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>>>>>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
>>>>>>>
>>>>>>> if (S_ISDIR(mode) &&
>>>>>>> - ((parent == d_inode(sb->s_root)) ||
>>>>>>> - (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
>>>>>>> + ((parent == d_inode(sb->s_root)) ||
>>>>>>> + (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
>>>>>>
>>>>>> I don't understand why you are changing the indentation here?
>>>>>>
>>>>>>> + unsigned int inodes_per_flex_group;
>>>>>>> + unsigned long int clusters_per_flex_group;
>>>>>>> int best_ndir = inodes_per_group;
>>>>>>> int ret = -1;
>>>>>>> + inodes_per_flex_group = inodes_per_group * flex_size;
>>>>>>> + clusters_per_flex_group = sbi->s_clusters_per_group * flex_size;
>>>>>>
>>>>>> Have you actually tested if this code works? When I was looking at this
>>>>>> code I was accidentally looking at the ext2 version, which only deals with
>>>>>> individual groups, and not flex groups. I don't think that a flex group
>>>>>> can actually be completely empty, since it will always contain at least
>>>>>> an inode table and some bitmaps.
>>>>>>
>>>>>> This calculation needs to take this into account or it will just be
>>>>>> overhead and not an optimization. Something like:
>>>>>>
>>>>>> /* account for inode table and allocation bitmaps */
>>>>>> clusters_per_flex_group = sbi->s_clusters_per_group * flex_size -
>>>>>> ((inodes_per_flex_group * EXT4_INODE_SIZE(sb) +
>>>>>> (flex_size << sb->s_blocksize_bits) * 2 ) >>
>>>>>> EXT4_SB(sb)->s_cluster_bits;
>>>>>>
>>>>>> It would be worthwhile to print this out and verify that there are actually
>>>>>> groups with this many free blocks in a newly formatted filesystem.
>>>>>>
>>>>>> Cheers, Andreas
>>>>>>
>>>>>>> if (qstr) {
>>>>>>> hinfo.hash_version = DX_HASH_HALF_MD4;
>>>>>>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
>>>>>>> for (i = 0; i < ngroups; i++) {
>>>>>>> g = (parent_group + i) % ngroups;
>>>>>>> get_orlov_stats(sb, g, flex_size, &stats);
>>>>>>> + /* the group can't get any better than empty */
>>>>>>> + if (inodes_per_flex_group == stats.free_inodes &&
>>>>>>> + clusters_per_flex_group == stats.free_clusters)
>>>>>>> + goto found_flex_bg;
>>>>>>> if (!stats.free_inodes)
>>>>>>> continue;
>>>>>>> if (stats.used_dirs >= best_ndir)
>>>>>>> --
>>>>>>> 1.7.1
>>>>>>>
>>>>>>>
>>>>>>>> On Mon, Dec 21, 2015 at 10:27:37AM -0700, Andreas Dilger wrote:
>>>>>>>>> On Dec 18, 2015, at 4:32 PM, lokesh jaliminche <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> From 9e09fef78b2fa552c883bf8124af873abfde0805 Mon Sep 17 00:00:00 2001
>>>>>>>>> From: Lokesh Nagappa Jaliminche <[email protected]>
>>>>>>>>
>>>>>>>> In the patch summary there is a typo - s/serch/search/
>>>>>>>>
>>>>>>>> Also, it appears your email client has mangled the whitespace in the
>>>>>>>> patch. Please use "git send-email" to send your patch to the list:
>>>>>>>>
>>>>>>>> git send-email --to [email protected] --cc [email protected] HEAD~1
>>>>>>>>
>>>>>>>> so that it arrives intact. You probably need to set it up in ~/.gitconfig:
>>>>>>>>
>>>>>>>> [sendemail]
>>>>>>>> confirm = compose
>>>>>>>> smtpdomain = {your client hostname}
>>>>>>>> smtpserver = {your SMTP server hostname}
>>>>>>>>
>>>>>>>> Cheers, Andreas
>>>>>>>>
>>>>>>>>> Added a check at the start of group search loop to
>>>>>>>>> avoid looping unecessarily in case of empty group.
>>>>>>>>> This also allow group search to jump directly to
>>>>>>>>> "found_flex_bg" with "stats" and "group" already set,
>>>>>>>>> so there is no need to go through the extra steps of
>>>>>>>>> setting "best_desc" and "best_group" and then break
>>>>>>>>> out of the loop just to set "stats" and "group" again.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Lokesh N Jaliminche <[email protected]>
>>>>>>>>> ---
>>>>>>>>> fs/ext4/ialloc.c | 8 ++++++++
>>>>>>>>> 1 files changed, 8 insertions(+), 0 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>>>>>>>>> index 1b8024d..588bf8e 100644
>>>>>>>>> --- a/fs/ext4/ialloc.c
>>>>>>>>> +++ b/fs/ext4/ialloc.c
>>>>>>>>> @@ -446,6 +446,8 @@ static int find_group_orlov(struct super_block
>>>>>>>>> *sb, struct inode *parent,
>>>>>>>>> struct ext4_sb_info *sbi = EXT4_SB(sb);
>>>>>>>>> ext4_group_t real_ngroups = ext4_get_groups_count(sb);
>>>>>>>>> int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
>>>>>>>>> + unsigned int inodes_per_flex_group;
>>>>>>>>> + long unsigned int blocks_per_clustre;
>>>>>>>>> unsigned int freei, avefreei, grp_free;
>>>>>>>>> ext4_fsblk_t freeb, avefreec;
>>>>>>>>> unsigned int ndirs;
>>>>>>>>> @@ -470,6 +472,8 @@ static int find_group_orlov(struct super_block
>>>>>>>>> *sb, struct inode *parent,
>>>>>>>>> percpu_counter_read_positive(&sbi->s_freeclusters_counter));
>>>>>>>>> avefreec = freeb;
>>>>>>>>> do_div(avefreec, ngroups);
>>>>>>>>> + inodes_per_flex_group = inodes_per_group * flex_size;
>>>>>>>>> + blocks_per_clustre = sbi->s_blocks_per_group * flex_size;
>>>>>>>>> ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
>>>>>>>>>
>>>>>>>>> if (S_ISDIR(mode) &&
>>>>>>>>> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block
>>>>>>>>> *sb, struct inode *parent,
>>>>>>>>> for (i = 0; i < ngroups; i++) {
>>>>>>>>> g = (parent_group + i) % ngroups;
>>>>>>>>> get_orlov_stats(sb, g, flex_size, &stats);
>>>>>>>>> + /* the group can't get any better than empty */
>>>>>>>>> + if (inodes_per_flex_group == stats.free_inodes &&
>>>>>>>>> + blocks_per_clustre == stats.free_clusters)
>>>>>>>>> + goto found_flex_bg;
>>>>>>>>> if (!stats.free_inodes)
>>>>>>>>> continue;
>>>>>>>>> if (stats.used_dirs >= best_ndir)
>>>>>>>>> --
>>>>>>>>> 1.7.1
>>>>>>>>> <0001-EXT4-optimizing-group-serch-for-inode-allocation.patch>
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers, Andreas
>>>>>>>
>>>>>>>
>>>>>>> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
>>>>> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
>>>>
>>>>
>>>> Cheers, Andreas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
>>
>>
>> Cheers, Andreas
>>
>>
>>
>>
>>
>
>
> <0001-ext4-optimizing-group-serch-for-inode-allocation.patch>


Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-02-23 04:04:54

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v6] EXT4: optimizing group search for inode allocation

On Tue, Jan 19, 2016 at 11:38:12AM +0530, Lokesh Jaliminche wrote:
> From: Lokesh Nagappa Jaliminche <[email protected]>
> Subject: [PATCH] ext4: optimizing group serch for inode allocation
>
> Added a check at the start of group search loop to
> avoid looping unecessarily in case of empty group.
> This also allow group search to jump directly to
> "found_flex_bg" with "stats" and "group" already set,
> so there is no need to go through the extra steps of
> setting "best_desc" , "best_group" and then break
> out of the loop just to set "stats" and "group" again.
>
> Signed-off-by: Lokesh Nagappa Jaliminche <[email protected]>

Thanks, applied.

- Ted