2012-11-26 16:39:48

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 0/6 -v2] Optimize e2fsck for large file systems

As requested on the ext4 conference call, here is an updated release of
the e2fsck optimization patch series.


This patch series optimizes e2fsck for large file systems (where large
is 4TB or more). Previously checking a 4TB file system when it was
mostly full could take upwards of 3 minutes of wall clock time, and
e2fsck would be mostly CPU bound. With this patch series, the same 4TB
file system can now be checked in less than 50 seconds and approximately
20 seconds of userspace CPU time. (Previously, it was consuming over
8 times as much CPU time.)

The speed ups come in three places:

1) Reducing the CPU time while reading the block bitmap in from disk.
This was done by speeding up rb_set_bmap_range, and it significantly
improves e2fsck's pass 5 operation.

2) Reducing the CPU time in e2fsck pass1 while constructing the
block_found_map (which is the in-core block allocation bitmap as
found by interating over all of the inodes).

3) Further speed up e2fsck's pass 5 by comparing the block allocation
bitmap one bitmap block at a time, instead of a bit-at-a-time.

Before....

Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 916k/31500k (719k/198k), time: 96.48/81.26/ 0.10
Pass 1: I/O read: 8MB, write: 0MB, rate: 0.08MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 916k/62064k (704k/212k), time: 0.46/ 0.01/ 0.03
Pass 2: I/O read: 4MB, write: 0MB, rate: 8.72MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 916k/62064k (705k/212k), time: 97.56/81.88/ 0.13
Pass 3A: Memory used: 916k/62064k (736k/181k), time: 0.00/ 0.00/ 0.00
Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 3: Memory used: 916k/62064k (693k/223k), time: 0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 1257.86MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 916k/936k (602k/315k), time: 6.59/ 6.55/ 0.02
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 1048k/936k (569k/480k), time: 93.39/80.92/ 0.69
Pass 5: I/O read: 117MB, write: 0MB, rate: 1.25MB/s
Memory used: 1048k/936k (569k/480k), time: 197.58/169.36/ 0.84
I/O read: 129MB, write: 0MB, rate: 0.65MB/s
169.36user 0.90system 3:17.66elapsed 86%CPU (0avgtext+0avgdata 254192maxresident)k
0inputs+0outputs (12major+16066minor)pagefaults 0swaps

.... and after....

Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 916k/31500k (719k/198k), time: 17.83/ 2.79/ 0.05
Pass 1: I/O read: 8MB, write: 0MB, rate: 0.45MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 916k/62064k (704k/212k), time: 0.45/ 0.01/ 0.01
Pass 2: I/O read: 4MB, write: 0MB, rate: 8.82MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 916k/62064k (704k/212k), time: 18.89/ 3.39/ 0.07
Pass 3A: Memory used: 916k/62064k (735k/181k), time: 0.00/ 0.00/ 0.00
Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 3: Memory used: 916k/62064k (693k/224k), time: 0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 1265.82MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 916k/936k (601k/315k), time: 5.69/ 5.67/ 0.00
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 1048k/936k (569k/480k), time: 22.76/10.49/ 0.53
Pass 5: I/O read: 117MB, write: 0MB, rate: 5.14MB/s
Memory used: 1048k/936k (569k/480k), time: 47.38/19.57/ 0.60
I/O read: 129MB, write: 0MB, rate: 2.72MB/s

For slower CPU's (i.e., bookshelf NAS servers with underpowered, wimpy
ARM processors) or for larger RAID arrays, the speed ups would of course
be even better.


Theodore Ts'o (6):
libext2fs: optimize rb_set_bmap_range()
e2fsck: optimize pass1 for CPU time
libext2fs: add ext2fs_bitcount() function
libext2fs: optimize rb_get_bmap_range()
libext2fs: optimize rb_get_bmap_range() for mostly allocated bmaps
e2fsck: optimize pass 5 for CPU utilization

e2fsck/pass1.c | 18 ++++++++-
e2fsck/pass5.c | 55 ++++++++++++++++++++++++-
lib/ext2fs/bitops.c | 35 ++++++++++++++++
lib/ext2fs/bitops.h | 1 +
lib/ext2fs/blkmap64_rb.c | 76 +++++++++++++++++++++++++----------
lib/ext2fs/tst_bitmaps.c | 1 +
lib/ext2fs/tst_bitmaps_cmds | 42 ++++++++++++++++++++
lib/ext2fs/tst_bitmaps_exp | 97 +++++++++++++++++++++++++++++++++++++++++++++
8 files changed, 301 insertions(+), 24 deletions(-)

--
1.7.12.rc0.22.gcdd159b



2012-11-26 16:39:44

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 1/6 -v2] libext2fs: optimize rb_set_bmap_range()

This speeds up reading bitmaps from disk for very large (and full)
disks by significant amounts (i.e., up to two CPU minutes for a 4T
file system).

Addresses-Google-Bug: #7534813

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
lib/ext2fs/blkmap64_rb.c | 32 +++++++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/lib/ext2fs/blkmap64_rb.c b/lib/ext2fs/blkmap64_rb.c
index a42eda1..0fc7c57 100644
--- a/lib/ext2fs/blkmap64_rb.c
+++ b/lib/ext2fs/blkmap64_rb.c
@@ -674,16 +674,42 @@ static errcode_t rb_set_bmap_range(ext2fs_generic_bitmap bitmap,
__u64 start, size_t num, void *in)
{
struct ext2fs_rb_private *bp;
+ unsigned char *cp = in;
size_t i;
+ int first_set = -1;
int ret;

bp = (struct ext2fs_rb_private *) bitmap->private;

for (i = 0; i < num; i++) {
- ret = ext2fs_test_bit(i, in);
- if (ret)
- rb_insert_extent(start + i - bitmap->start, 1, bp);
+ if (i & 7 == 0) {
+ unsigned char c = cp[i/8];
+ if (c == 0xFF) {
+ if (first_set == -1)
+ first_set = i;
+ i += 7;
+ continue;
+ }
+ if ((c == 0x00) && (first_set == -1)) {
+ i += 7;
+ continue;
+ }
+ }
+ if (ext2fs_test_bit(i, in)) {
+ if (first_set == -1)
+ first_set = i;
+ continue;
+ }
+ if (first_set == -1)
+ continue;
+
+ rb_insert_extent(start + first_set - bitmap->start,
+ i - first_set, bp);
+ first_set = -1;
}
+ if (first_set != -1)
+ rb_insert_extent(start + first_set - bitmap->start,
+ num - first_set, bp);

return 0;
}
--
1.7.12.rc0.22.gcdd159b


2012-11-26 16:39:48

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 4/6 -v2] libext2fs: optimize rb_get_bmap_range()

This simplifies the rb_get_bmap_range() function and speeds it up for
the case where most of the bitmap is zero.

Signed-off-by: "Theodore Ts'o" <[email protected]>
Reviewed-by: Lukas Czerner <[email protected]>
---
lib/ext2fs/blkmap64_rb.c | 25 ++++++++-----------------
1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/lib/ext2fs/blkmap64_rb.c b/lib/ext2fs/blkmap64_rb.c
index 0fc7c57..eb56c85 100644
--- a/lib/ext2fs/blkmap64_rb.c
+++ b/lib/ext2fs/blkmap64_rb.c
@@ -741,32 +741,23 @@ static errcode_t rb_get_bmap_range(ext2fs_generic_bitmap bitmap,
break;
}

- pos = start;
+ memset(out, 0, (num + 7) >> 3);
+
for (; parent != NULL; parent = next) {
next = ext2fs_rb_next(parent);
ext = ext2fs_rb_entry(parent, struct bmap_rb_extent, node);

- while (((pos - start) < num) &&
- (pos < ext->start)) {
- ext2fs_fast_clear_bit64((pos - start), out);
- pos++;
- }
-
- if ((pos - start) >= num)
- return 0;
+ pos = ext->start;
+ if (pos < start)
+ pos = start;

- while (((pos - start) < num) &&
- (pos < (ext->start + ext->count))) {
+ while (pos < (ext->start + ext->count)) {
+ if ((pos - start) >= num)
+ return 0;
ext2fs_fast_set_bit64((pos - start), out);
pos++;
}
}
-
- while ((pos - start) < num) {
- ext2fs_fast_clear_bit64((pos - start), out);
- pos++;
- }

2012-11-26 16:39:48

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 3/6 -v3] libext2fs: add ext2fs_bitcount() function

This function efficiently counts the number of bits in a block of
memory.

Signed-off-by: "Theodore Ts'o" <[email protected]>
Reviewed-by: Lukas Czerner <[email protected]>
---
lib/ext2fs/bitops.c | 35 ++++++++++++++++
lib/ext2fs/bitops.h | 1 +
lib/ext2fs/tst_bitmaps.c | 1 +
lib/ext2fs/tst_bitmaps_cmds | 42 ++++++++++++++++++++
lib/ext2fs/tst_bitmaps_exp | 97 +++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 176 insertions(+)

diff --git a/lib/ext2fs/bitops.c b/lib/ext2fs/bitops.c
index 9322a35..bd9e768 100644
--- a/lib/ext2fs/bitops.c
+++ b/lib/ext2fs/bitops.c
@@ -116,3 +116,38 @@ int ext2fs_test_bit64(__u64 nr, const void * addr)
return (mask & *ADDR);
}

+static unsigned int popcount8(unsigned int w)
+{
+ unsigned int res = w - ((w >> 1) & 0x55);
+ res = (res & 0x33) + ((res >> 2) & 0x33);
+ return (res + (res >> 4)) & 0x0F;
+}
+
+static unsigned int popcount32(unsigned int w)
+{
+ unsigned int res = w - ((w >> 1) & 0x55555555);
+ res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
+ res = (res + (res >> 4)) & 0x0F0F0F0F;
+ res = res + (res >> 8);
+ return (res + (res >> 16)) & 0x000000FF;
+}
+
+unsigned int ext2fs_bitcount(const void *addr, unsigned int nbytes)
+{
+ const unsigned char *cp = addr;
+ const __u32 *p = addr;
+ unsigned int res = 0;
+
+ if ((((unsigned long) addr) & 3) == 0) {
+ while (nbytes > 4) {
+ res += popcount32(*p++);
+ nbytes -= 4;
+ }
+ cp = (const unsigned char *) p;
+ }
+ while (nbytes > 0) {
+ res += popcount8(*cp++);
+ nbytes--;
+ }
+ return res;
+}
diff --git a/lib/ext2fs/bitops.h b/lib/ext2fs/bitops.h
index 526870f..406999b 100644
--- a/lib/ext2fs/bitops.h
+++ b/lib/ext2fs/bitops.h
@@ -686,6 +686,7 @@ extern int ext2fs_test_bit(unsigned int nr, const void * addr);
extern int ext2fs_set_bit64(__u64 nr,void * addr);
extern int ext2fs_clear_bit64(__u64 nr, void * addr);
extern int ext2fs_test_bit64(__u64 nr, const void * addr);
+extern unsigned int ext2fs_bitcount(const void *addr, unsigned int nbytes);

#ifdef NO_INLINE_FUNCS
extern void ext2fs_fast_set_bit(unsigned int nr,void * addr);
diff --git a/lib/ext2fs/tst_bitmaps.c b/lib/ext2fs/tst_bitmaps.c
index 5da3693..2a76292 100644
--- a/lib/ext2fs/tst_bitmaps.c
+++ b/lib/ext2fs/tst_bitmaps.c
@@ -270,6 +270,7 @@ void dump_bitmap(ext2fs_generic_bitmap bmap, unsigned int start, unsigned num)
for (i=0; i < len; i++)
printf("%02x", buf[i]);
printf("\n");
+ printf("bits set: %u\n", ext2fs_bitcount(buf, len));
free(buf);
}

diff --git a/lib/ext2fs/tst_bitmaps_cmds b/lib/ext2fs/tst_bitmaps_cmds
index 9a4f3d0..31e2a60 100644
--- a/lib/ext2fs/tst_bitmaps_cmds
+++ b/lib/ext2fs/tst_bitmaps_cmds
@@ -53,5 +53,47 @@ cleari 5
testi 17
testi 6
testi 4
+clearb 7 12
+dump_bb
+setb 1
+dump_bb
+setb 2
+dump_bb
+setb 3
+dump_bb
+setb 4
+dump_bb
+setb 5
+dump_bb
+setb 6
+dump_bb
+setb 7
+dump_bb
+setb 8
+dump_bb
+setb 10
+setb 12
+setb 14
+setb 17
+setb 19
+setb 24
+setb 26
+setb 27
+setb 30
+setb 31
+setb 32
+setb 35
+setb 39
+setb 40
+setb 44
+setb 46
+setb 47
+setb 49
+setb 51
+setb 52
+clearb 2
+clearb 3
+clearb 7
+dump_bb
quit

diff --git a/lib/ext2fs/tst_bitmaps_exp b/lib/ext2fs/tst_bitmaps_exp
index 2d406ce..2d62b66 100644
--- a/lib/ext2fs/tst_bitmaps_exp
+++ b/lib/ext2fs/tst_bitmaps_exp
@@ -36,6 +36,7 @@ tst_bitmaps: testb 16
Block 16 is set
tst_bitmaps: dump_bb
block bitmap: 00f80000000000000000000000000000
+bits set: 5
tst_bitmaps: ffzb 11 16
First unmarked block is 11
tst_bitmaps: ffzb 12 16
@@ -64,6 +65,7 @@ tst_bitmaps: setb 12 7
Marking blocks 12 to 18
tst_bitmaps: dump_bb
block bitmap: 00f80300000000000000000000000000
+bits set: 7
tst_bitmaps: seti 2
Setting inode 2, was clear before
tst_bitmaps: seti 5
@@ -82,6 +84,7 @@ tst_bitmaps: testi 1
Inode 1 is clear
tst_bitmaps: dump_ib
inode bitmap: 1e000000
+bits set: 4
tst_bitmaps: ffzi 1 6
First unmarked inode is 1
tst_bitmaps: ffzi 2 5
@@ -110,5 +113,99 @@ tst_bitmaps: testi 6
Inode 6 is clear
tst_bitmaps: testi 4
Inode 4 is clear
+tst_bitmaps: clearb 7 12
+Clearing blocks 7 to 18
+tst_bitmaps: dump_bb
+block bitmap: 00000000000000000000000000000000
+bits set: 0
+tst_bitmaps: setb 1
+Setting block 1, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 01000000000000000000000000000000
+bits set: 1
+tst_bitmaps: setb 2
+Setting block 2, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 03000000000000000000000000000000
+bits set: 2
+tst_bitmaps: setb 3
+Setting block 3, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 07000000000000000000000000000000
+bits set: 3
+tst_bitmaps: setb 4
+Setting block 4, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 0f000000000000000000000000000000
+bits set: 4
+tst_bitmaps: setb 5
+Setting block 5, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 1f000000000000000000000000000000
+bits set: 5
+tst_bitmaps: setb 6
+Setting block 6, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 3f000000000000000000000000000000
+bits set: 6
+tst_bitmaps: setb 7
+Setting block 7, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 7f000000000000000000000000000000
+bits set: 7
+tst_bitmaps: setb 8
+Setting block 8, was clear before
+tst_bitmaps: dump_bb
+block bitmap: ff000000000000000000000000000000
+bits set: 8
+tst_bitmaps: setb 10
+Setting block 10, was clear before
+tst_bitmaps: setb 12
+Setting block 12, was clear before
+tst_bitmaps: setb 14
+Setting block 14, was clear before
+tst_bitmaps: setb 17
+Setting block 17, was clear before
+tst_bitmaps: setb 19
+Setting block 19, was clear before
+tst_bitmaps: setb 24
+Setting block 24, was clear before
+tst_bitmaps: setb 26
+Setting block 26, was clear before
+tst_bitmaps: setb 27
+Setting block 27, was clear before
+tst_bitmaps: setb 30
+Setting block 30, was clear before
+tst_bitmaps: setb 31
+Setting block 31, was clear before
+tst_bitmaps: setb 32
+Setting block 32, was clear before
+tst_bitmaps: setb 35
+Setting block 35, was clear before
+tst_bitmaps: setb 39
+Setting block 39, was clear before
+tst_bitmaps: setb 40
+Setting block 40, was clear before
+tst_bitmaps: setb 44
+Setting block 44, was clear before
+tst_bitmaps: setb 46
+Setting block 46, was clear before
+tst_bitmaps: setb 47
+Setting block 47, was clear before
+tst_bitmaps: setb 49
+Setting block 49, was clear before
+tst_bitmaps: setb 51
+Setting block 51, was clear before
+tst_bitmaps: setb 52
+Setting block 52, was clear before
+tst_bitmaps: clearb 2
+Clearing block 2, was set before
+tst_bitmaps: clearb 3
+Clearing block 3, was set before
+tst_bitmaps: clearb 7
+Clearing block 7, was set before
+tst_bitmaps: dump_bb
+block bitmap: b92a85e6c4680d000000000000000000
+bits set: 25
tst_bitmaps: quit
tst_bitmaps:
--
1.7.12.rc0.22.gcdd159b


2012-11-26 16:39:48

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 2/6 -v2] e2fsck: optimize pass1 for CPU time

Optimize e2fsck pass 1 by marking entire extents as being in use at a
time, instead of block by block. This optimization only works for
non-bigalloc file systems for now (it's tricky to handle bigalloc file
systems since this code is also responsible for dealing with blocks
that are not correctly aligned within a cluster). When the
optimization works, the CPU savings can be significant: ove a full CPU
minute for a mostly full 4T disk.

Addresses-Google-Bug: #7534813

Signed-off-by: "Theodore Ts'o" <[email protected]>
Reviewed-by: Lukas Czerner <[email protected]>
---
e2fsck/pass1.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 78fbe8d..cc00e0f 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -1432,6 +1432,16 @@ static _INLINE_ void mark_block_used(e2fsck_t ctx, blk64_t block)
}
}

+static _INLINE_ void mark_blocks_used(e2fsck_t ctx, blk64_t block,
+ unsigned int num)
+{
+ if (ext2fs_test_block_bitmap_range2(ctx->block_found_map, block, num))
+ ext2fs_mark_block_bitmap_range2(ctx->block_found_map, block, num);
+ else
+ while (num--)
+ mark_block_used(ctx, block++);
+}
+
/*
* Adjust the extended attribute block's reference counts at the end
* of pass 1, either by subtracting out references for EA blocks that
@@ -1867,11 +1877,15 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
goto failed_add_dir_block;
}
}
+ if (!ctx->fs->cluster_ratio_bits) {
+ mark_blocks_used(ctx, extent.e_pblk, extent.e_len);
+ pb->num_blocks += extent.e_len;
+ }
for (blk = extent.e_pblk, blockcnt = extent.e_lblk, i = 0;
i < extent.e_len;
blk++, blockcnt++, i++) {
- if (!(ctx->fs->cluster_ratio_bits &&
- pb->previous_block &&
+ if (ctx->fs->cluster_ratio_bits &&
+ !(pb->previous_block &&
(EXT2FS_B2C(ctx->fs, blk) ==
EXT2FS_B2C(ctx->fs, pb->previous_block)) &&
(blk & EXT2FS_CLUSTER_MASK(ctx->fs)) ==
--
1.7.12.rc0.22.gcdd159b


2012-11-26 16:45:52

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 5/6 -v2] libext2fs: optimize rb_get_bmap_range() for mostly allocated bmaps

This optimizies the CPU utilization of the rb_get_bmap_range()
function when most of the bitmap is allocated.

Signed-off-by: "Theodore Ts'o" <[email protected]>
Reviewed-by: Lukas Czerner <[email protected]>
---
lib/ext2fs/blkmap64_rb.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/lib/ext2fs/blkmap64_rb.c b/lib/ext2fs/blkmap64_rb.c
index eb56c85..7196e90 100644
--- a/lib/ext2fs/blkmap64_rb.c
+++ b/lib/ext2fs/blkmap64_rb.c
@@ -721,6 +721,7 @@ static errcode_t rb_get_bmap_range(ext2fs_generic_bitmap bitmap,
struct rb_node *parent = NULL, *next, **n;
struct ext2fs_rb_private *bp;
struct bmap_rb_extent *ext;
+ int count;
__u64 pos;

bp = (struct ext2fs_rb_private *) bitmap->private;
@@ -748,14 +749,32 @@ static errcode_t rb_get_bmap_range(ext2fs_generic_bitmap bitmap,
ext = ext2fs_rb_entry(parent, struct bmap_rb_extent, node);

pos = ext->start;
- if (pos < start)
+ count = ext->count;
+ if (pos >= start + num)
+ break;
+ if (pos < start) {
+ count -= start - pos;
+ if (count < 0)
+ continue;
pos = start;
-
- while (pos < (ext->start + ext->count)) {
- if ((pos - start) >= num)
- return 0;
+ }
+ if (pos + count > start + num)
+ count = start + num - pos;
+
+ while (count > 0) {
+ if ((count >= 8) &&
+ ((pos - start) % 8) == 0) {
+ int nbytes = count >> 3;
+ int offset = (pos - start) >> 3;
+
+ memset(out + offset, 0xFF, nbytes);
+ pos += nbytes << 3;
+ count -= nbytes << 3;
+ continue;
+ }
ext2fs_fast_set_bit64((pos - start), out);
pos++;
+ count--;
}
}
return 0;
--
1.7.12.rc0.22.gcdd159b


2012-11-26 16:45:56

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 6/6 -v2] e2fsck: optimize pass 5 for CPU utilization

Add a fast past optimization in e2fsck's pass 5 for the common case
where the block bitmap is correct. The optimization works by
extracting each block group's block allocation bitmap into a memory
buffer, and comparing it with the expected allocation bitmap using
memcmp(). If it matches, then we can just update the free block
counts and be on our way, and skip checking each bit individually.

Addresses-Google-Bug: #7534813

Signed-off-by: "Theodore Ts'o" <[email protected]>
Reviewed-by: Lukas Czerner <[email protected]>
---
e2fsck/pass5.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
index 8312fe0..5aff55c 100644
--- a/e2fsck/pass5.c
+++ b/e2fsck/pass5.c
@@ -212,6 +212,12 @@ static void check_block_bitmaps(e2fsck_t ctx)
int cmp_block = 0;
int redo_flag = 0;
blk64_t super_blk, old_desc_blk, new_desc_blk;
+ char *actual_buf, *bitmap_buf;
+
+ actual_buf = (char *) e2fsck_allocate_memory(ctx, fs->blocksize,
+ "actual bitmap buffer");
+ bitmap_buf = (char *) e2fsck_allocate_memory(ctx, fs->blocksize,
+ "bitmap block buffer");

clear_problem_context(&pctx);
free_array = (unsigned int *) e2fsck_allocate_memory(ctx,
@@ -259,11 +265,53 @@ redo_counts:
for (i = B2C(fs->super->s_first_data_block);
i < ext2fs_blocks_count(fs->super);
i += EXT2FS_CLUSTER_RATIO(fs)) {
+ int first_block_in_bg = (B2C(i) -
+ B2C(fs->super->s_first_data_block)) %
+ fs->super->s_clusters_per_group == 0;
+ int n, nbytes = fs->super->s_clusters_per_group / 8;
+
actual = ext2fs_fast_test_block_bitmap2(ctx->block_found_map, i);

+ /*
+ * Try to optimize pass5 by extracting a bitmap block
+ * as expected from what we have on disk, and then
+ * comparing the two. If they are identical, then
+ * update the free block counts and go on to the next
+ * block group. This is much faster than doing the
+ * individual bit-by-bit comparison. The one downside
+ * is that this doesn't work if we are asking e2fsck
+ * to do a discard operation.
+ */
+ if (!first_block_in_bg ||
+ (group == (int)fs->group_desc_count - 1) ||
+ (ctx->options & E2F_OPT_DISCARD))
+ goto no_optimize;
+
+ retval = ext2fs_get_block_bitmap_range2(ctx->block_found_map,
+ B2C(i), fs->super->s_clusters_per_group,
+ actual_buf);
+ if (retval)
+ goto no_optimize;
+ if (ext2fs_bg_flags_test(fs, group, EXT2_BG_BLOCK_UNINIT))
+ memset(bitmap_buf, 0, nbytes);
+ else {
+ retval = ext2fs_get_block_bitmap_range2(fs->block_map,
+ B2C(i), fs->super->s_clusters_per_group,
+ bitmap_buf);
+ if (retval)
+ goto no_optimize;
+ }
+ if (memcmp(actual_buf, bitmap_buf, nbytes) != 0)
+ goto no_optimize;
+ n = ext2fs_bitcount(actual_buf, nbytes);
+ group_free = fs->super->s_clusters_per_group - n;
+ free_blocks += group_free;
+ i += fs->super->s_clusters_per_group - 1;
+ goto next_group;
+ no_optimize:
+
if (skip_group) {
- if ((B2C(i) - B2C(fs->super->s_first_data_block)) %
- fs->super->s_clusters_per_group == 0) {
+ if (first_block_in_bg) {
super_blk = 0;
old_desc_blk = 0;
new_desc_blk = 0;
@@ -401,6 +449,7 @@ redo_counts:
if (!bitmap && i >= first_free)
e2fsck_discard_blocks(ctx, first_free,
(i - first_free) + 1);
+ next_group:
first_free = ext2fs_blocks_count(fs->super);

free_array[group] = group_free;
@@ -475,6 +524,8 @@ redo_counts:
}
errout:
ext2fs_free_mem(&free_array);
+ ext2fs_free_mem(&actual_buf);
+ ext2fs_free_mem(&bitmap_buf);
}

static void check_inode_bitmaps(e2fsck_t ctx)
--
1.7.12.rc0.22.gcdd159b


2012-11-26 23:17:46

by Zach Brown

[permalink] [raw]
Subject: Re: [PATCH 3/6 -v3] libext2fs: add ext2fs_bitcount() function

> This function efficiently counts the number of bits in a block of
> memory.

Would it be worth the annoying build- and run-time machinery to detect
and use the -msse4.2 __builtin_popcount() gcc intrinsic?

> +static unsigned int popcount32(unsigned int w)
> +{
> + unsigned int res = w - ((w >> 1) & 0x55555555);
> + res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
> + res = (res + (res >> 4)) & 0x0F0F0F0F;
> + res = res + (res >> 8);
> + return (res + (res >> 16)) & 0x000000FF;
> +}

40042b: 89 c2 mov %eax,%edx
400432: d1 ea shr %edx
400434: 81 e2 55 55 55 55 and $0x55555555,%edx
40043a: 29 d0 sub %edx,%eax
40043c: 89 c2 mov %eax,%edx
40043e: c1 e8 02 shr $0x2,%eax
400441: 81 e2 33 33 33 33 and $0x33333333,%edx
400447: 25 33 33 33 33 and $0x33333333,%eax
40044c: 01 d0 add %edx,%eax
40044e: 89 c2 mov %eax,%edx
400450: c1 ea 04 shr $0x4,%edx
400453: 01 d0 add %edx,%eax
400455: 25 0f 0f 0f 0f and $0xf0f0f0f,%eax
40045a: 89 c2 mov %eax,%edx
40045c: c1 ea 08 shr $0x8,%edx
40045f: 01 d0 add %edx,%eax
400461: 89 c6 mov %eax,%esi
400463: c1 ee 10 shr $0x10,%esi
400468: 0f b6 f0 movzbl %al,%esi

__builtin_popcount():

40047e: f3 0f b8 f0 popcnt %eax,%esi

- z

2012-11-27 01:45:07

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 3/6 -v3] libext2fs: add ext2fs_bitcount() function

On Mon, Nov 26, 2012 at 03:17:45PM -0800, Zach Brown wrote:
> > This function efficiently counts the number of bits in a block of
> > memory.
>
> Would it be worth the annoying build- and run-time machinery to detect
> and use the -msse4.2 __builtin_popcount() gcc intrinsic?

I thought about doing it, but I was in a bit of a hurry implementing
this patch set, and I wasn't even sure how to correctly implement the
build- and run-time machinery (i.e., detecting whether the gcc you're
compiling with supports __builtin_popcount, and implementing a
run-time fallback is the CPU doesn't support popcount instruction ---
which by the way isn't properly part of SSE 4.2; it has its own
separate CPUID bit, IIRC). Is there some userspace application
licensed under LGPLv2 which does this cleanly from which I could
borrow code?

I suppose I should first check and see how much difference it makes to
with a hard-coded use __builtin_popcnt(). If it makes a sufficiently
large improvement, it's probably worth the hair of implementing the
fallback machinery.

- Ted

2012-11-27 05:16:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 3/6 -v3] libext2fs: add ext2fs_bitcount() function

On Mon, Nov 26, 2012 at 08:45:05PM -0500, Theodore Ts'o wrote:
> I suppose I should first check and see how much difference it makes to
> with a hard-coded use __builtin_popcnt(). If it makes a sufficiently
> large improvement, it's probably worth the hair of implementing the
> fallback machinery.

I did some quick benchmarking, and the difference it makes when
checking 4TB's worth of bitmaps is negligble:

slow popcount: 0.2623
fast popcount: 0.0700

For a 128TB's worth of bitmaps, the time difference is:

slow popcount: 8.0185
fast popcount: 2.2066

I measured running e2fsck on an empty 128TB file system, and that took
202 CPU seconds (assuming all of the fs metadata blocks are in cache),
so with this optimization we would save at most 3%. (For comparison,
using an unmodified 1.42.6 e2fsck, it burned 392.7 CPU seconds.)

My conclusion is that using __builtin_popcnt() is a nice-to-have, and
if someone sends me patches I'll probably take them as a optimization,
but it's not super high priority for me.

- Ted

2012-11-27 07:16:21

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 1/6 -v2] libext2fs: optimize rb_set_bmap_range()

On Mon, 26 Nov 2012, Theodore Ts'o wrote:

> Date: Mon, 26 Nov 2012 11:39:35 -0500
> From: Theodore Ts'o <[email protected]>
> To: Ext4 Developers List <[email protected]>
> Cc: Theodore Ts'o <[email protected]>
> Subject: [PATCH 1/6 -v2] libext2fs: optimize rb_set_bmap_range()
>
> This speeds up reading bitmaps from disk for very large (and full)
> disks by significant amounts (i.e., up to two CPU minutes for a 4T
> file system).
>
> Addresses-Google-Bug: #7534813

Looks good.

Reviewed-by: Lukas Czerner <[email protected]>

>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> lib/ext2fs/blkmap64_rb.c | 32 +++++++++++++++++++++++++++++---
> 1 file changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/lib/ext2fs/blkmap64_rb.c b/lib/ext2fs/blkmap64_rb.c
> index a42eda1..0fc7c57 100644
> --- a/lib/ext2fs/blkmap64_rb.c
> +++ b/lib/ext2fs/blkmap64_rb.c
> @@ -674,16 +674,42 @@ static errcode_t rb_set_bmap_range(ext2fs_generic_bitmap bitmap,
> __u64 start, size_t num, void *in)
> {
> struct ext2fs_rb_private *bp;
> + unsigned char *cp = in;
> size_t i;
> + int first_set = -1;
> int ret;
>
> bp = (struct ext2fs_rb_private *) bitmap->private;
>
> for (i = 0; i < num; i++) {
> - ret = ext2fs_test_bit(i, in);
> - if (ret)
> - rb_insert_extent(start + i - bitmap->start, 1, bp);
> + if (i & 7 == 0) {
> + unsigned char c = cp[i/8];
> + if (c == 0xFF) {
> + if (first_set == -1)
> + first_set = i;
> + i += 7;
> + continue;
> + }
> + if ((c == 0x00) && (first_set == -1)) {
> + i += 7;
> + continue;
> + }
> + }
> + if (ext2fs_test_bit(i, in)) {
> + if (first_set == -1)
> + first_set = i;
> + continue;
> + }
> + if (first_set == -1)
> + continue;
> +
> + rb_insert_extent(start + first_set - bitmap->start,
> + i - first_set, bp);
> + first_set = -1;
> }
> + if (first_set != -1)
> + rb_insert_extent(start + first_set - bitmap->start,
> + num - first_set, bp);
>
> return 0;
> }
>

2012-11-27 17:50:24

by Zach Brown

[permalink] [raw]
Subject: Re: [PATCH 3/6 -v3] libext2fs: add ext2fs_bitcount() function

On Tue, Nov 27, 2012 at 12:16:17AM -0500, Theodore Ts'o wrote:
> I did some quick benchmarking, and the difference it makes when
> checking 4TB's worth of bitmaps is negligble:
>
> slow popcount: 0.2623
> fast popcount: 0.0700
>
> For a 128TB's worth of bitmaps, the time difference is:
>
> slow popcount: 8.0185
> fast popcount: 2.2066
>
> I measured running e2fsck on an empty 128TB file system, and that took
> 202 CPU seconds (assuming all of the fs metadata blocks are in cache),
> so with this optimization we would save at most 3%. (For comparison,
> using an unmodified 1.42.6 e2fsck, it burned 392.7 CPU seconds.)

Nice, thanks for taking the time to get numbers.

> My conclusion is that using __builtin_popcnt() is a nice-to-have, and
> if someone sends me patches I'll probably take them as a optimization,
> but it's not super high priority for me.

Agreed. I'll chuck it at the end of my fun-projects-some-day list as
well, but getting it right for all the platforms that e2fsprogs
supports.. meh :).

- z

2012-11-27 17:54:23

by Zach Brown

[permalink] [raw]
Subject: Re: [PATCH 3/6 -v3] libext2fs: add ext2fs_bitcount() function

On Mon, Nov 26, 2012 at 08:45:05PM -0500, Theodore Ts'o wrote:
> build- and run-time machinery (i.e., detecting whether the gcc you're
> compiling with supports __builtin_popcount, and implementing a
> run-time fallback is the CPU doesn't support popcount instruction ---
> which by the way isn't properly part of SSE 4.2; it has its own
> separate CPUID bit, IIRC).

*nod*

> Is there some userspace application licensed under LGPLv2 which does
> this cleanly from which I could borrow code?

Not that I know of, off the top of my head. I think I'd first check the
usual crypto suspects :).

- z

2012-11-27 19:37:58

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 3/6 -v3] libext2fs: add ext2fs_bitcount() function

On Tue, Nov 27, 2012 at 09:50:23AM -0800, Zach Brown wrote:
>
> Agreed. I'll chuck it at the end of my fun-projects-some-day list as
> well, but getting it right for all the platforms that e2fsprogs
> supports.. meh :).

It's not strictly necessary to get things right for all platforms; we
already have some accelerations using asm statements which only work
for one platform already --- although it's already the case that I
didn't bother to make 64-bit set/clear/test bit optimizations for x86,
mainly because I didn't think it was worth it, especially on modern
CPU's. (And with the red/black tree backend for bitmaps, the asm
bitops are even less important.)

- Ted