2009-08-10 03:24:01

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH, RFC -V2 0/4] mballoc patches for ext4

This patch series adds better debugging support for mballoc, but the
main goal was to improve how small files are allocated.

These patches could use some testing and benchmarking; I suspect we will
a slight increase the CPU required to read and write small files, but
hopefully it won't be significant enough to be significantly
noticeable. More seriously, the hueristics that try to detect lock
contention and so we go back to using per-cpu group preallocation
extents may need some tuning. We may also want to change the group
preallocation code so that instead of requiring an aligned extent of 512
blocks which is completely unused, that a partially used extent can also
be used for group preallocation.

I used the attached test script, named test-allocate, to demonstrate the
benefits of the last two patches in this patch series. (The first two
simply add better debugging and make the flags field in mb_history
easier to understand.) Without the last two patches applied, the
results of a (bug-fixed) e2freefrag looks like this:

Device: /dev/sdc1
Blocksize: 1024 bytes
Total blocks: 5237156
Free blocks: 5117184 (97.7%)

Min. free extent: 5 KB
Max. free extent: 128992 KB
Avg. free extent: 85286 KB

HISTOGRAM OF FREE EXTENT SIZES:
Chunk Size Range : Free chunks Free Blocks Percent
4K... 8K- : 1 5 0.00%
8K... 16K- : 1 11 0.00%
16K... 32K- : 2 44 0.00%
128K... 256K- : 1 235 0.00%
256K... 512K- : 1 407 0.01%
4M... 8M- : 5 30268 0.59%
8M... 16M- : 5 79723 1.56%
16M... 32M- : 2 46795 0.91%
32M... 64M- : 3 153014 2.99%
64M... 128M- : 39 4806682 93.93%

With the full patch series applied the e2freefrag output looks like
this:

Device: /dev/sdc1
Blocksize: 1024 bytes
Total blocks: 5237156
Free blocks: 5117184 (97.7%)

Min. free extent: 1 KB
Max. free extent: 128992 KB
Avg. free extent: 88227 KB

HISTOGRAM OF FREE EXTENTS SIZES:
Chunk Size Range : Free chunks Free Blocks Percent
1K... 2K- : 2 2 0.00%
2K... 4K- : 1 3 0.00%
4K... 8K- : 1 5 0.00%
4M... 8M- : 5 30268 0.59%
8M... 16M- : 5 80415 1.57%
16M... 32M- : 2 46795 0.91%
32M... 64M- : 3 153014 2.99%
64M... 128M- : 39 4806682 93.93%

Compare the histogram of the sub-megabyte free chunks.

Here is the before output of mb_history:

pid inode original goal result found grps cr flags merge tail broken
1920 513 0/0/1@0 0/0/1@0 0/2371/1@0 1 1 1 0x0000 0 0
398 12 1/0/3@0 0/0/512@0 1/512/512@0 1 1 0 0x04a0 0 0
398 13 1/0/5@0 1/515/5@0
398 14 1/0/7@0 1/520/7@0
398 15 1/0/11@0 1/527/11@0
398 16 1/0/13@0 1/538/13@0
398 17 1/0/15@0 1/551/15@0
398 19 1/0/5@0 1/566/5@0
398 20 1/0/7@0 1/571/7@0
398 21 1/0/11@0 1/578/11@0
398 22 1/0/13@0 1/589/13@0
398 23 1/0/15@0 1/602/15@0
1933 18 1/0/3@0 1/512/512@0 1/1024/512@0 1 1 0 0x04a0 512 1024
398 13 1/0/2@0 1/1027/2@0
398 15 1/0/3@0 1/1029/3@0
398 17 1/0/5@0 1/1032/5@0
398 19 1/0/6@0 1/1037/6@0
398 21 1/0/9@0 1/1043/9@0
398 22 1/0/6@0 1/1052/6@0
398 24 1/0/2@0 1/1058/2@0
398 25 1/0/3@0 1/1060/3@0
398 26 1/0/5@0 1/1063/5@0
398 27 1/0/6@0 1/1068/6@0
398 28 1/0/9@0 1/1074/9@0
398 29 1/0/6@0 1/1083/6@0
.... and here is the after:

pid inode original goal result found grps cr flags merge tail broken
1825 513 0/0/1@0 0/0/1@0 0/2371/1@0 1 1 1 0x0000 0 0
397 12 1/0/3@0 1/0/3@0 1/277/3@0 1 1 1 0x0460 0 0
397 13 1/0/5@0 1/0/5@0 1/280/5@0 9 1 1 0x0460 5 8
397 14 1/0/7@0 1/0/7@0 1/285/7@0 8 1 1 0x0460 4 32
397 15 1/0/11@0 1/0/11@0 1/292/11@0 9 1 1 0x0460 7 8
397 16 1/0/13@0 1/0/13@0 1/303/13@0 8 1 1 0x0460 12 16
397 17 1/0/15@0 1/0/15@0 1/316/15@0 7 1 1 0x0460 11 64
397 18 1/0/3@0 1/0/3@0 1/331/3@0 9 1 1 0x0460 2 4
397 19 1/0/5@0 1/0/5@0 1/334/5@0 8 1 1 0x0460 3 16
397 20 1/0/7@0 1/0/7@0 1/339/7@0 8 1 1 0x0460 2 8
397 21 1/0/11@0 1/0/11@0 1/346/11@0 7 1 1 0x0460 5 32
397 22 1/0/13@0 1/0/13@0 1/357/13@0 7 1 1 0x0460 2 16
397 23 1/0/15@0 1/0/15@0 1/370/15@0 6 1 1 0x0460 1 128
1853 13 1/0/2@0 1/0/2@0 1/300/2@0 1 1 0 0x0460 0 0
1853 15 1/0/3@0 1/0/3@0 1/328/3@0 8 1 1 0x0460 0 0
1853 17 1/0/5@0 1/0/5@0 1/280/5@0 1 1 1 0x0460 0 0
1853 19 1/0/6@0 1/0/6@0 1/346/6@0 5 1 1 0x0460 0 0
1853 21 1/0/9@0 1/0/9@0 1/316/9@0 11 1 1 0x0460 5 8
1853 22 1/0/6@0 1/0/6@0 1/385/6@0 11 1 1 0x0460 3 4
1853 24 1/0/2@0 1/0/2@0 1/326/2@0 1 1 0 0x0460 0 0
1853 25 1/0/3@0 1/0/3@0 1/292/3@0 11 1 1 0x0460 3 4
1853 26 1/0/5@0 1/0/5@0 1/295/5@0 1 1 1 0x0460 0 0
1853 27 1/0/6@0 1/0/6@0 1/391/6@0 11 1 1 0x0460 5 8
1853 28 1/0/9@0 1/0/9@0 1/352/9@0 11 1 1 0x0460 9 16
1853 29 1/0/6@0 1/0/6@0 1/361/6@0 11 1 1 0x0460 3 4

- Ted

------------------------ begin test-allocate

#!/bin/bash

function gen_file()
{
dd if=/dev/zero of=$1 bs=1k count=$2
}

mkdir t
cd t
gen_file a 3
gen_file b 5
gen_file c 7
gen_file d 11
gen_file e 13
gen_file f 15
gen_file g 3
gen_file h 5
gen_file i 7
gen_file j 11
gen_file k 13
gen_file l 15
sync

rm b d f h j k
sync

gen_file m 2
gen_file n 3
gen_file o 5
gen_file p 6
gen_file q 9
gen_file r 6
gen_file s 2
gen_file t 3
gen_file u 5
gen_file v 6
gen_file w 9
gen_file x 6
sync
-------------------------------------------------


Theodore Ts'o (4):
ext4: Add configurable run-time mballoc debugging
ext4: Display the mballoc flags in mb_history in hex instead of
decimal
ext4: Fix bugs in mballoc's stream allocation mode
ext4: Avoid group preallocation for closed files

fs/ext4/Kconfig | 9 ++++
fs/ext4/ext4.h | 46 +++++++++++++++-----
fs/ext4/mballoc.c | 116 ++++++++++++++++++++++++++++++++++++-----------------
fs/ext4/mballoc.h | 16 +++++--
4 files changed, 134 insertions(+), 53 deletions(-)



2009-08-10 03:24:01

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH, RFC -V2 2/4] ext4: Display the mballoc flags in mb_history in hex instead of decimal

Displaying the flags in base 16 makes it easier to see which flags
have been set.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 22 +++++++++++-----------
fs/ext4/mballoc.c | 4 ++--
2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9714db3..e267727 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -67,27 +67,27 @@ typedef unsigned int ext4_group_t;


/* prefer goal again. length */
-#define EXT4_MB_HINT_MERGE 1
+#define EXT4_MB_HINT_MERGE 0x0001
/* blocks already reserved */
-#define EXT4_MB_HINT_RESERVED 2
+#define EXT4_MB_HINT_RESERVED 0x0002
/* metadata is being allocated */
-#define EXT4_MB_HINT_METADATA 4
+#define EXT4_MB_HINT_METADATA 0x0004
/* first blocks in the file */
-#define EXT4_MB_HINT_FIRST 8
+#define EXT4_MB_HINT_FIRST 0x0008
/* search for the best chunk */
-#define EXT4_MB_HINT_BEST 16
+#define EXT4_MB_HINT_BEST 0x0010
/* data is being allocated */
-#define EXT4_MB_HINT_DATA 32
+#define EXT4_MB_HINT_DATA 0x0020
/* don't preallocate (for tails) */
-#define EXT4_MB_HINT_NOPREALLOC 64
+#define EXT4_MB_HINT_NOPREALLOC 0x0040
/* allocate for locality group */
-#define EXT4_MB_HINT_GROUP_ALLOC 128
+#define EXT4_MB_HINT_GROUP_ALLOC 0x0080
/* allocate goal blocks or none */
-#define EXT4_MB_HINT_GOAL_ONLY 256
+#define EXT4_MB_HINT_GOAL_ONLY 0x0100
/* goal is meaningful */
-#define EXT4_MB_HINT_TRY_GOAL 512
+#define EXT4_MB_HINT_TRY_GOAL 0x0200
/* blocks already pre-reserved by delayed allocation */
-#define EXT4_MB_DELALLOC_RESERVED 1024
+#define EXT4_MB_DELALLOC_RESERVED 0x0400


struct ext4_allocation_request {
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 2c81240..f510a58 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2157,7 +2157,7 @@ static int ext4_mb_seq_history_show(struct seq_file *seq, void *v)

if (v == SEQ_START_TOKEN) {
seq_printf(seq, "%-5s %-8s %-23s %-23s %-23s %-5s "
- "%-5s %-2s %-5s %-5s %-5s %-6s\n",
+ "%-5s %-2s %-6s %-5s %-5s %-6s\n",
"pid", "inode", "original", "goal", "result", "found",
"grps", "cr", "flags", "merge", "tail", "broken");
return 0;
@@ -2165,7 +2165,7 @@ static int ext4_mb_seq_history_show(struct seq_file *seq, void *v)

if (hs->op == EXT4_MB_HISTORY_ALLOC) {
fmt = "%-5u %-8u %-23s %-23s %-23s %-5u %-5u %-2u "
- "%-5u %-5s %-5u %-6u\n";
+ "0x%04x %-5s %-5u %-6u\n";
sprintf(buf2, "%u/%d/%u@%u", hs->result.fe_group,
hs->result.fe_start, hs->result.fe_len,
hs->result.fe_logical);
--
1.6.3.2.1.gb9f7d.dirty


2009-08-10 03:24:01

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files

Currently the group preallocation code tries to find a large (512)
free block from which to do per-cpu group allocation for small files.
The problem with this scheme is that it leaves the filesystem horribly
fragmented. In the worst case, if the filesystem is unmounted and
remounted (after a system shutdown, for example) we forget the fact
that wee were using a particular (now-partially filled) 512 block
extent. So the next time we try to allocate space for a small file,
we will find *another* completely free 512 block chunk to allocate
small files. Given that there are 32,768 blocks in a block group,
after 64 iterations of "mount, write one 4k file in a directory,
unmount", the block group will have 64 files, each separated by 511
blocks, and the block group will no longer have any free 512
completely free chunks of blocks for group preallocation space.

So if we try to allocate blocks for a file that has been closed, such
that we know the final size of the file, and the filesystem is not
busy, avoid using group preallocation.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 22 +++++++++++++++++++++-
fs/ext4/mballoc.c | 10 +++++++++-
2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 70aa951..a09ea10 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -952,6 +952,7 @@ struct ext4_sb_info {
atomic_t s_mb_lost_chunks;
atomic_t s_mb_preallocated;
atomic_t s_mb_discarded;
+ atomic_t s_lock_busy;

/* locality groups */
struct ext4_locality_group *s_locality_groups;
@@ -1593,15 +1594,34 @@ struct ext4_group_info {
#define EXT4_MB_GRP_NEED_INIT(grp) \
(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))

+#define EXT4_MAX_CONTENTION 8
+#define EXT4_CONTENTION_THRESHOLD 2
+
static inline spinlock_t *ext4_group_lock_ptr(struct super_block *sb,
ext4_group_t group)
{
return bgl_lock_ptr(EXT4_SB(sb)->s_blockgroup_lock, group);
}

+/*
+ * Returns true if the filesystem is busy enough that attempts to
+ * access the block group locks has run into contention.
+ */
+static inline int ext4_fs_is_busy(struct ext4_sb_info *sbi)
+{
+ return (atomic_read(&sbi->s_lock_busy) > EXT4_CONTENTION_THRESHOLD);
+}
+
static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group)
{
- spin_lock(ext4_group_lock_ptr(sb, group));
+ spinlock_t *lock = ext4_group_lock_ptr(sb, group);
+ if (spin_trylock(lock))
+ atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, 1,
+ EXT4_MAX_CONTENTION);
+ else {
+ atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, -1, 0);
+ spin_lock(lock);
+ }
}

static inline void ext4_unlock_group(struct super_block *sb,
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a103cb0..6c7be0d 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4191,9 +4191,17 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac)
return;

size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
- isize = i_size_read(ac->ac_inode) >> bsbits;
+ isize = (i_size_read(ac->ac_inode) + ac->ac_sb->s_blocksize - 1)
+ >> bsbits;
size = max(size, isize);

+ if ((size == isize) &&
+ ext4_fs_is_busy(sbi) &&
+ (atomic_read(&ac->ac_inode->i_writecount) == 0)) {
+ ac->ac_flags |= EXT4_MB_HINT_NOPREALLOC;
+ return;
+ }
+
/* don't use group allocation for large files */
if (size >= sbi->s_mb_stream_request) {
ac->ac_flags |= EXT4_MB_STREAM_ALLOC;
--
1.6.3.2.1.gb9f7d.dirty


2009-08-10 03:24:01

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH, RFC -V2 3/4] ext4: Fix bugs in mballoc's stream allocation mode

The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all
screwed up. These fields were getting unconditionally all the time,
set even when stream allocation had not taken place, and if they were
being used when the file was smaller than s_mb_stream_request, which
is when the allocation should _not_ be doing stream allocation.

Fix this by determining whether or not we stream allocation should
take place once, in ext4_mb_group_or_file(), and setting a flag which
gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found().
This simplifies the code and assures that we are consistently using
(or not using) the stream allocation logic.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 2 ++
fs/ext4/mballoc.c | 23 ++++++++++-------------
2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index e267727..70aa951 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -88,6 +88,8 @@ typedef unsigned int ext4_group_t;
#define EXT4_MB_HINT_TRY_GOAL 0x0200
/* blocks already pre-reserved by delayed allocation */
#define EXT4_MB_DELALLOC_RESERVED 0x0400
+/* We are doing stream allocation */
+#define EXT4_MB_STREAM_ALLOC 0x0800


struct ext4_allocation_request {
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f510a58..a103cb0 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1361,7 +1361,7 @@ static void ext4_mb_use_best_found(struct ext4_allocation_context *ac,
ac->alloc_semp = e4b->alloc_semp;
e4b->alloc_semp = NULL;
/* store last allocated for subsequent stream allocation */
- if ((ac->ac_flags & EXT4_MB_HINT_DATA)) {
+ if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) {
spin_lock(&sbi->s_md_lock);
sbi->s_mb_last_group = ac->ac_f_ex.fe_group;
sbi->s_mb_last_start = ac->ac_f_ex.fe_start;
@@ -1939,7 +1939,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
struct ext4_sb_info *sbi;
struct super_block *sb;
struct ext4_buddy e4b;
- loff_t size, isize;

sb = ac->ac_sb;
sbi = EXT4_SB(sb);
@@ -1975,20 +1974,16 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
}

bsbits = ac->ac_sb->s_blocksize_bits;
- /* if stream allocation is enabled, use global goal */
- size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
- isize = i_size_read(ac->ac_inode) >> bsbits;
- if (size < isize)
- size = isize;

- if (size < sbi->s_mb_stream_request &&
- (ac->ac_flags & EXT4_MB_HINT_DATA)) {
+ /* if stream allocation is enabled, use global goal */
+ if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) {
/* TBD: may be hot point */
spin_lock(&sbi->s_md_lock);
ac->ac_g_ex.fe_group = sbi->s_mb_last_group;
ac->ac_g_ex.fe_start = sbi->s_mb_last_start;
spin_unlock(&sbi->s_md_lock);
}
+
/* Let's just scan groups to find more-less suitable blocks */
cr = ac->ac_2order ? 0 : 1;
/*
@@ -4192,16 +4187,18 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac)
if (!(ac->ac_flags & EXT4_MB_HINT_DATA))
return;

+ if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
+ return;
+
size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
isize = i_size_read(ac->ac_inode) >> bsbits;
size = max(size, isize);

/* don't use group allocation for large files */
- if (size >= sbi->s_mb_stream_request)
- return;

2009-08-10 03:24:01

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging

Allow mballoc debugging to be enabled at run-time instead of just at
compile time.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/Kconfig | 9 ++++++
fs/ext4/mballoc.c | 81 ++++++++++++++++++++++++++++++++++++++--------------
fs/ext4/mballoc.h | 16 ++++++++--
3 files changed, 80 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
index 1523046..d5c0ea2 100644
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@@ -77,3 +77,12 @@ config EXT4_FS_SECURITY

If you are not using a security module that requires using
extended attributes for file security labels, say N.
+
+config EXT4_DEBUG
+ bool "EXT4 debugging support"
+ depends on EXT4_FS
+ help
+ Enables run-time debugging support for the ext4 filesystem.
+
+ If you select Y here, then you will be able to turn on debugging
+ with a command such as "echo 1 > /sys/kernel/debug/ext4/mballoc-debug"
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 68cde59..2c81240 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -22,6 +22,7 @@
*/

#include "mballoc.h"
+#include <linux/debugfs.h>
#include <trace/events/ext4.h>

/*
@@ -743,7 +744,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
char *data;
char *bitmap;

- mb_debug("init page %lu\n", page->index);
+ mb_debug(1, "init page %lu\n", page->index);

inode = page->mapping->host;
sb = inode->i_sb;
@@ -822,7 +823,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
set_bitmap_uptodate(bh[i]);
bh[i]->b_end_io = end_buffer_read_sync;
submit_bh(READ, bh[i]);
- mb_debug("read bitmap for group %u\n", first_group + i);
+ mb_debug(1, "read bitmap for group %u\n", first_group + i);
}

/* wait for I/O completion */
@@ -862,7 +863,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
if ((first_block + i) & 1) {
/* this is block of buddy */
BUG_ON(incore == NULL);
- mb_debug("put buddy for group %u in page %lu/%x\n",
+ mb_debug(1, "put buddy for group %u in page %lu/%x\n",
group, page->index, i * blocksize);
grinfo = ext4_get_group_info(sb, group);
grinfo->bb_fragments = 0;
@@ -878,7 +879,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
} else {
/* this is block of bitmap */
BUG_ON(incore != NULL);
- mb_debug("put bitmap for group %u in page %lu/%x\n",
+ mb_debug(1, "put bitmap for group %u in page %lu/%x\n",
group, page->index, i * blocksize);

/* see comments in ext4_mb_put_pa() */
@@ -922,7 +923,7 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
struct ext4_sb_info *sbi = EXT4_SB(sb);
struct inode *inode = sbi->s_buddy_cache;

- mb_debug("load group %u\n", group);
+ mb_debug(1, "load group %u\n", group);

blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
grp = ext4_get_group_info(sb, group);
@@ -1851,7 +1852,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
struct inode *inode = sbi->s_buddy_cache;
struct page *page = NULL, *bitmap_page = NULL;

- mb_debug("init group %u\n", group);
+ mb_debug(1, "init group %u\n", group);
blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
this_grp = ext4_get_group_info(sb, group);
/*
@@ -2739,7 +2740,7 @@ static void ext4_mb_cleanup_pa(struct ext4_group_info *grp)
kmem_cache_free(ext4_pspace_cachep, pa);
}
if (count)
- mb_debug("mballoc: %u PAs left\n", count);
+ mb_debug(1, "mballoc: %u PAs left\n", count);

}

@@ -2820,7 +2821,7 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
list_for_each_safe(l, ltmp, &txn->t_private_list) {
entry = list_entry(l, struct ext4_free_data, list);

- mb_debug("gonna free %u blocks in group %u (0x%p):",
+ mb_debug(1, "gonna free %u blocks in group %u (0x%p):",
entry->count, entry->group, entry);

err = ext4_mb_load_buddy(sb, entry->group, &e4b);
@@ -2855,9 +2856,43 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
ext4_mb_release_desc(&e4b);
}

- mb_debug("freed %u blocks in %u structures\n", count, count2);
+ mb_debug(1, "freed %u blocks in %u structures\n", count, count2);
}

+#ifdef CONFIG_EXT4_DEBUG
+u8 mb_enable_debug __read_mostly;
+
+static struct dentry *debugfs_dir;
+static struct dentry *debugfs_debug;
+
+static void __init ext4_create_debugfs_entry(void)
+{
+ debugfs_dir = debugfs_create_dir("ext4", NULL);
+ if (debugfs_dir)
+ debugfs_debug = debugfs_create_u8("mballoc-debug",
+ S_IRUGO | S_IWUSR,
+ debugfs_dir,
+ &mb_enable_debug);
+}
+
+static void __exit ext4_remove_debugfs_entry(void)
+{
+ debugfs_remove(debugfs_debug);
+ debugfs_remove(debugfs_dir);
+}
+
+#else
+
+static void __init ext4_create_debugfs_entry(void)
+{
+}
+
+static void __exit ext4_remove_debugfs_entry(void)
+{
+}
+
+#endif
+
int __init init_ext4_mballoc(void)
{
ext4_pspace_cachep =
@@ -2885,6 +2920,7 @@ int __init init_ext4_mballoc(void)
kmem_cache_destroy(ext4_ac_cachep);
return -ENOMEM;
}
+ ext4_create_debugfs_entry();
return 0;
}

@@ -2898,6 +2934,7 @@ void exit_ext4_mballoc(void)
kmem_cache_destroy(ext4_pspace_cachep);
kmem_cache_destroy(ext4_ac_cachep);
kmem_cache_destroy(ext4_free_ext_cachep);
+ ext4_remove_debugfs_entry();
}


@@ -3042,7 +3079,7 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac)
ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_stripe;
else
ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
- mb_debug("#%u: goal %u blocks for locality group\n",
+ mb_debug(1, "#%u: goal %u blocks for locality group\n",
current->pid, ac->ac_g_ex.fe_len);
}

@@ -3232,7 +3269,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL;
}

- mb_debug("goal: %u(was %u) blocks at %u\n", (unsigned) size,
+ mb_debug(1, "goal: %u(was %u) blocks at %u\n", (unsigned) size,
(unsigned) orig_size, (unsigned) start);
}

@@ -3281,7 +3318,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
BUG_ON(pa->pa_free < len);
pa->pa_free -= len;

- mb_debug("use %llu/%u from inode pa %p\n", start, len, pa);
+ mb_debug(1, "use %llu/%u from inode pa %p\n", start, len, pa);
}

/*
@@ -3305,7 +3342,7 @@ static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
* in on-disk bitmap -- see ext4_mb_release_context()
* Other CPUs are prevented from allocating from this pa by lg_mutex
*/
- mb_debug("use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa);
+ mb_debug(1, "use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa);
}

/*
@@ -3484,7 +3521,7 @@ void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
preallocated += len;
count++;
}
- mb_debug("prellocated %u for group %u\n", preallocated, group);
+ mb_debug(1, "prellocated %u for group %u\n", preallocated, group);
}

static void ext4_mb_pa_callback(struct rcu_head *head)
@@ -3619,7 +3656,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa->pa_deleted = 0;
pa->pa_type = MB_INODE_PA;

- mb_debug("new inode pa %p: %llu/%u for %u\n", pa,
+ mb_debug(1, "new inode pa %p: %llu/%u for %u\n", pa,
pa->pa_pstart, pa->pa_len, pa->pa_lstart);
trace_ext4_mb_new_inode_pa(ac, pa);

@@ -3679,7 +3716,7 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *ac)
pa->pa_deleted = 0;
pa->pa_type = MB_GROUP_PA;

- mb_debug("new group pa %p: %llu/%u for %u\n", pa,
+ mb_debug(1, "new group pa %p: %llu/%u for %u\n", pa,
pa->pa_pstart, pa->pa_len, pa->pa_lstart);
trace_ext4_mb_new_group_pa(ac, pa);

@@ -3758,7 +3795,7 @@ ext4_mb_release_inode_pa(struct ext4_buddy *e4b, struct buffer_head *bitmap_bh,
next = mb_find_next_bit(bitmap_bh->b_data, end, bit);
start = group * EXT4_BLOCKS_PER_GROUP(sb) + bit +
le32_to_cpu(sbi->s_es->s_first_data_block);
- mb_debug(" free preallocated %u/%u in group %u\n",
+ mb_debug(1, " free preallocated %u/%u in group %u\n",
(unsigned) start, (unsigned) next - bit,
(unsigned) group);
free += next - bit;
@@ -3849,7 +3886,7 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
int busy = 0;
int free = 0;

- mb_debug("discard preallocation for group %u\n", group);
+ mb_debug(1, "discard preallocation for group %u\n", group);

if (list_empty(&grp->bb_prealloc_list))
return 0;
@@ -3973,7 +4010,7 @@ void ext4_discard_preallocations(struct inode *inode)
return;
}

- mb_debug("discard preallocation for inode %lu\n", inode->i_ino);
+ mb_debug(1, "discard preallocation for inode %lu\n", inode->i_ino);
trace_ext4_discard_preallocations(inode);

INIT_LIST_HEAD(&list);
@@ -4078,7 +4115,7 @@ static void ext4_mb_return_to_preallocation(struct inode *inode,
{
BUG_ON(!list_empty(&EXT4_I(inode)->i_prealloc_list));
}
-#ifdef MB_DEBUG
+#ifdef CONFIG_EXT4_DEBUG
static void ext4_mb_show_ac(struct ext4_allocation_context *ac)
{
struct super_block *sb = ac->ac_sb;
@@ -4227,7 +4264,7 @@ ext4_mb_initialize_context(struct ext4_allocation_context *ac,
* locality group. this is a policy, actually */
ext4_mb_group_or_file(ac);

- mb_debug("init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, "
+ mb_debug(1, "init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, "
"left: %u/%u, right %u/%u to %swritable\n",
(unsigned) ar->len, (unsigned) ar->logical,
(unsigned) ar->goal, ac->ac_flags, ac->ac_2order,
@@ -4249,7 +4286,7 @@ ext4_mb_discard_lg_preallocations(struct super_block *sb,
struct ext4_prealloc_space *pa, *tmp;
struct ext4_allocation_context *ac;

- mb_debug("discard locality group preallocation\n");
+ mb_debug(1, "discard locality group preallocation\n");

INIT_LIST_HEAD(&discard_list);
ac = kmem_cache_alloc(ext4_ac_cachep, GFP_NOFS);
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index c96bb19..28abb02 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
@@ -37,11 +37,19 @@

/*
*/
-#define MB_DEBUG__
-#ifdef MB_DEBUG
-#define mb_debug(fmt, a...) printk(fmt, ##a)
+#ifdef CONFIG_EXT4_DEBUG
+extern u8 mb_enable_debug;
+
+#define mb_debug(n, fmt, a...) \
+ do { \
+ if ((n) <= mb_enable_debug) { \
+ printk (KERN_DEBUG "(%s, %d): %s: ", \
+ __FILE__, __LINE__, __func__); \
+ printk (fmt, ## a); \
+ } \
+ } while (0)
#else
-#define mb_debug(fmt, a...)
+#define mb_debug(n, fmt, a...)
#endif

/*
--
1.6.3.2.1.gb9f7d.dirty


2009-08-10 03:42:19

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging

Theodore Ts'o wrote:
> Allow mballoc debugging to be enabled at run-time instead of just at
> compile time.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/Kconfig | 9 ++++++
> fs/ext4/mballoc.c | 81 ++++++++++++++++++++++++++++++++++++++--------------
> fs/ext4/mballoc.h | 16 ++++++++--
> 3 files changed, 80 insertions(+), 26 deletions(-)

This looks fine to me, though is there any reason to add the debug
verbosity level at this point, when it's all just "1?"

If you're going to do that I suppose I'd suggest some #defines that
give an idea of the expected range... at least low/medium/high... will
our debug go to 11? ;)

-Eric

> diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
> index 1523046..d5c0ea2 100644
> --- a/fs/ext4/Kconfig
> +++ b/fs/ext4/Kconfig
> @@ -77,3 +77,12 @@ config EXT4_FS_SECURITY
>
> If you are not using a security module that requires using
> extended attributes for file security labels, say N.
> +
> +config EXT4_DEBUG
> + bool "EXT4 debugging support"
> + depends on EXT4_FS
> + help
> + Enables run-time debugging support for the ext4 filesystem.
> +
> + If you select Y here, then you will be able to turn on debugging
> + with a command such as "echo 1 > /sys/kernel/debug/ext4/mballoc-debug"
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 68cde59..2c81240 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -22,6 +22,7 @@
> */
>
> #include "mballoc.h"
> +#include <linux/debugfs.h>
> #include <trace/events/ext4.h>
>
> /*
> @@ -743,7 +744,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
> char *data;
> char *bitmap;
>
> - mb_debug("init page %lu\n", page->index);
> + mb_debug(1, "init page %lu\n", page->index);
>
> inode = page->mapping->host;
> sb = inode->i_sb;
> @@ -822,7 +823,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
> set_bitmap_uptodate(bh[i]);
> bh[i]->b_end_io = end_buffer_read_sync;
> submit_bh(READ, bh[i]);
> - mb_debug("read bitmap for group %u\n", first_group + i);
> + mb_debug(1, "read bitmap for group %u\n", first_group + i);
> }
>
> /* wait for I/O completion */
> @@ -862,7 +863,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
> if ((first_block + i) & 1) {
> /* this is block of buddy */
> BUG_ON(incore == NULL);
> - mb_debug("put buddy for group %u in page %lu/%x\n",
> + mb_debug(1, "put buddy for group %u in page %lu/%x\n",
> group, page->index, i * blocksize);
> grinfo = ext4_get_group_info(sb, group);
> grinfo->bb_fragments = 0;
> @@ -878,7 +879,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
> } else {
> /* this is block of bitmap */
> BUG_ON(incore != NULL);
> - mb_debug("put bitmap for group %u in page %lu/%x\n",
> + mb_debug(1, "put bitmap for group %u in page %lu/%x\n",
> group, page->index, i * blocksize);
>
> /* see comments in ext4_mb_put_pa() */
> @@ -922,7 +923,7 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
> struct ext4_sb_info *sbi = EXT4_SB(sb);
> struct inode *inode = sbi->s_buddy_cache;
>
> - mb_debug("load group %u\n", group);
> + mb_debug(1, "load group %u\n", group);
>
> blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
> grp = ext4_get_group_info(sb, group);
> @@ -1851,7 +1852,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
> struct inode *inode = sbi->s_buddy_cache;
> struct page *page = NULL, *bitmap_page = NULL;
>
> - mb_debug("init group %u\n", group);
> + mb_debug(1, "init group %u\n", group);
> blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
> this_grp = ext4_get_group_info(sb, group);
> /*
> @@ -2739,7 +2740,7 @@ static void ext4_mb_cleanup_pa(struct ext4_group_info *grp)
> kmem_cache_free(ext4_pspace_cachep, pa);
> }
> if (count)
> - mb_debug("mballoc: %u PAs left\n", count);
> + mb_debug(1, "mballoc: %u PAs left\n", count);
>
> }
>
> @@ -2820,7 +2821,7 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
> list_for_each_safe(l, ltmp, &txn->t_private_list) {
> entry = list_entry(l, struct ext4_free_data, list);
>
> - mb_debug("gonna free %u blocks in group %u (0x%p):",
> + mb_debug(1, "gonna free %u blocks in group %u (0x%p):",
> entry->count, entry->group, entry);
>
> err = ext4_mb_load_buddy(sb, entry->group, &e4b);
> @@ -2855,9 +2856,43 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
> ext4_mb_release_desc(&e4b);
> }
>
> - mb_debug("freed %u blocks in %u structures\n", count, count2);
> + mb_debug(1, "freed %u blocks in %u structures\n", count, count2);
> }
>
> +#ifdef CONFIG_EXT4_DEBUG
> +u8 mb_enable_debug __read_mostly;
> +
> +static struct dentry *debugfs_dir;
> +static struct dentry *debugfs_debug;
> +
> +static void __init ext4_create_debugfs_entry(void)
> +{
> + debugfs_dir = debugfs_create_dir("ext4", NULL);
> + if (debugfs_dir)
> + debugfs_debug = debugfs_create_u8("mballoc-debug",
> + S_IRUGO | S_IWUSR,
> + debugfs_dir,
> + &mb_enable_debug);
> +}
> +
> +static void __exit ext4_remove_debugfs_entry(void)
> +{
> + debugfs_remove(debugfs_debug);
> + debugfs_remove(debugfs_dir);
> +}
> +
> +#else
> +
> +static void __init ext4_create_debugfs_entry(void)
> +{
> +}
> +
> +static void __exit ext4_remove_debugfs_entry(void)
> +{
> +}
> +
> +#endif
> +
> int __init init_ext4_mballoc(void)
> {
> ext4_pspace_cachep =
> @@ -2885,6 +2920,7 @@ int __init init_ext4_mballoc(void)
> kmem_cache_destroy(ext4_ac_cachep);
> return -ENOMEM;
> }
> + ext4_create_debugfs_entry();
> return 0;
> }
>
> @@ -2898,6 +2934,7 @@ void exit_ext4_mballoc(void)
> kmem_cache_destroy(ext4_pspace_cachep);
> kmem_cache_destroy(ext4_ac_cachep);
> kmem_cache_destroy(ext4_free_ext_cachep);
> + ext4_remove_debugfs_entry();
> }
>
>
> @@ -3042,7 +3079,7 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac)
> ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_stripe;
> else
> ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
> - mb_debug("#%u: goal %u blocks for locality group\n",
> + mb_debug(1, "#%u: goal %u blocks for locality group\n",
> current->pid, ac->ac_g_ex.fe_len);
> }
>
> @@ -3232,7 +3269,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
> ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL;
> }
>
> - mb_debug("goal: %u(was %u) blocks at %u\n", (unsigned) size,
> + mb_debug(1, "goal: %u(was %u) blocks at %u\n", (unsigned) size,
> (unsigned) orig_size, (unsigned) start);
> }
>
> @@ -3281,7 +3318,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
> BUG_ON(pa->pa_free < len);
> pa->pa_free -= len;
>
> - mb_debug("use %llu/%u from inode pa %p\n", start, len, pa);
> + mb_debug(1, "use %llu/%u from inode pa %p\n", start, len, pa);
> }
>
> /*
> @@ -3305,7 +3342,7 @@ static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
> * in on-disk bitmap -- see ext4_mb_release_context()
> * Other CPUs are prevented from allocating from this pa by lg_mutex
> */
> - mb_debug("use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa);
> + mb_debug(1, "use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa);
> }
>
> /*
> @@ -3484,7 +3521,7 @@ void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
> preallocated += len;
> count++;
> }
> - mb_debug("prellocated %u for group %u\n", preallocated, group);
> + mb_debug(1, "prellocated %u for group %u\n", preallocated, group);
> }
>
> static void ext4_mb_pa_callback(struct rcu_head *head)
> @@ -3619,7 +3656,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
> pa->pa_deleted = 0;
> pa->pa_type = MB_INODE_PA;
>
> - mb_debug("new inode pa %p: %llu/%u for %u\n", pa,
> + mb_debug(1, "new inode pa %p: %llu/%u for %u\n", pa,
> pa->pa_pstart, pa->pa_len, pa->pa_lstart);
> trace_ext4_mb_new_inode_pa(ac, pa);
>
> @@ -3679,7 +3716,7 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *ac)
> pa->pa_deleted = 0;
> pa->pa_type = MB_GROUP_PA;
>
> - mb_debug("new group pa %p: %llu/%u for %u\n", pa,
> + mb_debug(1, "new group pa %p: %llu/%u for %u\n", pa,
> pa->pa_pstart, pa->pa_len, pa->pa_lstart);
> trace_ext4_mb_new_group_pa(ac, pa);
>
> @@ -3758,7 +3795,7 @@ ext4_mb_release_inode_pa(struct ext4_buddy *e4b, struct buffer_head *bitmap_bh,
> next = mb_find_next_bit(bitmap_bh->b_data, end, bit);
> start = group * EXT4_BLOCKS_PER_GROUP(sb) + bit +
> le32_to_cpu(sbi->s_es->s_first_data_block);
> - mb_debug(" free preallocated %u/%u in group %u\n",
> + mb_debug(1, " free preallocated %u/%u in group %u\n",
> (unsigned) start, (unsigned) next - bit,
> (unsigned) group);
> free += next - bit;
> @@ -3849,7 +3886,7 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
> int busy = 0;
> int free = 0;
>
> - mb_debug("discard preallocation for group %u\n", group);
> + mb_debug(1, "discard preallocation for group %u\n", group);
>
> if (list_empty(&grp->bb_prealloc_list))
> return 0;
> @@ -3973,7 +4010,7 @@ void ext4_discard_preallocations(struct inode *inode)
> return;
> }
>
> - mb_debug("discard preallocation for inode %lu\n", inode->i_ino);
> + mb_debug(1, "discard preallocation for inode %lu\n", inode->i_ino);
> trace_ext4_discard_preallocations(inode);
>
> INIT_LIST_HEAD(&list);
> @@ -4078,7 +4115,7 @@ static void ext4_mb_return_to_preallocation(struct inode *inode,
> {
> BUG_ON(!list_empty(&EXT4_I(inode)->i_prealloc_list));
> }
> -#ifdef MB_DEBUG
> +#ifdef CONFIG_EXT4_DEBUG
> static void ext4_mb_show_ac(struct ext4_allocation_context *ac)
> {
> struct super_block *sb = ac->ac_sb;
> @@ -4227,7 +4264,7 @@ ext4_mb_initialize_context(struct ext4_allocation_context *ac,
> * locality group. this is a policy, actually */
> ext4_mb_group_or_file(ac);
>
> - mb_debug("init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, "
> + mb_debug(1, "init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, "
> "left: %u/%u, right %u/%u to %swritable\n",
> (unsigned) ar->len, (unsigned) ar->logical,
> (unsigned) ar->goal, ac->ac_flags, ac->ac_2order,
> @@ -4249,7 +4286,7 @@ ext4_mb_discard_lg_preallocations(struct super_block *sb,
> struct ext4_prealloc_space *pa, *tmp;
> struct ext4_allocation_context *ac;
>
> - mb_debug("discard locality group preallocation\n");
> + mb_debug(1, "discard locality group preallocation\n");
>
> INIT_LIST_HEAD(&discard_list);
> ac = kmem_cache_alloc(ext4_ac_cachep, GFP_NOFS);
> diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
> index c96bb19..28abb02 100644
> --- a/fs/ext4/mballoc.h
> +++ b/fs/ext4/mballoc.h
> @@ -37,11 +37,19 @@
>
> /*
> */
> -#define MB_DEBUG__
> -#ifdef MB_DEBUG
> -#define mb_debug(fmt, a...) printk(fmt, ##a)
> +#ifdef CONFIG_EXT4_DEBUG
> +extern u8 mb_enable_debug;
> +
> +#define mb_debug(n, fmt, a...) \
> + do { \
> + if ((n) <= mb_enable_debug) { \
> + printk (KERN_DEBUG "(%s, %d): %s: ", \
> + __FILE__, __LINE__, __func__); \
> + printk (fmt, ## a); \
> + } \
> + } while (0)
> #else
> -#define mb_debug(fmt, a...)
> +#define mb_debug(n, fmt, a...)
> #endif
>
> /*


2009-08-10 03:44:45

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH, RFC -V2 2/4] ext4: Display the mballoc flags in mb_history in hex instead of decimal

Theodore Ts'o wrote:
> Displaying the flags in base 16 makes it easier to see which flags
> have been set.

Looks good to me,

Reviewed-by: Eric Sandeen <[email protected]>

> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/ext4.h | 22 +++++++++++-----------
> fs/ext4/mballoc.c | 4 ++--
> 2 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 9714db3..e267727 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -67,27 +67,27 @@ typedef unsigned int ext4_group_t;
>
>
> /* prefer goal again. length */
> -#define EXT4_MB_HINT_MERGE 1
> +#define EXT4_MB_HINT_MERGE 0x0001
> /* blocks already reserved */
> -#define EXT4_MB_HINT_RESERVED 2
> +#define EXT4_MB_HINT_RESERVED 0x0002
> /* metadata is being allocated */
> -#define EXT4_MB_HINT_METADATA 4
> +#define EXT4_MB_HINT_METADATA 0x0004
> /* first blocks in the file */
> -#define EXT4_MB_HINT_FIRST 8
> +#define EXT4_MB_HINT_FIRST 0x0008
> /* search for the best chunk */
> -#define EXT4_MB_HINT_BEST 16
> +#define EXT4_MB_HINT_BEST 0x0010
> /* data is being allocated */
> -#define EXT4_MB_HINT_DATA 32
> +#define EXT4_MB_HINT_DATA 0x0020
> /* don't preallocate (for tails) */
> -#define EXT4_MB_HINT_NOPREALLOC 64
> +#define EXT4_MB_HINT_NOPREALLOC 0x0040
> /* allocate for locality group */
> -#define EXT4_MB_HINT_GROUP_ALLOC 128
> +#define EXT4_MB_HINT_GROUP_ALLOC 0x0080
> /* allocate goal blocks or none */
> -#define EXT4_MB_HINT_GOAL_ONLY 256
> +#define EXT4_MB_HINT_GOAL_ONLY 0x0100
> /* goal is meaningful */
> -#define EXT4_MB_HINT_TRY_GOAL 512
> +#define EXT4_MB_HINT_TRY_GOAL 0x0200
> /* blocks already pre-reserved by delayed allocation */
> -#define EXT4_MB_DELALLOC_RESERVED 1024
> +#define EXT4_MB_DELALLOC_RESERVED 0x0400
>
>
> struct ext4_allocation_request {
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 2c81240..f510a58 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2157,7 +2157,7 @@ static int ext4_mb_seq_history_show(struct seq_file *seq, void *v)
>
> if (v == SEQ_START_TOKEN) {
> seq_printf(seq, "%-5s %-8s %-23s %-23s %-23s %-5s "
> - "%-5s %-2s %-5s %-5s %-5s %-6s\n",
> + "%-5s %-2s %-6s %-5s %-5s %-6s\n",
> "pid", "inode", "original", "goal", "result", "found",
> "grps", "cr", "flags", "merge", "tail", "broken");
> return 0;
> @@ -2165,7 +2165,7 @@ static int ext4_mb_seq_history_show(struct seq_file *seq, void *v)
>
> if (hs->op == EXT4_MB_HISTORY_ALLOC) {
> fmt = "%-5u %-8u %-23s %-23s %-23s %-5u %-5u %-2u "
> - "%-5u %-5s %-5u %-6u\n";
> + "0x%04x %-5s %-5u %-6u\n";
> sprintf(buf2, "%u/%d/%u@%u", hs->result.fe_group,
> hs->result.fe_start, hs->result.fe_len,
> hs->result.fe_logical);


2009-08-10 20:23:25

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging

On Sun, Aug 09, 2009 at 10:42:17PM -0500, Eric Sandeen wrote:
> Theodore Ts'o wrote:
> > Allow mballoc debugging to be enabled at run-time instead of just at
> > compile time.
> >
> > Signed-off-by: "Theodore Ts'o" <[email protected]>
> > ---
> > fs/ext4/Kconfig | 9 ++++++
> > fs/ext4/mballoc.c | 81 ++++++++++++++++++++++++++++++++++++++--------------
> > fs/ext4/mballoc.h | 16 ++++++++--
> > 3 files changed, 80 insertions(+), 26 deletions(-)
>
> This looks fine to me, though is there any reason to add the debug
> verbosity level at this point, when it's all just "1?"

What I'm thinking about doing is to use a bitmask instead of a
verbosity level. That way it will be possible to selectively enable
some debugging print messages and not others.

In the long run the right answer is to use ftrace instead, but this is
faster and more convenient when doing surgery/development on the
mballoc code.

- Ted

2009-08-11 18:16:02

by Xiang Wang

[permalink] [raw]
Subject: Re: [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging

Hi Ted,

I tried to apply this patch to our kernel, but I encountered some
problems in building the kernel:

FATAL: fs/ext4/ext4.o(.text+0x24f2d): Section mismatch in reference
from the function exit_ext4_mballoc() to the function
.exit.text:ext4_remove_debugfs_entry()

Looking at the code, the problem seems to be that, exit_ext4_mballoc
calls ext4_remove_debugfs_entry. And ext4_remove_debugfs_entry has the
"__exit" annotation while exit_ext4_mballoc does not. I tried removing
the "__exit" annotation from ext4_remove_debugfs_entry and it builds
well.

I am wondering whether you have seen any similar problems.

Thanks,
Xiang

On Sun, Aug 9, 2009 at 8:23 PM, Theodore Ts'o<[email protected]> wrote:
> Allow mballoc debugging to be enabled at run-time instead of just at
> compile time.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> ?fs/ext4/Kconfig ? | ? ?9 ++++++
> ?fs/ext4/mballoc.c | ? 81 ++++++++++++++++++++++++++++++++++++++--------------
> ?fs/ext4/mballoc.h | ? 16 ++++++++--
> ?3 files changed, 80 insertions(+), 26 deletions(-)
>
> diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
> index 1523046..d5c0ea2 100644
> --- a/fs/ext4/Kconfig
> +++ b/fs/ext4/Kconfig
> @@ -77,3 +77,12 @@ config EXT4_FS_SECURITY
>
> ? ? ? ? ?If you are not using a security module that requires using
> ? ? ? ? ?extended attributes for file security labels, say N.
> +
> +config EXT4_DEBUG
> + ? ? ? bool "EXT4 debugging support"
> + ? ? ? depends on EXT4_FS
> + ? ? ? help
> + ? ? ? ? Enables run-time debugging support for the ext4 filesystem.
> +
> + ? ? ? ? If you select Y here, then you will be able to turn on debugging
> + ? ? ? ? with a command such as "echo 1 > /sys/kernel/debug/ext4/mballoc-debug"
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 68cde59..2c81240 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -22,6 +22,7 @@
> ?*/
>
> ?#include "mballoc.h"
> +#include <linux/debugfs.h>
> ?#include <trace/events/ext4.h>
>
> ?/*
> @@ -743,7 +744,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
> ? ? ? ?char *data;
> ? ? ? ?char *bitmap;
>
> - ? ? ? mb_debug("init page %lu\n", page->index);
> + ? ? ? mb_debug(1, "init page %lu\n", page->index);
>
> ? ? ? ?inode = page->mapping->host;
> ? ? ? ?sb = inode->i_sb;
> @@ -822,7 +823,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
> ? ? ? ? ? ? ? ?set_bitmap_uptodate(bh[i]);
> ? ? ? ? ? ? ? ?bh[i]->b_end_io = end_buffer_read_sync;
> ? ? ? ? ? ? ? ?submit_bh(READ, bh[i]);
> - ? ? ? ? ? ? ? mb_debug("read bitmap for group %u\n", first_group + i);
> + ? ? ? ? ? ? ? mb_debug(1, "read bitmap for group %u\n", first_group + i);
> ? ? ? ?}
>
> ? ? ? ?/* wait for I/O completion */
> @@ -862,7 +863,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
> ? ? ? ? ? ? ? ?if ((first_block + i) & 1) {
> ? ? ? ? ? ? ? ? ? ? ? ?/* this is block of buddy */
> ? ? ? ? ? ? ? ? ? ? ? ?BUG_ON(incore == NULL);
> - ? ? ? ? ? ? ? ? ? ? ? mb_debug("put buddy for group %u in page %lu/%x\n",
> + ? ? ? ? ? ? ? ? ? ? ? mb_debug(1, "put buddy for group %u in page %lu/%x\n",
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?group, page->index, i * blocksize);
> ? ? ? ? ? ? ? ? ? ? ? ?grinfo = ext4_get_group_info(sb, group);
> ? ? ? ? ? ? ? ? ? ? ? ?grinfo->bb_fragments = 0;
> @@ -878,7 +879,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
> ? ? ? ? ? ? ? ?} else {
> ? ? ? ? ? ? ? ? ? ? ? ?/* this is block of bitmap */
> ? ? ? ? ? ? ? ? ? ? ? ?BUG_ON(incore != NULL);
> - ? ? ? ? ? ? ? ? ? ? ? mb_debug("put bitmap for group %u in page %lu/%x\n",
> + ? ? ? ? ? ? ? ? ? ? ? mb_debug(1, "put bitmap for group %u in page %lu/%x\n",
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?group, page->index, i * blocksize);
>
> ? ? ? ? ? ? ? ? ? ? ? ?/* see comments in ext4_mb_put_pa() */
> @@ -922,7 +923,7 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
> ? ? ? ?struct ext4_sb_info *sbi = EXT4_SB(sb);
> ? ? ? ?struct inode *inode = sbi->s_buddy_cache;
>
> - ? ? ? mb_debug("load group %u\n", group);
> + ? ? ? mb_debug(1, "load group %u\n", group);
>
> ? ? ? ?blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
> ? ? ? ?grp = ext4_get_group_info(sb, group);
> @@ -1851,7 +1852,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
> ? ? ? ?struct inode *inode = sbi->s_buddy_cache;
> ? ? ? ?struct page *page = NULL, *bitmap_page = NULL;
>
> - ? ? ? mb_debug("init group %u\n", group);
> + ? ? ? mb_debug(1, "init group %u\n", group);
> ? ? ? ?blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
> ? ? ? ?this_grp = ext4_get_group_info(sb, group);
> ? ? ? ?/*
> @@ -2739,7 +2740,7 @@ static void ext4_mb_cleanup_pa(struct ext4_group_info *grp)
> ? ? ? ? ? ? ? ?kmem_cache_free(ext4_pspace_cachep, pa);
> ? ? ? ?}
> ? ? ? ?if (count)
> - ? ? ? ? ? ? ? mb_debug("mballoc: %u PAs left\n", count);
> + ? ? ? ? ? ? ? mb_debug(1, "mballoc: %u PAs left\n", count);
>
> ?}
>
> @@ -2820,7 +2821,7 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
> ? ? ? ?list_for_each_safe(l, ltmp, &txn->t_private_list) {
> ? ? ? ? ? ? ? ?entry = list_entry(l, struct ext4_free_data, list);
>
> - ? ? ? ? ? ? ? mb_debug("gonna free %u blocks in group %u (0x%p):",
> + ? ? ? ? ? ? ? mb_debug(1, "gonna free %u blocks in group %u (0x%p):",
> ? ? ? ? ? ? ? ? ? ? ? ? entry->count, entry->group, entry);
>
> ? ? ? ? ? ? ? ?err = ext4_mb_load_buddy(sb, entry->group, &e4b);
> @@ -2855,9 +2856,43 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
> ? ? ? ? ? ? ? ?ext4_mb_release_desc(&e4b);
> ? ? ? ?}
>
> - ? ? ? mb_debug("freed %u blocks in %u structures\n", count, count2);
> + ? ? ? mb_debug(1, "freed %u blocks in %u structures\n", count, count2);
> ?}
>
> +#ifdef CONFIG_EXT4_DEBUG
> +u8 mb_enable_debug __read_mostly;
> +
> +static struct dentry *debugfs_dir;
> +static struct dentry *debugfs_debug;
> +
> +static void __init ext4_create_debugfs_entry(void)
> +{
> + ? ? ? debugfs_dir = debugfs_create_dir("ext4", NULL);
> + ? ? ? if (debugfs_dir)
> + ? ? ? ? ? ? ? debugfs_debug = debugfs_create_u8("mballoc-debug",
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? S_IRUGO | S_IWUSR,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? debugfs_dir,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? &mb_enable_debug);
> +}
> +
> +static void __exit ext4_remove_debugfs_entry(void)
> +{
> + ? ? ? debugfs_remove(debugfs_debug);
> + ? ? ? debugfs_remove(debugfs_dir);
> +}
> +
> +#else
> +
> +static void __init ext4_create_debugfs_entry(void)
> +{
> +}
> +
> +static void __exit ext4_remove_debugfs_entry(void)
> +{
> +}
> +
> +#endif
> +
> ?int __init init_ext4_mballoc(void)
> ?{
> ? ? ? ?ext4_pspace_cachep =
> @@ -2885,6 +2920,7 @@ int __init init_ext4_mballoc(void)
> ? ? ? ? ? ? ? ?kmem_cache_destroy(ext4_ac_cachep);
> ? ? ? ? ? ? ? ?return -ENOMEM;
> ? ? ? ?}
> + ? ? ? ext4_create_debugfs_entry();
> ? ? ? ?return 0;
> ?}
>
> @@ -2898,6 +2934,7 @@ void exit_ext4_mballoc(void)
> ? ? ? ?kmem_cache_destroy(ext4_pspace_cachep);
> ? ? ? ?kmem_cache_destroy(ext4_ac_cachep);
> ? ? ? ?kmem_cache_destroy(ext4_free_ext_cachep);
> + ? ? ? ext4_remove_debugfs_entry();
> ?}
>
>
> @@ -3042,7 +3079,7 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac)
> ? ? ? ? ? ? ? ?ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_stripe;
> ? ? ? ?else
> ? ? ? ? ? ? ? ?ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
> - ? ? ? mb_debug("#%u: goal %u blocks for locality group\n",
> + ? ? ? mb_debug(1, "#%u: goal %u blocks for locality group\n",
> ? ? ? ? ? ? ? ?current->pid, ac->ac_g_ex.fe_len);
> ?}
>
> @@ -3232,7 +3269,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
> ? ? ? ? ? ? ? ?ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL;
> ? ? ? ?}
>
> - ? ? ? mb_debug("goal: %u(was %u) blocks at %u\n", (unsigned) size,
> + ? ? ? mb_debug(1, "goal: %u(was %u) blocks at %u\n", (unsigned) size,
> ? ? ? ? ? ? ? ?(unsigned) orig_size, (unsigned) start);
> ?}
>
> @@ -3281,7 +3318,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
> ? ? ? ?BUG_ON(pa->pa_free < len);
> ? ? ? ?pa->pa_free -= len;
>
> - ? ? ? mb_debug("use %llu/%u from inode pa %p\n", start, len, pa);
> + ? ? ? mb_debug(1, "use %llu/%u from inode pa %p\n", start, len, pa);
> ?}
>
> ?/*
> @@ -3305,7 +3342,7 @@ static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
> ? ? ? ? * in on-disk bitmap -- see ext4_mb_release_context()
> ? ? ? ? * Other CPUs are prevented from allocating from this pa by lg_mutex
> ? ? ? ? */
> - ? ? ? mb_debug("use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa);
> + ? ? ? mb_debug(1, "use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa);
> ?}
>
> ?/*
> @@ -3484,7 +3521,7 @@ void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
> ? ? ? ? ? ? ? ?preallocated += len;
> ? ? ? ? ? ? ? ?count++;
> ? ? ? ?}
> - ? ? ? mb_debug("prellocated %u for group %u\n", preallocated, group);
> + ? ? ? mb_debug(1, "prellocated %u for group %u\n", preallocated, group);
> ?}
>
> ?static void ext4_mb_pa_callback(struct rcu_head *head)
> @@ -3619,7 +3656,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
> ? ? ? ?pa->pa_deleted = 0;
> ? ? ? ?pa->pa_type = MB_INODE_PA;
>
> - ? ? ? mb_debug("new inode pa %p: %llu/%u for %u\n", pa,
> + ? ? ? mb_debug(1, "new inode pa %p: %llu/%u for %u\n", pa,
> ? ? ? ? ? ? ? ? ? ? ? ?pa->pa_pstart, pa->pa_len, pa->pa_lstart);
> ? ? ? ?trace_ext4_mb_new_inode_pa(ac, pa);
>
> @@ -3679,7 +3716,7 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *ac)
> ? ? ? ?pa->pa_deleted = 0;
> ? ? ? ?pa->pa_type = MB_GROUP_PA;
>
> - ? ? ? mb_debug("new group pa %p: %llu/%u for %u\n", pa,
> + ? ? ? mb_debug(1, "new group pa %p: %llu/%u for %u\n", pa,
> ? ? ? ? ? ? ? ? ? ? ? ?pa->pa_pstart, pa->pa_len, pa->pa_lstart);
> ? ? ? ?trace_ext4_mb_new_group_pa(ac, pa);
>
> @@ -3758,7 +3795,7 @@ ext4_mb_release_inode_pa(struct ext4_buddy *e4b, struct buffer_head *bitmap_bh,
> ? ? ? ? ? ? ? ?next = mb_find_next_bit(bitmap_bh->b_data, end, bit);
> ? ? ? ? ? ? ? ?start = group * EXT4_BLOCKS_PER_GROUP(sb) + bit +
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?le32_to_cpu(sbi->s_es->s_first_data_block);
> - ? ? ? ? ? ? ? mb_debug(" ? ?free preallocated %u/%u in group %u\n",
> + ? ? ? ? ? ? ? mb_debug(1, " ? ?free preallocated %u/%u in group %u\n",
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(unsigned) start, (unsigned) next - bit,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(unsigned) group);
> ? ? ? ? ? ? ? ?free += next - bit;
> @@ -3849,7 +3886,7 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
> ? ? ? ?int busy = 0;
> ? ? ? ?int free = 0;
>
> - ? ? ? mb_debug("discard preallocation for group %u\n", group);
> + ? ? ? mb_debug(1, "discard preallocation for group %u\n", group);
>
> ? ? ? ?if (list_empty(&grp->bb_prealloc_list))
> ? ? ? ? ? ? ? ?return 0;
> @@ -3973,7 +4010,7 @@ void ext4_discard_preallocations(struct inode *inode)
> ? ? ? ? ? ? ? ?return;
> ? ? ? ?}
>
> - ? ? ? mb_debug("discard preallocation for inode %lu\n", inode->i_ino);
> + ? ? ? mb_debug(1, "discard preallocation for inode %lu\n", inode->i_ino);
> ? ? ? ?trace_ext4_discard_preallocations(inode);
>
> ? ? ? ?INIT_LIST_HEAD(&list);
> @@ -4078,7 +4115,7 @@ static void ext4_mb_return_to_preallocation(struct inode *inode,
> ?{
> ? ? ? ?BUG_ON(!list_empty(&EXT4_I(inode)->i_prealloc_list));
> ?}
> -#ifdef MB_DEBUG
> +#ifdef CONFIG_EXT4_DEBUG
> ?static void ext4_mb_show_ac(struct ext4_allocation_context *ac)
> ?{
> ? ? ? ?struct super_block *sb = ac->ac_sb;
> @@ -4227,7 +4264,7 @@ ext4_mb_initialize_context(struct ext4_allocation_context *ac,
> ? ? ? ? * locality group. this is a policy, actually */
> ? ? ? ?ext4_mb_group_or_file(ac);
>
> - ? ? ? mb_debug("init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, "
> + ? ? ? mb_debug(1, "init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, "
> ? ? ? ? ? ? ? ? ? ? ? ?"left: %u/%u, right %u/%u to %swritable\n",
> ? ? ? ? ? ? ? ? ? ? ? ?(unsigned) ar->len, (unsigned) ar->logical,
> ? ? ? ? ? ? ? ? ? ? ? ?(unsigned) ar->goal, ac->ac_flags, ac->ac_2order,
> @@ -4249,7 +4286,7 @@ ext4_mb_discard_lg_preallocations(struct super_block *sb,
> ? ? ? ?struct ext4_prealloc_space *pa, *tmp;
> ? ? ? ?struct ext4_allocation_context *ac;
>
> - ? ? ? mb_debug("discard locality group preallocation\n");
> + ? ? ? mb_debug(1, "discard locality group preallocation\n");
>
> ? ? ? ?INIT_LIST_HEAD(&discard_list);
> ? ? ? ?ac = kmem_cache_alloc(ext4_ac_cachep, GFP_NOFS);
> diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
> index c96bb19..28abb02 100644
> --- a/fs/ext4/mballoc.h
> +++ b/fs/ext4/mballoc.h
> @@ -37,11 +37,19 @@
>
> ?/*
> ?*/
> -#define MB_DEBUG__
> -#ifdef MB_DEBUG
> -#define mb_debug(fmt, a...) ? ?printk(fmt, ##a)
> +#ifdef CONFIG_EXT4_DEBUG
> +extern u8 mb_enable_debug;
> +
> +#define mb_debug(n, fmt, a...) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? \
> + ? ? ? do { ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\
> + ? ? ? ? ? ? ? if ((n) <= mb_enable_debug) { ? ? ? ? ? ? ? ? ? ? ? ? ? \
> + ? ? ? ? ? ? ? ? ? ? ? printk (KERN_DEBUG "(%s, %d): %s: ", ? ? ? ? ? ?\
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? __FILE__, __LINE__, __func__); ? ? ? ? ?\
> + ? ? ? ? ? ? ? ? ? ? ? printk (fmt, ## a); ? ? ? ? ? ? ? ? ? ? ? ? ? ? \
> + ? ? ? ? ? ? ? } ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? \
> + ? ? ? } while (0)
> ?#else
> -#define mb_debug(fmt, a...)
> +#define mb_debug(n, fmt, a...)
> ?#endif
>
> ?/*
> --
> 1.6.3.2.1.gb9f7d.dirty
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2009-08-11 18:53:58

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging

On Tue, Aug 11, 2009 at 11:15:53AM -0700, Xiang Wang wrote:
> Hi Ted,
>
> I tried to apply this patch to our kernel, but I encountered some
> problems in building the kernel:
>
> FATAL: fs/ext4/ext4.o(.text+0x24f2d): Section mismatch in reference
> from the function exit_ext4_mballoc() to the function
> .exit.text:ext4_remove_debugfs_entry()
>
> Looking at the code, the problem seems to be that, exit_ext4_mballoc
> calls ext4_remove_debugfs_entry. And ext4_remove_debugfs_entry has the
> "__exit" annotation while exit_ext4_mballoc does not. I tried removing
> the "__exit" annotation from ext4_remove_debugfs_entry and it builds
> well.

Good catch; I didn't notice because I wasn't compiling with
CONFIG_DEBUG_SECTION_MISMATCH=y. Your fix is the right one.

- Ted

2009-08-20 06:40:58

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files

On Sun, Aug 09, 2009 at 11:23:55PM -0400, Theodore Ts'o wrote:
....

>
> static inline void ext4_unlock_group(struct super_block *sb,
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index a103cb0..6c7be0d 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4191,9 +4191,17 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac)
> return;
>
> size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
> - isize = i_size_read(ac->ac_inode) >> bsbits;
> + isize = (i_size_read(ac->ac_inode) + ac->ac_sb->s_blocksize - 1)
> + >> bsbits;
> size = max(size, isize);
>
> + if ((size == isize) &&

What is this check supposed to help us ?. This would also imply we disable prealloc
only if we are allocating the last chunk in the file. Why not just

if (atomic_read(&ac->ac_inode->i_writecount) == 0) && !ext4_fs_is_busy(sbi) {
ac->ac_flags |= EXT4_MB_HINT_NOPREALLOC;
}



> + ext4_fs_is_busy(sbi) &&
> + (atomic_read(&ac->ac_inode->i_writecount) == 0)) {
> + ac->ac_flags |= EXT4_MB_HINT_NOPREALLOC;
> + return;
> + }

shouldn't it be !ext4_fs_is_busy(sbi) ?. Can you also write function documentation
for ext4_fs_is_busy. I found in confusing that you are decrementing s_lock_busy
if we are going to spin on spin_lock.


> +
> /* don't use group allocation for large files */
> if (size >= sbi->s_mb_stream_request) {
> ac->ac_flags |= EXT4_MB_STREAM_ALLOC;
> --

-aneesh

2009-08-20 07:22:50

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH, RFC -V2 3/4] ext4: Fix bugs in mballoc's stream allocation mode

On Sun, Aug 09, 2009 at 11:23:54PM -0400, Theodore Ts'o wrote:
> The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all
> screwed up. These fields were getting unconditionally all the time,
> set even when stream allocation had not taken place, and if they were
> being used when the file was smaller than s_mb_stream_request, which
> is when the allocation should _not_ be doing stream allocation.
>
> Fix this by determining whether or not we stream allocation should
> take place once, in ext4_mb_group_or_file(), and setting a flag which
> gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found().
> This simplifies the code and assures that we are consistently using
> (or not using) the stream allocation logic.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/ext4.h | 2 ++
> fs/ext4/mballoc.c | 23 ++++++++++-------------
> 2 files changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index e267727..70aa951 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -88,6 +88,8 @@ typedef unsigned int ext4_group_t;
> #define EXT4_MB_HINT_TRY_GOAL 0x0200
> /* blocks already pre-reserved by delayed allocation */
> #define EXT4_MB_DELALLOC_RESERVED 0x0400
> +/* We are doing stream allocation */
> +#define EXT4_MB_STREAM_ALLOC 0x0800
>
>
> struct ext4_allocation_request {
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index f510a58..a103cb0 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1361,7 +1361,7 @@ static void ext4_mb_use_best_found(struct ext4_allocation_context *ac,
> ac->alloc_semp = e4b->alloc_semp;
> e4b->alloc_semp = NULL;
> /* store last allocated for subsequent stream allocation */
> - if ((ac->ac_flags & EXT4_MB_HINT_DATA)) {
> + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) {
> spin_lock(&sbi->s_md_lock);
> sbi->s_mb_last_group = ac->ac_f_ex.fe_group;
> sbi->s_mb_last_start = ac->ac_f_ex.fe_start;
> @@ -1939,7 +1939,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
> struct ext4_sb_info *sbi;
> struct super_block *sb;
> struct ext4_buddy e4b;
> - loff_t size, isize;
>
> sb = ac->ac_sb;
> sbi = EXT4_SB(sb);
> @@ -1975,20 +1974,16 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
> }
>
> bsbits = ac->ac_sb->s_blocksize_bits;
> - /* if stream allocation is enabled, use global goal */
> - size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
> - isize = i_size_read(ac->ac_inode) >> bsbits;
> - if (size < isize)
> - size = isize;
>
> - if (size < sbi->s_mb_stream_request &&
> - (ac->ac_flags & EXT4_MB_HINT_DATA)) {
> + /* if stream allocation is enabled, use global goal */
> + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) {
> /* TBD: may be hot point */
> spin_lock(&sbi->s_md_lock);
> ac->ac_g_ex.fe_group = sbi->s_mb_last_group;
> ac->ac_g_ex.fe_start = sbi->s_mb_last_start;
> spin_unlock(&sbi->s_md_lock);
> }
> +
> /* Let's just scan groups to find more-less suitable blocks */
> cr = ac->ac_2order ? 0 : 1;
> /*
> @@ -4192,16 +4187,18 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac)
> if (!(ac->ac_flags & EXT4_MB_HINT_DATA))
> return;
>
> + if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
> + return;
> +
> size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
> isize = i_size_read(ac->ac_inode) >> bsbits;
> size = max(size, isize);
>
> /* don't use group allocation for large files */
> - if (size >= sbi->s_mb_stream_request)
> - return;
> -
> - if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
> + if (size >= sbi->s_mb_stream_request) {
> + ac->ac_flags |= EXT4_MB_STREAM_ALLOC;
> return;
> + }
>
> BUG_ON(ac->ac_lg != NULL);
> /*

NAK.

This would give bad allocation pattern for large files. We should be using global goal only for small files
not for large files. Large files should be using neighbour allocated extent block as the goal, so that
we get contiguous blocks.

-aneesh

2009-08-20 18:20:10

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC -V2 3/4] ext4: Fix bugs in mballoc's stream allocation mode

On Thu, Aug 20, 2009 at 12:52:38PM +0530, Aneesh Kumar K.V wrote:
>
> This would give bad allocation pattern for large files. We should be
> using global goal only for small files not for large files.

Huh? Small files should be allocated within their flex_bg close to
their parent directories, right? Large files are supposed to
allocated globally, potentially outside of the flex_bg so they won't
chew up all of the space in the local flex_bg.

Also, the comments ext4.h for s_mb_last_group and s_mb_last_start
indicate: "where last allocation was done - for stream allocation".
If your interpretation was correct, the comment would be wrong.

> Large files should be using neighbour allocated extent block as the
> goal, so that we get contiguous blocks.

We do use the neighbour allocated extent block as the goal. The code
in question here is used only when the ext4_mb_find_by_goal() has
failed.

This brings up the larger problem which is the mballoc code is
extremely hard to understand, and not sufficiently documented, where
the algorithm is broken up so many pieces that unless you spend a long
time mind-melding with the code, it's sometimes very hard to get a
mental map of the forest versus the trees.

- Ted

2009-08-21 02:45:46

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files

On Thu, Aug 20, 2009 at 12:10:35PM +0530, Aneesh Kumar K.V wrote:
> > + if ((size == isize) &&
>
> What is this check supposed to help us ?. This would also imply we
> disable prealloc only if we are allocating the last chunk in the
> file.

That was the idea, yes; the idea was to disable preallocation if the
file is small enough that it could be written in a single call to
ext4_da_writepages, or if we are allocating/writing the last chunk in
a file. Otherwise, preallocation would be a good thing.

> shouldn't it be !ext4_fs_is_busy(sbi) ?. Can you also write function
> documentation for ext4_fs_is_busy. I found in confusing that you are
> decrementing s_lock_busy if we are going to spin on spin_lock.

Um, oops. Yeah, good point. It should be !ext4_fs_is_busy(). I also
have the logic backwards in ext4_lock_group as well, so the two errors
mostly cancel each other out. I'll fix that in the patch.

- Ted

2009-08-21 12:23:15

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files

On Thu, Aug 20, 2009 at 10:45:45PM -0400, Theodore Tso wrote:
> Um, oops. Yeah, good point. It should be !ext4_fs_is_busy(). I also
> have the logic backwards in ext4_lock_group as well, so the two errors
> mostly cancel each other out. I'll fix that in the patch.

... and here's the revised patch.

- Ted

ext4: Avoid group preallocation for closed files

Currently the group preallocation code tries to find a large (512)
free block from which to do per-cpu group allocation for small files.
The problem with this scheme is that it leaves the filesystem horribly
fragmented. In the worst case, if the filesystem is unmounted and
remounted (after a system shutdown, for example) we forget the fact
that wee were using a particular (now-partially filled) 512 block
extent. So the next time we try to allocate space for a small file,
we will find *another* completely free 512 block chunk to allocate
small files. Given that there are 32,768 blocks in a block group,
after 64 iterations of "mount, write one 4k file in a directory,
unmount", the block group will have 64 files, each separated by 511
blocks, and the block group will no longer have any free 512
completely free chunks of blocks for group preallocation space.

So if we try to allocate blocks for a file that has been closed, such
that we know the final size of the file, and the filesystem is not
busy, avoid using group preallocation.

Signed-off-by: "Theodore Ts'o" <[email protected]>
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 70aa951..b989920 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -952,6 +952,7 @@ struct ext4_sb_info {
atomic_t s_mb_lost_chunks;
atomic_t s_mb_preallocated;
atomic_t s_mb_discarded;
+ atomic_t s_lock_busy;

/* locality groups */
struct ext4_locality_group *s_locality_groups;
@@ -1593,15 +1594,42 @@ struct ext4_group_info {
#define EXT4_MB_GRP_NEED_INIT(grp) \
(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))

+#define EXT4_MAX_CONTENTION 8
+#define EXT4_CONTENTION_THRESHOLD 2
+
static inline spinlock_t *ext4_group_lock_ptr(struct super_block *sb,
ext4_group_t group)
{
return bgl_lock_ptr(EXT4_SB(sb)->s_blockgroup_lock, group);
}

+/*
+ * Returns true if the filesystem is busy enough that attempts to
+ * access the block group locks has run into contention.
+ */
+static inline int ext4_fs_is_busy(struct ext4_sb_info *sbi)
+{
+ return (atomic_read(&sbi->s_lock_busy) > EXT4_CONTENTION_THRESHOLD);
+}
+
static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group)
{
- spin_lock(ext4_group_lock_ptr(sb, group));
+ spinlock_t *lock = ext4_group_lock_ptr(sb, group);
+ if (spin_trylock(lock))
+ /*
+ * We're able to grab the lock right away, so drop the
+ * lock contention counter.
+ */
+ atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, -1, 0);
+ else {
+ /*
+ * The lock is busy, so bump the contention counter,
+ * and then wait on the spin lock.
+ */
+ atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, 1,
+ EXT4_MAX_CONTENTION);
+ spin_lock(lock);
+ }
}

static inline void ext4_unlock_group(struct super_block *sb,
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a103cb0..ee563e2 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4191,9 +4191,17 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac)
return;

size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
- isize = i_size_read(ac->ac_inode) >> bsbits;
+ isize = (i_size_read(ac->ac_inode) + ac->ac_sb->s_blocksize - 1)
+ >> bsbits;
size = max(size, isize);

+ if ((size == isize) &&
+ !ext4_fs_is_busy(sbi) &&
+ (atomic_read(&ac->ac_inode->i_writecount) == 0)) {
+ ac->ac_flags |= EXT4_MB_HINT_NOPREALLOC;
+ return;
+ }
+
/* don't use group allocation for large files */
if (size >= sbi->s_mb_stream_request) {
ac->ac_flags |= EXT4_MB_STREAM_ALLOC;