2007-08-13 10:22:34

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [RFC] mballoc patches


Alex actually pointed me the new mballoc patches at
ftp://ftp.clusterfs.com/pub/people/alex/mballoc3

The series is the forward port of the same on top of
d4ac2477fad0f2680e84ec12e387ce67682c5c13 (v2.6.23-rc2)


I guess the mballoc3 patch at clusterfs.com is based on
a patched ext3(I guess it is ext3 + extent tree local to
clusterfs ? ). The following series is based on ext4.


I tested the changes with different blocks per group
combination and it seems to be working fine. (The last
series did panic with mke2fs -g 800 )


I will now look at splitting this to smaller patches.

NOTE : I am not sure whether patch 2 will be able to make the list.
if not is there a place i can ftp/scp them so that others can access
the same ?

I haven't split the patches in any order and didn't bothered to
write any meaningful commit messages since this is mostly work in
progress.

-aneesh


2007-08-13 10:24:22

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 1/4] Add some new function for searching extent tree.

From: Alex Tomas <[email protected]>

ext4_ext_search_left
ext4_ext_search_right
---
fs/ext4/extents.c | 142 +++++++++++++++++++++++++++++++++++++++
include/linux/ext4_fs_extents.h | 2 +
2 files changed, 144 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 78beb09..3084e09 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1015,6 +1015,148 @@ out:
}

/*
+ * search the closest allocated block to the left for *logical
+ * and returns it at @logical + it's physical address at @phys
+ * if *logical is the smallest allocated block, the function
+ * returns 0 at @phys
+ * return value contains 0 (success) or error code
+ */
+int
+ext4_ext_search_left(struct inode *inode, struct ext4_ext_path *path,
+ ext4_fsblk_t *logical, ext4_fsblk_t *phys)
+{
+ struct ext4_extent_idx *ix;
+ struct ext4_extent *ex;
+ int depth;
+
+ BUG_ON(path == NULL);
+ depth = path->p_depth;
+ *phys = 0;
+
+ if (depth == 0 && path->p_ext == NULL)
+ return 0;
+
+ /* usually extent in the path covers blocks smaller
+ * then *logical, but it can be that extent is the
+ * first one in the file */
+
+ ex = path[depth].p_ext;
+ if (*logical < le32_to_cpu(ex->ee_block)) {
+ BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
+ while (--depth >= 0) {
+ ix = path[depth].p_idx;
+ BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr));
+ }
+ return 0;
+ }
+
+ BUG_ON(*logical < le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len));
+
+ *logical = le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len) - 1;
+ *phys = ext_pblock(ex) + le16_to_cpu(ex->ee_len) - 1;
+ return 0;
+}
+
+/*
+ * search the closest allocated block to the right for *logical
+ * and returns it at @logical + it's physical address at @phys
+ * if *logical is the smallest allocated block, the function
+ * returns 0 at @phys
+ * return value contains 0 (success) or error code
+ */
+int
+ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path,
+ ext4_fsblk_t *logical, ext4_fsblk_t *phys)
+{
+ struct buffer_head *bh = NULL;
+ struct ext4_extent_header *eh;
+ struct ext4_extent_idx *ix;
+ struct ext4_extent *ex;
+ ext4_fsblk_t block;
+ int depth;
+
+ BUG_ON(path == NULL);
+ depth = path->p_depth;
+ *phys = 0;
+
+ if (depth == 0 && path->p_ext == NULL)
+ return 0;
+
+ /* usually extent in the path covers blocks smaller
+ * then *logical, but it can be that extent is the
+ * first one in the file */
+
+ ex = path[depth].p_ext;
+ if (*logical < le32_to_cpu(ex->ee_block)) {
+ BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
+ while (--depth >= 0) {
+ ix = path[depth].p_idx;
+ BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr));
+ }
+ *logical = le32_to_cpu(ex->ee_block);
+ *phys = ext_pblock(ex);
+ return 0;
+ }
+
+ BUG_ON(*logical < le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len));
+
+ if (ex != EXT_LAST_EXTENT(path[depth].p_hdr)) {
+ /* next allocated block in this leaf */
+ ex++;
+ *logical = le32_to_cpu(ex->ee_block);
+ *phys = ext_pblock(ex);
+ return 0;
+ }
+
+ /* go up and search for index to the right */
+ while (--depth >= 0) {
+ ix = path[depth].p_idx;
+ if (ix != EXT_LAST_INDEX(path[depth].p_hdr))
+ break;
+ }
+
+ if (depth < 0) {
+ /* we've gone up to the root and
+ * found no index to the right */
+ return 0;
+ }
+
+ /* we've found index to the right, let's
+ * follow it and find the closest allocated
+ * block to the right */
+ ix++;
+ block = idx_pblock(ix);
+ while (++depth < path->p_depth) {
+ bh = sb_bread(inode->i_sb, block);
+ if (bh == NULL)
+ return -EIO;
+ eh = ext_block_hdr(bh);
+ if (ext4_ext_check_header(inode, eh, depth)) {
+ brelse(bh);
+ return -EIO;
+ }
+ ix = EXT_FIRST_INDEX(eh);
+ block = idx_pblock(ix);
+ brelse(bh);
+ }
+
+ bh = sb_bread(inode->i_sb, block);
+ if (bh == NULL)
+ return -EIO;
+ eh = ext_block_hdr(bh);
+ if (ext4_ext_check_header(inode, eh, depth)) {
+ brelse(bh);
+ return -EIO;
+ }
+ ex = EXT_FIRST_EXTENT(eh);
+ *logical = le32_to_cpu(ex->ee_block);
+ *phys = ext_pblock(ex);
+ brelse(bh);
+ return 0;
+
+}
+
+/*
* ext4_ext_next_allocated_block:
* returns allocated block in subsequent extent or EXT_MAX_BLOCK.
* NOTE: it considers block number from index entry as
diff --git a/include/linux/ext4_fs_extents.h b/include/linux/ext4_fs_extents.h
index 81406f3..5ed0891 100644
--- a/include/linux/ext4_fs_extents.h
+++ b/include/linux/ext4_fs_extents.h
@@ -236,5 +236,7 @@ extern int ext4_ext_insert_extent(handle_t *, struct inode *, struct ext4_ext_pa
extern int ext4_ext_walk_space(struct inode *, unsigned long, unsigned long, ext_prepare_callback, void *);
extern struct ext4_ext_path * ext4_ext_find_extent(struct inode *, int, struct ext4_ext_path *);

+extern int ext4_ext_search_left(struct inode *, struct ext4_ext_path *, ext4_fsblk_t *, ext4_fsblk_t *);
+extern int ext4_ext_search_right(struct inode *, struct ext4_ext_path *, ext4_fsblk_t *, ext4_fsblk_t *);
#endif /* _LINUX_EXT4_EXTENTS */

--
1.5.3.rc4.67.gf9286-dirty

2007-08-13 16:38:47

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [RFC] mballoc patches



Aneesh Kumar K.V wrote:
> Alex actually pointed me the new mballoc patches at
> ftp://ftp.clusterfs.com/pub/people/alex/mballoc3
>
> The series is the forward port of the same on top of
> d4ac2477fad0f2680e84ec12e387ce67682c5c13 (v2.6.23-rc2)
>
>
> I guess the mballoc3 patch at clusterfs.com is based on
> a patched ext3(I guess it is ext3 + extent tree local to
> clusterfs ? ). The following series is based on ext4.
>
>
> I tested the changes with different blocks per group
> combination and it seems to be working fine. (The last
> series did panic with mke2fs -g 800 )
>
>
> I will now look at splitting this to smaller patches.
>
> NOTE : I am not sure whether patch 2 will be able to make the list.
> if not is there a place i can ftp/scp them so that others can access
> the same ?
>
>

I am attaching below the PATCH 2/4 in .gz format. The uncompressed one
got dropped by the list.

-aneesh


Attachments:
0002-mballoc-core-patch.patch.gz (30.47 kB)

2007-08-21 21:07:16

by Eric Sandeen

[permalink] [raw]
Subject: Re: [RFC] mballoc patches

Aneesh Kumar K.V wrote:
>
>
> I am attaching below the PATCH 2/4 in .gz format. The uncompressed one
> got dropped by the list.
>
> -aneesh

hmm makes it hard to comment in-line though :) So basing this on the
patch currently in the git repo.

+/*
+ * default stripe size = 1MB
+ */
+#define MB_DEFAULT_STRIPE 256

Units? Doesn't seem to matter anyway as it's never referenced.

+ /* tunables */
+ unsigned long s_mb_factor;
+ unsigned long s_stripe;
+ unsigned long s_mb_small_req;
+ unsigned long s_mb_large_req;
+ unsigned long s_mb_max_to_scan;
+ unsigned long s_mb_min_to_scan;

could we get some comments here as to what these are, and what units?

Same is true many places... for example

+static int mb_find_extent(struct ext4_buddy *e3b, int order, int block,
+ int needed, struct ext4_free_extent *ex)

how many "what" are needed?

And perhaps an addition of the new mount options to
Documentation/fs/ext4.txt would be good.

+#define EXT4_MB_BITMAP(e3b) ((e3b)->bd_bitmap)
+#define EXT4_MB_BUDDY(e3b) ((e3b)->bd_buddy)

For the sake of consistency should these (and others) be e4b?

Also there are a *lot* of BUGs and BUG_ONs added in this patch... are
none of these recoverable?

Thanks,

-Eric

2007-09-10 12:30:39

by Alex Tomas

[permalink] [raw]
Subject: Re: [RFC] mballoc patches

Eric Sandeen wrote:
> +/*
> + * default stripe size = 1MB
> + */
> +#define MB_DEFAULT_STRIPE 256

agree, though seems we'd better make it blocksize-insensitive

>
> Units? Doesn't seem to matter anyway as it's never referenced.
>
> + /* tunables */
> + unsigned long s_mb_factor;
> + unsigned long s_stripe;
> + unsigned long s_mb_small_req;
> + unsigned long s_mb_large_req;
> + unsigned long s_mb_max_to_scan;
> + unsigned long s_mb_min_to_scan;
>
> could we get some comments here as to what these are, and what units?

OK, I'll do as well as policy (and tunning) description.

>
> Same is true many places... for example
>
> +static int mb_find_extent(struct ext4_buddy *e3b, int order, int block,
> + int needed, struct ext4_free_extent *ex)
>
> how many "what" are needed?

well, blocks :)

> And perhaps an addition of the new mount options to
> Documentation/fs/ext4.txt would be good.
>
> +#define EXT4_MB_BITMAP(e3b) ((e3b)->bd_bitmap)
> +#define EXT4_MB_BUDDY(e3b) ((e3b)->bd_buddy)
>
> For the sake of consistency should these (and others) be e4b?

OK

>
> Also there are a *lot* of BUGs and BUG_ONs added in this patch... are
> none of these recoverable?

well, I'll review the code in this regard again, but most of them are not.
not that I like kernel panics, but BUG_ON() are very helpful to maintain
code, especially in long-term.

thanks, Alex

2007-09-10 14:37:35

by Eric Sandeen

[permalink] [raw]
Subject: Re: [RFC] mballoc patches

Alex Tomas wrote:
> Eric Sandeen wrote:
>> +/*
>> + * default stripe size = 1MB
>> + */
>> +#define MB_DEFAULT_STRIPE 256
>
> agree, though seems we'd better make it blocksize-insensitive

Well, for now this isn't even used at *all* so might as well just remove it.

>> +static int mb_find_extent(struct ext4_buddy *e3b, int order, int block,
>> + int needed, struct ext4_free_extent *ex)
>>
>> how many "what" are needed?
>
> well, blocks :)

I figured, though bytes is a possibility. :) In general, being
explicit about units helps; xfs suffers a bit from byte/block/sector
confusion in the code, in some places.

>> Also there are a *lot* of BUGs and BUG_ONs added in this patch... are
>> none of these recoverable?
>
> well, I'll review the code in this regard again, but most of them are not.
> not that I like kernel panics, but BUG_ON() are very helpful to maintain
> code, especially in long-term.

Ok, I just think there might be upstream pushback on these, at one point
at least they were trying to reduce the nr. of BUG calls in the code...

Thanks,
-Eric