2011-03-24 16:58:28

by Amir G.

[permalink] [raw]
Subject: [PATCHSET v2] ext4: removal of alloc_sem locks from block allocation paths

The purpose of this patch set is the removal of grp->alloc_sem locks
from block allocation paths.

The resulting code is cleaner and should perform better in concurrent
allocating tasks workloads.

I ran several xfstests runs with these patches (4K and 1K block size).
I tried several online resizes and verifyed that both in-core and on-disk
group counters are correct.

v2->v1:
- fix silly bug in patch 4/5 that triggers BUG_ON(incore == NULL)
- replace get_undo_access() with get_write_access()
- ran xfstests with block size 1K (where 2 groups share a buddy page)

[PATCH v2 1/5] ext4: move ext4_add_groupblocks() to mballoc.c
[PATCH v2 2/5] ext4: implement ext4_add_groupblocks() by freeing blocks
[PATCH v2 3/5] ext4: synchronize ext4_mb_init_group() with buddy page lock
[PATCH v2 4/5] ext4: teach ext4_mb_init_cache() to skip uptodate buddy caches
[PATCH v2 5/5] ext4: remove alloc_semp



2011-03-24 16:58:32

by Amir G.

[permalink] [raw]
Subject: [PATCH v2 2/5] ext4: implement ext4_add_groupblocks() by freeing blocks

From: Amir Goldstein <[email protected]>

The old imlementation used to take grp->alloc_sem and set the
GROUP_NEED_INIT flag, so that the buddy cache would be reloaded.

The new implementation updates the buddy cache by freeing the added
blocks and making them available for use, so there is no need to
reload the buddy cache and there is no need to take grp->alloc_sem.

Signed-off-by: Amir Goldstein <[email protected]>
---
fs/ext4/mballoc.c | 35 ++++++++++++++++++-----------------
1 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 1e3c826..da52bd9 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4710,9 +4710,7 @@ error_return:
* @block: start physcial block to add to the block group
* @count: number of blocks to free
*
- * This marks the blocks as free in the bitmap. We ask the
- * mballoc to reload the buddy after this by setting group
- * EXT4_GROUP_INFO_NEED_INIT_BIT flag
+ * This marks the blocks as free in the bitmap and buddy.
*/
void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
ext4_fsblk_t block, unsigned long count)
@@ -4724,6 +4722,7 @@ void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
unsigned int i;
struct ext4_group_desc *desc;
struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_buddy e4b;
int err = 0, ret, blk_free_count;
ext4_grpblk_t blocks_freed;
struct ext4_group_info *grp;
@@ -4771,15 +4770,10 @@ void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
err = ext4_journal_get_write_access(handle, gd_bh);
if (err)
goto error_return;
- /*
- * make sure we don't allow a parallel init on other groups in the
- * same buddy cache
- */
- down_write(&grp->alloc_sem);
+
for (i = 0, blocks_freed = 0; i < count; i++) {
BUFFER_TRACE(bitmap_bh, "clear bit");
- if (!ext4_clear_bit_atomic(ext4_group_lock_ptr(sb, block_group),
- bit + i, bitmap_bh->b_data)) {
+ if (!mb_test_bit(bit + i, bitmap_bh->b_data)) {
ext4_error(sb, "bit already cleared for block %llu",
(ext4_fsblk_t)(block + i));
BUFFER_TRACE(bitmap_bh, "bit already cleared");
@@ -4787,7 +4781,19 @@ void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
blocks_freed++;
}
}
+
+ err = ext4_mb_load_buddy(sb, block_group, &e4b);
+ if (err)
+ goto error_return;
+
+ /*
+ * need to update group_info->bb_free and bitmap
+ * with group lock held. generate_buddy look at
+ * them with group lock_held
+ */
ext4_lock_group(sb, block_group);
+ mb_clear_bits(bitmap_bh->b_data, bit, count);
+ mb_free_blocks(NULL, &e4b, bit, count);
blk_free_count = blocks_freed + ext4_free_blks_count(sb, desc);
ext4_free_blks_set(sb, desc, blk_free_count);
desc->bg_checksum = ext4_group_desc_csum(sbi, block_group, desc);
@@ -4799,13 +4805,8 @@ void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
atomic_add(blocks_freed,
&sbi->s_flex_groups[flex_group].free_blocks);
}
- /*
- * request to reload the buddy with the
- * new bitmap information
- */
- set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &(grp->bb_state));
- grp->bb_free += blocks_freed;
- up_write(&grp->alloc_sem);
+
+ ext4_mb_unload_buddy(&e4b);

/* We dirtied the bitmap block */
BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
--
1.7.0.4


2011-03-24 16:58:30

by Amir G.

[permalink] [raw]
Subject: [PATCH v2 1/5] ext4: move ext4_add_groupblocks() to mballoc.c

From: Amir Goldstein <[email protected]>

In preparation for the next patch, the function ext4_add_groupblocks()
is moved to mballoc.c, where it could use some static functions.

I also fixed a checkpatch warning and replaced obsolete get_undo_access
for bitmap with get_write_access.

Signed-off-by: Amir Goldstein <[email protected]>
---
fs/ext4/balloc.c | 124 -----------------------------------------------------
fs/ext4/mballoc.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 120 insertions(+), 124 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index adf96b8..1288f80 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -359,130 +359,6 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
}

/**
- * ext4_add_groupblocks() -- Add given blocks to an existing group
- * @handle: handle to this transaction
- * @sb: super block
- * @block: start physcial block to add to the block group
- * @count: number of blocks to free
- *
- * This marks the blocks as free in the bitmap. We ask the
- * mballoc to reload the buddy after this by setting group
- * EXT4_GROUP_INFO_NEED_INIT_BIT flag
- */
-void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
- ext4_fsblk_t block, unsigned long count)
-{
- struct buffer_head *bitmap_bh = NULL;
- struct buffer_head *gd_bh;
- ext4_group_t block_group;
- ext4_grpblk_t bit;
- unsigned int i;
- struct ext4_group_desc *desc;
- struct ext4_sb_info *sbi = EXT4_SB(sb);
- int err = 0, ret, blk_free_count;
- ext4_grpblk_t blocks_freed;
- struct ext4_group_info *grp;
-
- ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1);
-
- ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
- grp = ext4_get_group_info(sb, block_group);
- /*
- * Check to see if we are freeing blocks across a group
- * boundary.
- */
- if (bit + count > EXT4_BLOCKS_PER_GROUP(sb)) {
- goto error_return;
- }
- bitmap_bh = ext4_read_block_bitmap(sb, block_group);
- if (!bitmap_bh)
- goto error_return;
- desc = ext4_get_group_desc(sb, block_group, &gd_bh);
- if (!desc)
- goto error_return;
-
- if (in_range(ext4_block_bitmap(sb, desc), block, count) ||
- in_range(ext4_inode_bitmap(sb, desc), block, count) ||
- in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) ||
- in_range(block + count - 1, ext4_inode_table(sb, desc),
- sbi->s_itb_per_group)) {
- ext4_error(sb, "Adding blocks in system zones - "
- "Block = %llu, count = %lu",
- block, count);
- goto error_return;
- }
-
- /*
- * We are about to add blocks to the bitmap,
- * so we need undo access.
- */
- BUFFER_TRACE(bitmap_bh, "getting undo access");
- err = ext4_journal_get_undo_access(handle, bitmap_bh);
- if (err)
- goto error_return;
-
- /*
- * We are about to modify some metadata. Call the journal APIs
- * to unshare ->b_data if a currently-committing transaction is
- * using it
- */
- BUFFER_TRACE(gd_bh, "get_write_access");
- err = ext4_journal_get_write_access(handle, gd_bh);
- if (err)
- goto error_return;
- /*
- * make sure we don't allow a parallel init on other groups in the
- * same buddy cache
- */
- down_write(&grp->alloc_sem);
- for (i = 0, blocks_freed = 0; i < count; i++) {
- BUFFER_TRACE(bitmap_bh, "clear bit");
- if (!ext4_clear_bit_atomic(ext4_group_lock_ptr(sb, block_group),
- bit + i, bitmap_bh->b_data)) {
- ext4_error(sb, "bit already cleared for block %llu",
- (ext4_fsblk_t)(block + i));
- BUFFER_TRACE(bitmap_bh, "bit already cleared");
- } else {
- blocks_freed++;
- }
- }
- ext4_lock_group(sb, block_group);
- blk_free_count = blocks_freed + ext4_free_blks_count(sb, desc);
- ext4_free_blks_set(sb, desc, blk_free_count);
- desc->bg_checksum = ext4_group_desc_csum(sbi, block_group, desc);
- ext4_unlock_group(sb, block_group);
- percpu_counter_add(&sbi->s_freeblocks_counter, blocks_freed);
-
- if (sbi->s_log_groups_per_flex) {
- ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
- atomic_add(blocks_freed,
- &sbi->s_flex_groups[flex_group].free_blocks);
- }
- /*
- * request to reload the buddy with the
- * new bitmap information
- */
- set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &(grp->bb_state));
- grp->bb_free += blocks_freed;
- up_write(&grp->alloc_sem);
-
- /* We dirtied the bitmap block */
- BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
- err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
-
- /* And the group descriptor block */
- BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
- ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
- if (!err)
- err = ret;
-
-error_return:
- brelse(bitmap_bh);
- ext4_std_error(sb, err);
- return;
-}
-
-/**
* ext4_has_free_blocks()
* @sbi: in-core super block structure.
* @nblocks: number of needed blocks
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 2f6f0dd..1e3c826 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4704,6 +4704,126 @@ error_return:
}

/**
+ * ext4_add_groupblocks() -- Add given blocks to an existing group
+ * @handle: handle to this transaction
+ * @sb: super block
+ * @block: start physcial block to add to the block group
+ * @count: number of blocks to free
+ *
+ * This marks the blocks as free in the bitmap. We ask the
+ * mballoc to reload the buddy after this by setting group
+ * EXT4_GROUP_INFO_NEED_INIT_BIT flag
+ */
+void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
+ ext4_fsblk_t block, unsigned long count)
+{
+ struct buffer_head *bitmap_bh = NULL;
+ struct buffer_head *gd_bh;
+ ext4_group_t block_group;
+ ext4_grpblk_t bit;
+ unsigned int i;
+ struct ext4_group_desc *desc;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ int err = 0, ret, blk_free_count;
+ ext4_grpblk_t blocks_freed;
+ struct ext4_group_info *grp;
+
+ ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1);
+
+ ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
+ grp = ext4_get_group_info(sb, block_group);
+ /*
+ * Check to see if we are freeing blocks across a group
+ * boundary.
+ */
+ if (bit + count > EXT4_BLOCKS_PER_GROUP(sb))
+ goto error_return;
+
+ bitmap_bh = ext4_read_block_bitmap(sb, block_group);
+ if (!bitmap_bh)
+ goto error_return;
+ desc = ext4_get_group_desc(sb, block_group, &gd_bh);
+ if (!desc)
+ goto error_return;
+
+ if (in_range(ext4_block_bitmap(sb, desc), block, count) ||
+ in_range(ext4_inode_bitmap(sb, desc), block, count) ||
+ in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) ||
+ in_range(block + count - 1, ext4_inode_table(sb, desc),
+ sbi->s_itb_per_group)) {
+ ext4_error(sb, "Adding blocks in system zones - "
+ "Block = %llu, count = %lu",
+ block, count);
+ goto error_return;
+ }
+
+ BUFFER_TRACE(bitmap_bh, "getting write access");
+ err = ext4_journal_get_write_access(handle, bitmap_bh);
+ if (err)
+ goto error_return;
+
+ /*
+ * We are about to modify some metadata. Call the journal APIs
+ * to unshare ->b_data if a currently-committing transaction is
+ * using it
+ */
+ BUFFER_TRACE(gd_bh, "get_write_access");
+ err = ext4_journal_get_write_access(handle, gd_bh);
+ if (err)
+ goto error_return;
+ /*
+ * make sure we don't allow a parallel init on other groups in the
+ * same buddy cache
+ */
+ down_write(&grp->alloc_sem);
+ for (i = 0, blocks_freed = 0; i < count; i++) {
+ BUFFER_TRACE(bitmap_bh, "clear bit");
+ if (!ext4_clear_bit_atomic(ext4_group_lock_ptr(sb, block_group),
+ bit + i, bitmap_bh->b_data)) {
+ ext4_error(sb, "bit already cleared for block %llu",
+ (ext4_fsblk_t)(block + i));
+ BUFFER_TRACE(bitmap_bh, "bit already cleared");
+ } else {
+ blocks_freed++;
+ }
+ }
+ ext4_lock_group(sb, block_group);
+ blk_free_count = blocks_freed + ext4_free_blks_count(sb, desc);
+ ext4_free_blks_set(sb, desc, blk_free_count);
+ desc->bg_checksum = ext4_group_desc_csum(sbi, block_group, desc);
+ ext4_unlock_group(sb, block_group);
+ percpu_counter_add(&sbi->s_freeblocks_counter, blocks_freed);
+
+ if (sbi->s_log_groups_per_flex) {
+ ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
+ atomic_add(blocks_freed,
+ &sbi->s_flex_groups[flex_group].free_blocks);
+ }
+ /*
+ * request to reload the buddy with the
+ * new bitmap information
+ */
+ set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &(grp->bb_state));
+ grp->bb_free += blocks_freed;
+ up_write(&grp->alloc_sem);
+
+ /* We dirtied the bitmap block */
+ BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+ err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
+
+ /* And the group descriptor block */
+ BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
+ ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
+ if (!err)
+ err = ret;
+
+error_return:
+ brelse(bitmap_bh);
+ ext4_std_error(sb, err);
+ return;
+}
+
+/**
* ext4_trim_extent -- function to TRIM one single free extent in the group
* @sb: super block for the file system
* @start: starting block of the free extent in the alloc. group
--
1.7.0.4


2011-03-24 16:58:35

by Amir G.

[permalink] [raw]
Subject: [PATCH v2 3/5] ext4: synchronize ext4_mb_init_group() with buddy page lock

From: Amir Goldstein <[email protected]>

The old routines ext4_mb_[get|put]_buddy_cache_lock(), which used
to take grp->alloc_sem for all groups on the buddy page have been
replaced with the routines ext4_mb_[get|put]_buddy_page_lock().

The new routines take both buddy and bitmap page locks to protect
against concurrent init of groups on the same buddy page.

The GROUP_NEED_INIT flag is tested again under page lock to check
if the group was initialized by another caller.

Signed-off-by: Amir Goldstein <[email protected]>
---
fs/ext4/mballoc.c | 177 +++++++++++++++++++----------------------------------
1 files changed, 63 insertions(+), 114 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index da52bd9..69601b2 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -957,22 +957,21 @@ out:
}

/*
- * lock the group_info alloc_sem of all the groups
- * belonging to the same buddy cache page. This
- * make sure other parallel operation on the buddy
- * cache doesn't happen whild holding the buddy cache
- * lock
+ * Lock the buddy and bitmap pages. This make sure other parallel init_group
+ * on the same buddy page doesn't happen whild holding the buddy page lock.
+ * Return locked buddy and bitmap pages on e4b struct. If buddy and bitmap
+ * are on the same page e4b->bd_buddy_page is NULL and return value is 0.
*/
-static int ext4_mb_get_buddy_cache_lock(struct super_block *sb,
- ext4_group_t group)
+static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
+ ext4_group_t group, struct ext4_buddy *e4b)
{
- int i;
- int block, pnum;
+ struct inode *inode = EXT4_SB(sb)->s_buddy_cache;
+ int block, pnum, poff;
int blocks_per_page;
- int groups_per_page;
- ext4_group_t ngroups = ext4_get_groups_count(sb);
- ext4_group_t first_group;
- struct ext4_group_info *grp;
+ struct page *page;
+
+ e4b->bd_buddy_page = NULL;
+ e4b->bd_bitmap_page = NULL;

blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
/*
@@ -982,57 +981,40 @@ static int ext4_mb_get_buddy_cache_lock(struct super_block *sb,
*/
block = group * 2;
pnum = block / blocks_per_page;
- first_group = pnum * blocks_per_page / 2;
-
- groups_per_page = blocks_per_page >> 1;
- if (groups_per_page == 0)
- groups_per_page = 1;
- /* read all groups the page covers into the cache */
- for (i = 0; i < groups_per_page; i++) {
+ poff = block % blocks_per_page;
+ page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+ if (!page)
+ return -EIO;
+ BUG_ON(page->mapping != inode->i_mapping);
+ e4b->bd_bitmap_page = page;
+ e4b->bd_bitmap = page_address(page) + (poff * sb->s_blocksize);

- if ((first_group + i) >= ngroups)
- break;
- grp = ext4_get_group_info(sb, first_group + i);
- /* take all groups write allocation
- * semaphore. This make sure there is
- * no block allocation going on in any
- * of that groups
- */
- down_write_nested(&grp->alloc_sem, i);
+ if (blocks_per_page >= 2) {
+ /* buddy and bitmap are on the same page */
+ return 0;
}
- return i;
+
+ block++;
+ pnum = block / blocks_per_page;
+ poff = block % blocks_per_page;
+ page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+ if (!page)
+ return -EIO;
+ BUG_ON(page->mapping != inode->i_mapping);
+ e4b->bd_buddy_page = page;
+ return 0;
}

-static void ext4_mb_put_buddy_cache_lock(struct super_block *sb,
- ext4_group_t group, int locked_group)
+static void ext4_mb_put_buddy_page_lock(struct ext4_buddy *e4b)
{
- int i;
- int block, pnum;
- int blocks_per_page;
- ext4_group_t first_group;
- struct ext4_group_info *grp;
-
- blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
- /*
- * the buddy cache inode stores the block bitmap
- * and buddy information in consecutive blocks.
- * So for each group we need two blocks.
- */
- block = group * 2;
- pnum = block / blocks_per_page;
- first_group = pnum * blocks_per_page / 2;
- /* release locks on all the groups */
- for (i = 0; i < locked_group; i++) {
-
- grp = ext4_get_group_info(sb, first_group + i);
- /* take all groups write allocation
- * semaphore. This make sure there is
- * no block allocation going on in any
- * of that groups
- */
- up_write(&grp->alloc_sem);
+ if (e4b->bd_bitmap_page) {
+ unlock_page(e4b->bd_bitmap_page);
+ page_cache_release(e4b->bd_bitmap_page);
+ }
+ if (e4b->bd_buddy_page) {
+ unlock_page(e4b->bd_buddy_page);
+ page_cache_release(e4b->bd_buddy_page);
}
-
}

/*
@@ -1044,93 +1026,60 @@ static noinline_for_stack
int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
{

- int ret = 0;
- void *bitmap;
- int blocks_per_page;
- int block, pnum, poff;
- int num_grp_locked = 0;
struct ext4_group_info *this_grp;
- struct ext4_sb_info *sbi = EXT4_SB(sb);
- struct inode *inode = sbi->s_buddy_cache;
- struct page *page = NULL, *bitmap_page = NULL;
+ struct ext4_buddy e4b;
+ struct page *page;
+ int ret = 0;

mb_debug(1, "init group %u\n", group);
- blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
this_grp = ext4_get_group_info(sb, group);
/*
* This ensures that we don't reinit the buddy cache
* page which map to the group from which we are already
* allocating. If we are looking at the buddy cache we would
* have taken a reference using ext4_mb_load_buddy and that
- * would have taken the alloc_sem lock.
+ * would have pinned buddy page to page cache.
*/
- num_grp_locked = ext4_mb_get_buddy_cache_lock(sb, group);
- if (!EXT4_MB_GRP_NEED_INIT(this_grp)) {
+ ret = ext4_mb_get_buddy_page_lock(sb, group, &e4b);
+ if (ret || !EXT4_MB_GRP_NEED_INIT(this_grp)) {
/*
* somebody initialized the group
* return without doing anything
*/
- ret = 0;
goto err;
}
- /*
- * the buddy cache inode stores the block bitmap
- * and buddy information in consecutive blocks.
- * So for each group we need two blocks.
- */
- block = group * 2;
- pnum = block / blocks_per_page;
- poff = block % blocks_per_page;
- page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
- if (page) {
- BUG_ON(page->mapping != inode->i_mapping);
- ret = ext4_mb_init_cache(page, NULL);
- if (ret) {
- unlock_page(page);
- goto err;
- }
- unlock_page(page);
- }
- if (page == NULL || !PageUptodate(page)) {
+
+ page = e4b.bd_bitmap_page;
+ ret = ext4_mb_init_cache(page, NULL);
+ if (ret)
+ goto err;
+ if (!PageUptodate(page)) {
ret = -EIO;
goto err;
}
mark_page_accessed(page);
- bitmap_page = page;
- bitmap = page_address(page) + (poff * sb->s_blocksize);

- /* init buddy cache */
- block++;
- pnum = block / blocks_per_page;
- poff = block % blocks_per_page;
- page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
- if (page == bitmap_page) {
+ if (e4b.bd_buddy_page == NULL) {
/*
* If both the bitmap and buddy are in
* the same page we don't need to force
* init the buddy
*/
- unlock_page(page);
- } else if (page) {
- BUG_ON(page->mapping != inode->i_mapping);
- ret = ext4_mb_init_cache(page, bitmap);
- if (ret) {
- unlock_page(page);
- goto err;
- }
- unlock_page(page);
+ ret = 0;
+ goto err;
}
- if (page == NULL || !PageUptodate(page)) {
+ /* init buddy cache */
+ page = e4b.bd_buddy_page;
+ ret = ext4_mb_init_cache(page, e4b.bd_bitmap);
+ if (ret)
+ goto err;
+ if (!PageUptodate(page)) {
ret = -EIO;
goto err;
}
mark_page_accessed(page);
err:
- ext4_mb_put_buddy_cache_lock(sb, group, num_grp_locked);
- if (bitmap_page)
- page_cache_release(bitmap_page);
- if (page)
- page_cache_release(page);
+ ext4_mb_put_buddy_page_lock(&e4b);
return ret;
}

--
1.7.0.4


2011-03-24 16:58:36

by Amir G.

[permalink] [raw]
Subject: [PATCH v2 4/5] ext4: teach ext4_mb_init_cache() to skip uptodate buddy caches

From: Amir Goldstein <[email protected]>

After online resize which adds new groups, some of the groups
in a buddy page may be initialized and uptodate, while other
(new ones) may be uninitialized.

The indication for init of new block groups is when ext4_mb_init_cache()
is called with an uptodate buddy page. In this case, initialized groups
on that buddy page must be skipped when initializing the buddy cache.

Signed-off-by: Amir Goldstein <[email protected]>
---
fs/ext4/mballoc.c | 33 +++++++++++++++++++++++++--------
1 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 69601b2..1ddb92b 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -787,6 +787,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
struct inode *inode;
char *data;
char *bitmap;
+ struct ext4_group_info *grinfo;

mb_debug(1, "init page %lu\n", page->index);

@@ -819,6 +820,18 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
if (first_group + i >= ngroups)
break;

+ grinfo = ext4_get_group_info(sb, first_group + i);
+ /*
+ * If page is uptodate then we came here after online resize
+ * which added some new uninitialized group info structs, so
+ * we must skip all initialized uptodate buddies on the page,
+ * which may be currently in use by an allocating task.
+ */
+ if (PageUptodate(page) && !EXT4_MB_GRP_NEED_INIT(grinfo)) {
+ bh[i] = NULL;
+ continue;
+ }
+
err = -EIO;
desc = ext4_get_group_desc(sb, first_group + i, NULL);
if (desc == NULL)
@@ -871,26 +884,28 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
}

/* wait for I/O completion */
- for (i = 0; i < groups_per_page && bh[i]; i++)
- wait_on_buffer(bh[i]);
+ for (i = 0; i < groups_per_page; i++)
+ if (bh[i])
+ wait_on_buffer(bh[i]);

err = -EIO;
- for (i = 0; i < groups_per_page && bh[i]; i++)
- if (!buffer_uptodate(bh[i]))
+ for (i = 0; i < groups_per_page; i++)
+ if (bh[i] && !buffer_uptodate(bh[i]))
goto out;

err = 0;
first_block = page->index * blocks_per_page;
- /* init the page */
- memset(page_address(page), 0xff, PAGE_CACHE_SIZE);
for (i = 0; i < blocks_per_page; i++) {
int group;
- struct ext4_group_info *grinfo;

group = (first_block + i) >> 1;
if (group >= ngroups)
break;

+ if (!bh[group - first_group])
+ /* skip initialized uptodate buddy */
+ continue;
+
/*
* data carry information regarding this
* particular group in the format specified
@@ -919,6 +934,8 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
* incore got set to the group block bitmap below
*/
ext4_lock_group(sb, group);
+ /* init the buddy */
+ memset(data, 0xff, blocksize);
ext4_mb_generate_buddy(sb, data, incore, group);
ext4_unlock_group(sb, group);
incore = NULL;
@@ -948,7 +965,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)

out:
if (bh) {
- for (i = 0; i < groups_per_page && bh[i]; i++)
+ for (i = 0; i < groups_per_page; i++)
brelse(bh[i]);
if (bh != &bhs)
kfree(bh);
--
1.7.0.4


2011-03-24 16:58:38

by Amir G.

[permalink] [raw]
Subject: [PATCH v2 5/5] ext4: remove alloc_semp

From: Amir Goldstein <[email protected]>

After taking care of all group init races, all that remains is to
remove alloc_semp from ext4_allocation_context and ext4_buddy structs.

Signed-off-by: Amir Goldstein <[email protected]>
---
fs/ext4/mballoc.c | 31 +------------------------------
fs/ext4/mballoc.h | 6 ------
2 files changed, 1 insertions(+), 36 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 1ddb92b..5449b1a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1130,24 +1130,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
e4b->bd_group = group;
e4b->bd_buddy_page = NULL;
e4b->bd_bitmap_page = NULL;
- e4b->alloc_semp = &grp->alloc_sem;
-
- /* Take the read lock on the group alloc
- * sem. This would make sure a parallel
- * ext4_mb_init_group happening on other
- * groups mapped by the page is blocked
- * till we are done with allocation
- */
-repeat_load_buddy:
- down_read(e4b->alloc_semp);

if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
- /* we need to check for group need init flag
- * with alloc_semp held so that we can be sure
- * that new blocks didn't get added to the group
- * when we are loading the buddy cache
- */
- up_read(e4b->alloc_semp);
/*
* we need full data about the group
* to make a good selection
@@ -1155,7 +1139,6 @@ repeat_load_buddy:
ret = ext4_mb_init_group(sb, group);
if (ret)
return ret;
- goto repeat_load_buddy;
}

/*
@@ -1245,9 +1228,6 @@ err:
page_cache_release(e4b->bd_buddy_page);
e4b->bd_buddy = NULL;
e4b->bd_bitmap = NULL;
-
- /* Done with the buddy cache */
- up_read(e4b->alloc_semp);
return ret;
}

@@ -1257,9 +1237,6 @@ static void ext4_mb_unload_buddy(struct ext4_buddy *e4b)
page_cache_release(e4b->bd_bitmap_page);
if (e4b->bd_buddy_page)
page_cache_release(e4b->bd_buddy_page);
- /* Done with the buddy cache */
- if (e4b->alloc_semp)
- up_read(e4b->alloc_semp);
}


@@ -1572,9 +1549,6 @@ static void ext4_mb_use_best_found(struct ext4_allocation_context *ac,
get_page(ac->ac_bitmap_page);
ac->ac_buddy_page = e4b->bd_buddy_page;
get_page(ac->ac_buddy_page);
- /* on allocation we use ac to track the held semaphore */
- ac->alloc_semp = e4b->alloc_semp;
- e4b->alloc_semp = NULL;
/* store last allocated for subsequent stream allocation */
if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) {
spin_lock(&sbi->s_md_lock);
@@ -4192,15 +4166,12 @@ static int ext4_mb_release_context(struct ext4_allocation_context *ac)
spin_unlock(&pa->pa_lock);
}
}
- if (ac->alloc_semp)
- up_read(ac->alloc_semp);
if (pa) {
/*
* We want to add the pa to the right bucket.
* Remove it from the list and while adding
* make sure the list to which we are adding
- * doesn't grow big. We need to release
- * alloc_semp before calling ext4_mb_add_n_trim()
+ * doesn't grow big.
*/
if ((pa->pa_type == MB_GROUP_PA) && likely(pa->pa_free)) {
spin_lock(pa->pa_obj_lock);
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index 22bd4d7..20b5e7b 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
@@ -193,11 +193,6 @@ struct ext4_allocation_context {
__u8 ac_op; /* operation, for history only */
struct page *ac_bitmap_page;
struct page *ac_buddy_page;
- /*
- * pointer to the held semaphore upon successful
- * block allocation
- */
- struct rw_semaphore *alloc_semp;
struct ext4_prealloc_space *ac_pa;
struct ext4_locality_group *ac_lg;
};
@@ -215,7 +210,6 @@ struct ext4_buddy {
struct super_block *bd_sb;
__u16 bd_blkbits;
ext4_group_t bd_group;
- struct rw_semaphore *alloc_semp;
};
#define EXT4_MB_BITMAP(e4b) ((e4b)->bd_bitmap)
#define EXT4_MB_BUDDY(e4b) ((e4b)->bd_buddy)
--
1.7.0.4


2011-05-03 15:47:19

by Amir G.

[permalink] [raw]
Subject: Re: [PATCHSET v2] ext4: removal of alloc_sem locks from block allocation paths

Hi Ted,

Ping ^2 on these patches.
You have asked me for mballoc cleanup and I gave you mballoc cleanup...
Do you still have them or do you want me to resend?

Yongqiang is working on cleaning up the snapshot patches and merging
his extent split
refactoring patches with Allison's punch hole patches.

We would like to start rebasing our patches on top of your big alloc patches.
Are the patches in ext4-patch-queue in good shape for testing (with
BIG_ALLOC feature off)?

Thanks,
Amir.

On Thu, Mar 24, 2011 at 7:58 PM, <[email protected]> wrote:
> The purpose of this patch set is the removal of grp->alloc_sem locks
> from block allocation paths.
>
> The resulting code is cleaner and should perform better in concurrent
> allocating tasks workloads.
>
> I ran several xfstests runs with these patches (4K and 1K block size).
> I tried several online resizes and verifyed that both in-core and on-disk
> group counters are correct.
>
> v2->v1:
> - fix silly bug in patch 4/5 that triggers BUG_ON(incore == NULL)
> - replace get_undo_access() with get_write_access()
> - ran xfstests with block size 1K (where 2 groups share a buddy page)
>
> [PATCH v2 1/5] ext4: move ext4_add_groupblocks() to mballoc.c
> [PATCH v2 2/5] ext4: implement ext4_add_groupblocks() by freeing blocks
> [PATCH v2 3/5] ext4: synchronize ext4_mb_init_group() with buddy page lock
> [PATCH v2 4/5] ext4: teach ext4_mb_init_cache() to skip uptodate buddy caches
> [PATCH v2 5/5] ext4: remove alloc_semp
>
>

2011-05-09 14:54:56

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] ext4: move ext4_add_groupblocks() to mballoc.c

On Thu, Mar 24, 2011 at 06:58:09PM +0200, [email protected] wrote:
> From: Amir Goldstein <[email protected]>
>
> In preparation for the next patch, the function ext4_add_groupblocks()
> is moved to mballoc.c, where it could use some static functions.
>
> I also fixed a checkpatch warning and replaced obsolete get_undo_access
> for bitmap with get_write_access.
>
> Signed-off-by: Amir Goldstein <[email protected]>

Please don't move code and make changes in one patch. #1, it's hard
to review changes that happened in the middle of code movement. #2,
if there has been any changes in the source function caused by other
patches, I can't regenerate a patch by simply redoing the function
move --- I have to reverse engineer the change that happened under the
cover of code movement, regnerate the patch, and then redo the change.

I've split this into two patches, one which is just a simple code
movement (note that I also moved the function declaration in ext4.h so
it function is listed under the correct .c file), and the second which
changed the use of ext4_journal_get_undo_access to
ext4_journal_get_write_access. Since this was also the last use of
ext4_journal_get_undo_access(), I also removed the now-unneeded code
in ext4_jbd2.[ch].

- Ted

2011-05-09 14:59:33

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 2/2] ext4: remove unneeded ext4_journal_get_undo_access

The block allocation code used to use jbd2_journal_get_undo_access as
a way to make changes that wouldn't show up until the commit took
place. The new multi-block allocation code has a its own way of
preventing newly freed blocks from getting reused until the commit
takes place (it avoids updating the buddy bitmaps until the commit is
done), so we don't need to use jbd2_journal_get_undo_access(), which
has extra overhead compared to jbd2_journal_get_write_access().

There was one last vestigal use of ext4_journal_get_undo_access() in
ext4_add_groupblocks(); change it to use ext4_journal_get_write_access()
and then remove the ext4_journal_get_undo_access() support.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4_jbd2.c | 14 --------------
fs/ext4/ext4_jbd2.h | 5 -----
fs/ext4/mballoc.c | 12 ++++--------
3 files changed, 4 insertions(+), 27 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 6e272ef..f5240aa 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -6,20 +6,6 @@

#include <trace/events/ext4.h>

-int __ext4_journal_get_undo_access(const char *where, unsigned int line,
- handle_t *handle, struct buffer_head *bh)
-{
- int err = 0;
-
- if (ext4_handle_valid(handle)) {
- err = jbd2_journal_get_undo_access(handle, bh);
- if (err)
- ext4_journal_abort_handle(where, line, __func__, bh,
- handle, err);
- }
- return err;
-}
-
int __ext4_journal_get_write_access(const char *where, unsigned int line,
handle_t *handle, struct buffer_head *bh)
{
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index d0f5353..bb85757 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -126,9 +126,6 @@ void ext4_journal_abort_handle(const char *caller, unsigned int line,
const char *err_fn,
struct buffer_head *bh, handle_t *handle, int err);

-int __ext4_journal_get_undo_access(const char *where, unsigned int line,
- handle_t *handle, struct buffer_head *bh);
-
int __ext4_journal_get_write_access(const char *where, unsigned int line,
handle_t *handle, struct buffer_head *bh);

@@ -146,8 +143,6 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line,
int __ext4_handle_dirty_super(const char *where, unsigned int line,
handle_t *handle, struct super_block *sb);

-#define ext4_journal_get_undo_access(handle, bh) \
- __ext4_journal_get_undo_access(__func__, __LINE__, (handle), (bh))
#define ext4_journal_get_write_access(handle, bh) \
__ext4_journal_get_write_access(__func__, __LINE__, (handle), (bh))
#define ext4_forget(handle, is_metadata, inode, bh, block_nr) \
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 92aa05d..307d447 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4732,9 +4732,9 @@ void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
* Check to see if we are freeing blocks across a group
* boundary.
*/
- if (bit + count > EXT4_BLOCKS_PER_GROUP(sb)) {
+ if (bit + count > EXT4_BLOCKS_PER_GROUP(sb))
goto error_return;
- }
+
bitmap_bh = ext4_read_block_bitmap(sb, block_group);
if (!bitmap_bh)
goto error_return;
@@ -4753,12 +4753,8 @@ void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
goto error_return;
}

- /*
- * We are about to add blocks to the bitmap,
- * so we need undo access.
- */
- BUFFER_TRACE(bitmap_bh, "getting undo access");
- err = ext4_journal_get_undo_access(handle, bitmap_bh);
+ BUFFER_TRACE(bitmap_bh, "getting write access");
+ err = ext4_journal_get_write_access(handle, bitmap_bh);
if (err)
goto error_return;

--
1.7.3.1


2011-05-09 14:59:33

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 1/2] ext4: move ext4_add_groupblocks() to mballoc.c

From: Amir Goldstein <[email protected]>

In preparation for the next patch, the function ext4_add_groupblocks()
is moved to mballoc.c, where it could use some static functions.

Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/balloc.c | 124 -----------------------------------------------------
fs/ext4/ext4.h | 4 +-
fs/ext4/mballoc.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 126 insertions(+), 126 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 9c6cd51..b2d10da 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -362,130 +362,6 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
}

/**
- * ext4_add_groupblocks() -- Add given blocks to an existing group
- * @handle: handle to this transaction
- * @sb: super block
- * @block: start physcial block to add to the block group
- * @count: number of blocks to free
- *
- * This marks the blocks as free in the bitmap. We ask the
- * mballoc to reload the buddy after this by setting group
- * EXT4_GROUP_INFO_NEED_INIT_BIT flag
- */
-void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
- ext4_fsblk_t block, unsigned long count)
-{
- struct buffer_head *bitmap_bh = NULL;
- struct buffer_head *gd_bh;
- ext4_group_t block_group;
- ext4_grpblk_t bit;
- unsigned int i;
- struct ext4_group_desc *desc;
- struct ext4_sb_info *sbi = EXT4_SB(sb);
- int err = 0, ret, blk_free_count;
- ext4_grpblk_t blocks_freed;
- struct ext4_group_info *grp;
-
- ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1);
-
- ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
- grp = ext4_get_group_info(sb, block_group);
- /*
- * Check to see if we are freeing blocks across a group
- * boundary.
- */
- if (bit + count > EXT4_BLOCKS_PER_GROUP(sb)) {
- goto error_return;
- }
- bitmap_bh = ext4_read_block_bitmap(sb, block_group);
- if (!bitmap_bh)
- goto error_return;
- desc = ext4_get_group_desc(sb, block_group, &gd_bh);
- if (!desc)
- goto error_return;
-
- if (in_range(ext4_block_bitmap(sb, desc), block, count) ||
- in_range(ext4_inode_bitmap(sb, desc), block, count) ||
- in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) ||
- in_range(block + count - 1, ext4_inode_table(sb, desc),
- sbi->s_itb_per_group)) {
- ext4_error(sb, "Adding blocks in system zones - "
- "Block = %llu, count = %lu",
- block, count);
- goto error_return;
- }
-
- /*
- * We are about to add blocks to the bitmap,
- * so we need undo access.
- */
- BUFFER_TRACE(bitmap_bh, "getting undo access");
- err = ext4_journal_get_undo_access(handle, bitmap_bh);
- if (err)
- goto error_return;
-
- /*
- * We are about to modify some metadata. Call the journal APIs
- * to unshare ->b_data if a currently-committing transaction is
- * using it
- */
- BUFFER_TRACE(gd_bh, "get_write_access");
- err = ext4_journal_get_write_access(handle, gd_bh);
- if (err)
- goto error_return;
- /*
- * make sure we don't allow a parallel init on other groups in the
- * same buddy cache
- */
- down_write(&grp->alloc_sem);
- for (i = 0, blocks_freed = 0; i < count; i++) {
- BUFFER_TRACE(bitmap_bh, "clear bit");
- if (!ext4_clear_bit_atomic(ext4_group_lock_ptr(sb, block_group),
- bit + i, bitmap_bh->b_data)) {
- ext4_error(sb, "bit already cleared for block %llu",
- (ext4_fsblk_t)(block + i));
- BUFFER_TRACE(bitmap_bh, "bit already cleared");
- } else {
- blocks_freed++;
- }
- }
- ext4_lock_group(sb, block_group);
- blk_free_count = blocks_freed + ext4_free_blks_count(sb, desc);
- ext4_free_blks_set(sb, desc, blk_free_count);
- desc->bg_checksum = ext4_group_desc_csum(sbi, block_group, desc);
- ext4_unlock_group(sb, block_group);
- percpu_counter_add(&sbi->s_freeblocks_counter, blocks_freed);
-
- if (sbi->s_log_groups_per_flex) {
- ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
- atomic_add(blocks_freed,
- &sbi->s_flex_groups[flex_group].free_blocks);
- }
- /*
- * request to reload the buddy with the
- * new bitmap information
- */
- set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &(grp->bb_state));
- grp->bb_free += blocks_freed;
- up_write(&grp->alloc_sem);
-
- /* We dirtied the bitmap block */
- BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
- err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
-
- /* And the group descriptor block */
- BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
- ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
- if (!err)
- err = ret;
-
-error_return:
- brelse(bitmap_bh);
- ext4_std_error(sb, err);
- return;
-}
-
-/**
* ext4_has_free_blocks()
* @sbi: in-core super block structure.
* @nblocks: number of needed blocks
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 076c5d2..47986ae 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1655,8 +1655,6 @@ extern unsigned long ext4_bg_num_gdb(struct super_block *sb,
extern ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode *inode,
ext4_fsblk_t goal, unsigned long *count, int *errp);
extern int ext4_claim_free_blocks(struct ext4_sb_info *sbi, s64 nblocks);
-extern void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
- ext4_fsblk_t block, unsigned long count);
extern ext4_fsblk_t ext4_count_free_blocks(struct super_block *);
extern void ext4_check_blocks_bitmap(struct super_block *);
extern struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb,
@@ -1721,6 +1719,8 @@ extern void ext4_free_blocks(handle_t *handle, struct inode *inode,
unsigned long count, int flags);
extern int ext4_mb_add_groupinfo(struct super_block *sb,
ext4_group_t i, struct ext4_group_desc *desc);
+extern void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
+ ext4_fsblk_t block, unsigned long count);
extern int ext4_trim_fs(struct super_block *, struct fstrim_range *);

/* inode.c */
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 730c1a7..92aa05d 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4700,6 +4700,130 @@ error_return:
}

/**
+ * ext4_add_groupblocks() -- Add given blocks to an existing group
+ * @handle: handle to this transaction
+ * @sb: super block
+ * @block: start physcial block to add to the block group
+ * @count: number of blocks to free
+ *
+ * This marks the blocks as free in the bitmap. We ask the
+ * mballoc to reload the buddy after this by setting group
+ * EXT4_GROUP_INFO_NEED_INIT_BIT flag
+ */
+void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
+ ext4_fsblk_t block, unsigned long count)
+{
+ struct buffer_head *bitmap_bh = NULL;
+ struct buffer_head *gd_bh;
+ ext4_group_t block_group;
+ ext4_grpblk_t bit;
+ unsigned int i;
+ struct ext4_group_desc *desc;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ int err = 0, ret, blk_free_count;
+ ext4_grpblk_t blocks_freed;
+ struct ext4_group_info *grp;
+
+ ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1);
+
+ ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
+ grp = ext4_get_group_info(sb, block_group);
+ /*
+ * Check to see if we are freeing blocks across a group
+ * boundary.
+ */
+ if (bit + count > EXT4_BLOCKS_PER_GROUP(sb)) {
+ goto error_return;
+ }
+ bitmap_bh = ext4_read_block_bitmap(sb, block_group);
+ if (!bitmap_bh)
+ goto error_return;
+ desc = ext4_get_group_desc(sb, block_group, &gd_bh);
+ if (!desc)
+ goto error_return;
+
+ if (in_range(ext4_block_bitmap(sb, desc), block, count) ||
+ in_range(ext4_inode_bitmap(sb, desc), block, count) ||
+ in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) ||
+ in_range(block + count - 1, ext4_inode_table(sb, desc),
+ sbi->s_itb_per_group)) {
+ ext4_error(sb, "Adding blocks in system zones - "
+ "Block = %llu, count = %lu",
+ block, count);
+ goto error_return;
+ }
+
+ /*
+ * We are about to add blocks to the bitmap,
+ * so we need undo access.
+ */
+ BUFFER_TRACE(bitmap_bh, "getting undo access");
+ err = ext4_journal_get_undo_access(handle, bitmap_bh);
+ if (err)
+ goto error_return;
+
+ /*
+ * We are about to modify some metadata. Call the journal APIs
+ * to unshare ->b_data if a currently-committing transaction is
+ * using it
+ */
+ BUFFER_TRACE(gd_bh, "get_write_access");
+ err = ext4_journal_get_write_access(handle, gd_bh);
+ if (err)
+ goto error_return;
+ /*
+ * make sure we don't allow a parallel init on other groups in the
+ * same buddy cache
+ */
+ down_write(&grp->alloc_sem);
+ for (i = 0, blocks_freed = 0; i < count; i++) {
+ BUFFER_TRACE(bitmap_bh, "clear bit");
+ if (!ext4_clear_bit_atomic(ext4_group_lock_ptr(sb, block_group),
+ bit + i, bitmap_bh->b_data)) {
+ ext4_error(sb, "bit already cleared for block %llu",
+ (ext4_fsblk_t)(block + i));
+ BUFFER_TRACE(bitmap_bh, "bit already cleared");
+ } else {
+ blocks_freed++;
+ }
+ }
+ ext4_lock_group(sb, block_group);
+ blk_free_count = blocks_freed + ext4_free_blks_count(sb, desc);
+ ext4_free_blks_set(sb, desc, blk_free_count);
+ desc->bg_checksum = ext4_group_desc_csum(sbi, block_group, desc);
+ ext4_unlock_group(sb, block_group);
+ percpu_counter_add(&sbi->s_freeblocks_counter, blocks_freed);
+
+ if (sbi->s_log_groups_per_flex) {
+ ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
+ atomic_add(blocks_freed,
+ &sbi->s_flex_groups[flex_group].free_blocks);
+ }
+ /*
+ * request to reload the buddy with the
+ * new bitmap information
+ */
+ set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &(grp->bb_state));
+ grp->bb_free += blocks_freed;
+ up_write(&grp->alloc_sem);
+
+ /* We dirtied the bitmap block */
+ BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+ err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
+
+ /* And the group descriptor block */
+ BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
+ ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
+ if (!err)
+ err = ret;
+
+error_return:
+ brelse(bitmap_bh);
+ ext4_std_error(sb, err);
+ return;
+}
+
+/**
* ext4_trim_extent -- function to TRIM one single free extent in the group
* @sb: super block for the file system
* @start: starting block of the free extent in the alloc. group
--
1.7.3.1


2011-05-09 16:01:24

by Amir G.

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] ext4: move ext4_add_groupblocks() to mballoc.c

On Mon, May 9, 2011 at 5:54 PM, Ted Ts'o <[email protected]> wrote:
> On Thu, Mar 24, 2011 at 06:58:09PM +0200, [email protected] wrote:
>> From: Amir Goldstein <[email protected]>
>>
>> In preparation for the next patch, the function ext4_add_groupblocks()
>> is moved to mballoc.c, where it could use some static functions.
>>
>> I also fixed a checkpatch warning and replaced obsolete get_undo_access
>> for bitmap with get_write_access.
>>
>> Signed-off-by: Amir Goldstein <[email protected]>
>
> Please don't move code and make changes in one patch. ?#1, it's hard
> to review changes that happened in the middle of code movement. ?#2,
> if there has been any changes in the source function caused by other
> patches, I can't regenerate a patch by simply redoing the function
> move --- I have to reverse engineer the change that happened under the
> cover of code movement, regnerate the patch, and then redo the change.
>

Sorry for the trouble. At least I (sort of) documented the change in the commit
description, so I hope it wasn't too hard to reverse engineer...
Fixing the checkpatch error I just kind of felt obligated to do, changing
get_undo_access to get_write_access in this patch was just me being lazy.


> I've split this into two patches, one which is just a simple code
> movement (note that I also moved the function declaration in ext4.h so
> it function is listed under the correct .c file), and the second which
> changed the use of ext4_journal_get_undo_access to
> ext4_journal_get_write_access. ?Since this was also the last use of
> ext4_journal_get_undo_access(), I also removed the now-unneeded code
> in ext4_jbd2.[ch].
>

Thanks. FYI, in one of the snapshot patches this get_write_access instance is
replaced with get_bitmap_access (which calls a different snapshot hook).
That patch also removes the get_undo_access function, but now you beat
me to it :-)

FYI2, the snapshot patches are waiting in my outbox for me to push send.
When running xfstests I caught a hang in test 225 with 1K blocksize
(all other tests were fine),
so I asked Yongqiang to take a look at it because his patch (6d9c85) had fixed
a problem in test 225. He just said that the hang was caused by a bug
in his patch
and that the hang happen with tytso/master branch and that he is
working on a fix,
so I may just go a head and send out the snapshot patches anyway.



> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
>

2011-05-09 16:28:23

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] ext4: move ext4_add_groupblocks() to mballoc.c

On Tue, May 10, 2011 at 12:01 AM, Amir G.
<[email protected]> wrote:
> On Mon, May 9, 2011 at 5:54 PM, Ted Ts'o <[email protected]> wrote:
>> On Thu, Mar 24, 2011 at 06:58:09PM +0200, [email protected] wrote:
>>> From: Amir Goldstein <[email protected]>
>>>
>>> In preparation for the next patch, the function ext4_add_groupblocks()
>>> is moved to mballoc.c, where it could use some static functions.
>>>
>>> I also fixed a checkpatch warning and replaced obsolete get_undo_access
>>> for bitmap with get_write_access.
>>>
>>> Signed-off-by: Amir Goldstein <[email protected]>
>>
>> Please don't move code and make changes in one patch. ?#1, it's hard
>> to review changes that happened in the middle of code movement. ?#2,
>> if there has been any changes in the source function caused by other
>> patches, I can't regenerate a patch by simply redoing the function
>> move --- I have to reverse engineer the change that happened under the
>> cover of code movement, regnerate the patch, and then redo the change.
>>
>
> Sorry for the trouble. At least I (sort of) documented the change in the commit
> description, so I hope it wasn't too hard to reverse engineer...
> Fixing the checkpatch error I just kind of felt obligated to do, changing
> get_undo_access to get_write_access in this patch was just me being lazy.
>
>
>> I've split this into two patches, one which is just a simple code
>> movement (note that I also moved the function declaration in ext4.h so
>> it function is listed under the correct .c file), and the second which
>> changed the use of ext4_journal_get_undo_access to
>> ext4_journal_get_write_access. ?Since this was also the last use of
>> ext4_journal_get_undo_access(), I also removed the now-unneeded code
>> in ext4_jbd2.[ch].
>>
>
> Thanks. FYI, in one of the snapshot patches this get_write_access instance is
> replaced with get_bitmap_access (which calls a different snapshot hook).
> That patch also removes the get_undo_access function, but now you beat
> me to it :-)
>
> FYI2, the snapshot patches are waiting in my outbox for me to push send.
> When running xfstests I caught a hang in test 225 with 1K blocksize
> (all other tests were fine),
> so I asked Yongqiang to take a look at it because his patch (6d9c85) had fixed
> a problem in test 225. He just said that the hang was caused by a bug
> in his patch
> and that the hang happen with tytso/master branch and that he is
> working on a fix,
> so I may just go a head and send out the snapshot patches anyway.

Fixed. You can send snapshot patches out.
>
>
>
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
>>
>



--
Best Wishes
Yongqiang Yang

2011-05-10 13:29:14

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] ext4: remove alloc_semp

On Thu, Mar 24, 2011 at 06:58:13PM +0200, [email protected] wrote:
> From: Amir Goldstein <[email protected]>
>
> After taking care of all group init races, all that remains is to
> remove alloc_semp from ext4_allocation_context and ext4_buddy structs.
>
> Signed-off-by: Amir Goldstein <[email protected]>

Hey Amir,

What sort of testing have you done with this patch series? In
particular, have you tried doing online resizes while the file system
is heavily loaded?

- Ted

2011-05-10 13:49:12

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCHSET v2] ext4: removal of alloc_sem locks from block allocation paths

On Thu, 24 Mar 2011, [email protected] wrote:

> The purpose of this patch set is the removal of grp->alloc_sem locks
> from block allocation paths.
>
> The resulting code is cleaner and should perform better in concurrent
> allocating tasks workloads.
Hi Amir,

Do you have any performance numbers indicating performance improvement
in concurrent allocations ? The only point where I can see taking
write semaphore is in filesystem resize code. Or am I missing something ?

Thanks!
-Lukas

>
> I ran several xfstests runs with these patches (4K and 1K block size).
> I tried several online resizes and verifyed that both in-core and on-disk
> group counters are correct.
>
> v2->v1:
> - fix silly bug in patch 4/5 that triggers BUG_ON(incore == NULL)
> - replace get_undo_access() with get_write_access()
> - ran xfstests with block size 1K (where 2 groups share a buddy page)
>
> [PATCH v2 1/5] ext4: move ext4_add_groupblocks() to mballoc.c
> [PATCH v2 2/5] ext4: implement ext4_add_groupblocks() by freeing blocks
> [PATCH v2 3/5] ext4: synchronize ext4_mb_init_group() with buddy page lock
> [PATCH v2 4/5] ext4: teach ext4_mb_init_cache() to skip uptodate buddy caches
> [PATCH v2 5/5] ext4: remove alloc_semp
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--

2011-05-10 18:20:59

by Amir G.

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] ext4: remove alloc_semp

On Tue, May 10, 2011 at 4:29 PM, Ted Ts'o <[email protected]> wrote:
> On Thu, Mar 24, 2011 at 06:58:13PM +0200, [email protected] wrote:
>> From: Amir Goldstein <[email protected]>
>>
>> After taking care of all group init races, all that remains is to
>> remove alloc_semp from ext4_allocation_context and ext4_buddy structs.
>>
>> Signed-off-by: Amir Goldstein <[email protected]>
>
> Hey Amir,
>
> What sort of testing have you done with this patch series? ?In
> particular, have you tried doing online resizes while the file system
> is heavily loaded?

I have tested many small resizes, not block group aligned and added
debug prints in mballoc.c to verify that in-memory bb_free and group_desc
bg_free_blocks remain in sync.

Sorry, I did not run the same tests under heavy load.
I would think that one would need to know exactly how to load his
system to make that load relevant to this test, rather than blindly
fsstressing the system.
If you have specific ideas for testing please let us know.
I think Yongqiang could run these tests.

Regarding the "risk" of online resize with this patch,
if you look at ext4_add_groupblocks() in it's new form,
you will surely notice that it is an exact replica of ext4_free_blocks()
with lots of irrelevant code removed, so the code path of online resize,
is essentially the same as the code patch of freeing any range of blocks
with regards to buddy cache init and locks.

It was actually possible to alter ext4_free_blocks() a bit and reuse it
to do most of the work done by ext4_add_groupblocks(), but I did not
want to make these changes to ext4_free_blocks(). it's your call if
you think that would be better off.

Anyway, I was kind of hoping for another pair of eyes to validate my changes
when I first posted these patches. So what do you say?
Does the core change to synchronize ext4_mb_init_group() by locking
the buddy pages look reasonable to you?
You were the one who suggested that approach in the first place.

Amir.

2011-05-10 18:30:17

by Amir G.

[permalink] [raw]
Subject: Re: [PATCHSET v2] ext4: removal of alloc_sem locks from block allocation paths

On Tue, May 10, 2011 at 4:48 PM, Lukas Czerner <[email protected]> wrote:
> On Thu, 24 Mar 2011, [email protected] wrote:
>
>> The purpose of this patch set is the removal of grp->alloc_sem locks
>> from block allocation paths.
>>
>> The resulting code is cleaner and should perform better in concurrent
>> allocating tasks workloads.
> Hi Amir,
>
> Do you have any performance numbers indicating performance improvement
> in concurrent allocations ? The only point where I can see taking
> write semaphore is in filesystem resize code. Or am I missing something ?

Yes, you are. down_write is also taken when initializing a block group
buddy cache for the first time (NEED_INIT flag is set).
Anyway, I do NOT have any performance number since this wasn't the purpose of
this work. This work was done for snapshots, but I do think that as I wrote:
1. The resulting code is cleaner
2. Getting rid of an unneeded semaphore in allocation path can only do good
to performance, but I certainly don't have the kind of high scalability testing
setup to show the performance improvements if there are any.

>
> Thanks!
> -Lukas
>
>>
>> I ran several xfstests runs with these patches (4K and 1K block size).
>> I tried several online resizes and verifyed that both in-core and on-disk
>> group counters are correct.
>>
>> v2->v1:
>> - fix silly bug in patch 4/5 that triggers BUG_ON(incore == NULL)
>> - replace get_undo_access() with get_write_access()
>> - ran xfstests with block size 1K (where 2 groups share a buddy page)
>>
>> [PATCH v2 1/5] ext4: move ext4_add_groupblocks() to mballoc.c
>> [PATCH v2 2/5] ext4: implement ext4_add_groupblocks() by freeing blocks
>> [PATCH v2 3/5] ext4: synchronize ext4_mb_init_group() with buddy page lock
>> [PATCH v2 4/5] ext4: teach ext4_mb_init_cache() to skip uptodate buddy caches
>> [PATCH v2 5/5] ext4: remove alloc_semp
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>
>
> --
>

2011-05-10 18:43:57

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCHSET v2] ext4: removal of alloc_sem locks from block allocation paths

On Tue, 10 May 2011, Amir G. wrote:

> On Tue, May 10, 2011 at 4:48 PM, Lukas Czerner <[email protected]> wrote:
> > On Thu, 24 Mar 2011, [email protected] wrote:
> >
> >> The purpose of this patch set is the removal of grp->alloc_sem locks
> >> from block allocation paths.
> >>
> >> The resulting code is cleaner and should perform better in concurrent
> >> allocating tasks workloads.
> > Hi Amir,
> >
> > Do you have any performance numbers indicating performance improvement
> > in concurrent allocations ? The only point where I can see taking
> > write semaphore is in filesystem resize code. Or am I missing something ?
>
> Yes, you are. down_write is also taken when initializing a block group
> buddy cache for the first time (NEED_INIT flag is set).
> Anyway, I do NOT have any performance number since this wasn't the purpose of
> this work. This work was done for snapshots, but I do think that as I wrote:
> 1. The resulting code is cleaner
> 2. Getting rid of an unneeded semaphore in allocation path can only do good
> to performance, but I certainly don't have the kind of high scalability testing
> setup to show the performance improvements if there are any.

Well everyone who wants to use that group has to wait for this
initialization anyway, right ?

Of course I know that the purpose of this was to ease your snapshot
work, but I do not see that stated anywhere in the description ;).
Anyway I was just curious what impact does this have on the
performance since you've mentioned that, if you do not have any I am ok
with that.

Btw, patches looks good to me, but I have not done _very_ deep review.

Thanks!
-Lukas

>
> >
> > Thanks!
> > -Lukas
> >
> >>
> >> I ran several xfstests runs with these patches (4K and 1K block size).
> >> I tried several online resizes and verifyed that both in-core and on-disk
> >> group counters are correct.
> >>
> >> v2->v1:
> >> - fix silly bug in patch 4/5 that triggers BUG_ON(incore == NULL)
> >> - replace get_undo_access() with get_write_access()
> >> - ran xfstests with block size 1K (where 2 groups share a buddy page)
> >>
> >> [PATCH v2 1/5] ext4: move ext4_add_groupblocks() to mballoc.c
> >> [PATCH v2 2/5] ext4: implement ext4_add_groupblocks() by freeing blocks
> >> [PATCH v2 3/5] ext4: synchronize ext4_mb_init_group() with buddy page lock
> >> [PATCH v2 4/5] ext4: teach ext4_mb_init_cache() to skip uptodate buddy caches
> >> [PATCH v2 5/5] ext4: remove alloc_semp
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> >> the body of a message to [email protected]
> >> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> >>
> >
> > --
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-05-10 18:58:27

by Amir G.

[permalink] [raw]
Subject: Re: [PATCHSET v2] ext4: removal of alloc_sem locks from block allocation paths

On Tue, May 10, 2011 at 9:43 PM, Lukas Czerner <[email protected]> wrote:
> On Tue, 10 May 2011, Amir G. wrote:
>
>> On Tue, May 10, 2011 at 4:48 PM, Lukas Czerner <[email protected]> wrote:
>> > On Thu, 24 Mar 2011, [email protected] wrote:
>> >
>> >> The purpose of this patch set is the removal of grp->alloc_sem locks
>> >> from block allocation paths.
>> >>
>> >> The resulting code is cleaner and should perform better in concurrent
>> >> allocating tasks workloads.
>> > Hi Amir,
>> >
>> > Do you have any performance numbers indicating performance improvement
>> > in concurrent allocations ? The only point where I can see taking
>> > write semaphore is in filesystem resize code. Or am I missing something ?
>>
>> Yes, you are. down_write is also taken when initializing a block group
>> buddy cache for the first time (NEED_INIT flag is set).
>> Anyway, I do NOT have any performance number since this wasn't the purpose of
>> this work. This work was done for snapshots, but I do think that as I wrote:
>> 1. The resulting code is cleaner
>> 2. Getting rid of an unneeded semaphore in allocation path can only do good
>> to performance, but I certainly don't have the kind of high scalability testing
>> setup to show the performance improvements if there are any.
>
> Well everyone who wants to use that group has to wait for this
> initialization anyway, right ?

Definitely right, but...
All the users that allocate blocks in any group, whether initialized or not,
needed to take the read side of the semaphore.
That was totally unneeded, especially with the common case of
blocksize==pagesize.
Now I don't know the overhead of taking down_read, but I am sure there is some
overhead, which I will not be able to demonstrate on my system.

>
> Of course I know that the purpose of this was to ease your snapshot
> work, but I do not see that stated anywhere in the description ;).

As a matter of fact, I had a much smaller patch which bypassed taking the
semaphore when blocksize==pagesize.
Since snapshots only work with blocksize==pagesize, that was enough for me,
but Ted has requested that I look into getting rid of alloc_semp
altogether, so I did :-)

> Anyway I was just curious what impact does this have on the
> performance since you've mentioned that, if you do not have any I am ok
> with that.
>
> Btw, patches looks good to me, but I have not done _very_ deep review.
>
> Thanks!
> -Lukas
>
>>
>> >
>> > Thanks!
>> > -Lukas
>> >
>> >>
>> >> I ran several xfstests runs with these patches (4K and 1K block size).
>> >> I tried several online resizes and verifyed that both in-core and on-disk
>> >> group counters are correct.
>> >>
>> >> v2->v1:
>> >> - fix silly bug in patch 4/5 that triggers BUG_ON(incore == NULL)
>> >> - replace get_undo_access() with get_write_access()
>> >> - ran xfstests with block size 1K (where 2 groups share a buddy page)
>> >>
>> >> [PATCH v2 1/5] ext4: move ext4_add_groupblocks() to mballoc.c
>> >> [PATCH v2 2/5] ext4: implement ext4_add_groupblocks() by freeing blocks
>> >> [PATCH v2 3/5] ext4: synchronize ext4_mb_init_group() with buddy page lock
>> >> [PATCH v2 4/5] ext4: teach ext4_mb_init_cache() to skip uptodate buddy caches
>> >> [PATCH v2 5/5] ext4: remove alloc_semp
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> >> the body of a message to [email protected]
>> >> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>> >>
>> >
>> > --
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>