LinuxLists.cc - [PATCH] concurrent block allocation for ext3

2003-03-10 15:37:47

Subject: [PATCH] concurrent block allocation for ext3

Hi!

Here is the small patch which implements concurrent block allocation
for ext3. It removes lock_super() in ext3_new_block() and ext3_free_blocks().
Modifications of counters in superblock and group descriptors are protected
by spinlock. Tested on SMP for several hours.

--- linux/fs/ext3/balloc.c Thu Feb 20 16:19:06 2003
+++ balloc.c Mon Mar 10 16:00:49 2003
@@ -118,7 +118,6 @@
printk ("ext3_free_blocks: nonexistent device");
return;
}
- lock_super (sb);
es = EXT3_SB(sb)->s_es;
if (block < le32_to_cpu(es->s_first_data_block) ||
block + count < block ||
@@ -214,11 +213,13 @@
block + i);
BUFFER_TRACE(bitmap_bh, "bit already cleared");
} else {
+ spin_lock(&EXT3_SB(sb)->s_alloc_lock);
dquot_freed_blocks++;
gdp->bg_free_blocks_count =
cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count)+1);
es->s_free_blocks_count =
cpu_to_le32(le32_to_cpu(es->s_free_blocks_count)+1);
+ spin_unlock(&EXT3_SB(sb)->s_alloc_lock);
}
/* @@@ This prevents newly-allocated data from being
* freed and then reallocated within the same
@@ -267,7 +268,6 @@
error_return:
brelse(bitmap_bh);
ext3_std_error(sb, err);
- unlock_super(sb);
if (dquot_freed_blocks)
DQUOT_FREE_BLOCK(inode, dquot_freed_blocks);
return;
@@ -408,7 +408,6 @@
return 0;
}

- lock_super(sb);
es = EXT3_SB(sb)->s_es;
if (le32_to_cpu(es->s_free_blocks_count) <=
le32_to_cpu(es->s_r_blocks_count) &&
@@ -461,6 +460,7 @@

ext3_debug("Bit not found in block group %d.\n", group_no);

+repeat:
/*
* Now search the rest of the groups. We assume that
* i and gdp correctly point to the last group visited.
@@ -538,9 +538,9 @@

/* The superblock lock should guard against anybody else beating
* us to this point! */
- J_ASSERT_BH(bitmap_bh, !ext3_test_bit(ret_block, bitmap_bh->b_data));
BUFFER_TRACE(bitmap_bh, "setting bitmap bit");
- ext3_set_bit(ret_block, bitmap_bh->b_data);
+ if (ext3_set_bit(ret_block, bitmap_bh->b_data))
+ goto repeat;
performed_allocation = 1;

#ifdef CONFIG_JBD_DEBUG
@@ -586,11 +586,13 @@
ext3_debug("allocating block %d. Goal hits %d of %d.\n",
ret_block, goal_hits, goal_attempts);

+ spin_lock(&EXT3_SB(sb)->s_alloc_lock);
gdp->bg_free_blocks_count =
cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) - 1);
es->s_free_blocks_count =
cpu_to_le32(le32_to_cpu(es->s_free_blocks_count) - 1);
-
+ spin_unlock(&EXT3_SB(sb)->s_alloc_lock);
+
BUFFER_TRACE(gdp_bh, "journal_dirty_metadata for group descriptor");
err = ext3_journal_dirty_metadata(handle, gdp_bh);
if (!fatal)
@@ -606,7 +608,6 @@
if (fatal)
goto out;

- unlock_super(sb);
*errp = 0;
brelse(bitmap_bh);
return ret_block;
@@ -618,7 +619,6 @@
*errp = fatal;
ext3_std_error(sb, fatal);
}
- unlock_super(sb);
/*
* Undo the block allocation
*/

2003-03-10 16:16:55

by Andreas Dilger

[permalink] [raw]

Subject: Re: [PATCH] concurrent block allocation for ext3

On Mar 10, 2003 18:41 +0300, Alex Tomas wrote:
> Here is the small patch which implements concurrent block allocation
> for ext3. It removes lock_super() in ext3_new_block() and ext3_free_blocks().
> Modifications of counters in superblock and group descriptors are protected
> by spinlock. Tested on SMP for several hours.

Any ideas on how much this improves the performance? What sort of tests
were you running? We could improve things a bit further by having separate
per-group locks for the update of the group descriptor info, and only
lazily update the superblock at statfs and unmount time (with a suitable
feature flag so e2fsck can fix this up at recovery time), but you seem
to have gotten the majority of the parallelism from this fix.

> @@ -214,11 +213,13 @@
> block + i);
> BUFFER_TRACE(bitmap_bh, "bit already cleared");
> } else {
> + spin_lock(&EXT3_SB(sb)->s_alloc_lock);
> dquot_freed_blocks++;
> gdp->bg_free_blocks_count =
> cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count)+1);
> es->s_free_blocks_count =
> cpu_to_le32(le32_to_cpu(es->s_free_blocks_count)+1);
> + spin_unlock(&EXT3_SB(sb)->s_alloc_lock);

One minor nit is that you left an ext3_error() for the "bit already cleared"
case just above this patch hunk.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2003-03-10 16:28:55

by Daniel Phillips

[permalink] [raw]

Subject: Re: [PATCH] concurrent block allocation for ext3

On Mon 10 Mar 03 17:25, Andreas Dilger wrote:
> One minor nit is that you left an ext3_error() for the "bit already
> cleared" case just above this patch hunk.

But that one belongs there, because no two threads should be trying to free
the same block at the same time.

Regards,

Daniel

2003-03-10 16:30:23

by Alex Tomas

[permalink] [raw]

Subject: Re: [PATCH] concurrent block allocation for ext3

>>>>> Andreas Dilger (AD) writes:

AD> Any ideas on how much this improves the performance? What sort
AD> of tests were you running? We could improve things a bit further
AD> by having separate per-group locks for the update of the group
AD> descriptor info, and only lazily update the superblock at statfs
AD> and unmount time (with a suitable feature flag so e2fsck can fix
AD> this up at recovery time), but you seem to have gotten the
AD> majority of the parallelism from this fix.

I'm trying to measure improvement.

The tests were:

1) on big fs (1GB)
lots of processes (up to 50) creating, removing directories and files +
untaring kernel and make -j4 bzImage +
dd if=/dev/zero of=/mnt/dump.file bs=1M count=8000; rm -f /mnt/dump.file

2) on small fs (64MB)
20 processes create and remove lots of files and directories

in fact, I catched dozens of debug messages about set_bit collision. Then
I fscked fs to be sure all is ok.

>> @@ -214,11 +213,13 @@ block + i); BUFFER_TRACE(bitmap_bh, "bit
>> already cleared"); } else { +
>> spin_lock(&EXT3_SB(sb)->s_alloc_lock); dquot_freed_blocks++;
gdp-> bg_free_blocks_count =
>> cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count)+1);
es-> s_free_blocks_count =
>> cpu_to_le32(le32_to_cpu(es->s_free_blocks_count)+1); +
>> spin_unlock(&EXT3_SB(sb)->s_alloc_lock);

AD> One minor nit is that you left an ext3_error() for the "bit
AD> already cleared" case just above this patch hunk.

hmm. whats wrong with it?

with best regards, Alex

2003-03-14 21:21:33

by Martin J. Bligh

[permalink] [raw]

Subject: Re: [PATCH] concurrent block allocation for ext3

SDET on my machine (16x NUMA-Q) has fallen in love with your patch,
and has decided to elope with it to a small desert island. This is
despite it's one disk hung off node 0, and the IO througput of a
slightly damp piece of cotton thread. Apologies for the loss of your
patch as it gets whisked away ;-)

M.

PS. Oh, I had this bit, per akpm-instructions: For best results, add ____cacheline_aligned_in_smp to struct ext2_bg_info

PPS. I'll try to run some more focused tests with aim7 over the weekend.
As if we needed it ...

-------------------------

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 1 (see disclaimer)
Throughput Std. Dev
2.5.64-bk3-mjb3 100.0% 1.8%
2.5.64-mjb3-ext2 102.0% 1.1%

SDET 2 (see disclaimer)
Throughput Std. Dev
2.5.64-bk3-mjb3 100.0% 3.7%
2.5.64-mjb3-ext2 106.1% 3.1%

SDET 4 (see disclaimer)
Throughput Std. Dev
2.5.64-bk3-mjb3 100.0% 1.5%
2.5.64-mjb3-ext2 101.1% 2.1%

SDET 8 (see disclaimer)
Throughput Std. Dev
2.5.64-bk3-mjb3 100.0% 0.2%
2.5.64-mjb3-ext2 113.3% 0.7%

SDET 16 (see disclaimer)
Throughput Std. Dev
2.5.64-bk3-mjb3 100.0% 1.1%
2.5.64-mjb3-ext2 167.1% 0.8%

SDET 32 (see disclaimer)
Throughput Std. Dev
2.5.64-bk3-mjb3 100.0% 0.9%
2.5.64-mjb3-ext2 170.7% 0.1%

SDET 64 (see disclaimer)
Throughput Std. Dev
2.5.64-bk3-mjb3 100.0% 0.7%
2.5.64-mjb3-ext2 157.2% 0.5%

SDET 128 (see disclaimer)
Throughput Std. Dev
2.5.64-bk3-mjb3 100.0% 0.3%
2.5.64-mjb3-ext2 151.3% 0.8%

2003-03-15 02:55:42

by Martin J. Bligh

[permalink] [raw]

Subject: Re: [PATCH] concurrent block allocation for ext3

> SDET on my machine (16x NUMA-Q) has fallen in love with your patch,
> and has decided to elope with it to a small desert island. This is
> despite it's one disk hung off node 0, and the IO througput of a
> slightly damp piece of cotton thread. Apologies for the loss of your
> patch as it gets whisked away ;-)

Dbench (1 disk, x440 8 real cpus, 16 HT ones)

before:
Throughput 265.032 MB/sec (NB=331.29 MB/sec 2650.32 MBit/sec) 256 procs
after:
Throughput 381.964 MB/sec (NB=477.454 MB/sec 3819.64 MBit/sec) 256 procs

(I took the second run, first ones are slower, seems to be stable after)

NUMA-Q 16-way (1 disk. 16 cpus)

before:
Throughput 48.5304 MB/sec (NB=60.663 MB/sec 485.304 MBit/sec) 256 procs
after:
Throughput 58.8483 MB/sec (NB=73.5603 MB/sec 588.483 MBit/sec) 256 procs

NUMA-Q has slower disks, old adaptors, and a slow cross-node interconnect.

2003-03-15 05:57:43

by Martin J. Bligh

[permalink] [raw]

Subject: Re: [PATCH] concurrent block allocation for ext3

> before:
> Throughput 48.5304 MB/sec (NB=60.663 MB/sec 485.304 MBit/sec) 256 procs
> after:
> Throughput 58.8483 MB/sec (NB=73.5603 MB/sec 588.483 MBit/sec) 256 procs

OK, akpm wanted dbench 32 instead:

before:

Throughput 187.637 MB/sec (NB=234.546 MB/sec 1876.37 MBit/sec) 32 procs

after:

Throughput 378.664 MB/sec (NB=473.33 MB/sec 3786.64 MBit/sec) 32 procs

/me likes.

M.