The patches to follow clean up a lot of the ext3 online reservation
code in 2.6.9-rc2-mm4. There are a few minor fixes for things like
loglevels of printks and correcting some error returns, plus
refactoring a bit of existing ext3 code to allow resize to avoid dummy
on-stack inodes.
There's also a review of the whole SMP locking of the resize. Locking
is minimised: the impact on the hot path consists of nothing more than
an smp_rmb() before we test sb->s_groups_count. That's a noop on x86,
but is a bit expensive on archs with a weak memory order; I've tried to
minimise that by reading it just once where previously it was read each
time round a loop, but I don't see how to avoid the cost entirely.
Finally, sb->s_debts is nuked from ext3. It's broken already, as per my
email a week or two ago --- the per-group s_debt[] counts never get
modified. We could probably do with nuking it from ext2 too, as it's
(differently) broken there (performs unlocked byte inc/dec operations on
a shared array and is vulnerable to word-tearing problems.)
This should address all of the points akpm had in his review of resize
a while back, except for the documentation/user space side of things
and the lack of error checking in certain ext3_journal_dirty_metadata
calls: I'm still fixing those up (I'll try to push out a working
user-space for this later today.)
sct wrote:
>Locking
>is minimised: the impact on the hot path consists of nothing more than
>an smp_rmb() before we test sb->s_groups_count. That's a noop on x86,
>
No, wrong way around:
wmb() is empty. rmb() is either lfence or a locked dummy instruction.
--
Manfred
Hi,
On Thu, 2004-09-30 at 17:00, Manfred Spraul wrote:
> >Locking
> >is minimised: the impact on the hot path consists of nothing more than
> >an smp_rmb() before we test sb->s_groups_count. That's a noop on x86,
> >
> No, wrong way around:
> wmb() is empty. rmb() is either lfence or a locked dummy instruction.
Hmm. But I'm still not sure we can get away with anything
lighter-weight.
The basic construct we need to worry about is:
new_group_table = kmalloc(...);
memcpy(new_group_table, old_group_table);
new_group_table[new_group] = foo;
sbi->s_group_desc = new_group_table;
/* SMP WRITE BARRIER */
sbi->s_group_count = new_group_count;
on the writer side, and
ngroups = sbi->s_group_count;
/* SMP READ BARRIER */
for (i = 0; i < ngroups; i++)
gdp = sbi->s_group_desc[i];
The latter is the worry --- we're doing a read that depends immediately
on "i" and s_group_desc, but not on sbi->s_group_count. There *IS* a
comparison between i and s_group_count, though, so the dependency is
implicit.
I'm just not familiar enough with the architecture of weakly-ordered
platforms to know if we can get away with smp_read_barrier_depends() in
this particular case. If so, we can use that and be done with the extra
locked op on x86.
--Stephen
On Sep 30, 2004 14:23 +0100, Stephen Tweedie wrote:
> The patches to follow clean up a lot of the ext3 online reservation
> code in 2.6.9-rc2-mm4. There are a few minor fixes for things like
> loglevels of printks and correcting some error returns, plus
> refactoring a bit of existing ext3 code to allow resize to avoid dummy
> on-stack inodes.
Many thanks to Stephen for putting in the effort to bring this into
shape. All of the patches look good.
Cheers, Andreas
--
Andreas Dilger
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/