2001-11-04 02:27:30

by Daniel Phillips

[permalink] [raw]
Subject: Ext2 directory index, updated

***N.B.: still for use on test partitions only.***

This update mainly fixes a bug, a one-in-a-million occurance on an untested
code path. This bug resulted in rm -rf deleting all files but one from a
million-file directory. I believe that's the last untested code path, and
otherwise it's been very stable.

I didn't expect highmem to work properly, and it didn't. It's on my to-do
list, but for now highmem has to be off or you will oops on boot.

I elaborated the dx_show_buckets debug output to show dump the full index
tree instead of just one level. This function now serves as a capsule
summary of the index tree structure, and as you can see, it's simple.

I've done quite a bit more testing, including stress testing on a real
machine and I find that everything works quite comfortably up to about 2
million files, turning in an average time of about 50 microseconds/create and
300 microseconds/delete (1 GHz PIII). In the 4 million file range things go
pear-shaped, which I believe is not due to the index patch, but to rd. The
runs do complete, but require exponentially more time, with cpu 98% idle and
block throughput in the 300/second range. I'll look into that more later.

I did run into some bad mm behavior on 2.4.13. The icache seems to be too
severely throttled, resulting in delete performance being less than it should
be. I also find I am rarely unable to create a million file test run on uml
(2.4.13) without oom-ing. In my experience, such problems are not due to
uml, but to the kernel's memory manager. These issues may have been
addressed in recent pre-patch kernels, but it seems there is a still some
room for improvement in mm stability.

The patch is available at:

http://nl.linux.org/~phillips/htree/ext2.index-2.4.13

To apply:

cd /your/source/tree
patch -p0 <this.patch

--
Daniel


2001-11-04 02:43:42

by Daniel Phillips

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

On November 4, 2001 03:28 am, I wrote:
> The patch is available at:
>
> http://nl.linux.org/~phillips/htree/ext2.index-2.4.13
>

Make that:

http://nl.linux.org/~phillips/htree/ext2.index-2.4.13-2

--
Daniel

2001-11-04 22:10:04

by Christian Laursen

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

Daniel Phillips <[email protected]> writes:

> ***N.B.: still for use on test partitions only.***

It's the first time, I've tried this patch and I must say, that
the first impression is very good indeed.

I took a real world directory (my linux-kernel MH folder containing
roughly 115000 files) and did a 'du -s' on it.

Without the patch it took a little more than 20 minutes to complete.

With the patch, it took less than 20 seconds. (And that was inside uml)


However, when I accidentally killed the uml, it left me with an unclean
filesystem which fsck refuses to touch because it has unsupported features.

Even the latest version does this.

Is there a patch for fsck, that fixes this somewhere?

--
Best regards
Christian Laursen

2001-11-04 22:23:38

by Daniel Phillips

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

On November 4, 2001 11:09 pm, Christian Laursen wrote:
> Daniel Phillips <[email protected]> writes:
>
> > ***N.B.: still for use on test partitions only.***
>
> It's the first time, I've tried this patch and I must say, that
> the first impression is very good indeed.
>
> I took a real world directory (my linux-kernel MH folder containing
> roughly 115000 files) and did a 'du -s' on it.
>
> Without the patch it took a little more than 20 minutes to complete.
>
> With the patch, it took less than 20 seconds. (And that was inside uml)
>
>
> However, when I accidentally killed the uml, it left me with an unclean
> filesystem which fsck refuses to touch because it has unsupported features.
>
> Even the latest version does this.
>
> Is there a patch for fsck, that fixes this somewhere?

Ted Ts'o volunteered to do that but I failed to support him with proper
documentation so it hasn't been done yet.

However, it's very easy to get around this, just comment out the part of the
patch that sets the incompat flag. Then the indexed directories will
magically turn back into normal directories the next time you write to them
(it would be very good to give this feature a real-life test :-)

There is an easy way to turn that FEATURE_COMPAT flag back off so you can
fsck, but I don't know it and I should.

Andreas?

--
Daniel

2001-11-04 22:55:17

by Christian Laursen

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

Daniel Phillips <[email protected]> writes:

> There is an easy way to turn that FEATURE_COMPAT flag back off so you can
> fsck, but I don't know it and I should.

I figured it out by myself. :)

$ debugfs -w root_fs
debugfs: feature -dir_index

--
Best regards
Christian Laursen

2001-11-04 23:01:17

by Daniel Phillips

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

On November 4, 2001 11:24 pm, Daniel Phillips wrote:
> On November 4, 2001 11:09 pm, Christian Laursen wrote:
> > Daniel Phillips <[email protected]> writes:
> > However, when I accidentally killed the uml, it left me with an unclean
> > filesystem which fsck refuses to touch because it has unsupported
features.
> >
> > Even the latest version does this.
> >
> > Is there a patch for fsck, that fixes this somewhere?
>
> [...]
> There is an easy way to turn that FEATURE_COMPAT flag back off so you can
> fsck, but I don't know it and I should.

It's debug2fs, details to come.

The COMPAT_FEATURE thing is a problem, we *are* supposed to be able to fsck
a volume that has indexed directories on it with old versions of fsck, and
it's only the COMPAT_FEATURE flag that prevents this. You tried fsck -f
and it didn't work, right?

For using the -o index option on a non-throwaway volume, we should do this:

void ext2_add_compat_feature (struct super_block *sb, unsigned feature)
{
+ return;
if (!EXT2_HAS_COMPAT_FEATURE(sb, feature))
{

And afterwards you can rm -rf your test directory, though actually normal
ext2 shouldn't see anything unusual about it. The real reason for rm'ing the
test directory is so that I can tweak the index format in upcoming prerelease
versions.

I've disabled the add_compat_feature here for now, because until fsck can
handle it, it just causes trouble. I'll go read Andreas' writeup on the
COMPAT flags again and see if I can come up with a more friendly
interpretation.

--
Daniel

2001-11-04 23:10:19

by Gábor Lénárt

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

Hmmm. Maybe some off-topic questions follows:

* Is there patch for directory index and ext3 together?
* Will dirindex ever be a part of official kernels (eg: 2.5.x) ?

- Gabor

2001-11-05 01:42:52

by Daniel Phillips

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

On November 4, 2001 11:09 pm, Christian Laursen wrote:
> Daniel Phillips <[email protected]> writes:
>
> > ***N.B.: still for use on test partitions only.***
>
> It's the first time, I've tried this patch and I must say, that
> the first impression is very good indeed.
>
> I took a real world directory (my linux-kernel MH folder containing
> roughly 115000 files) and did a 'du -s' on it.
>
> Without the patch it took a little more than 20 minutes to complete.
>
> With the patch, it took less than 20 seconds. (And that was inside uml)

Which kernel are you using? From 2.4.10 on ext2 has an accelerator in
ext2_find_entry - it caches the last lookup position. I'm wondering how that
affects this case.

--
Daniel

2001-11-05 07:48:53

by Ville Herva

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

On Mon, Nov 05, 2001 at 02:43:28AM +0100, you [Daniel Phillips] claimed:
>
> Which kernel are you using? From 2.4.10 on ext2 has an accelerator in
> ext2_find_entry - it caches the last lookup position. I'm wondering how that
> affects this case.

Is that the same optimization Ted T'so implemented for ext3 around 0.9.10? I
thought it hadn't been ported the ext2...

BTW, I assume the ext2 dir index patch is roughly equivalent to FreeBSD
dirhash and the the other patch resembles theFreeBSD dirperf patch?
Have you looked at them? [http://www.osnews.com/story.php?news_id=153]


-- v --

[email protected]

2001-11-05 09:53:15

by Daniel Phillips

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

On November 5, 2001 08:48 am, Ville Herva wrote:
> On Mon, Nov 05, 2001 at 02:43:28AM +0100, you [Daniel Phillips] claimed:
> >
> > Which kernel are you using? From 2.4.10 on ext2 has an accelerator in
> > ext2_find_entry - it caches the last lookup position. I'm wondering how that
> > affects this case.
>
> Is that the same optimization Ted T'so implemented for ext3 around 0.9.10? I
> thought it hadn't been ported the ext2...

Yes, Ted did it, earlier this year.

> BTW, I assume the ext2 dir index patch is roughly equivalent to FreeBSD
> dirhash and the the other patch resembles theFreeBSD dirperf patch?
> Have you looked at them? [http://www.osnews.com/story.php?news_id=153]

I *think* the performance of my dir index patch is roughly in line with BSD's
dirhash patch, for common cases. The big difference is that the BSD dirhash
is not persistent - the cache goes away when the directory is closed. So
there are loads that can break it badly, such as accessing files in large
directories randomly over a large disk. This forces the entire directory to
be read into cache, in the worst case, on every access. Another bad case is
first-time access. A million file directory is around 30 meg - it takes a
long time to read and hash all those blocks, just to open the first file.

They will have to implement a persistent index at some point. For common
cases though, the BSD approach is good.

I'll go into the gory details next week at ALS if people are insterested.

--
Daniel

2001-11-05 22:11:17

by Andreas Dilger

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

On Nov 05, 2001 00:01 +0100, Daniel Phillips wrote:
> For using the -o index option on a non-throwaway volume, we should do this:
>
> void ext2_add_compat_feature (struct super_block *sb, unsigned feature)
> {
> + return;
> if (!EXT2_HAS_COMPAT_FEATURE(sb, feature))
> {
>
> And afterwards you can rm -rf your test directory, though actually normal
> ext2 shouldn't see anything unusual about it. The real reason for rm'ing the
> test directory is so that I can tweak the index format in upcoming prerelease
> versions.

Well, e2fsck _should_ really know about the fact that there are indexed
directories in the filesystem, which is what the COMPAT flag flag is for.
The only current issue is that e2fsck doesn't understand this compat flag.

> I've disabled the add_compat_feature here for now, because until fsck can
> handle it, it just causes trouble. I'll go read Andreas' writeup on the
> COMPAT flags again and see if I can come up with a more friendly
> interpretation.

No, COMPAT is the friendliest. It means old kernels can read/write this
filesystem without problems, just that e2fsck can't/won't check it. Even
though an old fsck _probably_ won't break such a filesystem, there is no
guarantee of that, and it definitely won't validate the indexes, so a
"successfull" fsck of an indexed directory doesn't mean anything until it
can understand this COMPAT flag.

That said, I agree that turning the COMPAT flag off for short term testing
is probably not fatal, but I thought we were not going to even suggest
using non-throwaway filesystems until the hash function was finalized? In
the end, if an updated e2fsck detects the DIR_INDEX flag (and valid indexes
therein) it will turn on the COMPAT flag for us, so all will be well. I
don't advise that we push for patch inclusion until e2fsck is done, however.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2001-11-05 22:59:52

by Christian Laursen

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

Daniel Phillips <[email protected]> writes:

> On November 4, 2001 11:09 pm, Christian Laursen wrote:
> > Daniel Phillips <[email protected]> writes:
> >
> > > ***N.B.: still for use on test partitions only.***
> >
> > It's the first time, I've tried this patch and I must say, that
> > the first impression is very good indeed.
> >
> > I took a real world directory (my linux-kernel MH folder containing
> > roughly 115000 files) and did a 'du -s' on it.
> >
> > Without the patch it took a little more than 20 minutes to complete.
> >
> > With the patch, it took less than 20 seconds. (And that was inside uml)
>
> Which kernel are you using?

Actually, it was on a 2.2.20 kernel.

> From 2.4.10 on ext2 has an accelerator in
> ext2_find_entry - it caches the last lookup position. I'm wondering how that
> affects this case.

>From the description I read a while ago, I believe it could cause a significant
speedup.

I'll have to try that out one of these days.

--
Best regards
Christian Laursen

2001-11-05 23:12:43

by Daniel Phillips

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

On November 5, 2001 11:59 pm, Christian Laursen wrote:
> Daniel Phillips <[email protected]> writes:
>
> > On November 4, 2001 11:09 pm, Christian Laursen wrote:
> > > Daniel Phillips <[email protected]> writes:
> > >
> > > > ***N.B.: still for use on test partitions only.***
> > >
> > > It's the first time, I've tried this patch and I must say, that
> > > the first impression is very good indeed.
> > >
> > > I took a real world directory (my linux-kernel MH folder containing
> > > roughly 115000 files) and did a 'du -s' on it.
> > >
> > > Without the patch it took a little more than 20 minutes to complete.
> > >
> > > With the patch, it took less than 20 seconds. (And that was inside uml)
> >
> > Which kernel are you using?
>
> Actually, it was on a 2.2.20 kernel.

Yes, it's cool you can run 2.4 uml kernels on 2.2, isn't it? What I meant
was, which kernel is your uml built on?

> > From 2.4.10 on ext2 has an accelerator in
> > ext2_find_entry - it caches the last lookup position. I'm wondering how
> > that affects this case.
>
> From the description I read a while ago, I believe it could cause a
> significant speedup.
>
> I'll have to try that out one of these days.

I noticed split results with the find_entry accelerator, at least in its
current form: faster delete, slower create.

--
Daniel

2001-11-05 23:46:13

by Andreas Dilger

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

On Nov 06, 2001 00:13 +0100, Daniel Phillips wrote:
> > > From 2.4.10 on ext2 has an accelerator in
> > > ext2_find_entry - it caches the last lookup position. I'm wondering how
> > > that affects this case.
> >
> > From the description I read a while ago, I believe it could cause a
> > significant speedup.
> >
> > I'll have to try that out one of these days.
>
> I noticed split results with the find_entry accelerator, at least in its
> current form: faster delete, slower create.

Well, according to reiserfs benchmarks at:
http://namesys.com/benchmarks/mongo/2.4.8_vs_2.4.9_vs_2.4.10_table.txt

the accelerator speeds up stat times (in all cases) by a factor of 3 to 5.
Create times are reduced as well (although not as much). In fact, it also
shows delete speed as being slower, but that is hard to quantify as the
reiserfs delete spped is slower also.

It actually looks like both ext2 and reiserfs took a hit in the read
department in 2.4.10 as well. Maybe a bad interaction with the page
cache or something? It would also be worthwhile to go back to the
addition of directories-in-pagecache as well, because I seem to recall
posting about a hit in read performance at that time as well, and never
really heard anything about it.

The bonnie++ benchmark doesn't show any obvious trends (incomplete tables):
http://namesys.com/benchmarks/bonnie/2.4.8_2.4.9_2.4.10.txt

I'll have to go and update my bonnie benchmarks for newer kernels (last
run when testing indexed directores and dir-in-pagecache at 2.4.5).

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2001-11-06 00:37:31

by Daniel Phillips

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

On November 5, 2001 11:10 pm, Andreas Dilger wrote:
> On Nov 05, 2001 00:01 +0100, Daniel Phillips wrote:
> > For using the -o index option on a non-throwaway volume, we should do
this:
> >
> > void ext2_add_compat_feature (struct super_block *sb, unsigned feature)
> > {
> > + return;
> > if (!EXT2_HAS_COMPAT_FEATURE(sb, feature))
> > {
> >
> > And afterwards you can rm -rf your test directory, though actually normal
> > ext2 shouldn't see anything unusual about it. The real reason for rm'ing
the
> > test directory is so that I can tweak the index format in upcoming
prerelease
> > versions.
>
> Well, e2fsck _should_ really know about the fact that there are indexed
> directories in the filesystem, which is what the COMPAT flag flag is for.
> The only current issue is that e2fsck doesn't understand this compat flag.
>
> > I've disabled the add_compat_feature here for now, because until fsck can
> > handle it, it just causes trouble. I'll go read Andreas' writeup on the
> > COMPAT flags again and see if I can come up with a more friendly
> > interpretation.
>
> No, COMPAT is the friendliest. It means old kernels can read/write this
> filesystem without problems, just that e2fsck can't/won't check it. Even
> though an old fsck _probably_ won't break such a filesystem, there is no
> guarantee of that,

Well, it's hard to see how the fsck could hurt, since all the blocks of the
directory look like legitimate empty blocks. When did

> and it definitely won't validate the indexes, so a
> "successfull" fsck of an indexed directory doesn't mean anything until it
> can understand this COMPAT flag.
>
> That said, I agree that turning the COMPAT flag off for short term testing
> is probably not fatal, but I thought we were not going to even suggest
> using non-throwaway filesystems until the hash function was finalized?

True. Right now, I'm interested in finding out exactly how the old fscks are
going to behave when they run into indexed directories, so I'll leave the
COMPAT flag off for now and turn it back on when we hit the first
format-frozen release. The method of restoring a partition to a fsckable
state is easy to document:

# debugfs -w root_fs
debugfs: feature -dir_index

Anybody who's running the patch will have access to a recent version of
debugfs that knows about the dir_index flag.

> In
> the end, if an updated e2fsck detects the DIR_INDEX flag (and valid indexes
> therein) it will turn on the COMPAT flag for us, so all will be well. I
> don't advise that we push for patch inclusion until e2fsck is done, however.

Yes, as long as testers heed my warning and stick to test partitions there's
no particular danger. There's a simple recovery procedure for anyone who
doesn't want to bother re-mkfsing the partition. We're in pretty good shape.

My improved show_buckets routine is working code that could be used to get
started on the new fsck code. It walks the index in hash bucket order
dumping out statistics, and, together with the checks in dx_probe, basically
defines the index format.

--
Daniel

2001-11-08 07:22:06

by Christian Laursen

[permalink] [raw]
Subject: Re: Ext2 directory index, updated

Daniel Phillips <[email protected]> writes:

> On November 4, 2001 11:09 pm, Christian Laursen wrote:
> > Daniel Phillips <[email protected]> writes:
> >
> > > ***N.B.: still for use on test partitions only.***
> >
> > It's the first time, I've tried this patch and I must say, that
> > the first impression is very good indeed.
> >
> > I took a real world directory (my linux-kernel MH folder containing
> > roughly 115000 files) and did a 'du -s' on it.
> >
> Which kernel are you using? From 2.4.10 on ext2 has an accelerator in
> ext2_find_entry - it caches the last lookup position. I'm wondering how that
> affects this case.

I ran the tests again and got some real numbers this time.

The accelerator should work as normal, when the filesystem is not
mounted with -o index, shouldn't it (Although it's on a kernel
with the directory index patch)?

xi@tam:~/Mail > uname -a
Linux tam 2.4.13-3um #1 Sun Nov 4 14:29:19 CET 2001 i686 unknown

xi@tam:~/Mail > mount
/dev/ubd0 on / type ext2 (rw,index)
proc on /proc type proc (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
/dev/ubd2 on /mnt/flaf type ext2 (rw)

xi@tam:/mnt/flaf > time du -s linux-kernel/
685652 linux-kernel

real 19m14.689s
user 0m1.650s
sys 23m39.000s

xi@tam:~/Mail > time du -s linux-kernel/
686432 linux-kernel

real 1m8.363s
user 0m5.500s
sys 0m57.350s


--
Best regards
Christian Laursen