2008-08-28 14:59:09

by Eric Sandeen

[permalink] [raw]
Subject: Do we need dump for ext4?

I was talking to Ric about dump benchmarks, and he was of the impression
that dump may not be used that often anymore, at least in the
enterprise. (Ric, hope I'm paraphrasing correctly)

Undaunted :) I ran off and tested an artificial backup scenario:

* Untar a kernel tree into 128 different top level dirs
* Make a level 0 backup
* Untar a kernel tree into 128 MORE different top level dirs
* Make a level 1 backup

128 kernel trees uses about 6.5M inodes, and about 80G of space.

I tested ext3 with dump; ext4 with tar, and xfs with xfsdump.

for ext3:
dump -1 -u -f $DUMPDIR/dump1 $DATADIR

for ext4:
tar --atime-preserve --xattr --after-date=$DUMPDIR/dump0.tar -cf
$DUMPDIR/dump1.tar $DATADIR

for xfs:
xfsdump -F -l 1 -f $DUMPDIR/dump0 $DATADIR

DUMPDIR and DATADIR were 2 partitions on the same (fast hardware) lun.

Results:

level0 level1
------ ------
ext3 38m52s 42m21s
ext4 57m55s 69m35s
xfs 25m18s 37m44s

Clearly tar on ext4, at least for my incantation, lags. Is dump for
ext4 anywhere on the todo list, or should it be? Or am I just running
tar wrong? :)

-Eric


2008-08-28 18:48:08

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

On Thu, Aug 28, 2008 at 09:58:10AM -0500, Eric Sandeen wrote:
> I was talking to Ric about dump benchmarks, and he was of the impression
> that dump may not be used that often anymore, at least in the
> enterprise.

Many people don't use the dump/restore any more program any more,
that's definitely true. Whether people use backups (as opposed to
large amounts of RAID) in the enterprise is a different question. I'm
not so sure about the second question.

So a couple of comments. First, it's probably not fair to use
different backup programs for the different filesystems. We probably
want to do one set of comparisons where we use tar for all three.
(Note: not all backup/dump programs are doing the right things with
xattr's, so we're not necessarily comparing programs with completely
identical functionality.)

Secondly, it really wouldn't be hard to update dump/restore for ext4.
It uses libext2fs, so the real problem is that it is explicitly
checking the feature flags. Removing those checks may be all that is
necessary, given that ext2_block_iterate() still works for
extent-based files. I just noted BTW that the dump/restore doesn't
seem to be TOTALLY abandoned. It was last updated in 2006, true, but
there is support for backing up and restoring extended attributes and
ACL's. I wonder if they broke format compatibility with BSD 4.4
format dump/restore backups when they did it --- and if anyone would
still cares. :-)

Finally, I suspect most of the problem with using tar is the HTREE
dirent sorting problem. If we modify tar to sort the directory
entries before emitting the files, and then use that tar across all
the filesystems, I suspect the results would be much more better for
ext3 and ext4.

- Ted

2008-08-28 19:06:03

by Ric Wheeler

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

Theodore Tso wrote:
> On Thu, Aug 28, 2008 at 09:58:10AM -0500, Eric Sandeen wrote:
>
>> I was talking to Ric about dump benchmarks, and he was of the impression
>> that dump may not be used that often anymore, at least in the
>> enterprise.
>>
>
> Many people don't use the dump/restore any more program any more,
> that's definitely true. Whether people use backups (as opposed to
> large amounts of RAID) in the enterprise is a different question. I'm
> not so sure about the second question.
>

I think a lot of high end customers still back up to tape (or virtual
tape which is basically a tape emulation on top of RAID arrays), but
they use commercial programs to do that.

> So a couple of comments. First, it's probably not fair to use
> different backup programs for the different filesystems. We probably
> want to do one set of comparisons where we use tar for all three.
> (Note: not all backup/dump programs are doing the right things with
> xattr's, so we're not necessarily comparing programs with completely
> identical functionality.)
>

I like Chris's acp program since that is heavily optimized (read files
in inode sorted order) and is small enough to tweak.

> Secondly, it really wouldn't be hard to update dump/restore for ext4.
> It uses libext2fs, so the real problem is that it is explicitly
> checking the feature flags. Removing those checks may be all that is
> necessary, given that ext2_block_iterate() still works for
> extent-based files. I just noted BTW that the dump/restore doesn't
> seem to be TOTALLY abandoned. It was last updated in 2006, true, but
> there is support for backing up and restoring extended attributes and
> ACL's. I wonder if they broke format compatibility with BSD 4.4
> format dump/restore backups when they did it --- and if anyone would
> still cares. :-)
>
>

We may as well just time "tar" as an easy baseline.

> Finally, I suspect most of the problem with using tar is the HTREE
> dirent sorting problem. If we modify tar to sort the directory
> entries before emitting the files, and then use that tar across all
> the filesystems, I suspect the results would be much more better for
> ext3 and ext4.
>
> - Ted
>

Like acp ;-)

ric


2008-08-28 19:15:34

by Eric Sandeen

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

Theodore Tso wrote:
> On Thu, Aug 28, 2008 at 09:58:10AM -0500, Eric Sandeen wrote:
>> I was talking to Ric about dump benchmarks, and he was of the impression
>> that dump may not be used that often anymore, at least in the
>> enterprise.
>
> Many people don't use the dump/restore any more program any more,
> that's definitely true. Whether people use backups (as opposed to
> large amounts of RAID) in the enterprise is a different question. I'm
> not so sure about the second question.
>
> So a couple of comments. First, it's probably not fair to use
> different backup programs for the different filesystems.

Well, if a filesystem has a dedicated, presumably optimized backup
utility, why would you not benchmark that as part of the mix? :)

> We probably
> want to do one set of comparisons where we use tar for all three.

Yep, I'm doing that now ... also realized I tested on an inappropriate
elevator (cfq) for a fancy-raid. I'll resend in a bit.

> (Note: not all backup/dump programs are doing the right things with
> xattr's, so we're not necessarily comparing programs with completely
> identical functionality.)

well, tar supposedly is with the options I gave it, xfsdump certainly
does, and dump, I dunno offhand.

> Secondly, it really wouldn't be hard to update dump/restore for ext4.
> It uses libext2fs, so the real problem is that it is explicitly
> checking the feature flags. Removing those checks may be all that is
> necessary, given that ext2_block_iterate() still works for
> extent-based files.

Eh, I'll test that then.

> I just noted BTW that the dump/restore doesn't
> seem to be TOTALLY abandoned. It was last updated in 2006, true, but
> there is support for backing up and restoring extended attributes and
> ACL's.

Ah, ok, so they all should be backing up acls/attrs then.

> I wonder if they broke format compatibility with BSD 4.4
> format dump/restore backups when they did it --- and if anyone would
> still cares. :-)
>
> Finally, I suspect most of the problem with using tar is the HTREE
> dirent sorting problem. If we modify tar to sort the directory
> entries before emitting the files, and then use that tar across all
> the filesystems, I suspect the results would be much more better for
> ext3 and ext4.

True enough, just testing what we have now. I can play with acp...

-Eric

2008-08-28 20:05:01

by Andreas Dilger

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

On Aug 28, 2008 15:03 -0400, Ric Wheeler wrote:
>> Finally, I suspect most of the problem with using tar is the HTREE
>> dirent sorting problem. If we modify tar to sort the directory
>> entries before emitting the files, and then use that tar across all
>> the filesystems, I suspect the results would be much more better for
>> ext3 and ext4.
>
> Like acp ;-)

I think there is little benefit to fixing each program to do sorting.
Either your LD_PRELOAD library should become more standard (e.g. put
into glibc), or we use something like e2scan or a modified find to
generate filenames in inode order.

NB: e2scan itself scans the inode table in order, but it makes no effort
to generate filenames in that order, since Lustre doesn't care about
that. It likely gets it largely right by coincidence.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-08-28 20:35:56

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

On Thu, Aug 28, 2008 at 02:04:48PM -0600, Andreas Dilger wrote:
>
> I think there is little benefit to fixing each program to do sorting.
> Either your LD_PRELOAD library should become more standard (e.g. put
> into glibc)

Yeah, but that requires dealing with Ulrich and for my own mental
health I try to avoid that as much as possible. :-)

This idea is something that has been in my "if only I had time or some
minions to dispatch" category for quite some time. We can actually do
this in the kernel.

For small directories which could potentially get converted into htree
format, we already sucking the entire directory and putting it into an
rbtree. We could just do this for all directories less than or equal
to 32k, but have them returned sorted by inode instead of by hash
value. At least on my laptop, this accounts for 99.93% of the
directories on my root filesystem.

There are some fancy things that would have to be done to make
telldir/seekdir, but the basic idea is pretty simple.

- Ted

2008-08-28 22:24:08

by Andreas Dilger

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

On Aug 28, 2008 16:35 -0400, Theodore Ts'o wrote:
> Yeah, but that requires dealing with Ulrich and for my own mental
> health I try to avoid that as much as possible. :-)
>
> This idea is something that has been in my "if only I had time or some
> minions to dispatch" category for quite some time. We can actually do
> this in the kernel.
>
> For small directories which could potentially get converted into htree
> format, we already sucking the entire directory and putting it into an
> rbtree. We could just do this for all directories less than or equal
> to 32k, but have them returned sorted by inode instead of by hash
> value. At least on my laptop, this accounts for 99.93% of the
> directories on my root filesystem.

What happens if the directory is grown at that point? I thought the
reason for keeping it sorted in hash order was to deal with the
telldir headache? I guess if the whole thing is in memory then it
can be attached to the fd and discarded once read or seeked-on
(and POSIX doesn't require reporting new entries after the start
of the read).

Doing this at the VFS level would also benefit _most_ filesystems,
though maybe not ones like XFS or btrfs that have their own preferred
order.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-08-28 22:34:57

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

On Thu, Aug 28, 2008 at 04:23:48PM -0600, Andreas Dilger wrote:
> > For small directories which could potentially get converted into htree
> > format, we already sucking the entire directory and putting it into an
> > rbtree. We could just do this for all directories less than or equal
> > to 32k, but have them returned sorted by inode instead of by hash
> > value. At least on my laptop, this accounts for 99.93% of the
> > directories on my root filesystem.
>
> What happens if the directory is grown at that point? I thought the
> reason for keeping it sorted in hash order was to deal with the
> telldir headache? I guess if the whole thing is in memory then it
> can be attached to the fd and discarded once read or seeked-on
> (and POSIX doesn't require reporting new entries after the start
> of the read).

It's fine, because according to POSIX it's undefined what happens to
files that are created or deleted after the last opendir() or
rewindir(). So basically, the b-tree is attached to the opendir, and
we discard it and re-create it if we ever seek to the beginning of the
directory. This logic is already in the kernel, since it's exactly
the same situation if the file is grown past the point where it
becomes an HTREE directory.

> Doing this at the VFS level would also benefit _most_ filesystems,
> though maybe not ones like XFS or btrfs that have their own preferred
> order.

Possibly, but it's a lot easier inside ext3/ext4 because we have most
of the machinery in place and in use already.

- Ted

2008-08-28 23:14:25

by Andreas Dilger

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

On Aug 28, 2008 18:34 -0400, Theodore Ts'o wrote:
> It's fine, because according to POSIX it's undefined what happens to
> files that are created or deleted after the last opendir() or
> rewindir(). So basically, the b-tree is attached to the opendir, and
> we discard it and re-create it if we ever seek to the beginning of the
> directory.

... presumably only if the directory has been modified in the meantime?
It seems like this is missing from the ext3_dx_readdir() code, and could
be easily achieved by checking inode->i_version vs. filp->f_version or
similar. It looks like this is checked later on, but by that time we've
already discarded everything.


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-08-29 14:18:05

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

Just for completeness's sake, can you apply the following patch to
dump, and then try doing a benchmark run using dump on ext4?

I'm curious how it would compare.

- Ted

Index: dump-0.4b41/dump/traverse.c
===================================================================
--- dump-0.4b41.orig/dump/traverse.c
+++ dump-0.4b41/dump/traverse.c
@@ -157,14 +157,6 @@ int dump_fs_open(const char *disk, ext2_
retval = EXT2_ET_UNSUPP_FEATURE;
ext2fs_close(*fs);
}
- else if ((retval = es->s_feature_incompat &
- ~(EXT2_LIB_FEATURE_INCOMPAT_SUPP |
- EXT3_FEATURE_INCOMPAT_RECOVER))) {
- msg("Unsupported feature(s) 0x%x in filesystem\n",
- retval);
- retval = EXT2_ET_UNSUPP_FEATURE;
- ext2fs_close(*fs);
- }
else {
if (es->s_feature_compat &
EXT3_FEATURE_COMPAT_HAS_JOURNAL &&

2008-08-29 12:37:09

by Eric Sandeen

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

Eric Sandeen wrote:
> I was talking to Ric about dump benchmarks, and he was of the impression
> that dump may not be used that often anymore, at least in the
> enterprise. (Ric, hope I'm paraphrasing correctly)
>
> Undaunted :) I ran off and tested an artificial backup scenario:
>
> * Untar a kernel tree into 128 different top level dirs
> * Make a level 0 backup
> * Untar a kernel tree into 128 MORE different top level dirs
> * Make a level 1 backup
>
> 128 kernel trees uses about 6.5M inodes, and about 80G of space.
>
> I tested ext3 with dump; ext4 with tar, and xfs with xfsdump.
>
> for ext3:
> dump -1 -u -f $DUMPDIR/dump1 $DATADIR
>
> for ext4:
> tar --atime-preserve --xattr --after-date=$DUMPDIR/dump0.tar -cf
> $DUMPDIR/dump1.tar $DATADIR
>
> for xfs:
> xfsdump -F -l 1 -f $DUMPDIR/dump0 $DATADIR
>
> DUMPDIR and DATADIR were 2 partitions on the same (fast hardware) lun.
>
> Results:

at Ric & hch's request here is tar on the other fs's as well, re-sorted
by level 0 dump time. I put acp into the mix as well.

Oh, and this time I remembered to set the elevator to something sane
(noop) for this storage, oops (was cfq last time)

Also, this time the dup/tar/acp was written to /dev/null rather than
another filesystem. Interesting how routing to /dev/null alone changed
the ranking quite a bit.

level0 level1
====== ======
ext4-acp 12m22s ------
ext3-acp 14m11s ------
ext4-tar 18m24s 34m56s
ext3-dump 19m30s 35m30s
xfs-dump 20m07s 40m24s
ext3-tar 21m16s 42m41s
xfs-tar 21m19s 46m13s
xfs-acp 29m38s ------

-Eric

2008-08-29 18:16:31

by Eric Sandeen

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

Theodore Tso wrote:
> Just for completeness's sake, can you apply the following patch to
> dump, and then try doing a benchmark run using dump on ext4?
>
> I'm curious how it would compare.
>
> - Ted
>
> Index: dump-0.4b41/dump/traverse.c
> ===================================================================
> --- dump-0.4b41.orig/dump/traverse.c
> +++ dump-0.4b41/dump/traverse.c
> @@ -157,14 +157,6 @@ int dump_fs_open(const char *disk, ext2_
> retval = EXT2_ET_UNSUPP_FEATURE;
> ext2fs_close(*fs);
> }
> - else if ((retval = es->s_feature_incompat &
> - ~(EXT2_LIB_FEATURE_INCOMPAT_SUPP |
> - EXT3_FEATURE_INCOMPAT_RECOVER))) {
> - msg("Unsupported feature(s) 0x%x in filesystem\n",
> - retval);
> - retval = EXT2_ET_UNSUPP_FEATURE;
> - ext2fs_close(*fs);
> - }
> else {
> if (es->s_feature_compat &
> EXT3_FEATURE_COMPAT_HAS_JOURNAL &&

Yep, will do. Should have included it in the run (in last, in case
anything exploded...) :)

I'm traveling but will give it a shot soon.

-Eric

2008-08-31 02:43:13

by Andreas Dilger

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

On Aug 29, 2008 07:36 -0500, Eric Sandeen wrote:
> at Ric & hch's request here is tar on the other fs's as well, re-sorted
> by level 0 dump time. I put acp into the mix as well.
>
> Oh, and this time I remembered to set the elevator to something sane
> (noop) for this storage, oops (was cfq last time)
>
> Also, this time the dup/tar/acp was written to /dev/null rather than
> another filesystem. Interesting how routing to /dev/null alone changed
> the ranking quite a bit.

Note that tar has a (questionable) optimization when writing to /dev/null.
It will NOT open the file or read the data, and just do the filename
traversal to generate the file list and total file size. It does this by
comparing the output file to "/dev/null":


$ strace tar cvf /dev/null /t# mp/scheduler.pdf
:
[opening libs and stuff]
:
creat("/dev/null", 0666) = 3
fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
write(1, "/tmp/scheduler.pdf\n", 19/tmp/scheduler.pdf
) = 19
lstat("/tmp/scheduler.pdf", {st_mode=S_IFREG|0664, st_size=150262, ...}) = 0
close(3) = 0
close(1) = 0
munmap(0x7f2c00dd7000, 4096) = 0
close(2) = 0
exit_group(0) = ?



A way to get around this is to hard link to /dev/null and use that as output:

# cp -a /dev/null /tmp/foo
$ ls -l /tmp/foo
0 crw-rw-rw- 1 root root 1, 3 2008-08-18 18:15 /tmp/foo
$ strace tar cvf /dev/null /t# mp/scheduler.pdf
:
[opening libs and stuff]
:
creat("/tmp/foo", 0666) = 3
fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
stat("/dev/null", {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
lstat("/tmp/scheduler.pdf", {st_mode=S_IFREG|0664, st_size=150262, ...}) = 0
open("/tmp/scheduler.pdf", O_RDONLY) = 4
write(1, "/tmp/scheduler.pdf\n", 19/tmp/scheduler.pdf) = 19
read(4, "%PDF-1.4\n%\303\244\303\274\303\266\303\237\n2 0 obj\n<</Le"..., 9728) = 9728
write(3, "tmp/scheduler.pdf\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 10240) = 10240
read(4, "\5\351\204\216\220N\224\304\344\331\355\366\264\216\311n\345\365\224\370\352.\205\354\341\0237\306\31>\201"..., 10240) = 10240
:
[repeats]
:
write(3, "\253\2708\264\262\275\36\335\323\212\262\33\244\301\226\246\256d^\214n\246\220BW\363\306\25b%_L"..., 10240) = 10240
read(4, "i\264\364\243\264\220\327+u\3B\1\21\210\376\23\[email protected]\316t\356\271\v\37\210S\307]["..., 7414) = 7414
fstat(4, {st_mode=S_IFREG|0664, st_size=150262, ...}) = 0
close(4) = 0
write(3, "i\264\364\243\264\220\327+u\3B\1\21\210\376\23\[email protected]\316t\356\271\v\37\210S\307]["..., 10240) = 10240
close(3) = 0
close(1) = 0
munmap(0x7fb194cea000, 4096) = 0
close(2) = 0
exit_group(0) = ?



Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-09-02 15:53:41

by Eric Sandeen

[permalink] [raw]
Subject: Re: Do we need dump for ext4?

Andreas Dilger wrote:
> On Aug 29, 2008 07:36 -0500, Eric Sandeen wrote:
>> at Ric & hch's request here is tar on the other fs's as well, re-sorted
>> by level 0 dump time. I put acp into the mix as well.
>>
>> Oh, and this time I remembered to set the elevator to something sane
>> (noop) for this storage, oops (was cfq last time)
>>
>> Also, this time the dup/tar/acp was written to /dev/null rather than
>> another filesystem. Interesting how routing to /dev/null alone changed
>> the ranking quite a bit.
>
> Note that tar has a (questionable) optimization when writing to /dev/null.
> It will NOT open the file or read the data, and just do the filename
> traversal to generate the file list and total file size. It does this by
> comparing the output file to "/dev/null":

Ah, crud.

-Eric