Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* next (Sun Oct 21 21:03:14 2007 -0500) 6 commits
(plus those commits pulled in from the topics branches, below)
+ check_ea_in_inode: Cleanup e2fsck_write_inode_full() call
+ blkid.pc, ss.pc: Move private libs from "Libs:" to "Libs.private:"
+ texinfo: Fix directory entries
+ blkid/test_probe.in: Fix temporary files handling
+ Check fgets(3) return value
+ e2image: Fix potential off-by-one fs_device_name buffer overflow
+ Use blk_t type instead of unsigned int or __u32
I'm using a slightly different system than git, where new
minor commits are going into the "next" branch, and where
master is always a strict subset of next. (With git topic
branches get pulled into master, and some branches live in
next for a long time. In contrast, for now I'm only pulling
branches into 'next' when I'm reasonably certain that they can
go into 'master' fairly quickly.) This is an experiment, but
if it's too much trouble to keep 'next' close to 'master', I
might move over to the 'git' scheme.
In any case, everything in 'next' will proably go into
'master' fairly quickly. Pleas test it and let me know if you
find any problems!
* tt/badblocks-cleanup (Mon Oct 22 10:19:20 2007 -0400) 3 commits
+ badblocks: Factor out calls to strtoul to a helper function
+ badblocks: Change unsigned long to blk_t and unsigned int, as
appropriate
+ badblocks: Use unsigned int instead of unsigned long for test
patterns
This branch has been completely moved into 'next', and is
ready to graduate to 'master'.
* tt/chattr-error-code (Mon Oct 22 08:51:39 2007 -0400) 3 commits
+ chattr: provide an exit code in case of failure and add -f flag
+ libe2p: Change iterate_on_dir so that it counts non-zero returns
+ libe2p: Use lstat() instead of stat() in fsetflags() and
fgetflags()
This branch has been completely moved into 'next', and is
ready to graduate to 'master'.
* js/uninit (Sun Oct 21 21:04:24 2007 -0500) 14 commits
- Add m_uninit test case.
- Add new mm_lazy test case.
- Fix test cases.
- Update uninit block group documetation for some of the utilities.
- Make e2fsck uninit block group aware.
- Make debugfs uninit block group aware.
- Make resize2fs uninit block group aware.
- Make dumpe2fs uninit block group aware.
- Make tune2fs uninit block group aware.
- Add support for creating filesystems using uninit block group.
- Rename feature name from gdt_checksum to uninit_groups.
- Add uninit block group support on libe2fs.
- Add initial checksum support.
+ Reorder some of the $(SRCS) in alphabetical order.
This still doesn't pass "make check" in lib/ext2fs, but I'll
take a closer look at this in the near future.
* tt/64bit-bitmaps (Sun Oct 14 22:51:51 2007 -0400) 1 commit
- Initial design for 64-bit bitmaps
I haven't done anything with this yet; hopefully, I'll have
more time in the next two weeks.
* cl/remove-masix (Thu Aug 23 15:09:03 2007 +0800) 5 commits
+ mke2fs: remove masix support
+ e2fsck: remove masix support
+ libext2fs: remove masix support
+ document: remove masix from data structure
+ debugfs: remove masix support
This branch has been completely moved into 'next', and is
ready to graduate to 'master'.
* tt/extents (Mon Aug 20 21:31:11 2007 -0400) 5 commits
- e2fsck: Add support for extents
- Add support for extents to libext2fs
- e2fsck: factor out code to clear an inode into
e2fsck_clear_inode()
- Don't byte swap extents information in the inode
- Allow debugfs to be extended for use by test programs
I haven't done anything with this yet; hopefully, I'll have
more time in the next two weeks.
* js/flex-bg (Mon Aug 13 23:33:14 2007 -0500) 4 commits
- New bitmap and inode table allocation for FLEX_BG
+ Enable FLEX_BG feature support
+ Relax group descriptor checking for FLEX_BG
+ Reserve the INCOMPAT feature number for FLEX_BG.
The basic flex-bg support has been merged into 'master', but
the allocation changes need some more testing, especially for
resize2fs.
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
- e2fsprogs: Add test case for undoe2fs
- e2fsprogs: Fix the resize inode test case
- e2fsprogs: Support for large inode migration.
- e2fsprogs: Make mke2fs use undo I/O manager.
- e2fsprogs: Add undoe2fs
- e2fsprogs: Add undo I/O manager
I just need to review these changes more. It's on my queue...
* ad/extents-testcases (Thu Jul 12 11:20:14 2007 -0400) 19 commits
. Add extent test: f_extents_shrd_blk
. Add extent test: f_extents_unsorted
. Add extent test: f_extents_res_blk
. Add extent test: f_extents_overlap
. Add extent test: f_extents_orphan_blks
. Add extent test: f_extents_inrlevel-incons
. Add extent test: f_extents_imbalanced_tree
. Add extent test: f_extents_ei_leaf
. Add extent test: f_extents_ei_block
. Add extent test: f_extents_eh_max
. Add extent test: f_extents_eh_magic
. Add extent test: f_extents_eh_entries
. Add extent test: f_extents_eh_depth
. Add extent test: f_extents_ee_start
. Add extent test: f_extents_ee_len
. Add extent test: f_extents_ee_block
. Add extent test: f_extents_bad_blk
. Add extent test: f_extents
. Add extent test: f_bad_disconnected_inode
These are parked until the extents support is more mature. As
the extents code is better suited, these tests will get pulled
into the extents branch.
On Nov 04, 2007 17:42 -0500, Theodore Ts'o wrote:
> * js/uninit (Sun Oct 21 21:04:24 2007 -0500) 14 commits
> - Add m_uninit test case.
> - Add new mm_lazy test case.
> - Fix test cases.
Do you have all of the f_uninit test cases we made?
f_uninit_bad_free_inodes
f_uninit_blk_used_not_set
f_uninit_checksum_bad
f_uninit_disable
f_uninit_enable
f_uninit_inode_past_unused
f_uninit_last_uninit
f_uninit_set_inode_not_set
> - Update uninit block group documetation for some of the utilities.
> - Make e2fsck uninit block group aware.
> - Make debugfs uninit block group aware.
> - Make resize2fs uninit block group aware.
> - Make dumpe2fs uninit block group aware.
> - Make tune2fs uninit block group aware.
> - Add support for creating filesystems using uninit block group.
> - Rename feature name from gdt_checksum to uninit_groups.
> - Add uninit block group support on libe2fs.
> - Add initial checksum support.
> + Reorder some of the $(SRCS) in alphabetical order.
>
> This still doesn't pass "make check" in lib/ext2fs, but I'll
> take a closer look at this in the near future.
That's a bit of a surprise, since the e2fsprogs-uninit patches in
our own CVS repo pass "make check".
Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
On Mon, Nov 05, 2007 at 10:15:33AM +0800, Andreas Dilger wrote:
> On Nov 04, 2007 17:42 -0500, Theodore Ts'o wrote:
> > * js/uninit (Sun Oct 21 21:04:24 2007 -0500) 14 commits
> > - Add m_uninit test case.
> > - Add new mm_lazy test case.
> > - Fix test cases.
>
> Do you have all of the f_uninit test cases we made?
>
> f_uninit_bad_free_inodes
> f_uninit_blk_used_not_set
> f_uninit_checksum_bad
> f_uninit_disable
> f_uninit_enable
> f_uninit_inode_past_unused
> f_uninit_last_uninit
> f_uninit_set_inode_not_set
Nope, it wasn't in the patches that Jose sent out. I assume they're
in the CFS e2fsprogs?
> > This still doesn't pass "make check" in lib/ext2fs, but I'll
> > take a closer look at this in the near future.
>
> That's a bit of a surprise, since the e2fsprogs-uninit patches in
> our own CVS repo pass "make check".
making check in lib/ext2fs
make[1]: Entering directory `/usr/projects/e2fsprogs/e2fsprogs/build/lib/ext2fs'
CC ../../../lib/ext2fs/tst_bitops.c
LD tst_bitops
CC ../../../lib/ext2fs/tst_badblocks.c
LD tst_badblocks
rw_bitmaps.o: In function `read_bitmaps':
/usr/projects/e2fsprogs/e2fsprogs/build/lib/ext2fs/../../../lib/ext2fs/rw_bitmaps.c:237: undefined reference to `ext2fs_group_desc_csum_verify'
/usr/projects/e2fsprogs/e2fsprogs/build/lib/ext2fs/../../../lib/ext2fs/rw_bitmaps.c:260: undefined reference to `ext2fs_group_desc_csum_verify'
collect2: ld returned 1 exit status
I'm pretty sure it's just a Makefile.in screwup; I just haven't had
time to seriously look at the uninit patch series yet.
- Ted
On Nov 04, 2007 23:32 -0500, Theodore Tso wrote:
> On Mon, Nov 05, 2007 at 10:15:33AM +0800, Andreas Dilger wrote:
> > On Nov 04, 2007 17:42 -0500, Theodore Ts'o wrote:
> > > * js/uninit (Sun Oct 21 21:04:24 2007 -0500) 14 commits
> > > - Add m_uninit test case.
> > > - Add new mm_lazy test case.
> > > - Fix test cases.
> >
> > Do you have all of the f_uninit test cases we made?
> >
> > f_uninit_bad_free_inodes
> > f_uninit_blk_used_not_set
> > f_uninit_checksum_bad
> > f_uninit_disable
> > f_uninit_enable
> > f_uninit_inode_past_unused
> > f_uninit_last_uninit
> > f_uninit_set_inode_not_set
>
> Nope, it wasn't in the patches that Jose sent out. I assume they're
> in the CFS e2fsprogs?
Yes. You can get our full patch series at:
ftp://ftp.lustre.org/pub/lustre/other/e2fsprogs/e2fsprogs-1.40.2-cfs1.patches.tgz
Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* js/uninit (Sun Oct 21 21:04:24 2007 -0500) 14 commits
- Add m_uninit test case.
- Add new mm_lazy test case.
- Fix test cases.
- Update uninit block group documetation for some of the utilities.
- Make e2fsck uninit block group aware.
- Make debugfs uninit block group aware.
- Make resize2fs uninit block group aware.
- Make dumpe2fs uninit block group aware.
- Make tune2fs uninit block group aware.
- Add support for creating filesystems using uninit block group.
- Rename feature name from gdt_checksum to uninit_groups.
- Add uninit block group support on libe2fs.
- Add initial checksum support.
+ Reorder some of the $(SRCS) in alphabetical order.
The uninit patch set is the next major set on my "to-merge"
list. I've fixed this patch set so that "make check" works.
* tt/64bit-bitmaps (Sun Oct 14 22:51:51 2007 -0400) 1 commit
- Initial design for 64-bit bitmaps
No change; still needs work, obviously.
* tt/extents (Mon Aug 20 21:31:11 2007 -0400) 5 commits
- e2fsck: Add support for extents
- Add support for extents to libext2fs
- e2fsck: factor out code to clear an inode into
e2fsck_clear_inode()
- Don't byte swap extents information in the inode
- Allow debugfs to be extended for use by test programs
I've started adding some additional checking into e2fsck, but
it still needs work.
* js/flex-bg (Mon Aug 13 23:33:14 2007 -0500) 1 commit
- New bitmap and inode table allocation for FLEX_BG
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
- e2fsprogs: Add test case for undoe2fs
- e2fsprogs: Fix the resize inode test case
- e2fsprogs: Support for large inode migration.
- e2fsprogs: Make mke2fs use undo I/O manager.
- e2fsprogs: Add undoe2fs
- e2fsprogs: Add undo I/O manager
* ad/extents-testcases (Thu Jul 12 11:20:14 2007 -0400) 19 commits
. Add extent test: f_extents_shrd_blk
. Add extent test: f_extents_unsorted
. Add extent test: f_extents_res_blk
. Add extent test: f_extents_overlap
. Add extent test: f_extents_orphan_blks
. Add extent test: f_extents_inrlevel-incons
. Add extent test: f_extents_imbalanced_tree
. Add extent test: f_extents_ei_leaf
. Add extent test: f_extents_ei_block
. Add extent test: f_extents_eh_max
. Add extent test: f_extents_eh_magic
. Add extent test: f_extents_eh_entries
. Add extent test: f_extents_eh_depth
. Add extent test: f_extents_ee_start
. Add extent test: f_extents_ee_len
. Add extent test: f_extents_ee_block
. Add extent test: f_extents_bad_blk
. Add extent test: f_extents
. Add extent test: f_bad_disconnected_inode
These tests haven't been merged in yet, pending improvement of
the extents patches.
On Dec 17, 2007 12:11 -0500, Theodore Tso wrote:
> * ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
> - e2fsprogs: Make mke2fs use undo I/O manager.
Did you see Eric's report that using the undo manager for mke2fs caused the
performance to completely tank? There is already enough memory pressure
caused by a regular mke2fs that having to save the blocks into tdb for a
large filesystem makes it unbearably slow.
We had also wanted to move from using db4 to tdb for the Lustre lfsck data
(collection of EA information for distributed fsck) but even at 10000 files
the tdb performance was growing exponentially slower than db4 and we gave up.
I suspect the same problem hits undo manager when the number of blocks to
save is very high.
I think it might be viable to use undo manager for mke2fs when e.g.
uninit_groups and lazy_bg are enabled, since the amount of device IO is
very small then. Otherwise, it may be that undo manager should be reserved
for e2fsck right now, or possibly reworked to just do IO to a sparse file
similar to doing an e2image before the mke2fs.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
On Mon, Dec 17, 2007 at 03:34:55PM -0700, Andreas Dilger wrote:
> Did you see Eric's report that using the undo manager for mke2fs caused the
> performance to completely tank? There is already enough memory pressure
> caused by a regular mke2fs that having to save the blocks into tdb for a
> large filesystem makes it unbearably slow.
Yup, I saw that, and had come to the same consluion (only use it when
uninit_groups/lazy_bg is enabled).
> We had also wanted to move from using db4 to tdb for the Lustre lfsck data
> (collection of EA information for distributed fsck) but even at 10000 files
> the tdb performance was growing exponentially slower than db4 and we gave up.
> I suspect the same problem hits undo manager when the number of blocks to
> save is very high.
Hm. I was very concerned about using db4, mainly because of the ABI
and on-disk format compatibility nightmare, which is why I chose tdb.
But the performance problems are starting to make me worry. Do you
know how many tdb entries you had before tdb performance started going
really badly down the toilet? I wonder if there are some tuning knobs
we could tweak to the performance numbers.
- Ted
On Dec 17, 2007 17:59 -0500, Theodore Tso wrote:
> On Mon, Dec 17, 2007 at 03:34:55PM -0700, Andreas Dilger wrote:
> > We had also wanted to move from using db4 to tdb for the Lustre lfsck data
> > (collection of EA information for distributed fsck) but even at 10000 files
> > the tdb performance was growing exponentially slower than db4 and we gave up.
> > I suspect the same problem hits undo manager when the number of blocks to
> > save is very high.
>
> Hm. I was very concerned about using db4, mainly because of the ABI
> and on-disk format compatibility nightmare, which is why I chose tdb.
Yes, we have had all sorts of compatibility problems using db4 (e.g.
RHEL and SLES ship different package names, put the libraries and headers
in different locations, don't support overlapping sets of db4 libraries
between releases, etc), which is why we were hoping to be able to use tdb.
> But the performance problems are starting to make me worry. Do you
> know how many tdb entries you had before tdb performance started going
> really badly down the toilet? I wonder if there are some tuning knobs
> we could tweak to the performance numbers.
There is some test data at https://bugzilla.lustre.org/attachment.cgi?id=13924
which is a PDF file. This shows 1000 items is reasonable, and 10000 is not.
The majority of the time is taken looking up existing entries, and this
is due to one unusual requirement of the Lustre usage to be notified
of duplicate insertions to detect duplicate use of objects, so this may
have been a major factor in the slowdown. It isn't really practical to
use a regular libext2fs bitmap for our case, since the collision space
is a 64-bit integer, but maybe we could have done this with an RB tree
or some other mechanism.
So, your mileage may vary with the undo manager usage, but it is
definitely worth writing a test case (e.g. time creation of filesystems of
progressively larger size on a large device) and seeing how bad it gets.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
On Mon, Dec 17, 2007 at 04:36:34PM -0700, Andreas Dilger wrote:
> > But the performance problems are starting to make me worry. Do you
> > know how many tdb entries you had before tdb performance started going
> > really badly down the toilet? I wonder if there are some tuning knobs
> > we could tweak to the performance numbers.
>
> There is some test data at
> https://bugzilla.lustre.org/attachment.cgi?id=13924 which is a PDF
> file. This shows 1000 items is reasonable, and 10000 is not.
I did some research, and the problem is that tdb uses a fixed number
of buckets for its hash size. By default it is 131 hash buckets, but
you can pass in a different hash size when you create the tdb table.
So with 10,000 items, you will have an average of 76 objects per hash
chain, and that doesn't work terribly well, obviously. Berkdb's hash
method uses an extensible hashing system which increases number of
bits that are used in the hash, and breaks up the hash buckets as they
get too big, which is a much nicer self-tuning algorithm. With tdb,
you need to know from the get-go how much stuff you're likely going to
be storing in the tdb system, and adjust your hash size accordingly.
> The majority of the time is taken looking up existing entries, and this
> is due to one unusual requirement of the Lustre usage to be notified
> of duplicate insertions to detect duplicate use of objects, so this may
> have been a major factor in the slowdown. It isn't really practical to
> use a regular libext2fs bitmap for our case, since the collision space
> is a 64-bit integer, but maybe we could have done this with an RB tree
> or some other mechanism.
Well, if you only need an in-core data structure, and it doesn't need
to be stored on disk, have you looked at e2fsck/dict.c, which was
lifted from Kazlib? It's a userspace, single file, in-memory only RB
tree implementation.
Regards,
- Ted
* Theodore Tso:
> Hm. I was very concerned about using db4, mainly because of the ABI
> and on-disk format compatibility nightmare, which is why I chose tdb.
If you don't use the transactional data store, the B-tree disk format
hasn't changed in years. The API is fairly stable, too (not the ABI,
of course, because it's deliberately tied to a particular version by
most distros).
The main issues I see with Berkeley DB are: it assumes atomic page
writes (even in transactional data store mode), it writes database
files in a way that maximizes file-system level fragmentation, and
sequential scans are unbearably slow. Oh, and most applications using
TDS do not handle recovery and upgrades properly. (For them, SQLite
would probably have been a better choice. 8-/)
--
Florian Weimer <[email protected]>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstra?e 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
On Mon, 17 Dec 2007 12:11:00 -0500
Theodore Tso <[email protected]> wrote:
> Here are the topics that have been cooking. Commits prefixed
> with '-' are only in 'pu' while commits prefixed with '+' are
> in 'next'. The topics list the commits in reverse chronological
> order.
...
> * js/flex-bg (Mon Aug 13 23:33:14 2007 -0500) 1 commit
> - New bitmap and inode table allocation for FLEX_BG
I've started fixing this patch in order to address resize2fs problems
with the previous patch. I've got a patch that seems to do the right
thing now. Let me know if you agree with the general approach of the
patch and I'll fix it to put it in shape to get it added to the next
branch. Still incomplete, but there is enough for you to let me know
if you like this approach better than the previous patch.
-JRS
commit 37570ae6196045ce02a25f6c95fbdd103633bfb5
Author: Jose R. Santos <[email protected]>
Date: Sat Dec 15 08:09:35 2007 -0600
New bitmap and inode table allocation for FLEX_BG
Signed-off-by: Jose R. Santos <[email protected]>
diff --git a/lib/ext2fs/alloc_tables.c b/lib/ext2fs/alloc_tables.c
index 4ad2ba9..598a360 100644
--- a/lib/ext2fs/alloc_tables.c
+++ b/lib/ext2fs/alloc_tables.c
@@ -27,18 +27,55 @@
#include "ext2_fs.h"
#include "ext2fs.h"
+blk_t ext2fs_flexbg_offset(ext2_filsys fs, dgrp_t group, int flexbg_size,
+ ext2fs_block_bitmap bmap, int offset, int size)
+{
+ int flexbg;
+ errcode_t retval;
+ blk_t start_blk, last_blk, first_free = 0;
+ dgrp_t last_grp;
+
+ flexbg = group/flexbg_size;
+
+ last_grp = (group + (flexbg_size - (group % flexbg_size) - 1));
+ start_blk = ext2fs_group_first_block(fs, flexbg_size * flexbg);
+ last_blk = ext2fs_group_last_block(fs, last_grp);
+ if (last_grp > fs->group_desc_count)
+ last_grp = fs->group_desc_count;
+
+ if (ext2fs_get_free_blocks(fs, start_blk, last_blk, 1, bmap,
+ &first_free))
+ return first_free;
+
+ if (ext2fs_get_free_blocks(fs, first_free + offset, last_blk, size,
+ bmap, &first_free))
+ return first_free;
+
+ return first_free;
+}
+
errcode_t ext2fs_allocate_group_table(ext2_filsys fs, dgrp_t group,
ext2fs_block_bitmap bmap)
{
errcode_t retval;
blk_t group_blk, start_blk, last_blk, new_blk, blk;
- int j;
+ dgrp_t last_grp;
+ int j, rem_grps, flexbg_size = 0;
group_blk = ext2fs_group_first_block(fs, group);
last_blk = ext2fs_group_last_block(fs, group);
if (!bmap)
bmap = fs->block_map;
+
+ if (EXT2_HAS_INCOMPAT_FEATURE (fs->super,
+ EXT4_FEATURE_INCOMPAT_FLEX_BG)) {
+ flexbg_size = ext2fs_swab16(fs->super->s_flex_bg_size);
+ rem_grps = flexbg_size - (group % flexbg_size);
+ last_grp = group + rem_grps - 1;
+ if (last_grp > fs->group_desc_count)
+ last_grp = fs->group_desc_count;
+ }
/*
* Allocate the block and inode bitmaps, if necessary
@@ -56,6 +93,12 @@ errcode_t ext2fs_allocate_group_table(ext2_filsys fs, dgrp_t group,
} else
start_blk = group_blk;
+ if (flexbg_size) {
+ start_blk = ext2fs_flexbg_offset (fs, group, flexbg_size, bmap,
+ 0, rem_grps);
+ last_blk = ext2fs_group_last_block(fs, last_grp);
+ }
+
if (!fs->group_desc[group].bg_block_bitmap) {
retval = ext2fs_get_free_blocks(fs, start_blk, last_blk,
1, bmap, &new_blk);
@@ -68,6 +111,12 @@ errcode_t ext2fs_allocate_group_table(ext2_filsys fs, dgrp_t group,
fs->group_desc[group].bg_block_bitmap = new_blk;
}
+ if (flexbg_size) {
+ start_blk = ext2fs_flexbg_offset (fs, group, flexbg_size, bmap,
+ flexbg_size, rem_grps);
+ last_blk = ext2fs_group_last_block(fs, last_grp);
+ }
+
if (!fs->group_desc[group].bg_inode_bitmap) {
retval = ext2fs_get_free_blocks(fs, start_blk, last_blk,
1, bmap, &new_blk);
@@ -83,6 +132,13 @@ errcode_t ext2fs_allocate_group_table(ext2_filsys fs, dgrp_t group,
/*
* Allocate the inode table
*/
+ if (flexbg_size) {
+ group_blk = ext2fs_flexbg_offset (fs, group, flexbg_size, bmap,
+ flexbg_size * 2,
+ fs->inode_blocks_per_group * rem_grps);
+ last_blk = ext2fs_group_last_block(fs, last_grp);
+ }
+
if (!fs->group_desc[group].bg_inode_table) {
retval = ext2fs_get_free_blocks(fs, group_blk, last_blk,
fs->inode_blocks_per_group,
@@ -112,6 +168,7 @@ errcode_t ext2fs_allocate_tables(ext2_filsys fs)
if (retval)
return retval;
}
+
return 0;
}
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index 2394857..7528707 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -577,7 +577,9 @@ struct ext2_super_block {
__u16 s_mmp_interval; /* # seconds to wait in MMP checking */
__u64 s_mmp_block; /* Block for multi-mount protection */
__u32 s_raid_stripe_width; /* blocks on all data disks (N*stride)*/
- __u32 s_reserved[163]; /* Padding to the end of the block */
+ __u16 s_flex_bg_size; /* FLEX_BG group size */
+ __u16 padding; /* Padding to next 32bits */
+ __u32 s_reserved[162]; /* Padding to the end of the block */
};
/*
diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
index 16e9eaa..935a013 100644
--- a/lib/ext2fs/initialize.c
+++ b/lib/ext2fs/initialize.c
@@ -156,6 +156,7 @@ errcode_t ext2fs_initialize(const char *name, int flags,
set_field(s_feature_incompat, 0);
set_field(s_feature_ro_compat, 0);
set_field(s_first_meta_bg, 0);
+ set_field(s_flex_bg_size, 0);
if (super->s_feature_incompat & ~EXT2_LIB_FEATURE_INCOMPAT_SUPP) {
retval = EXT2_ET_UNSUPP_FEATURE;
goto cleanup;
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 9e2d7a8..7319832 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -96,7 +96,7 @@ static void usage(void)
{
fprintf(stderr, _("Usage: %s [-c|-t|-l filename] [-b block-size] "
"[-f fragment-size]\n\t[-i bytes-per-inode] [-I inode-size] "
- "[-j] [-J journal-options]\n"
+ "[-j] [-J journal-options] [-G meta group size]\n"
"\t[-N number-of-inodes] [-m reserved-blocks-percentage] "
"[-o creator-os]\n\t[-g blocks-per-group] [-L volume-label] "
"[-M last-mounted-directory]\n\t[-O feature[,...]] "
@@ -912,6 +912,7 @@ static void PRS(int argc, char *argv[])
int blocksize = 0;
int inode_ratio = 0;
int inode_size = 0;
+ unsigned long flex_bg_size = 0;
double reserved_ratio = 5.0;
int sector_size = 0;
int show_version_only = 0;
@@ -994,7 +995,7 @@ static void PRS(int argc, char *argv[])
}
while ((c = getopt (argc, argv,
- "b:cf:g:i:jl:m:no:qr:s:tvE:FI:J:L:M:N:O:R:ST:V")) != EOF) {
+ "b:cf:g:G:i:jl:m:no:qr:s:tvE:FI:J:L:M:N:O:R:ST:V")) != EOF) {
switch (c) {
case 'b':
blocksize = strtol(optarg, &tmp, 0);
@@ -1045,6 +1046,19 @@ static void PRS(int argc, char *argv[])
exit(1);
}
break;
+ case 'G':
+ flex_bg_size = strtoul(optarg, &tmp, 0);
+ if (*tmp) {
+ com_err(program_name, 0,
+ _("Illegal number for Flex_BG size"));
+ exit(1);
+ }
+ if ((flex_bg_size & (flex_bg_size-1)) != 0) {
+ com_err(program_name, 0,
+ _("Flex_BG size must be a power of 2"));
+ exit(1);
+ }
+ break;
case 'i':
inode_ratio = strtoul(optarg, &tmp, 0);
if (inode_ratio < EXT2_MIN_BLOCK_SIZE ||
@@ -1444,6 +1458,10 @@ static void PRS(int argc, char *argv[])
}
}
+ if(flex_bg_size) {
+ fs_param.s_flex_bg_size = ext2fs_swab16(flex_bg_size);
+ }
+
if (!force && fs_param.s_blocks_count >= ((unsigned) 1 << 31)) {
com_err(program_name, 0,
_("Filesystem too large. No more than 2**31-1 blocks\n"
I'ts been a while since I've pushed one of these out. This updates pu
against the latest "next" branch, so we can make sure all of the
changes that had been folded into "maint" is included on the pu
branch. It also includes more work on the tt/extents patches.
Tune2fs and e2image are now functional with extents, and e2fsck has
more extents checking added. Still a bit more work to be done there,
and I also have to add code to deal with symlinks that have the
EXTENTS_FL flag set.
I've dropped the js/flex-bg branch since it had conflicts with the
on-disk bg flags, and it had conflicts with the exsting patches that
caused me concern. Jose, once you regenerate the patch and deal with
comments, I'll add it back in.
My TODO list:
finish e2fsck extents support
more flexible mke2fs.conf handling
EXT4 -- allow mke2fs 8TB without -f
reserve a few extra resize blocks for small filesystems
disable metabg/resize combination
adjust big, medium, small, thresholds
256 inode size
flexbg patch from Jose
- Ted
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* js/uninit (Sun Oct 21 21:04:24 2007 -0500) 14 commits
- Add m_uninit test case.
- Add new mm_lazy test case.
- Fix test cases.
- Update uninit block group documetation for some of the utilities.
- Make e2fsck uninit block group aware.
- Make debugfs uninit block group aware.
- Make resize2fs uninit block group aware.
- Make dumpe2fs uninit block group aware.
- Make tune2fs uninit block group aware.
- Add support for creating filesystems using uninit block group.
- Rename feature name from gdt_checksum to uninit_groups.
- Add uninit block group support on libe2fs.
- Add initial checksum support.
* tt/64bit-bitmaps (Sun Oct 14 22:51:51 2007 -0400) 2 commits
- Initial design for 64-bit bitmaps
* tt/extents (Mon Aug 20 21:31:11 2007 -0400) 8 commits
- e2fsck: Add support for extents
- Use BLOCK_FLAG_READ_ONLY flag in debugfs, e2image, and tune2fs
- Add read-only extents support to ext2fs_block_iterate2()
- Add support for extents to libext2fs
- e2fsck: factor out code to clear an inode into
e2fsck_clear_inode()
- Don't byte swap extents information in the inode
- Allow debugfs to be extended for use by test programs
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 7 commits
- e2fsprogs: Add test case for undoe2fs
- e2fsprogs: Fix the resize inode test case
- e2fsprogs: Support for large inode migration.
- e2fsprogs: Make mke2fs use undo I/O manager.
- e2fsprogs: Add undoe2fs
- e2fsprogs: Add undo I/O manager
Theodore Tso wrote:
> I'ts been a while since I've pushed one of these out. This updates pu
> against the latest "next" branch, so we can make sure all of the
> changes that had been folded into "maint" is included on the pu
> branch. It also includes more work on the tt/extents patches.
> Tune2fs and e2image are now functional with extents, and e2fsck has
> more extents checking added. Still a bit more work to be done there,
> and I also have to add code to deal with symlinks that have the
> EXTENTS_FL flag set.
>
> I've dropped the js/flex-bg branch since it had conflicts with the
> on-disk bg flags, and it had conflicts with the exsting patches that
> caused me concern. Jose, once you regenerate the patch and deal with
> comments, I'll add it back in.
>
> My TODO list:
>
> finish e2fsck extents support
> more flexible mke2fs.conf handling
> EXT4 -- allow mke2fs 8TB without -f
ext3 should be fine as well, right? And I think Josef sent a patch...
-Eric
On Sun, Feb 10, 2008 at 11:08:11PM -0600, Eric Sandeen wrote:
> Theodore Tso wrote:
> > I'ts been a while since I've pushed one of these out. This updates pu
> > against the latest "next" branch, so we can make sure all of the
> > changes that had been folded into "maint" is included on the pu
> > branch. It also includes more work on the tt/extents patches.
> > Tune2fs and e2image are now functional with extents, and e2fsck has
> > more extents checking added. Still a bit more work to be done there,
> > and I also have to add code to deal with symlinks that have the
> > EXTENTS_FL flag set.
> >
> > I've dropped the js/flex-bg branch since it had conflicts with the
> > on-disk bg flags, and it had conflicts with the exsting patches that
> > caused me concern. Jose, once you regenerate the patch and deal with
> > comments, I'll add it back in.
> >
> > My TODO list:
> >
> > finish e2fsck extents support
> > more flexible mke2fs.conf handling
> > EXT4 -- allow mke2fs 8TB without -f
>
> ext3 should be fine as well, right? And I think Josef sent a patch...
I think so, yes.
- Ted
The big news here is that extents branch is sufficiently finished that
I've promoted it to the next branch.
- Ted
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
- Initial design for 64-bit bitmaps
* next (Mon Feb 18 23:01:43 2008 -0500) 3 commits
+ e2fsprogs-tests-f_ea_checks.patch
+ e2fsprogs-tests-f_unsorted_EAs.patch
+ Improve support for in-inode EA's
* tt/extents (Mon Feb 18 20:11:07 2008 -0500) 10 commits
+ Activate basic f_extents test case
+ e2fsck: Add support for extents
+ Add read/only support for extents to ext2fs_bmap()
+ Use BLOCK_FLAG_READ_ONLY flag in debugfs, e2image, and tune2fs
+ Add read-only extents support to ext2fs_block_iterate2()
+ Add support for extents to libext2fs
+ e2fsck: factor out code to clear an inode into
e2fsck_clear_inode()
+ Don't byte swap extents information in the inode
+ Allow debugfs to be extended for use by test programs
+ debugfs: Fix error handling in strtoblk() and
common_block_args_process()
* js/flex-bg (Wed Feb 13 20:47:50 2008 -0600) 1 commit
- e2fsprogs: New bitmap and inode table allocation for FLEX_BG v2
* js/uninit (Sun Oct 21 21:04:24 2007 -0500) 13 commits
- Add m_uninit test case.
- Add new mm_lazy test case.
- Fix test cases.
- Update uninit block group documetation for some of the utilities.
- Make e2fsck uninit block group aware.
- Make debugfs uninit block group aware.
- Make resize2fs uninit block group aware.
- Make dumpe2fs uninit block group aware.
- Make tune2fs uninit block group aware.
- Add support for creating filesystems using uninit block group.
- Rename feature name from gdt_checksum to uninit_groups.
- Add uninit block group support on libe2fs.
- Add initial checksum support.
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
- e2fsprogs: Add test case for undoe2fs
- e2fsprogs: Fix the resize inode test case
- e2fsprogs: Support for large inode migration.
- e2fsprogs: Make mke2fs use undo I/O manager.
- e2fsprogs: Add undoe2fs
- e2fsprogs: Add undo I/O manager
Theodore Tso wrote:
> The big news here is that extents branch is sufficiently finished that
> I've promoted it to the next branch.
Hey, that's great news. :)
Out of curiosity; what is the plan for behavior when finding symlinks
with the extents flag set? Last I checked e2fsprogs-interim, they got
clobbered. Is that on the TODO before extents support goes "live?"
Thanks,
-Eric
On Wed, Feb 20, 2008 at 12:46:57PM -0600, Eric Sandeen wrote:
> Theodore Tso wrote:
> > The big news here is that extents branch is sufficiently finished that
> > I've promoted it to the next branch.
>
> Hey, that's great news. :)
>
> Out of curiosity; what is the plan for behavior when finding symlinks
> with the extents flag set? Last I checked e2fsprogs-interim, they got
> clobbered. Is that on the TODO before extents support goes "live?"
Right now on e2fpsrogs 'next' the extents flag being set on symlinks,
block/char devices (in general inodes for which
ext2fs_has_valid_blocks returns NULL) are ignored. I need to make
sure this does the right thing with long symlinks --- and I'd argue
that given that long symlinks can only *ever* be one block long, it's
pointless to use the extents format, since it's just too complicated,
and I *think* that's what the kernel code is doing, but I need to
check to be sure.
Eventually I'll add code to mainline to clear EXTENTS_FL from inodes
that shouldn't have it, but the kernel patches need to hit mainline first.
At this point, I think the only thing the 'pu' branch is missing is
the DIR_NLINKS patch, and I'll get that pulled in fairly quickly. At
that point I believe the 'pu' branch and e2fsprogs-interim should be
of roughly the same functionality, except of course that the patches
which compose e2fsprogs-interim have gotten a lot more testing thanks
to Clusterfs using it in with Lustre, and the fact that at the moment,
if there are blocks that are claimed by multiple inodes, the 'clone'
feature which clones the block so that both inodes get their own copy
is not supported. The filesystem can be made consistent by deleting
one of the files, which is all UFS-style fsck's shipping with
professional Unix systems like Solaris support :-P, but obviously
that's not ideal. :-)
So one question which Eric you'll want to consider is at what point
you want to switch the FC users from e2fsprogs-interim to the
bleeding-edge e2fsprogs mainline code, since eventually this is the
branch that will have blk64_t support.
One related question is whether we want to try to get support for full
64-bit physical block numbers into ext4. I think there were some
rough draft patches floating about, but IIRC they didn't
simultaneously support the 48-bit extent format. The e2fsprogs
mainline implementation of extents makes it fairly easy to support new
extents formats --- it's minimal code changes in a single file,
lib/ext2fs/extents.c, and made sure all of the infrastructure was
present to make this easy --- but I believe that trying to support
multiple formats in the kernel given the current ext4 code would be
fairly invasive. Given the timeline, I'm assuming that our current
path is that we aren't planning on pushing full 64-bit physical block
support into the extents format for what we hope will make it into the
next round of enterprise kernels, but I thought I should throw out the
question so we make the decision one way or the other, explicitly.
Comments?
- Ted
Theodore Tso wrote:
> On Wed, Feb 20, 2008 at 12:46:57PM -0600, Eric Sandeen wrote:
>> Theodore Tso wrote:
>>> The big news here is that extents branch is sufficiently finished that
>>> I've promoted it to the next branch.
>> Hey, that's great news. :)
>>
>> Out of curiosity; what is the plan for behavior when finding symlinks
>> with the extents flag set? Last I checked e2fsprogs-interim, they got
>> clobbered. Is that on the TODO before extents support goes "live?"
>
> Right now on e2fpsrogs 'next' the extents flag being set on symlinks,
> block/char devices (in general inodes for which
> ext2fs_has_valid_blocks returns NULL) are ignored. I need to make
> sure this does the right thing with long symlinks --- and I'd argue
> that given that long symlinks can only *ever* be one block long, it's
> pointless to use the extents format, since it's just too complicated,
> and I *think* that's what the kernel code is doing, but I need to
> check to be sure.
After the patches I sent a couple days ago, which Aneesh then collapsed
into the ext4_new_inode logic, that should be the case (for both types
of symlinks)
> Eventually I'll add code to mainline to clear EXTENTS_FL from inodes
> that shouldn't have it, but the kernel patches need to hit mainline first.
Ok, but my concern is what happens to those long symlinks when they
really truly are in extents format. One option is to say "hey it was
ext4DEV, deal with it" and nuke the symlink, but if possible, converting
back to the proper format would be nice.
> ...
> So one question which Eric you'll want to consider is at what point
> you want to switch the FC users from e2fsprogs-interim to the
> bleeding-edge e2fsprogs mainline code, since eventually this is the
> branch that will have blk64_t support.
Well, honestly I haven't even put -interim into rawhide yet, although
I'm fairly close to that (or -next, whichever makes the most sense)
In my testing a few days ago, e2fsck from -interim blew away the long
symlinks, and I didn't want to foist that upon the folks using ext4dev
today...
> One related question is whether we want to try to get support for full
> 64-bit physical block numbers into ext4. I think there were some
> rough draft patches floating about, but IIRC they didn't
> simultaneously support the 48-bit extent format. The e2fsprogs
> mainline implementation of extents makes it fairly easy to support new
> extents formats --- it's minimal code changes in a single file,
> lib/ext2fs/extents.c, and made sure all of the infrastructure was
> present to make this easy --- but I believe that trying to support
> multiple formats in the kernel given the current ext4 code would be
> fairly invasive. Given the timeline, I'm assuming that our current
> path is that we aren't planning on pushing full 64-bit physical block
> support into the extents format for what we hope will make it into the
> next round of enterprise kernels, but I thought I should throw out the
> question so we make the decision one way or the other, explicitly.
I too had assumed that 48 bits would be it for now; it should be
sufficient for a good while. I guess what I'd like to see if a usable
ext4 out there in the near future, with stuff added on later as
necessary; delalloc, flex_bg (if that doesn't make 2.6.25...) etc.
Oh, speaking of all this - what do you think the criteria are for
dropping the "dev" from ext4dev? How do we decide that it's cooked
enough? :)
Thanks,
-Eric
On Feb 21, 2008 10:40 -0600, Eric Sandeen wrote:
> Ok, but my concern is what happens to those long symlinks when they
> really truly are in extents format. One option is to say "hey it was
> ext4DEV, deal with it" and nuke the symlink, but if possible, converting
> back to the proper format would be nice.
Is that actually the case though? That should be pretty easy to massage
into storing a block number. The difficulty is if the long symlink block
is beyond 32-bit blocknr, in which case it actually needs extents format.
We may as well bite the bullet and fix the code to be the same as with
htree fakeroot index block reading and use the proper mapping to find
the symlink block. See htree_blk_iter_cb() for how to do that.
> > One related question is whether we want to try to get support for full
> > 64-bit physical block numbers into ext4. I think there were some
> > rough draft patches floating about, but IIRC they didn't
> > simultaneously support the 48-bit extent format.
>
> I too had assumed that 48 bits would be it for now; it should be
> sufficient for a good while. I guess what I'd like to see if a usable
> ext4 out there in the near future, with stuff added on later as
> necessary; delalloc, flex_bg (if that doesn't make 2.6.25...) etc.
At some point 32-bit logical block numbers will also be an issue, but
the need for 16TB+ non-sparse single files is rare even in my world.
> Oh, speaking of all this - what do you think the criteria are for
> dropping the "dev" from ext4dev? How do we decide that it's cooked
> enough? :)
I'd say when e2fsprogs has an official release with extents support,
and there are no show-stopping bugs in the existing code... I don't
think that is too far off anymore.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
On Fri, Feb 22, 2008 at 04:14:34PM -0700, Andreas Dilger wrote:
> On Feb 21, 2008 10:40 -0600, Eric Sandeen wrote:
> > Ok, but my concern is what happens to those long symlinks when they
> > really truly are in extents format. One option is to say "hey it was
> > ext4DEV, deal with it" and nuke the symlink, but if possible, converting
> > back to the proper format would be nice.
>
> Is that actually the case though? That should be pretty easy to massage
> into storing a block number. The difficulty is if the long symlink block
> is beyond 32-bit blocknr, in which case it actually needs extents format.
> We may as well bite the bullet and fix the code to be the same as with
> htree fakeroot index block reading and use the proper mapping to find
> the symlink block. See htree_blk_iter_cb() for how to do that.
So before the recent patch were we actually creating long symlinks in
extents format? Or were we just setting the flag but still treating
them as a block number? If it was the latter, I guess we can put in
code into e2fsck to detect that case, and convert it back to a
singleton block number.
> > I too had assumed that 48 bits would be it for now; it should be
> > sufficient for a good while. I guess what I'd like to see if a usable
> > ext4 out there in the near future, with stuff added on later as
> > necessary; delalloc, flex_bg (if that doesn't make 2.6.25...) etc.
Flex_bg is already in 2.6.25. What's in the patch queue are changes
to the allocation algorithms to make them smarter if flex_bg is
enabled.
> At some point 32-bit logical block numbers will also be an issue, but
> the need for 16TB+ non-sparse single files is rare even in my world.
Yeah, I think the real issue is unless someone is willing to sign up
to support a new extent format (which will require significant code
revamp in the kernel, since right now it pretty much assumes only a
single extent format from a code structure point of view), it's
probably not going to happen in the near future.
> > Oh, speaking of all this - what do you think the criteria are for
> > dropping the "dev" from ext4dev? How do we decide that it's cooked
> > enough? :)
>
> I'd say when e2fsprogs has an official release with extents support,
> and there are no show-stopping bugs in the existing code... I don't
> think that is too far off anymore.
I guess I'd be a *bit* more cautious. We still have some code patches
such as the delayed allocation and to a lesser extent the online
defrag patches which have the possibility of introducing bugs. Once
all of those get merged and we have a full kernel release cycle to fix
the last remaining bugs, that's when I would drop the -dev from the
name.
Speaking of which, that's probably one of the things we should start
concentrating on the kernel side, which is preparing what's left in
the unstable part of the queue to be cleaned up and ready for
submission during the next merge window. Yeah, it's a ways, off, but
some of the patches left in the unstable series probably need quite a
bit of work. :-)
- Ted
On Feb 22, 2008 19:15 -0500, Theodore Ts'o wrote:
> So before the recent patch were we actually creating long symlinks in
> extents format? Or were we just setting the flag but still treating
> them as a block number? If it was the latter, I guess we can put in
> code into e2fsck to detect that case, and convert it back to a
> singleton block number.
Eric informed me that the long symlinks were actually stored in extent
mapped blocks. That is not harmful, because it can only be a single
block and it will always fit into the inode. The other thing to note
is that extent mapping is REQUIRED for > 32-bit blocknumbers, so we
may as well fix e2fsprogs to allow these symlinks to be handled normally.
> > I'd say when e2fsprogs has an official release with extents support,
> > and there are no show-stopping bugs in the existing code... I don't
> > think that is too far off anymore.
>
> I guess I'd be a *bit* more cautious. We still have some code patches
> such as the delayed allocation and to a lesser extent the online
> defrag patches which have the possibility of introducing bugs. Once
> all of those get merged and we have a full kernel release cycle to fix
> the last remaining bugs, that's when I would drop the -dev from the
> name.
Well, there isn't any hard requirement to include delalloc into the
first non-dev ext4 release. Yes, it would be desirable, but I don't
think we HAVE to have it. I think the important thing is that the
on-disk format is no longer changing.
One thing that still needs to be done (AFAIK) is the removal of the
"auto-setting" of filesystem feature flags, and allow tune2fs/e2fsck
to set/leave the flags that we currently set automatically. I'd
hazard that for feature flags which have been around a LONG time (like
EAs and LARGEFILE) that these should be enabled by default.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
On Sun, Feb 24, 2008 at 09:20:50PM -0700, Andreas Dilger wrote:
> On Feb 22, 2008 19:15 -0500, Theodore Ts'o wrote:
> > So before the recent patch were we actually creating long symlinks in
> > extents format? Or were we just setting the flag but still treating
> > them as a block number? If it was the latter, I guess we can put in
> > code into e2fsck to detect that case, and convert it back to a
> > singleton block number.
>
> Eric informed me that the long symlinks were actually stored in extent
> mapped blocks. That is not harmful, because it can only be a single
> block and it will always fit into the inode. The other thing to note
> is that extent mapping is REQUIRED for > 32-bit blocknumbers, so we
> may as well fix e2fsprogs to allow these symlinks to be handled normally.
Well, at least some kernel versions (as of sometime just before
2.6.25, iirc) were storing the long symlink as a single block in
i_block[0], despite EXTENTS_FL being set. Valerie noticed this, and I
confirmed it, as it caused the mainline e2fsck extents support to core
dump. Basically, what this means is that e2fsprogs can't trust
EXTENTS_FL for long symlinks.
But you do raise a good point that we need to support using the
extents format in order to support blocks > 2**32, so we can't just
arbitrary convert all symlinks to the old-style direct block maps.
- Ted
On Mon, Feb 25, 2008 at 10:13:39AM -0500, Theodore Tso wrote:
> On Sun, Feb 24, 2008 at 09:20:50PM -0700, Andreas Dilger wrote:
> > On Feb 22, 2008 19:15 -0500, Theodore Ts'o wrote:
> > > So before the recent patch were we actually creating long symlinks in
> > > extents format? Or were we just setting the flag but still treating
> > > them as a block number? If it was the latter, I guess we can put in
> > > code into e2fsck to detect that case, and convert it back to a
> > > singleton block number.
> >
> > Eric informed me that the long symlinks were actually stored in extent
> > mapped blocks. That is not harmful, because it can only be a single
> > block and it will always fit into the inode. The other thing to note
> > is that extent mapping is REQUIRED for > 32-bit blocknumbers, so we
> > may as well fix e2fsprogs to allow these symlinks to be handled normally.
>
> Well, at least some kernel versions (as of sometime just before
> 2.6.25, iirc) were storing the long symlink as a single block in
> i_block[0], despite EXTENTS_FL being set. Valerie noticed this, and I
> confirmed it, as it caused the mainline e2fsck extents support to core
> dump. Basically, what this means is that e2fsprogs can't trust
> EXTENTS_FL for long symlinks.
>
> But you do raise a good point that we need to support using the
> extents format in order to support blocks > 2**32, so we can't just
> arbitrary convert all symlinks to the old-style direct block maps.
How about the patch like below on top of the patch queue. Patch queue
currently enable extent flag only for directory and file . This patch
add it to normal symlink. With this fast symlink still have the extent
format enabled. I guess this would need a patch to the interim branch of
e2fsprogs to allow normal symlink to have extent format.
-aneesh
ext4: Enable extent format for symlink.
From: Aneesh Kumar K.V <[email protected]>
This patch enable extent format for normal symlink. Extent format enables
to refere file system blocks > 32 bits. Enabling extent format for symlink
enables to have symlink block beyond 2**32 blocks. We still don't enable
extent format for fast symlink.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/ialloc.c | 4 ++--
fs/ext4/namei.c | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 78d1094..1462189 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -842,8 +842,8 @@ got:
goto fail_free_drop;
}
if (test_opt(sb, EXTENTS)) {
- /* set extent flag only for diretory and file */
- if (S_ISDIR(mode) || S_ISREG(mode)) {
+ /* set extent flag only for diretory, file and normal symlink*/
+ if (S_ISDIR(mode) || S_ISREG(mode) || S_ISLNK(mode)) {
EXT4_I(inode)->i_flags |= EXT4_EXTENTS_FL;
ext4_ext_tree_init(handle, inode);
err = ext4_update_incompat_feature(handle, sb,
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index da942bc..63c33e0 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2222,6 +2222,8 @@ retry:
goto out_stop;
}
} else {
+ /* clear the extent format for fast symlink */
+ EXT4_I(inode)->i_flags &= ~EXT4_EXTENTS_FL;
inode->i_op = &ext4_fast_symlink_inode_operations;
memcpy((char*)&EXT4_I(inode)->i_data,symname,l);
inode->i_size = l-1;
Theodore Tso wrote:
> On Sun, Feb 24, 2008 at 09:20:50PM -0700, Andreas Dilger wrote:
>> On Feb 22, 2008 19:15 -0500, Theodore Ts'o wrote:
>>> So before the recent patch were we actually creating long symlinks in
>>> extents format? Or were we just setting the flag but still treating
>>> them as a block number? If it was the latter, I guess we can put in
>>> code into e2fsck to detect that case, and convert it back to a
>>> singleton block number.
>> Eric informed me that the long symlinks were actually stored in extent
>> mapped blocks. That is not harmful, because it can only be a single
>> block and it will always fit into the inode. The other thing to note
>> is that extent mapping is REQUIRED for > 32-bit blocknumbers, so we
>> may as well fix e2fsprogs to allow these symlinks to be handled normally.
>
> Well, at least some kernel versions (as of sometime just before
> 2.6.25, iirc) were storing the long symlink as a single block in
> i_block[0], despite EXTENTS_FL being set. Valerie noticed this, and I
> confirmed it, as it caused the mainline e2fsck extents support to core
> dump. Basically, what this means is that e2fsprogs can't trust
> EXTENTS_FL for long symlinks.
Are you sure? This was her patch comment, from
[PATCH] ext4: Don't set EXTENTS_FL flag for fast symlinks:
> For fast symbolic links, the file content is stored in the i_block[]
> array, which is not compatible with the new file extents format.
> e2fsck reports error on such files because EXTENTS_FL is set.
> Don't set the EXTENTS_FL flag when creating fast symlinks.
so this was for *fast* symlinks, stored internally on top of the i_block
array, right? In that case trying to read it as extents would certainly
cause problems. But, for *long* symlinks I think they truly were stored
in extent format, and as you say...
> But you do raise a good point that we need to support using the
> extents format in order to support blocks > 2**32, so we can't just
> arbitrary convert all symlinks to the old-style direct block maps.
... so I think we really *should* be unconditionally storing *long*
symlinks in extent format, on ext4... right?
-Eric
On Mon, Feb 25, 2008 at 11:32:34AM -0600, Eric Sandeen wrote:
> Are you sure? This was her patch comment, from
> [PATCH] ext4: Don't set EXTENTS_FL flag for fast symlinks:
You're right, I confused myself.
> > But you do raise a good point that we need to support using the
> > extents format in order to support blocks > 2**32, so we can't just
> > arbitrary convert all symlinks to the old-style direct block maps.
>
> ... so I think we really *should* be unconditionally storing *long*
> symlinks in extent format, on ext4... right?
Yes, I think so. Being liberal in what you receive is probably a good
idea, but we should only store new symlinks in one standard format,
and the extents format makes sense since it will allow us to support
48-bit block numbers.
- Ted
The primary changes since last time is more fixups on tt/mkfs-types,
and a refactoring of the patches in js/uninit.
In addition, I've done a lot of analysis and made comments below
regarding what needs to be done before each of these topics are ready
to be merged. If you are are responsible for any of the patches
below, please do take a look.
Although I've checked to make sure that each of these topic branches
passes "make check" on their own, the combined pu branch is failing
due to what appears to be unfortunate interactions between js/flex-bg
and js/uninit. This *may* be because I botched the merge, which got
tricky when I merged in js/uninit into the pu branch which already had
js/flex-bg. If someone could take a look at the misc/mke2fs.c changes
in "git show bffaaf74", I'd much appreciate it.
- Ted
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* tt/mkfs-types (Tue Feb 19 08:32:58 2008 -0500) 1 commit
- New mke2fs types parsing --- IN PROGRESS
This is missing only documentation updates at this point
* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
- Initial design for 64-bit bitmaps
Untouched since last time
* js/flex-bg (Wed Feb 13 20:47:50 2008 -0600) 1 commit
- e2fsprogs: New bitmap and inode table allocation for FLEX_BG v2
I've noticed that this patch is slightly different from what
Jose sent in for the e2fsprogs-interim branch, so I'm a little
concerned about which is the latest, or whether the
differences are intentional. Jose, if you have time, could
you take a look at commits cb676995 and 8072fe6 and perhaps comment?
* ad/nlinks-dir (Sat Feb 2 01:25:03 2008 -0700) 1 commit
- e2fsprogs-nlinks.patch
I'm really not sure about this change in e2fsck/pass4.c:
- if (fix_problem(ctx, PR_4_BAD_REF_COUNT, &pctx)) {
+ /* i_link_count was previously exceeded, but no longer
+ * is, fix this but don't consider it an error */
+ if ((LINUX_S_ISDIR(inode->i_mode) && link_counted > 1 &&
+ (inode->i_flags & EXT2_INDEX_FL) &&
+ link_count == 1 && !(ctx->options & E2F_OPT_NO)) ||
+ (fix_problem(ctx, PR_4_BAD_REF_COUNT, &pctx))) {
inode->i_links_count = link_counted;
e2fsck_write_inode(ctx, i, inode, "pass4");
}
Why do we require EXT2_INDEX_FL to be set before deciding that
it's OK if the i_link_count is 1 but we now have less than
EXT2_LINK_MAX links?
* js/uninit (Sun Oct 21 21:04:24 2007 -0500) 11 commits
- Add m_uninit test case
- Add new m_lazy test case
- Make e2fsck uninit block group aware
- Make debugfs uninit block group aware
- Make resize2fs uninit block group aware
- Make dumpe2fs uninit block group aware
- Make tune2fs uninit block group aware
- Add support for creating filesystems using uninit block group
- Rename feature name from gdt_checksum to uninit_groups
- Add uninit block group support on libe2fs
- Add initial checksum support
I've refactored this patch series, so that the documentation
changes for mke2fs and tune2fs are with their respective
patches. I also moved the test cases fixups into the "make
e2fsck uninit block group aware".
As a result of the "make check" failures which I mentioned at
the beginning of this note, I noticed the following
unfortunate problem in how errors are getting reported in
e2fsck pass 5. As a sample:
Block bitmap differences: +8195Group 3 block(s) in use but group is marked BLO
CK_UNINIT
Fix? yes
This mangling is happening because error reporting for
PR_5_INODE_UNINIT is getting intermingled with the
PR_5_INODE_USED/PR_5_INODE_UNUSED reporting, which isn't a
good idea. The comment around the code says:
/*
* We should never hit this, because it means that
* inodes were marked in use that weren't noticed
* in pass1 or pass 2. It is easier to fix the problem
* than to kill e2fsck and leave the user stuck.
*/
So I'm guessing there's something else wrong going on here....
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
- e2fsprogs: Add test case for undoe2fs
- e2fsprogs: Fix the resize inode test case
- e2fsprogs: Support for large inode migration.
- e2fsprogs: Make mke2fs use undo I/O manager.
- e2fsprogs: Add undoe2fs
- e2fsprogs: Add undo I/O manager
The test cases are missing an expected file
Also, we need to chage "mke2fs use undo I/O manager" so it
only does it if the uninit group patch is enabled. So this is
blocked on the uninit topic branch getting merged (or at least
the mke2fs) changes are so blocked.
----
URL's for the e2fsprogs repository:
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://www.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
On Feb 29, 2008 10:43 -0500, Theodore Ts'o wrote:
> * ad/nlinks-dir (Sat Feb 2 01:25:03 2008 -0700) 1 commit
> - e2fsprogs-nlinks.patch
>
> I'm really not sure about this change in e2fsck/pass4.c:
>
> - if (fix_problem(ctx, PR_4_BAD_REF_COUNT, &pctx)) {
> + /* i_link_count was previously exceeded, but no longer
> + * is, fix this but don't consider it an error */
> + if ((LINUX_S_ISDIR(inode->i_mode) && link_counted > 1 &&
> + (inode->i_flags & EXT2_INDEX_FL) &&
> + link_count == 1 && !(ctx->options & E2F_OPT_NO)) ||
> + (fix_problem(ctx, PR_4_BAD_REF_COUNT, &pctx))) {
> inode->i_links_count = link_counted;
> e2fsck_write_inode(ctx, i, inode, "pass4");
> }
>
> Why do we require EXT2_INDEX_FL to be set before deciding that
> it's OK if the i_link_count is 1 but we now have less than
> EXT2_LINK_MAX links?
The reason that INDEX_FL is important here is because only indexed
directories are allowed to exceed 65000 entries in the kernel. This is a
"save users from themselves" measure, because of the O(n^2) operations
needed to create/delete entries in unindexed directories. It also helps
detect the difference between corruption and expected behaviour.
> As a result of the "make check" failures which I mentioned at
> the beginning of this note, I noticed the following
> unfortunate problem in how errors are getting reported in
> e2fsck pass 5. As a sample:
>
> Block bitmap differences: +8195Group 3 block(s) in use but group is marked BLO
> CK_UNINIT
> Fix? yes
>
> This mangling is happening because error reporting for
> PR_5_INODE_UNINIT is getting intermingled with the
> PR_5_INODE_USED/PR_5_INODE_UNUSED reporting, which isn't a
> good idea. The comment around the code says:
>
> /*
> * We should never hit this, because it means that
> * inodes were marked in use that weren't noticed
> * in pass1 or pass 2. It is easier to fix the problem
> * than to kill e2fsck and leave the user stuck.
> */
>
> So I'm guessing there's something else wrong going on here....
Does this test case have both flexbg and uninit_groups? Alternately,
maybe some part of the e2fsck fixup code is allocating blocks in the
group, but doesn't know that the UNINIT flag needs to be cleared.
Instead of burning a lot of time on diagnosing this, I'd suggest to
try using the original uninit_groups patchset + Jose's patch on top of
that series to see if that works better?
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
On Fri, Feb 29, 2008 at 11:59:48AM -0800, Andreas Dilger wrote:
> Does this test case have both flexbg and uninit_groups? Alternately,
> maybe some part of the e2fsck fixup code is allocating blocks in the
> group, but doesn't know that the UNINIT flag needs to be cleared.
Yes, it does include both; this describes what was going on in the
'pu' branch.
- Ted
On Fri, 29 Feb 2008 10:43:33 -0500
Theodore Tso <[email protected]> wrote:
> Although I've checked to make sure that each of these topic branches
> passes "make check" on their own, the combined pu branch is failing
> due to what appears to be unfortunate interactions between js/flex-bg
> and js/uninit. This *may* be because I botched the merge, which got
> tricky when I merged in js/uninit into the pu branch which already had
> js/flex-bg. If someone could take a look at the misc/mke2fs.c changes
> in "git show bffaaf74", I'd much appreciate it.
The misc/mke2fs.c changes look sane to me.
> * js/flex-bg (Wed Feb 13 20:47:50 2008 -0600) 1 commit
> - e2fsprogs: New bitmap and inode table allocation for FLEX_BG v2
>
> I've noticed that this patch is slightly different from what
> Jose sent in for the e2fsprogs-interim branch, so I'm a little
> concerned about which is the latest, or whether the
> differences are intentional. Jose, if you have time, could
> you take a look at commits cb676995 and 8072fe6 and perhaps comment?
There where slight differences in misc/mke2fs.c that required making a
different patch for the e2fsprogs-interim branch. The patches are
functionally the same.
-JRS
FYI,
today I really had to use e2fsck from your e2fsprogs.git "pu" tree.
After upgrading to 2.6.25-rc3, I noticed that one (raid0-)device has
not been mounted, syslog told me:
EXT4-fs: md4: not marked OK to use with test code.
First I thought there is something wrong with the fs, so I ran it:
http://nerdbynature.de/bits/2.6.25-rc3/e2fsck_md4.log
...and when I could not mount the device afterwards, I stumpbled upon
http://fedoraproject.org/wiki/FedoraExt4, giving the exact commands which
made my filesystem usable again :)
Thanks,
Christian.
--
BOFH excuse #415:
Maintenance window broken
On Fri, 29 Feb 2008 10:43:33 -0500
Theodore Tso <[email protected]> wrote:
> Block bitmap differences: +8195Group 3 block(s) in use but group is marked BLO
> CK_UNINIT
> Fix? yes
>
> This mangling is happening because error reporting for
> PR_5_INODE_UNINIT is getting intermingled with the
> PR_5_INODE_USED/PR_5_INODE_UNUSED reporting, which isn't a
> good idea. The comment around the code says:
>
> /*
> * We should never hit this, because it means that
> * inodes were marked in use that weren't noticed
> * in pass1 or pass 2. It is easier to fix the problem
> * than to kill e2fsck and leave the user stuck.
> */
>
> So I'm guessing there's something else wrong going on here....
After looking at the misc/mke2fs.c a second time, I notice that there
is one chunk missing in setup_lazy_bg from the 6270612c commit:
+
+ /* Skip groups with GDT backups because the resize
+ * inode has blocks allocated in them, and the last
+ * group because it needs block bitmap padding. */
+ if ((ext2fs_bg_has_super(fs, i) &&
+ sb->s_reserved_gdt_blocks) ||
+ i == fs->group_desc_count - 1)
+ continue;
+
This should fix the problem you see above. Now, I wonder if you would
see a similar problem if the filesystems is created with the meta_bg
option since we don't check to see if the block groups has a bgd.
-JRS
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* js/64bitio (Mon Mar 3 10:41:29 2008 -0600) 3 commits
+ Add {read,write}_blk64 to inode_io.c
+ Add {read,write}_blk64 to unix_io.c
+ Add 64-bit IO manager operations to struct_io_manager.
Looks good; I've already put it into the next branch.
* tt/mkfs-types (Tue Feb 19 08:32:58 2008 -0500) 1 commit
- New mke2fs types parsing --- IN PROGRESS
No change since list time; still needs documentation.
* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
- Initial design for 64-bit bitmaps
No change since last time.
* js/flex-bg (Wed Feb 13 20:47:50 2008 -0600) 1 commit
- e2fsprogs: New bitmap and inode table allocation for FLEX_BG v2
Jose, have you had a chance to sanity check these compared to
what is in e2fsprogs-interim?
* ad/nlinks-dir (Sat Feb 2 01:25:03 2008 -0700) 2 commits
- Add support for the DIR_NLINK feature.
- libext2fs: Make icount use a 32-bit counter
This has been reworked per a note to linux-ext4 sent earlier
today. If there are no comments, these should be ready to go
in.
* js/uninit (Sun Oct 21 21:04:24 2007 -0500) 11 commits
- Add m_uninit test case
- Add new m_lazy test case
- Make e2fsck uninit block group aware
- Make debugfs uninit block group aware
- Make resize2fs uninit block group aware
- Make dumpe2fs uninit block group aware
- Make tune2fs uninit block group aware
- Add support for creating filesystems using uninit block group
- Rename feature name from gdt_checksum to uninit_groups
- Add uninit block group support on libe2fs
- Add initial checksum support
The pu merge problem seems to have been solved; thanks to Jose
for pointing out the missing patch hunk.
This should be ready to go in soon. I just need time to do a
final audit.
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
- e2fsprogs: Add test case for undoe2fs
- e2fsprogs: Fix the resize inode test case
- e2fsprogs: Support for large inode migration.
- e2fsprogs: Make mke2fs use undo I/O manager.
- e2fsprogs: Add undoe2fs
- e2fsprogs: Add undo I/O manager
The test cases are missing an expected file
Also, we need to chage "mke2fs use undo I/O manager" so it
only does it if the uninit group patch is enabled. So this is
blocked on the uninit topic branch getting merged (or at least
the mke2fs) changes are so blocked.
The undoe2fs program is missing a man page.
----
URL's for the e2fsprogs repository:
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://www.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* tt/mkfs-types (Tue Feb 19 08:32:58 2008 -0500) 1 commit
- New mke2fs types parsing --- IN PROGRESS
Eric is working on the documentation and reworking the way we
handle the command parsing.
* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
- Initial design for 64-bit bitmaps
Not touched.
* js/flex-bg (Wed Feb 13 20:47:50 2008 -0600) 1 commit
- e2fsprogs: New bitmap and inode table allocation for FLEX_BG v2
Next on my list to audit.
* js/uninit (Sun Oct 21 21:04:24 2007 -0500) 11 commits
+ Add m_uninit test case
+ Add new m_lazy test case
+ Make e2fsck uninit block group aware
+ Make debugfs uninit block group aware
+ Make resize2fs uninit block group aware
+ Make dumpe2fs uninit block group aware
+ Make tune2fs uninit block group aware
+ Add support for creating filesystems using uninit block group
+ Rename feature name from gdt_checksum to uninit_groups
+ Add uninit block group support to various libext2fs functions
+ Add initial checksum support for the gdt_checksum/uninit_group
feature
Merged into next. At this point e2fsck on the 'next' branch
should be able to deal with any valid ext4 filesystem; if it
doesn't, that's a bug.
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
- e2fsprogs: Add test case for undoe2fs
- e2fsprogs: Fix the resize inode test case
- e2fsprogs: Support for large inode migration.
- e2fsprogs: Make mke2fs use undo I/O manager.
- e2fsprogs: Add undoe2fs
- e2fsprogs: Add undo I/O manager
Unchanged from last time.
The test cases are missing an expected file.
Also, we need to chage "mke2fs use undo I/O manager" so it
only does it if the uninit group patch is enabled.
----
URL's for the e2fsprogs repository:
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://www.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* js/flex-bg (Mon Mar 31 22:13:11 2008 -0500) 1 commit
- mke2fs: New bitmap and inode table allocation for FLEX_BG
Updated with Jose's latest patch.
* tt/mkfs-types (Tue Feb 19 08:32:58 2008 -0500) 1 commit
- New mke2fs types parsing --- IN PROGRESS
I'm taking this back, will work on this soon.
* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
- Initial design for 64-bit bitmaps
Unchanged since last iteration.
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
- e2fsprogs: Add test case for undoe2fs
- e2fsprogs: Fix the resize inode test case
- e2fsprogs: Support for large inode migration.
- e2fsprogs: Make mke2fs use undo I/O manager.
- e2fsprogs: Add undoe2fs
- e2fsprogs: Add undo I/O manager
Unchanged since last iteration; previously remarked defects:
The test cases are missing an expected file
Also, we need to chage "mke2fs use undo I/O manager" so it
only does it if the uninit group patch is enabled. So this is
blocked on the uninit topic branch getting merged (or at least
the mke2fs) changes are so blocked.
In addition, the following changes were added to the "next" branch.
At this point, I believe all of the Lustre code changes (not including
the new test cases; that's still on the todo list) have been accounted
for.
Andreas Dilger (8):
debugfs: Add support for "set_block_group <bg_num> checksum calc"
ext2fs_set_gdt_csum(): Clean up superblock dirty flag handling
ext2fs_set_gdt_csum(): Force the last block group to have a valid block bitmap
ext2fs_set_gdt_csum(): Return an error code on errors instead of void
libext2fs: Micro-optimization in inode scan code
Split the m_lazy test case into two cases: m_lazy and m_lazy_resize
e2fsck: Add check to enforce a valid block bitmap in last block group
Add new regression test: f_uninit_last_uninit
Eric Sandeen (1):
e2fsck: Fix extent flag validity tests in pass1 on big endian boxes.
Theodore Ts'o (5):
Fix trailing whitespace in e2fsck/problem.[ch]
dumpe2fs: Print the group checksum and the block options in a nicer way
libext2fs: Make all test programs link only against the static library
Add dependency rule so that static library when necessary for "make check"
libext2fs: Fix big-endian bug in tst_csum.c
----
URL's for the e2fsprogs repository:
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://www.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* ak/undo-mgr (Sun Apr 6 18:08:12 2008 -0400) 7 commits
- Allow user to disable Undo manager through MKE2FS_SCRATCH_DIR
- e2fsprogs: Add test case for undoe2fs
- e2fsprogs: Fix the resize inode test case
- e2fsprogs: Support for large inode migration.
- e2fsprogs: Make mke2fs use undo I/O manager.
- e2fsprogs: Add undoe2fs
- e2fsprogs: Add undo I/O manager
Still needs a lot of polish. I'll be working on this myself.
Patches to clean this up are welcome...
* js/flex-bg (Mon Mar 31 22:13:11 2008 -0500) 1 commit
- mke2fs: New bitmap and inode table allocation for FLEX_BG
This patch has problems with combinations with meta_bg; some
of it may be bugs or a misdesign in the meta_bg or
uninit_groups support code. In particular I suspect I'll need
to redesign a new interface for ext2fs_super_and_bgd_loc2()
that makes this all cleaner.
We also need to make uninit_groups truly *be* uninit_groups,
and do the lazy intable initialziation (which we don't do
now).
* es/warnings-fix (Sat Mar 29 23:01:52 2008 -0500) 1 commit
+ Fix a couple of implicit function declarations
Merged into next and master; will be dropped as a branch.
* tt/mkfs-types (Tue Feb 19 08:32:58 2008 -0500) 1 commit
+ New mke2fs filesystem and usage types support
Merged into next and master; will be dropped as a branch.
* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
- Initial design for 64-bit bitmaps
No change
----
URL's for the e2fsprogs repository:
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://www.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* tt/huge_file (Wed Apr 9 11:39:11 2008 -0400) 1 commit
+ Add support for the HUGE_FILE feature
* ak/undo-mgr (Sun Apr 6 18:08:12 2008 -0400) 7 commits
- Allow user to disable Undo manager through MKE2FS_SCRATCH_DIR
- e2fsprogs: Add test case for undoe2fs
- e2fsprogs: Fix the resize inode test case
- e2fsprogs: Support for large inode migration.
- e2fsprogs: Make mke2fs use undo I/O manager.
- e2fsprogs: Add undoe2fs
- e2fsprogs: Add undo I/O manager
* js/flex-bg (Mon Mar 31 22:13:11 2008 -0500) 1 commit
- mke2fs: New bitmap and inode table allocation for FLEX_BG
* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
- Initial design for 64-bit bitmaps
In addition, the folllowing changes were dropped into the 'next' branch:
Eric Sandeen (5):
print a bit more info for the tst_extents info command
Fix ext2fs_extent_insert when at last extent in node
Make extent_goto() deterministic when logical block not found
e2fsck: Only check PR_1_EXTENT_ENDS_BEYOND for leaf nodes
blkid: more sanity checks for swap v1
Theodore Ts'o (4):
Fix tst_extents build when building w/o dynamic libraries
Rename the feature uninit_groups to uninit_bg
Change the primary name of the extents feature to be 'extent'
mke2fs: Disentangle lazy_bg from uninit_bg's lazy inode table initialization
----
URL's for the e2fsprogs repository:
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://www.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* tt/uninit_fixup (Mon Apr 21 01:29:01 2008 -0400) 8 commits
+ Transfer responsibility of setting the *_UNINIT flags to libext2fs
+ Remove LAZY_BG feature
+ Add comments documenting ext2fs_[reserve_]super_and_bgd_loc()
+ e2fsck: Fix pass5 handling of meta_bg and uninit_bg combination
+ ext2fs_set_gdt_csum: Remove setting of BLOCK_UNINIT
+ ext2fs_set_gdt_csum: Remove bogus setting of ITABLE_ZEROED
This is in the 'next' branch already. It significantly cleans
up how the BLOCK_UNINIT, INODE_UNINIT, and ITABLE_ZEROED flags
are handled. The resulting code is much more robust, so when
the flex_bg allocation patch is re-introduced, it should be
much more robust.
* tt/inode_size (Sun Apr 20 16:10:07 2008 -0400) 2 commits
+ libe2p: Print the s_min_extra_isize and s_wanted_extra_isize
fields
+ libext2fs: Initialize s_min_extra_isize and s_wanted_extra_isize
This is in the 'next' branch already. It turns out we weren't
initializing these fields in the superblock; oops.
Still to do is code in e2fsck so we can fix up inodes which
have less exra inode space allocated than what is requested by
the superblock.
* js/flex-bg (Mon Mar 31 22:13:11 2008 -0500) 1 commit
. mke2fs: New bitmap and inode table allocation for FLEX_BG
This has been temporarily dropped from the 'pu' branch,
because of patch conflicts introduced by the tt/uninit_fixup
branch. This patch needs to be broken up into smaller patches
(header file and base support in debugfs for the new
superblock fields, followed by the needed changes to the
library, and the the mke2fs changes., etc.)
* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
- Initial design for 64-bit bitmaps
No change from last time.
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
- Add test cases for undoe2fs: u_undoe2fs_mke2fs and
u_undoe2fs_tune2fs
- Fix the resize inode test case
- tune2fs: Support for large inode migration.
- mke2fs: Add support for the undo I/O manager.
- Add undoe2fs command
- libext2fs: Add undo I/O manager
These patches have been *significantly* rototilled.
* The test cases weren't correctly setting and deleting the
test_name.ok and test_name.failed files
* mke2fs would bomb out with an incomprehensible error message
if run twice in a row, or if the user didn't have write access
to /var/lib/e2fsprogs (some users run mke2fs as a non-root
user!)
* the undoe2fs program was calling com_err and passing
in uninitialized retval values, and was otherwise confused
about how to do proper error handling, resulting in garbage
getting printed if the file passed in didn't exist
* there was a terrible performance problem because in the
mke2fs case, the undo manager was using a tdb_data_size of
512.
* I added the ability to configure the location of the scratch
dirctory via mke2fs.conf for mk2efs. What we should do for
tune2fs is less clear, and still needs to be addressed. It
is also possible to force an undo file to be always created,
and not just when doing a lazy inode table initialization.
By using a 4k instead of 512 tdb_data_size, the time to
speed up an mke2fs was cut in half, while still using an
undo file. I suspect if we force the tdb_data_size to be
even larger, say 32k or 64k the overhead would shrink even
more.
Unfortunately, there appears to be some kind of data
corruption bug if I force a tdb_data_size of 32768, so I'm not
entirely sure I trust the undo manager to be working
correctly. The undo_manager code itself needs a pretty
serious auditing and checking to make sure it's doing the
right thing with large tdb_data_sizes.
----
URL's for the e2fsprogs repository:
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://www.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
On Mon, Apr 21, 2008 at 12:41:44PM -0400, Theodore Tso wrote:
>
> * ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
> - Add test cases for undoe2fs: u_undoe2fs_mke2fs and
> u_undoe2fs_tune2fs
> - Fix the resize inode test case
> - tune2fs: Support for large inode migration.
> - mke2fs: Add support for the undo I/O manager.
> - Add undoe2fs command
> - libext2fs: Add undo I/O manager
>
> These patches have been *significantly* rototilled.
>
> * The test cases weren't correctly setting and deleting the
> test_name.ok and test_name.failed files
>
> * mke2fs would bomb out with an incomprehensible error message
> if run twice in a row, or if the user didn't have write access
> to /var/lib/e2fsprogs (some users run mke2fs as a non-root
> user!)
>
> * the undoe2fs program was calling com_err and passing
> in uninitialized retval values, and was otherwise confused
> about how to do proper error handling, resulting in garbage
> getting printed if the file passed in didn't exist
>
> * there was a terrible performance problem because in the
> mke2fs case, the undo manager was using a tdb_data_size of
> 512.
>
> * I added the ability to configure the location of the scratch
> dirctory via mke2fs.conf for mk2efs. What we should do for
> tune2fs is less clear, and still needs to be addressed. It
> is also possible to force an undo file to be always created,
> and not just when doing a lazy inode table initialization.
> By using a 4k instead of 512 tdb_data_size, the time to
> speed up an mke2fs was cut in half, while still using an
> undo file. I suspect if we force the tdb_data_size to be
> even larger, say 32k or 64k the overhead would shrink even
> more.
>
> Unfortunately, there appears to be some kind of data
> corruption bug if I force a tdb_data_size of 32768, so I'm not
> entirely sure I trust the undo manager to be working
> correctly. The undo_manager code itself needs a pretty
> serious auditing and checking to make sure it's doing the
> right thing with large tdb_data_sizes.
>
undo-mgr is using the first blocks size used to write. This was done as
per the discussion at
http://thread.gmane.org/gmane.comp.file-systems.ext4/1942
http://thread.gmane.org/gmane.comp.file-systems.ext4/2056
I am not sure how you are forcing larger tdb_data_size. Can you send me
the changes so that I can test it and see what is wrong.
-aneesh
On Wed, Apr 23, 2008 at 01:02:31PM +0530, Aneesh Kumar K.V wrote:
> > Unfortunately, there appears to be some kind of data
> > corruption bug if I force a tdb_data_size of 32768, so I'm not
> > entirely sure I trust the undo manager to be working
> > correctly. The undo_manager code itself needs a pretty
> > serious auditing and checking to make sure it's doing the
> > right thing with large tdb_data_sizes.
> >
>
> undo-mgr is using the first blocks size used to write. This was done as
> per the discussion at
>
> http://thread.gmane.org/gmane.comp.file-systems.ext4/1942
> http://thread.gmane.org/gmane.comp.file-systems.ext4/2056
>
> I am not sure how you are forcing larger tdb_data_size. Can you send me
> the changes so that I can test it and see what is wrong.
There's a reason why I publish the "What's Cooking" messages, so
people can see what's in the master, next, and pu branches, and can
examine them to see what's in there. I find the easiest way to look
at what's gone into the git repository is by using the gitk command,
i.e., "gitk pu", or "gitk maint", etc. If you like terminal based
approaches, you can also use "git log pu", or "git log -p pu" if you
want to see the patches along side the changelog commit.
(BTW, it's good to browse the Linux kernel tree using git in this way
so you can get a good feel for the level of verbosity generally
favored by kernel maintainers, and match this in the ext4 patch
queue....)
------
[This begins a git tutorial for those who don't know how to dig out
the changes out of the 'pu' branch in the e2fsprogs git tree.
Given that you asked me to send me the changes without realizing I
had already pushed them out, I'm going to assume that you didn't
know how to do this. My apologies if you already did, and please
skip to the next section; maybe someone else will find this mini
git tutorial useful.]
I don't normally push out the heads of the individual components which
make up the 'pu' branch (mainly to avoid confusion), but it's normally
pretty easy to isolate the beginning of the ak/undo-mgr branch simply
by looking at the gitk output, or by doing something like this: "git
log --abbrev-commit --pretty=oneline --parents pu", which will show
you something like this:
0f83612... 2934064... 61639ff... Merge branch 'tt/64bit-bitmaps' into pu
2934064... 1140583... 9ba4000... Merge branch 'tt/flex-bg' into pu
1140583... d11736c... 625d7bf... Merge branch 'ak/undo-mgr' into pu
9ba4000... 494a1da... mke2fs: New bitmap and inode table allocation for FLEX_BG
494a1da... d11736c... Basic flexible block group support
d11736c... 0f2794c... ext2fs_open_inode_scan: Handle an non-zero bg_itable_used
0f2794c... 4476105... mke2fs/libext2fs: Fix lazy inode table initialization
625d7bf... 954b213... Add test cases for undoe2fs: u_undoe2fs_mke2fs and u_undo
954b213... cd81b90... Fix the resize inode test case
cd81b90... eadc218... tune2fs: Support for large inode migration.
eadc218... 5af269b... mke2fs: Add support for the undo I/O manager.
5af269b... 31db6f7... Add undoe2fs command
31db6f7... 4476105... libext2fs: Add undo I/O manager
4476105... 5711ed2... 748d071... Merge branch 'maint'
Either by looking at the "Merge branch 'ak/undo-mgr' into pu" line, or
by matching the subject lines from the "What's Cooking" e-mail, you
can easily determine that the head of the ak/undo-mgr branch is
625d7bf. So you can, if you like set a branch point for ak/undo-mgr
in your own git repository and change to it, like this:
git branch -f ak/undo-mgr 625d7bf
git checkout ak/undo-mgr
You can then rebase all of the patches on that branch to be against
the latest master branch, like this:
git rebase master
You can also look at the patches as ascii text via a command like this:
git format-patch --subject-prefix="E2FSPROGS" -o /tmp/patches master
Or, you can edit individual patches, like this:
git rebase --interactive master
This will throw up an editor window with the individual patches,
between the head of the branch and "master"; if you reorder the
patches by cutting and pasting lines, the patches will be reordered in
the branch. If you delete a line, the patch will be dropped from the
branch. If you change the first word in the editor window from "pick"
to "edit", during the rebasing process, it will stop, and you can look
at the tree right after the patch was applied, modify files, and then
use "git commit --amend -a" to ammend the current patch, and then type
"git rebase --continue" to continue the rebasing process. It is
possible to edit multiple patches in a branch this way; DO NOT do this
to try to modify commits that have already been pushed out on the
'master' or the 'next' branches. One of the reasons why we use the
'pu' branch is specifically for patches that need refinement in this
way, since commits between master/next and 'pu' aren't guaranteed to
be invariant. It is therefore a bad idea to base patches off of the
'pu' branch, and this is why in general all of the topics branches
shown in the 'What's Cooking' reports are based off of 'master'.
And finally, when you want to send out a patch series for people to
see, you can use the git-format-patch command above (make sure you
clear out the /tmp/patches directory first, or use a new, non-existent
directory name), and then use the command:
git send-email --to [email protected] /tmp/patches
You can use the --compose option if you want to compose an
introductory message describing the patchset first. Or use --dry-run
if you're not sure it will do what you want, first. See the man pages
for all of these commands for more details.
-----------
Anyway, getting back to the subject at hand, how I am forcing the
larger tdb_data_size is in the modified ak/undo-mgr patches which are
in the 'pu' branch. Most of the key bits are in the undo_io manager;
note especially:
* tdb_data_size is now stored in the private data
section of the structure, instead of being a global static
variable. (Why this is so much better than a static variable
is left as an exercise to the reader.)
* setting the tdb_data_size has been moved from undo_write_tdb()
to undo_set_blksize().
* new support added for the tdb_data_size option in undo_set_option()
The other key bit is in misc/mke2fs.c, in the main() function:
#if 0 /* XXX bug in undo_io.c if we set this? */
io_channel_set_options(fs->io, "tdb_data_size=32768");
#endif
Since ext2fs_initialize() calls io_channel_set_blksize(), this forces
tdb_data_size to be fs->blocksize, or 4096. The call to
io_channel_set_options(), above forces the tdb_data_size to another
value. If you set it to be 512, you will get the old, horrible
performance of the small tdb_data_size. If you set it to 32768, then
when you use undoe2fs to replay the undo log, you get back a different
md5sum checksum, so the u_undoe2fs_mke2fs test ends up failing.
- Ted
P.S. Unfortunately, ext2fs_open() first sets the blocksize to 1024,
in order to read the superblock. This means we need to either add an
io_channel_set_options(io, "tdb_data_size=<fs->blocksize>") to callers
of ext2fs_open(), or to ext2fs_open() itself. I've held off on that
because for the tune2fs -I case, it may make more sense to set the
tdb_data_size to something really big, like 32768, once we figure out
why that isn't working in the mke2fs case.
On Mon, Apr 21, 2008 at 12:41:44PM -0400, Theodore Tso wrote:
>
> * ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
> - Add test cases for undoe2fs: u_undoe2fs_mke2fs and
> u_undoe2fs_tune2fs
> - Fix the resize inode test case
> - tune2fs: Support for large inode migration.
> - mke2fs: Add support for the undo I/O manager.
> - Add undoe2fs command
> - libext2fs: Add undo I/O manager
>
> These patches have been *significantly* rototilled.
>
> * The test cases weren't correctly setting and deleting the
> test_name.ok and test_name.failed files
>
> * mke2fs would bomb out with an incomprehensible error message
> if run twice in a row, or if the user didn't have write access
> to /var/lib/e2fsprogs (some users run mke2fs as a non-root
> user!)
>
> * the undoe2fs program was calling com_err and passing
> in uninitialized retval values, and was otherwise confused
> about how to do proper error handling, resulting in garbage
> getting printed if the file passed in didn't exist
>
> * there was a terrible performance problem because in the
> mke2fs case, the undo manager was using a tdb_data_size of
> 512.
>
> * I added the ability to configure the location of the scratch
> dirctory via mke2fs.conf for mk2efs. What we should do for
> tune2fs is less clear, and still needs to be addressed. It
> is also possible to force an undo file to be always created,
> and not just when doing a lazy inode table initialization.
> By using a 4k instead of 512 tdb_data_size, the time to
> speed up an mke2fs was cut in half, while still using an
> undo file. I suspect if we force the tdb_data_size to be
> even larger, say 32k or 64k the overhead would shrink even
> more.
>
> Unfortunately, there appears to be some kind of data
> corruption bug if I force a tdb_data_size of 32768, so I'm not
> entirely sure I trust the undo manager to be working
> correctly. The undo_manager code itself needs a pretty
> serious auditing and checking to make sure it's doing the
> right thing with large tdb_data_sizes.
>
The bug was in the changes added to support block size via set_options
We had two records in the data for blocksize. one 1024. configured via
set_blk_size and other 32K configured via set_options. So the undoe2fs
was replaying with 1K as the block size.
The below changes get everything working for me. The patch is not clean.
so they are not to apply. I still need to think a little bit more about
what to do when we attempt to read tdb_data_size from the end of file
system. Will send the cleanup patch later.
Let me know whether you want to keep the debug printf
diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
index 9bee1b6..4ad0fd6 100644
--- a/lib/ext2fs/undo_io.c
+++ b/lib/ext2fs/undo_io.c
@@ -169,6 +169,9 @@ static errcode_t write_block_size(TDB_CONTEXT *tdb, int block_size)
tdb_key.dsize = sizeof("filesystem BLKSIZE");
tdb_data.dptr = (unsigned char *)&(block_size);
tdb_data.dsize = sizeof(block_size);
+#ifdef DEBUG
+ printf("Setting blocksize %d\n", block_size);
+#endif
retval = tdb_store(tdb, tdb_key, tdb_data, TDB_INSERT);
if (retval == -1) {
@@ -182,13 +185,14 @@ static errcode_t undo_write_tdb(io_channel channel,
unsigned long block, int count)
{
- int size, loop_count = 0, i;
+ int size, i;
unsigned long block_num, backing_blk_num;
errcode_t retval = 0;
ext2_loff_t offset;
struct undo_private_data *data;
TDB_DATA tdb_key, tdb_data;
char *read_ptr;
+ unsigned long end_block;
data = (struct undo_private_data *) channel->private_data;
@@ -208,21 +212,26 @@ static errcode_t undo_write_tdb(io_channel channel,
size = count * channel->block_size;
}
+#ifdef DEBUG
+ printf("Writing at %ld of size %ld\n", block, size);
+#endif
/*
* Data is stored in tdb database as blocks of tdb_data_size size
* This helps in efficient lookup further.
*
* We divide the disk to blocks of tdb_data_size.
*/
+ /*FIXME need to check end of file system. */
- block_num = ((block*channel->block_size)+data->offset) /
- data->tdb_data_size;
-
- loop_count = (size + data->tdb_data_size -1) /
- data->tdb_data_size;
+ offset = (block * channel->block_size) + data->offset ;
+ block_num = offset / data->tdb_data_size;
+ end_block = (offset + size) / data->tdb_data_size;
+#ifdef DEBUG
+ printf("Writing from %ld to %ld\n", block_num, end_block);
+#endif
tdb_transaction_start(data->tdb);
- for (i = 0; i < loop_count; i++) {
+ while (block_num <= end_block ) {
tdb_key.dptr = (unsigned char *)&block_num;
tdb_key.dsize = sizeof(block_num);
@@ -231,8 +240,11 @@ static errcode_t undo_write_tdb(io_channel channel,
* Check if we have the record already
*/
if (tdb_exists(data->tdb, tdb_key)) {
-
/* Try the next block */
+#ifdef DEBUG
+ printf("The block %d already exist in database\n",
+ block_num);
+#endif
block_num++;
continue;
}
@@ -258,6 +270,9 @@ static errcode_t undo_write_tdb(io_channel channel,
}
memset(read_ptr, 0, count);
+#ifdef DEBUG
+ printf("Reading from %ld to %ld\n", backing_blk_num, count);
+#endif
retval = io_channel_read_blk(data->real, backing_blk_num,
-count, read_ptr);
@@ -276,10 +291,22 @@ static errcode_t undo_write_tdb(io_channel channel,
printf("Printing with key %ld data %x and size %d\n",
block_num,
tdb_data.dptr,
- channel->tdb_data_size);
+ data->tdb_data_size);
#endif
- data->tdb_written = 1;
+ if (!data->tdb_written) {
+ data->tdb_written = 1;
+
+ /* Write the blocksize to tdb file */
+ retval = write_block_size(data->tdb,
+ data->tdb_data_size);
+ if (retval) {
+ tdb_transaction_cancel(data->tdb);
+ retval = EXT2_ET_TDB_ERR_IO;
+ free(read_ptr);
+ return retval;
+ }
+ }
retval = tdb_store(data->tdb, tdb_key, tdb_data, TDB_INSERT);
if (retval == -1) {
/*
@@ -417,11 +444,6 @@ static errcode_t undo_set_blksize(io_channel channel, int blksize)
*/
if (!data->tdb_data_size) {
data->tdb_data_size = blksize;
-
- /* Write it to tdb file */
- retval = write_block_size(data->tdb, data->tdb_data_size);
- if (retval)
- return retval;
}
channel->block_size = blksize;
@@ -540,12 +562,6 @@ static errcode_t undo_set_option(io_channel channel, const char *option,
return EXT2_ET_INVALID_ARGUMENT;
if (!data->tdb_data_size || !data->tdb_written) {
data->tdb_data_size = tmp;
-
- /* Write it to tdb file */
- retval = write_block_size(data->tdb,
- data->tdb_data_size);
- if (retval)
- return retval;
}
return 0;
}
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 593b743..be772ee 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1745,6 +1745,7 @@ static int should_do_undo(const char *name)
io_manager manager = unix_io_manager;
int csum_flag, force_undo;
+
csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(&fs_param,
EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
force_undo = get_int_from_profile(fs_types, "force_undo", 0);
@@ -1873,7 +1874,7 @@ int main (int argc, char *argv[])
com_err(device_name, retval, _("while setting up superblock"));
exit(1);
}
-#if 0 /* XXX bug in undo_io.c if we set this? */
+#if 1 /* XXX bug in undo_io.c if we set this? */
io_channel_set_options(fs->io, "tdb_data_size=32768");
#endif
So let me talk about what were some of the last couple of things that
I did before I folded the undo-mgr into the tree, since it might be
helpful for other people who are into hacking e2fsprogs.
1) I used "make gcc-wall" (or if you only want to scan for changes in
e2undo.c, you can use the command; "rm e2undo.o; make gcc-wall-new")
to scan for warnings; this turned up some interesting bugs. Please,
make sure your code is gcc-Wall clean. In some cases the pre-existing
code isn't, due to folks being lazy or gcc -Wall becoming more strict.
But your patch shouldn't make things worse.
2) Compiling with the --enable-testio-debug configure option, I used
the command "export TEST_IO_FLAGS=0xffff" to set the environment
variable to see the I/O pattern of various programs like "mke2fs" with
the config option defaults.force_undo=1 set. This is how I discovered
that we were using a pathetically small tdb_data_size, and changing it
to 32768 cut the mke2fs time in half. Some more investigation showed
that part of the problem was we were initializing the journal one
block at a time, causing significant overhead. When I changed
lib/ext2fs/mkjournal.c to zero out the journal 8 blocks at a time,
aligned on 8 block boundary to make life easier for the undo manager,
and made a similar change when zero'ing out the inode table so it
would also be issue writes aligned on 8 block boundaries, I was able
to cut the mke2fs time in half again.
So for everyone who had been complaining about mke2fs taking too much
time with the undo option --- we do have tools in e2fsprogs that can
be used to help profile its operation and speed things up. You can
also use seekwatcher/blktrace, of course, but there are also other
userspace tools that can be used.
3) I also changed the name of the files created by the undo manager to
include an .e2undo suffix. This makes it much more likely that system
administrators will understand what the file is and what it is used
for.
If people are looking for things to do with the undo manager --- it
would be good to wire up e2fsck and resize2fs to use it --- and at
this point, it would probably make sense to factor out common code for
setting up the undo manager into undo_mgr.c itself (although remember
the rules, nothing in libext2fs should try to write to stdout; library
routines must return error codes, and leave the printing of any status
or error messages to the application).
- Ted
The two topics I really want to get done before an e2fsprogs 1.41
release is tt/journal-checksum and es/setbmap. Fortunately both are
pretty close to being done.
- Ted
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* tt/journal-checksum (Fri May 23 01:00:19 2008 -0400) 3 commits
- e2fsck: Add support to check journal checksums
- Add CRC32 (big-endian) routine for use by journal checksums
- Re-enable byte-swapping functions for use by journal checksums
This basically works, but it currently stops without applying
a commit which has a failed checksum. Probably not the right
approach, as I discussed in a recent note to linux-ext4.
* js/64-bit (Wed May 21 12:54:28 2008 -0500) 10 commits
. Add 64-bit openfs interface.
. Use new ext2fs_super_and_bgd_loc2 call in libext2fs.
. Add 64-bit closefs interface.
. Add 64-bit ext_attr interface.
. Add 64-bit alloc interface.
. Add 64-bit alloc_stats interface.
. Add 64-bit dirblock interface.
. Use blk64_t for blocks in struct ext2_file.
. Add new blk64_t handling functions
. Add ext2_off64_t type.
I did not include this in the 'pu' branch because without the
undo manager support, it could break mke2fs.
* es/setbmap (Tue May 20 16:13:41 2008 -0500) 4 commits
- delete unused nodes in ext2fs_extent_delete
- libext2fs: add ext2fs_extent_set_bmap
- libext2fs: allow ext2fs_extent_insert to split if needed
- libext2fs: ext2fs_node_split
This needs a once-over, and then we need to wire up the
ext2fs_extent_set_bmap() funcion so it is called by
ext2fs_block_iterate() and ext2fs_bmap().
* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
- Initial design for 64-bit bitmaps
Still on my todo list.
----
URL's for the e2fsprogs repository:
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://www.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* js/64-bit (Mon Jun 2 22:29:26 2008 -0400) 11 commits
- libext2fs: Add 64-bit support to the undo manager
- Add 64-bit openfs interface.
- Use new ext2fs_super_and_bgd_loc2 call in libext2fs.
- Add 64-bit closefs interface.
- Add 64-bit ext_attr interface.
- Add 64-bit alloc interface.
- Add 64-bit alloc_stats interface.
- Add 64-bit dirblock interface.
- Use blk64_t for blocks in struct ext2_file.
- Add new blk64_t handling functions
- Add ext2_off64_t type.
* es/setbmap (Mon Jun 2 20:36:52 2008 -0400) 4 commits
+ Teach ext2fs_extent_delete() to remove an empty extent node from
the tree
+ libext2fs: add new function ext2fs_extent_set_bmap()
+ Teach ext2fs_extent_insert() to split the current node if
necessary
+ libext2fs: Teach extent.c how to split nodes
* tt/journal-checksum (Fri May 23 01:00:19 2008 -0400) 3 commits
- e2fsck: Add support to check journal checksums
- Add CRC32 (big-endian) routine for use by journal checksums
- Re-enable byte-swapping functions for use by journal checksums
* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
- Initial design for 64-bit bitmaps
* next (Mon Jun 2 21:40:02 2008 -0400) 6 commits
+ f_extents2: Add new test case testing e2fsck's support for extents
+ e2fsck: Detect unordered extents in an extent node
+ e2fsck: Wire up callback functions for _alloc_block() and
_block_alloc_stats()
+ libext2fs: Add callback functions for _alloc_block() and
_block_alloc_stats()
+ Wire ext2fs_bmap2() to use ext2fs_extent_set_bmap()
+ Wire ext2fs_block_iterate2() to use ext2fs_extent_set_bmap()
Here are the topics that have been cooking. Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'. The topics list the commits in reverse chronological
order.
* js/64-bit (Wed May 21 12:54:28 2008 -0500) 11 commits
- Add 64-bit openfs interface
- Use new ext2fs_super_and_bgd_loc2 call in libext2fs
- Add 64-bit closefs interface
- Add 64-bit ext_attr interface
- Add 64-bit alloc interface
- Add 64-bit alloc_stats interface
- Add 64-bit dirblock interface
- Use blk64_t for blocks in struct ext2_file
- libext2fs: Add 64-bit support to the undo manager
- Add new blk64_t handling functions
- Add ext2_off64_t type
I've massaged the patch comments to avoid duplication, and
applied Jose's fix to the 64-bit alloc interface patch, which
had been badly merged.
* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
- Initial design for 64-bit bitmaps
Some additional small bits of work on the infrastructure
----
URL's for the e2fsprogs repository:
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://www.kernel.org/pub/scm/fs/ext2/e2fsprogs.git