2008-04-15 16:52:40

by Jose R. Santos

[permalink] [raw]
Subject: Ininitial e2fsprogs TODO list (please expand)

As discuss on the call yesterday, some folks (my self included) really
want a TODO list to help them keep track of what things are left undone
in e2fsprogs as we try to get ext4 out the door. Here is my initial
list of items that still need addressing. Hopefully we can expand this
list and document it somewhere like the ext4 wiki or the SourceForge
bug tracker.

- Rename uninit_groups to uninit_bg to be consistent with other
defined features. Retain the old name for historical purpose.

- The return value of ext2fs_super_and_bgd_loc() is not to be trusted.
Document this in the source code.

- Make sure ext2fs_super_and_bgd_loc() does not get used anywhere where
the return value is expected to be accurate (aside from mke2fs).

- Remove lazy_bg feature from being set in mke2fs. Feature has been
declare a dangerous hack by its creator, remove it to avoid people
building on top of it.

- Add flex_bg meta-data grouping support.

- Remove support for not zeroing the inode tables from the
uninit_groups patches. This support is dangerous without a proper
kernel thread that zeros them in the background when the filesystem is
mounted. Depends on the lazy_bg removal.

- Activate undo-manager in mke2fs only when inode tables are not being
zeroed. Undo-manager is horribly slow if we need to store the
information of all the blocks that have been zeroed during mke2fs. The
amount of storage needed for the undo on a 16TB filesystem could be
problematic. Depends on kernel thread inode table zeroing.

- Make a 64-bit clean API that extends the existing one. The current
API can not support larger than 32-bit blocks so a new set API calls is
need in order to provide large filesystem support and retain backwards
compatibility with the old API.

- 64-bit bitmap interface. In order to support larger than 32-bit
blocks, a new bitmap interface is needed that can retain ABI
compatibility with the old one.



-JRS


2008-04-16 03:30:05

by Andreas Dilger

[permalink] [raw]
Subject: Re: Ininitial e2fsprogs TODO list (please expand)

On Apr 15, 2008 11:52 -0500, Jose R. Santos wrote:
> As discuss on the call yesterday, some folks (my self included) really
> want a TODO list to help them keep track of what things are left undone
> in e2fsprogs as we try to get ext4 out the door. Here is my initial
> list of items that still need addressing. Hopefully we can expand this
> list and document it somewhere like the ext4 wiki or the SourceForge
> bug tracker.
>
> - Rename uninit_groups to uninit_bg to be consistent with other
> defined features. Retain the old name for historical purpose.
>
> - The return value of ext2fs_super_and_bgd_loc() is not to be trusted.
> Document this in the source code.
>
> - Make sure ext2fs_super_and_bgd_loc() does not get used anywhere where
> the return value is expected to be accurate (aside from mke2fs).
>
> - Remove lazy_bg feature from being set in mke2fs. Feature has been
> declare a dangerous hack by its creator, remove it to avoid people
> building on top of it.
>
> - Add flex_bg meta-data grouping support.
>
> - Remove support for not zeroing the inode tables from the
> uninit_groups patches. This support is dangerous without a proper
> kernel thread that zeros them in the background when the filesystem is
> mounted. Depends on the lazy_bg removal.

Something was lost in translation here. The uninit_groups feature DOES
zero the inode tables by default, and marks the groups with ITABLE_ZEROED.
It is only if "-O uninit_groups,lazy_bg" are both given at the same time
that the itable is not initialized. That is no different than if lazy_bg
was given by itself.

So nothing needs to be done in e2fsprogs until some time after the kernel
is updated to do the zeroing.

> - Activate undo-manager in mke2fs only when inode tables are not being
> zeroed. Undo-manager is horribly slow if we need to store the
> information of all the blocks that have been zeroed during mke2fs. The
> amount of storage needed for the undo on a 16TB filesystem could be
> problematic. Depends on kernel thread inode table zeroing.
>
> - Make a 64-bit clean API that extends the existing one. The current
> API can not support larger than 32-bit blocks so a new set API calls is
> need in order to provide large filesystem support and retain backwards
> compatibility with the old API.
>
> - 64-bit bitmap interface. In order to support larger than 32-bit
> blocks, a new bitmap interface is needed that can retain ABI
> compatibility with the old one.

There are some notes on implementing more efficient bitmaps in
https://bugzilla.lustre.org/show_bug.cgi?id=12202

Even without 64-bit filesystems the memory consumption of e2fsck
can be quite high (2^32 blocks ~= 2^32 bytes of RAM for e2fsck).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-04-16 04:35:38

by Jose R. Santos

[permalink] [raw]
Subject: Re: Ininitial e2fsprogs TODO list (please expand)

On Tue, 15 Apr 2008 21:30:02 -0600
Andreas Dilger <[email protected]> wrote:

> On Apr 15, 2008 11:52 -0500, Jose R. Santos wrote:
> > As discuss on the call yesterday, some folks (my self included) really
> > want a TODO list to help them keep track of what things are left undone
> > in e2fsprogs as we try to get ext4 out the door. Here is my initial
> > list of items that still need addressing. Hopefully we can expand this
> > list and document it somewhere like the ext4 wiki or the SourceForge
> > bug tracker.
> >
> > - Rename uninit_groups to uninit_bg to be consistent with other
> > defined features. Retain the old name for historical purpose.
> >
> > - The return value of ext2fs_super_and_bgd_loc() is not to be trusted.
> > Document this in the source code.
> >
> > - Make sure ext2fs_super_and_bgd_loc() does not get used anywhere where
> > the return value is expected to be accurate (aside from mke2fs).
> >
> > - Remove lazy_bg feature from being set in mke2fs. Feature has been
> > declare a dangerous hack by its creator, remove it to avoid people
> > building on top of it.
> >
> > - Add flex_bg meta-data grouping support.
> >
> > - Remove support for not zeroing the inode tables from the
> > uninit_groups patches. This support is dangerous without a proper
> > kernel thread that zeros them in the background when the filesystem is
> > mounted. Depends on the lazy_bg removal.
>
> Something was lost in translation here. The uninit_groups feature DOES
> zero the inode tables by default, and marks the groups with ITABLE_ZEROED.
> It is only if "-O uninit_groups,lazy_bg" are both given at the same time
> that the itable is not initialized. That is no different than if lazy_bg
> was given by itself.

Yes, I understand this part.

> So nothing needs to be done in e2fsprogs until some time after the kernel
> is updated to do the zeroing.

The problem is that not initializing the inode table on the uninit
block group patch depends on a feature (lazy_bg) that Ted wants
removed. I believe that just removing the lazy_bg feature would be
enough to remove this capability from the uninit patch, but was not
entirely sure so I put the item just to keep track of it.

If lazy_bg is in fact removed from e2fsprogs, I suppose we need to add
another item to enable lazy setup of the inode tables once the proper
support in the kernel is establish.

> > - Activate undo-manager in mke2fs only when inode tables are not being
> > zeroed. Undo-manager is horribly slow if we need to store the
> > information of all the blocks that have been zeroed during mke2fs. The
> > amount of storage needed for the undo on a 16TB filesystem could be
> > problematic. Depends on kernel thread inode table zeroing.
> >
> > - Make a 64-bit clean API that extends the existing one. The current
> > API can not support larger than 32-bit blocks so a new set API calls is
> > need in order to provide large filesystem support and retain backwards
> > compatibility with the old API.
> >
> > - 64-bit bitmap interface. In order to support larger than 32-bit
> > blocks, a new bitmap interface is needed that can retain ABI
> > compatibility with the old one.
>
> There are some notes on implementing more efficient bitmaps in
> https://bugzilla.lustre.org/show_bug.cgi?id=12202
>
> Even without 64-bit filesystems the memory consumption of e2fsck
> can be quite high (2^32 blocks ~= 2^32 bytes of RAM for e2fsck).
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>



-JRS

2008-04-17 03:26:58

by Andreas Dilger

[permalink] [raw]
Subject: Re: Ininitial e2fsprogs TODO list (please expand)

On Apr 15, 2008 23:35 -0500, Jose R. Santos wrote:
> On Tue, 15 Apr 2008 21:30:02 -0600
> Andreas Dilger <[email protected]> wrote:
> > Something was lost in translation here. The uninit_groups feature DOES
> > zero the inode tables by default, and marks the groups with ITABLE_ZEROED.
> > It is only if "-O uninit_groups,lazy_bg" are both given at the same time
> > that the itable is not initialized. That is no different than if lazy_bg
> > was given by itself.
>
> Yes, I understand this part.
>
> > So nothing needs to be done in e2fsprogs until some time after the kernel
> > is updated to do the zeroing.
>
> The problem is that not initializing the inode table on the uninit
> block group patch depends on a feature (lazy_bg) that Ted wants
> removed. I believe that just removing the lazy_bg feature would be
> enough to remove this capability from the uninit patch, but was not
> entirely sure so I put the item just to keep track of it.
>
> If lazy_bg is in fact removed from e2fsprogs, I suppose we need to add
> another item to enable lazy setup of the inode tables once the proper
> support in the kernel is establish.

Yes, the "lazy init" for uinint_groups will essentially be identical to
what we have in lazy_bg today. So if we are disabling lazy_bg as a
user-selectable option, we should leave the code in place for later use.
I wouldn't object to requiring a user to specify "mke2fs -O FEATURE_C6"
to enable it. That keeps it out of the hands of newbies, but leaves the
capability to test large filesystems w/o 45 minute mke2fs times.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-04-17 03:36:24

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Ininitial e2fsprogs TODO list (please expand)

On Tue, Apr 15, 2008 at 11:52:16AM -0500, Jose R. Santos wrote:
> As discuss on the call yesterday, some folks (my self included) really
> want a TODO list to help them keep track of what things are left undone
> in e2fsprogs as we try to get ext4 out the door. Here is my initial
> list of items that still need addressing. Hopefully we can expand this
> list and document it somewhere like the ext4 wiki or the SourceForge
> bug tracker.
>
> - Rename uninit_groups to uninit_bg to be consistent with other
> defined features. Retain the old name for historical purpose.

Yes. Although until we actually don't do lazy initialization of the
inode table, I still think the name is a bit of a misnomer. It really
is more about checksuming the block group descriptors and a faster
fsck, but whether or not we initialize the block groups or not is
pretty much a non-issue.

> - The return value of ext2fs_super_and_bgd_loc() is not to be trusted.
> Document this in the source code.
>
> - Make sure ext2fs_super_and_bgd_loc() does not get used anywhere where
> the return value is expected to be accurate (aside from mke2fs).
>
> - Remove lazy_bg feature from being set in mke2fs. Feature has been
> declare a dangerous hack by its creator, remove it to avoid people
> building on top of it.

.... and to replace it, add a configuration parameter to
/etc/e2fsck.conf which controls whether or not the inode table and
bitmap blocks should be uninitialized when using uninit groups. It
will default to off for now, until the kernel support can be
implemented.

> - Add flex_bg meta-data grouping support.

Once it is demonstrated to work correctly in all circumstances. :-)

- Ted

2008-04-20 23:47:21

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Ininitial e2fsprogs TODO list (please expand)

I found a badly out-of-date e2fsprogs todo page on the ext4 wiki, and
I've updated with the todo items from this list.

http://ext4.wiki.kernel.org/index.php/E2fsprogs_features_and_patches

Some of the items marked "DONE" are in my tree and haven't been pushed
out yet, but I'll make sure that happens by Monday. Note that I am
taking the red eye from Sao Paulo tonight, and if all goes well, am
scheduled to arrive in Boston at 10:15am Eastern. If the flight gets
delayed, there is a chance that I may end up being late or missing the
ext4 call.

My intention is to try to get enough of the serious bugs fixed that we
can release 1.41-rc0 early this week. The other thing that needs to
happen is preparing the ext4 queue for pushing to Linus.

- Ted

2008-04-21 13:18:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Ininitial e2fsprogs TODO list (please expand)

On Sun, Apr 20, 2008 at 07:47:07PM -0400, Theodore Tso wrote:
> Some of the items marked "DONE" are in my tree and haven't been pushed
> out yet, but I'll make sure that happens by Monday. Note that I am
> taking the red eye from Sao Paulo tonight, and if all goes well, am
> scheduled to arrive in Boston at 10:15am Eastern. If the flight gets
> delayed, there is a chance that I may end up being late or missing the
> ext4 call.

Unfortunately, we were delayed in Sao Paulo for over three hours;
something about a problem with one of the fuel pumps.... So I've been
rebooked onto another flight which means I'll be in the air at the
time of the ext4 call.

While I was stuck on the airplane, I spent some time doing more fixups
on the uninit_bg code to make it much cleaner and more robust, and I
also started rototilling the undo_mgr patches.

In addition to fixing numerous style and usability problems, I also
found the design problem which caused it to be so slow. It is using
the first blocksize used to write to the device as the tdb_data_size.
For mke2fs, this is 512 bytes, which means that for every single 4k
inode table clock write, *eight* entries were getting made into the
tdb database and the old contents of the filesystem were getting
stored in 512 byte chunks. No wonder it was so slow!! I was able to
show significant speedups by forcing the tdb_data_size to be the
filesystem blocksize, and I suspect that for mke2fs, if it is
initializing the inode table, using a tdb_data_size of something like
32k or 64k would be even better.

Unfortunately I haven't made any progress on doing quality checking
the patches in the patch queue, since I found so much new code that
just screamed out for fixing in e2fsprogs. Eric, if you have time,
could you look through the patch queue and help out with
sanity-checking the patches and making sure the patch descriptions are
suitably well-written without version control logs, XXX FIXME
comments, or other things that would make Linus vomit? If you could,
I'd really appreciate it. Thanks!!

- Ted

2008-04-21 21:37:25

by Eric Sandeen

[permalink] [raw]
Subject: Re: Ininitial e2fsprogs TODO list (please expand)

Theodore Tso wrote:

> Unfortunately I haven't made any progress on doing quality checking
> the patches in the patch queue, since I found so much new code that
> just screamed out for fixing in e2fsprogs. Eric, if you have time,
> could you look through the patch queue and help out with
> sanity-checking the patches and making sure the patch descriptions are
> suitably well-written without version control logs, XXX FIXME
> comments, or other things that would make Linus vomit? If you could,
> I'd really appreciate it. Thanks!!

I'll put it on the list :) spent today doing more RHEL-related stuff...

-Eric