LinuxLists.cc - A tool that allows changing inode table sizes

2014-01-15 13:36:58

Subject: A tool that allows changing inode table sizes

Hi all!

As I understand it was a well-known fact that ext2/3/4 does not allow
changing inode table size without recreating the filesystem. And I
didn't have any experience in linux filesystem internals until recently,
when I've discovered that inode tables take 45 GB on one of my hard
drives (3 TB in size) :-):-) that hard drive is, of course, full of
movies, not 16Kb files, so the inode tables are almost 100% unused.

So, I've thought it would be good if it it would possible to change
inode table sizes. So I've written a tool that in fact allows to do it,
and I want to present it to the community! :)

Anyone is welcome to test it of course if it's of any interest for you -
the source is here
http://svn.yourcmc.ru/viewvc.py/vitalif/trunk/ext4-realloc-inodes/
('download tarball') (maybe it would be better to move it into a
separate git repo, of course)

I didn't test it on a real hard drive yet :-D, only on small fs images
with different settings (block, block group, flex_bg size, ext2/3/4,
bigalloc and etc). There are even some auto-tests (ran by 'make test').
The tools works without problem on all small test images that I've
created, though I didn't try to run it on bigger filesystems (of course
I'll do it in the nearest future).

As this is a highly destructive process that involves overwriting ALL
inode numbers in ALL directory entries across the whole filesystem, I've
also implemented a simple method of safely applying/rolling back
changes. First I've tried to use undo_io_manager, but it appears to be
very slow because of frequent commits, which are of course needed for it
to be safe. My method is called patch_io_manager and does a different
thing - it does not overwrite the initial FS image, but writes all
modified blocks into a separate sparse file + writes a bitmap of
modified blocks in the end when it finishes. I.e. the initial filesystem
stays unmodified.

Then, using e2patch utility (it's in the same repository), you can a)
backup the blocks that will be modified into another patch file (e2patch
backup <fs> <patch> <backup>) and b) apply the patch to real filesystem.
If the applying process gets interrupted (for example by the power
outage) it can be restarted from the beginning because it does nothing
except just overwriting some blocks. And if the FS changes appear to be
bad at all, you can restore the backup in a same way. So the process
should be safe at least to some extent.

--
With best regards,
Vitaliy Filippov

2014-01-17 00:05:50

by Andreas Dilger

[permalink] [raw]

Subject: Re: A tool that allows changing inode table sizes

On Jan 15, 2014, at 6:28 AM, [email protected] wrote:
> As I understand it was a well-known fact that ext2/3/4 does not allow changing inode table size without recreating the filesystem. And I didn't have any experience in linux filesystem internals until recently, when I've discovered that inode tables take 45 GB on one of my hard drives (3 TB in size) :-):-) that hard drive is, of course, full of movies, not 16Kb files, so the inode tables are almost 100% unused.
>
> So, I've thought it would be good if it it would possible to change inode table sizes. So I've written a tool that in fact allows to do it, and I want to present it to the community! :)

Interesting. I did something years ago for ext2/3 filesystem resizing
(ext2resize), but that has since become obsolete as the functionality
was included into e2fsprogs. I'd recommend that you also work to get
your functionality included into e2fsprogs sooner rather than later.

Ideally this would be part of resize2fs, but I'm not sure it would be
easily implemented there.

> Anyone is welcome to test it of course if it's of any interest for you - the source is here http://svn.yourcmc.ru/viewvc.py/vitalif/trunk/ext4-realloc-inodes/ ('download tarball') (maybe it would be better to move it into a separate git repo, of course)
>
> I didn't test it on a real hard drive yet :-D, only on small fs images with different settings (block, block group, flex_bg size, ext2/3/4, bigalloc and etc). There are even some auto-tests (ran by 'make test').

Note that it is critical to refuse to do anything on filesystems that
have any feature that your tool doesn't understand. Otherwise, it has
a good possibility to corrupt the filesystem.

> The tools works without problem on all small test images that I've created, though I didn't try to run it on bigger filesystems (of course I'll do it in the nearest future).
>
> As this is a highly destructive process that involves overwriting ALL inode numbers in ALL directory entries across the whole filesystem, I've also implemented a simple method of safely applying/rolling back changes. First I've tried to use undo_io_manager, but it appears to be very slow because of frequent commits, which are of course needed for it to be safe.

Would it be possible to speed up undo_io_manager if it had larger IO
groups or similar? How does the speed of running with undo_io_manager
compare to running your patch_io_manager doing both a backup and apply?

> My method is called patch_io_manager and does a different thing - it does not overwrite the initial FS image, but writes all modified blocks into a separate sparse file + writes a bitmap of modified blocks in the end when it finishes. I.e. the initial filesystem stays unmodified.

This is essentially implementing a journal in userspace for e2fsprogs.
You could even use the journal file in the filesystem. The journal
MUST be clean before the inode renumbering, or journal replay will
corrupt the filesystem after your resize. Does your tool check this?

That said, there may not be enough space in the journal for full data
journaling, but it might be enough for logical journaling of the inodes
to be moved and the directories that need to be updated?

> Then, using e2patch utility (it's in the same repository), you can a) backup the blocks that will be modified into another patch file (e2patch backup <fs> <patch> <backup>) and b) apply the patch to real filesystem. If the applying process gets interrupted (for example by the power outage) it can be restarted from the beginning because it does nothing except just overwriting some blocks.

This is exactly like journal replay.

> And if the FS changes appear to be bad at all, you can restore the backup in a same way. So the process should be safe at least to some extent.

Looks interesting. Of course, I always recommend doing a full backup
before any operation like this. At that point, it would also be
possible to just format a new filesystem and copy the data over. That
has the advantage of also allowing other filesystem features to be
enabled and defragmenting the data, but could be slower if the files
are large (as in your case) and relatively few inodes are moved.

Cheers, Andreas

Attachments:

signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2014-01-17 00:25:36

by Darrick J. Wong

[permalink] [raw]

Subject: Re: A tool that allows changing inode table sizes

On Thu, Jan 16, 2014 at 05:05:45PM -0700, Andreas Dilger wrote:
>
> On Jan 15, 2014, at 6:28 AM, [email protected] wrote:
> > As I understand it was a well-known fact that ext2/3/4 does not allow changing inode table size without recreating the filesystem. And I didn't have any experience in linux filesystem internals until recently, when I've discovered that inode tables take 45 GB on one of my hard drives (3 TB in size) :-):-) that hard drive is, of course, full of movies, not 16Kb files, so the inode tables are almost 100% unused.
> >
> > So, I've thought it would be good if it it would possible to change inode table sizes. So I've written a tool that in fact allows to do it, and I want to present it to the community! :)
>
> Interesting. I did something years ago for ext2/3 filesystem resizing
> (ext2resize), but that has since become obsolete as the functionality
> was included into e2fsprogs. I'd recommend that you also work to get
> your functionality included into e2fsprogs sooner rather than later.
>
> Ideally this would be part of resize2fs, but I'm not sure it would be
> easily implemented there.

I don't think it would be too difficult, since there's already code to move
blocks and inodes around. I guess the big question is how well does it respond
to having inodes_per_group change?

<shrug>

--D
>
> > Anyone is welcome to test it of course if it's of any interest for you - the source is here http://svn.yourcmc.ru/viewvc.py/vitalif/trunk/ext4-realloc-inodes/ ('download tarball') (maybe it would be better to move it into a separate git repo, of course)
> >
> > I didn't test it on a real hard drive yet :-D, only on small fs images with different settings (block, block group, flex_bg size, ext2/3/4, bigalloc and etc). There are even some auto-tests (ran by 'make test').
>
> Note that it is critical to refuse to do anything on filesystems that
> have any feature that your tool doesn't understand. Otherwise, it has
> a good possibility to corrupt the filesystem.
>
> > The tools works without problem on all small test images that I've created, though I didn't try to run it on bigger filesystems (of course I'll do it in the nearest future).
> >
> > As this is a highly destructive process that involves overwriting ALL inode numbers in ALL directory entries across the whole filesystem, I've also implemented a simple method of safely applying/rolling back changes. First I've tried to use undo_io_manager, but it appears to be very slow because of frequent commits, which are of course needed for it to be safe.
>
> Would it be possible to speed up undo_io_manager if it had larger IO
> groups or similar? How does the speed of running with undo_io_manager
> compare to running your patch_io_manager doing both a backup and apply?
>
> > My method is called patch_io_manager and does a different thing - it does not overwrite the initial FS image, but writes all modified blocks into a separate sparse file + writes a bitmap of modified blocks in the end when it finishes. I.e. the initial filesystem stays unmodified.
>
> This is essentially implementing a journal in userspace for e2fsprogs.
> You could even use the journal file in the filesystem. The journal
> MUST be clean before the inode renumbering, or journal replay will
> corrupt the filesystem after your resize. Does your tool check this?
>
> That said, there may not be enough space in the journal for full data
> journaling, but it might be enough for logical journaling of the inodes
> to be moved and the directories that need to be updated?
>
> > Then, using e2patch utility (it's in the same repository), you can a) backup the blocks that will be modified into another patch file (e2patch backup <fs> <patch> <backup>) and b) apply the patch to real filesystem. If the applying process gets interrupted (for example by the power outage) it can be restarted from the beginning because it does nothing except just overwriting some blocks.
>
> This is exactly like journal replay.
>
> > And if the FS changes appear to be bad at all, you can restore the backup in a same way. So the process should be safe at least to some extent.
>
> Looks interesting. Of course, I always recommend doing a full backup
> before any operation like this. At that point, it would also be
> possible to just format a new filesystem and copy the data over. That
> has the advantage of also allowing other filesystem features to be
> enabled and defragmenting the data, but could be slower if the files
> are large (as in your case) and relatively few inodes are moved.
>
> Cheers, Andreas
>
>
>
>
>

2014-01-17 13:21:12

by Виталий Филиппов

[permalink] [raw]

Subject: Re: A tool that allows changing inode table sizes

Hi!

Thanks for answering!

> Interesting. I did something years ago for ext2/3 filesystem resizing
> (ext2resize), but that has since become obsolete as the functionality
> was included into e2fsprogs. I'd recommend that you also work to get
> your functionality included into e2fsprogs sooner rather than later.
>
> Ideally this would be part of resize2fs, but I'm not sure it would be
> easily implemented there.

I agree including into e2fsprogs would be the best option! I only
slightly fear the contribution process because I didn't try it
(particularly with this project :)) experience that I've mostly had by
now - contributing to MediaWiki - isn't easy... :(

I've first thought of tune2fs (inode count is an fs option?), but it
seems you're right and resize2fs is more similar in terms of code logic.

Although my main concern about resize2fs is that now it's suited for
just one specific task and as I understand big part of its code flow
will need to be rearranged to do inode table resizing instead of device
resizing... And I don't know how would Theodore, as a e2fsprogs
maintainer, like such a patch. :)

>> Anyone is welcome to test it of course if it's of any interest for you
>> - the source is here
>> http://svn.yourcmc.ru/viewvc.py/vitalif/trunk/ext4-realloc-inodes/
>> ('download tarball') (maybe it would be better to move it into a
>> separate git repo, of course)
>>
>> I didn't test it on a real hard drive yet :-D, only on small fs images
>> with different settings (block, block group, flex_bg size, ext2/3/4,
>> bigalloc and etc). There are even some auto-tests (ran by 'make
>> test').
>
> Note that it is critical to refuse to do anything on filesystems that
> have any feature that your tool doesn't understand. Otherwise, it has
> a good possibility to corrupt the filesystem.

Didn't check it, thanks. As I understand some compatibility checks are
already done by libext2fs, but they're not enough as libext2fs may
support more features than the tool.

Also I have a question - check_block_uninit() and check_inode_uninit()
are copypasted into my tool from libext2fs alloc.c. There's some code in
check_block_uninit() that looks as duplicated with
ext2fs_reserve_super_and_bgd() to me - am I correct?

>> The tools works without problem on all small test images that I've
>> created, though I didn't try to run it on bigger filesystems (of
>> course I'll do it in the nearest future).
>>
>> As this is a highly destructive process that involves overwriting ALL
>> inode numbers in ALL directory entries across the whole filesystem,
>> I've also implemented a simple method of safely applying/rolling back
>> changes. First I've tried to use undo_io_manager, but it appears to be
>> very slow because of frequent commits, which are of course needed for
>> it to be safe.
>
> Would it be possible to speed up undo_io_manager if it had larger IO
> groups or similar? How does the speed of running with undo_io_manager
> compare to running your patch_io_manager doing both a backup and apply?

As I understand undo_io_manager needs to commit each write to TDB
database just before issuing the write request to underlying I/O
manager, because otherwise it may be possible that a block backup is not
really written on disk while the block itself is already overwritten...
So you're correct about larger IO groups - I think the only way to make
it faster is to buffer write requests and do only one commit operation
for many blocks.

About the performance: I only tested it on small images because after
that undo_io code was already removed from my tool. On such images (32M
and 128M) inode table resizing operation is normally finished almost
instantly - as without any undo method, as under patch_io. But the same
operation under undo_io took some couple (maybe tens) of seconds. This
was very slow for such small images, and I didn't run further tests but
instantly decided to implement patch_io... :)

In fact I also think patch_io is better because the idea of writing
modifications to a separate file is initially safer...

>> My method is called patch_io_manager and does a different thing - it
>> does not overwrite the initial FS image, but writes all modified
>> blocks into a separate sparse file + writes a bitmap of modified
>> blocks in the end when it finishes. I.e. the initial filesystem stays
>> unmodified.
>
> This is essentially implementing a journal in userspace for e2fsprogs.
> You could even use the journal file in the filesystem. The journal
> MUST be clean before the inode renumbering, or journal replay will
> corrupt the filesystem after your resize. Does your tool check this?

I've copied a check from resize2fs code - it checks for !EXT2_ERROR_FS
&& EXT2_VALID_FS and suggests running e2fsck if the check fails. Is this
check sufficient to guarantee that the journal is empty?

> That said, there may not be enough space in the journal for full data
> journaling, but it might be enough for logical journaling of the inodes
> to be moved and the directories that need to be updated?

It may be sufficient, but just updating the directory blocks without
moving inode tables and updating block group descriptors and superblock
will also ruin the filesystem... So even if you are able to run inode
number change operation through the journal, it won't really make the
process safer.

>> Then, using e2patch utility (it's in the same repository), you can a)
>> backup the blocks that will be modified into another patch file
>> (e2patch backup <fs> <patch> <backup>) and b) apply the patch to real
>> filesystem. If the applying process gets interrupted (for example by
>> the power outage) it can be restarted from the beginning because it
>> does nothing except just overwriting some blocks.
>
> This is exactly like journal replay.

Overall you're right about the "userspace journal", I've also thought of
using the real journal, but then refused it because a) as you said, the
journal is likely to be too small to hold all inode tables during moving
and b) journal inode may be moved during the process, and sometimes
journal data and extent blocks may also be moved. In the latter case my
tool will also fragment the journal, which is probably bad for
performance (am I correct here?), so I have a TODO item for fixing it...

In fact I think there should be a way to resize inode tables safely only
using the journal - for example: first free inodes/blocks, then shrink
inode tables without moving them, then <strike>haha, exit :D as I
understand it's not mandatory to move inode tables at all</strike> move
them one flex_bg at a time, all using the journal. Or, in case of
growing - move inode tables one flex_bg at a time and grow them after.
But I think it would be harder to implement (is there any journal write
code in libext2fs?) and you'll still have problems if the journal isn't
big enough to hold inode tables for a single flex_bg (although that
should be a very rare case).

One more feature that highly resembles patch_io is LVM snapshots which
I've thought of only after posting my message here :) if they worked
good, they would of course be better and more convenient than patch_io
(for example you can run e2fsck on a writable snapshot and you can't do
it on a 'patched' device). But just after thinking of snapshots, I've
tried to test them by resizing inode tables on that 3 TB hard drive +
LVM snapshot on loopback COW device... and I ended up with freezed
./realloc-inodes process and had to reboot :)

I.e. there was no problem until it started to move inode tables, maybe
it even managed to move some - but then, ./realloc-inodes hanged in 'D'
state (with the system being more or less responsive overall). Details
are in my post to linux-lvm:
http://www.redhat.com/archives/linux-lvm/2014-January/msg00016.html -
but there's no answer until now.

>> And if the FS changes appear to be bad at all, you can restore the
>> backup in a same way. So the process should be safe at least to some
>> extent.
>
> Looks interesting. Of course, I always recommend doing a full backup
> before any operation like this. At that point, it would also be
> possible to just format a new filesystem and copy the data over. That
> has the advantage of also allowing other filesystem features to be
> enabled and defragmenting the data, but could be slower if the files
> are large (as in your case) and relatively few inodes are moved.

As I understand, the resize2fs utility also isn't totally safe [in case
of an interrupt]?

2014-02-27 16:35:30

by Phillip Susi

[permalink] [raw]

Subject: Re: A tool that allows changing inode table sizes

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Wow! That is quite a project, and this patch manager sounds very
nice. Good work!

On 1/15/2014 8:28 AM, [email protected] wrote:
> Hi all!
>
> As I understand it was a well-known fact that ext2/3/4 does not
> allow changing inode table size without recreating the filesystem.
> And I didn't have any experience in linux filesystem internals
> until recently, when I've discovered that inode tables take 45 GB
> on one of my hard drives (3 TB in size) :-):-) that hard drive is,
> of course, full of movies, not 16Kb files, so the inode tables are
> almost 100% unused.
>
> So, I've thought it would be good if it it would possible to
> change inode table sizes. So I've written a tool that in fact
> allows to do it, and I want to present it to the community! :)
>
> Anyone is welcome to test it of course if it's of any interest for
> you - the source is here
> http://svn.yourcmc.ru/viewvc.py/vitalif/trunk/ext4-realloc-inodes/
> ('download tarball') (maybe it would be better to move it into a
> separate git repo, of course)
>
> I didn't test it on a real hard drive yet :-D, only on small fs
> images with different settings (block, block group, flex_bg size,
> ext2/3/4, bigalloc and etc). There are even some auto-tests (ran by
> 'make test'). The tools works without problem on all small test
> images that I've created, though I didn't try to run it on bigger
> filesystems (of course I'll do it in the nearest future).
>
> As this is a highly destructive process that involves overwriting
> ALL inode numbers in ALL directory entries across the whole
> filesystem, I've also implemented a simple method of safely
> applying/rolling back changes. First I've tried to use
> undo_io_manager, but it appears to be very slow because of frequent
> commits, which are of course needed for it to be safe. My method is
> called patch_io_manager and does a different thing - it does not
> overwrite the initial FS image, but writes all modified blocks into
> a separate sparse file + writes a bitmap of modified blocks in the
> end when it finishes. I.e. the initial filesystem stays
> unmodified.
>
> Then, using e2patch utility (it's in the same repository), you can
> a) backup the blocks that will be modified into another patch file
> (e2patch backup <fs> <patch> <backup>) and b) apply the patch to
> real filesystem. If the applying process gets interrupted (for
> example by the power outage) it can be restarted from the beginning
> because it does nothing except just overwriting some blocks. And if
> the FS changes appear to be bad at all, you can restore the backup
> in a same way. So the process should be safe at least to some
> extent.
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTD2lQAAoJEI5FoCIzSKrwrigH/jNowCyOlpQSzJZhRr6GH6GS
2R9o5Y5xyAD45EzuLHGNQVJ0kWo+nK88SDhn1cO4ZmHrpwEEZXo1g4EaPXglTXaw
LhV3/Nexw83dB6JbfIff7ko4b6IgIBtugRPuvSuWPxpGg8+3QuXKE89DzfPbC0SI
46KiT94QsJOVdtWYlZ91klJsswMW80VOVUm+EJXz+A+E1/HnSEe/ytwsV7nIaVEq
Xq/hiQ6sYvYEpOmLXLOL10VnHlvzzEqgFG2Q7AttcyUzUw8igkpXqBu6wO265uO8
ENgWJrjMKaSKpE4JqZiaXJvuke7hR7luW28mY5qydlLnvcW2IH/cN6eGgZGUhWc=
=nyEa
-----END PGP SIGNATURE-----

2014-02-27 21:12:07

by Виталий Филиппов

[permalink] [raw]

Subject: Re: A tool that allows changing inode table sizes

> Wow! That is quite a project, and this patch manager sounds very
> nice. Good work!

Thanks :)

Since that initial post I've also implemented a simple 'patchbd' kernel
module - http://svn.yourcmc.ru/viewvc.py/vitalif/trunk/sftl/patchbd.c
(don't look at sftl*, it's some old academic try to implement a software
FTL); it now has different format of patch files, so patch_io_manager is
also rewritten... It's still quite a hack of course: block maps for
example are logically stored twice - once explicitly and once in the
underlying FS extent tree... But it's working, opposed to an LVM snapshot
than made my system hang when being actively written; it's fast since
there are no additional kernel threads or work queues (except loop thread
of course), it just proxies bio's to other devices... and you can finally
mount and test a 'patched' block device... :)

Also I've done testing on a real harddrive (that 3TB one) and actually
fixed one bug after that testing :) also I've left the patch block device
mounted for a ~week, ... and the patch file had grown up to 20GB. Then
I've applied it with e2patch. Everything went OK. :)

What I'll do next is trying to actually port it into resize2fs...

--
With best regards,
Vitaliy Filippov

2016-09-25 21:32:03

by Виталий Филиппов

[permalink] [raw]

Subject: Re: A tool that allows changing inode table sizes

Hi everyone!

(After > 2.5 years have passed :)!)

At last, I've integrated my inode table resize tool into resize2fs. The
code is not very clean yet so I'm not sending it here, but it seems it
works on testcases which I've used to test my previous tool. It also
includes patch_io manager as `-T` option for e2fsck and resize2fs (in
separate commits, of course).

It can even change inode count and resize the filesystem simultaneously
(shrink fs + extend inode tables gives some errors now, but I think it
should also be easy to fix and I'll do it shortly).

I'll clean it up and send for review, but at the current point you can
take a look at it here: https://github.com/vitalif/e2fsprogs (better check
a squashed diff of last commits if you want to look at the code... some
lines are rewritten, I'll send it squashed)

My only questions are:
1) now I'm using my own move_inode_tables() instead of move_itables(). I
see there's some strange logic in there - which cases does it handle? Is
it something to handle overlaps during move?
2) interesting point of inode table allocation is that in case of a
bigalloc FS it's done based on per-block (not per-cluster) bitmap during
mke2fs. so, to reproduce that behavior I should either allocate similar
per-block bitmap or force their position by a kind of hack (do not mark
allocated inode tables in bitmap and set inode_table_loc to
inode_table_loc of previous BG + inode_blocks_per_group). the second
works, but the first seems more correct, so should I use that approach?

> On Thu, Jan 16, 2014 at 05:05:45PM -0700, Andreas Dilger wrote:
>>
>> On Jan 15, 2014, at 6:28 AM, [email protected] wrote:
>> > As I understand it was a well-known fact that ext2/3/4 does not allow
>> changing inode table size without recreating the filesystem. And I
>> didn't have any experience in linux filesystem internals until
>> recently, when I've discovered that inode tables take 45 GB on one of
>> my hard drives (3 TB in size) :-):-) that hard drive is, of course,
>> full of movies, not 16Kb files, so the inode tables are almost 100%
>> unused.
>> >
>> > So, I've thought it would be good if it it would possible to change
>> inode table sizes. So I've written a tool that in fact allows to do it,
>> and I want to present it to the community! :)
>>
>> Interesting. I did something years ago for ext2/3 filesystem resizing
>> (ext2resize), but that has since become obsolete as the functionality
>> was included into e2fsprogs. I'd recommend that you also work to get
>> your functionality included into e2fsprogs sooner rather than later.
>>
>> Ideally this would be part of resize2fs, but I'm not sure it would be
>> easily implemented there.
>
> I don't think it would be too difficult, since there's already code to
> move
> blocks and inodes around. I guess the big question is how well does it
> respond
> to having inodes_per_group change?
>
> <shrug>
>
> --D
>>
>> > Anyone is welcome to test it of course if it's of any interest for
>> you - the source is here
>> http://svn.yourcmc.ru/viewvc.py/vitalif/trunk/ext4-realloc-inodes/
>> ('download tarball') (maybe it would be better to move it into a
>> separate git repo, of course)
>> >
>> > I didn't test it on a real hard drive yet :-D, only on small fs
>> images with different settings (block, block group, flex_bg size,
>> ext2/3/4, bigalloc and etc). There are even some auto-tests (ran by
>> 'make test').
>>
>> Note that it is critical to refuse to do anything on filesystems that
>> have any feature that your tool doesn't understand. Otherwise, it has
>> a good possibility to corrupt the filesystem.
>>
>> > The tools works without problem on all small test images that I've
>> created, though I didn't try to run it on bigger filesystems (of course
>> I'll do it in the nearest future).
>> >
>> > As this is a highly destructive process that involves overwriting ALL
>> inode numbers in ALL directory entries across the whole filesystem,
>> I've also implemented a simple method of safely applying/rolling back
>> changes. First I've tried to use undo_io_manager, but it appears to be
>> very slow because of frequent commits, which are of course needed for
>> it to be safe.
>>
>> Would it be possible to speed up undo_io_manager if it had larger IO
>> groups or similar? How does the speed of running with undo_io_manager
>> compare to running your patch_io_manager doing both a backup and apply?
>>
>> > My method is called patch_io_manager and does a different thing - it
>> does not overwrite the initial FS image, but writes all modified blocks
>> into a separate sparse file + writes a bitmap of modified blocks in the
>> end when it finishes. I.e. the initial filesystem stays unmodified.
>>
>> This is essentially implementing a journal in userspace for e2fsprogs.
>> You could even use the journal file in the filesystem. The journal
>> MUST be clean before the inode renumbering, or journal replay will
>> corrupt the filesystem after your resize. Does your tool check this?
>>
>> That said, there may not be enough space in the journal for full data
>> journaling, but it might be enough for logical journaling of the inodes
>> to be moved and the directories that need to be updated?
>>
>> > Then, using e2patch utility (it's in the same repository), you can a)
>> backup the blocks that will be modified into another patch file
>> (e2patch backup <fs> <patch> <backup>) and b) apply the patch to real
>> filesystem. If the applying process gets interrupted (for example by
>> the power outage) it can be restarted from the beginning because it
>> does nothing except just overwriting some blocks.
>>
>> This is exactly like journal replay.
>>
>> > And if the FS changes appear to be bad at all, you can restore the
>> backup in a same way. So the process should be safe at least to some
>> extent.
>>
>> Looks interesting. Of course, I always recommend doing a full backup
>> before any operation like this. At that point, it would also be
>> possible to just format a new filesystem and copy the data over. That
>> has the advantage of also allowing other filesystem features to be
>> enabled and defragmenting the data, but could be slower if the files
>> are large (as in your case) and relatively few inodes are moved.
>>
>> Cheers, Andreas

--
With best regards,
Vitaliy Filippov

2016-09-28 12:14:54

by Виталий Филиппов

[permalink] [raw]

Subject: Re: A tool that allows changing inode table sizes

Hi!

I want to send a patch for e2fsprogs - I've implemented inode table
resizing in resize2fs.

To what mailing list should I send it?

I've written something to linux-ext4, but got no answer...

--
With best regards,
Vitaliy Filippov

2016-09-28 14:46:52

by Andreas Dilger

[permalink] [raw]

Subject: Re: A tool that allows changing inode table sizes

On Sep 28, 2016, at 6:14 AM, [email protected] wrote:
>
> Hi!
>
> I want to send a patch for e2fsprogs - I've implemented inode table resizing in resize2fs.
>
> To what mailing list should I send it?
>
> I've written something to linux-ext4, but got no answer...

linux-ext4 is the right list.

Cheers, Andreas

Attachments:

signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2017-01-22 09:13:19

by Виталий Филиппов

[permalink] [raw]

Subject: Re: A tool that allows changing inode table sizes

Hi again,

I've got no answer since this email...

But I still would like to have my patch merged. What should I do now?

> Ok, thank you :-)
>
> 28 сентября 2016 г. 20:37:37 GMT+03:00, Andreas Dilger
> <[email protected]> пишет:
>> On Sep 28, 2016, at 10:00 AM, Виталий Филиппов <[email protected]>
>> wrote:
>>> Thanks, did you see that email? Or was it lost somewhere in between?
>>
>> I did, but just didn't reply since resize2fs and bigalloc are Ted's
>> area.
>> He is often travelling, so it may take some time for him to reply.
>>
>> Cheers, Andreas
>>
>>> 28 сентября 2016 г. 17:46:52 GMT+03:00, Andreas Dilger
>> <[email protected]> пишет:
>>> On Sep 28, 2016, at 6:14 AM, [email protected] wrote:
>>> Hi!
>>> I want to send a patch for e2fsprogs - I've implemented inode table
>> resizing in resize2fs.
>>> To what mailing list should I send it?
>>> I've written something to linux-ext4, but got no answer...
>>> linux-ext4 is the right list.
>>> Cheers, Andreas