2008-11-01 19:10:08

by Michael B. Trausch

[permalink] [raw]
Subject: "tune2fs -I 256" runtime---can it be interrupted safely?

Hello,

I have a drive (a 250 GB external USB hard drive) which I wanted to
convert to 256 byte inodes and start using as an ext4 drive (currently
I use it as ext3). However, I started the tune2fs operation on 01 Nov
2008 02:27:34 -0400, and it is currently nearly 13 hours later, and the
operation is _still_ going at 100% of one core and operating on the
drive.

Three questions:

* Is this normal?
* How long can I expect this process to continue?
* Can it be (safely) interrupted if it will continue for a long time?

I converted my home directory to ext4 without issue and so I (stupidly,
I admit) did not take a backup of this 250 GB drive's contents before I
started the conversion process. Oops. Lesson learned. My thinking at
this point is that it would take _far_ less time to interrupt, backup,
and just use mkfs.ext4 on the drive and then restore (about 2 hours
instead of an unknown quantity which is already nearing 13 hours).

Thanks,
Mike Trausch

--
My sigfile ran away and is on hiatus.
http://www.trausch.us/


Attachments:
signature.asc (197.00 B)

2008-11-01 23:46:55

by Michael B. Trausch

[permalink] [raw]
Subject: Re: "tune2fs -I 256" runtime---can it be interrupted safely?

On Sat, 1 Nov 2008 15:10:02 -0400
"Michael B. Trausch" <[email protected]> wrote:
>
> I converted my home directory to ext4 without issue and so I
> (stupidly, I admit) did not take a backup of this 250 GB drive's
> contents before I started the conversion process. Oops. Lesson
> learned. My thinking at this point is that it would take _far_ less
> time to interrupt, backup, and just use mkfs.ext4 on the drive and
> then restore (about 2 hours instead of an unknown quantity which is
> already nearing 13 hours).
>

At 18:01:34 (after 15 hours and change) I decided to give it up and
sent tune2fs a SIGINT. It happily aborted, and fsck appears to have
fixed whatever inconsistencies were introduced during the tune2fs run
and the files on the volume appear to be perfectly intact.

Am going to go the route of simply pulling the data off and using
mkfs.ext4 to create a new filesystem on the device. But, I am still
curious as to why this would run for so long. Some volume statistics,
in case they may be of help in trying to determine why this process was
running for so long; I don't even know how much longer it would have
run for.

Stats from fsck:
651384 inodes used (4.27%)
788 non-contiguous inodes (0.1%)
# of inodes with ind/dind/tind blocks: 27974/1745/8
28464107 blocks used (46.62%)
0 bad blocks
14 large files

553474 regular files
90764 directories
0 character device files
0 block device files
0 fifos
3 links
7136 symbolic links (7093 fast symbolic links)
1 socket
--------
651378 files

Stats from dumpe2fs:
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 95904edb-b501-4c3c-9bec-b9f046b04e3b
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file Filesystem
flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 15269888
Block count: 61049596
Reserved block count: 3052479
Free blocks: 32585489
Free inodes: 14618504
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1009
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 256
Filesystem created: Mon Jul 7 23:09:06 2008
Last mount time: Sat Nov 1 18:20:45 2008
Last write time: Sat Nov 1 18:20:45 2008
Mount count: 1
Maximum mount count: 27
Last checked: Sat Nov 1 18:18:24 2008
Check interval: 15552000 (6 months)
Next check after: Thu Apr 30 18:18:24 2009
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
Default directory hash: tea
Directory Hash Seed: 52aba0f3-6da7-4e1a-b4b3-e8c51e8168fc
Journal backup: inode blocks
Journal size: 128M

--- Mike

--
My sigfile ran away and is on hiatus.
http://www.trausch.us/


Attachments:
signature.asc (197.00 B)

2008-11-07 21:16:51

by Andreas Dilger

[permalink] [raw]
Subject: Re: "tune2fs -I 256" runtime---can it be interrupted safely?

On Nov 01, 2008 19:46 -0400, Michael B. Trausch wrote:
> On Sat, 1 Nov 2008 15:10:02 -0400
> "Michael B. Trausch" <[email protected]> wrote:
> > I converted my home directory to ext4 without issue and so I
> > (stupidly, I admit) did not take a backup of this 250 GB drive's
> > contents before I started the conversion process. Oops. Lesson
> > learned. My thinking at this point is that it would take _far_ less
> > time to interrupt, backup, and just use mkfs.ext4 on the drive and
> > then restore (about 2 hours instead of an unknown quantity which is
> > already nearing 13 hours).
>
> At 18:01:34 (after 15 hours and change) I decided to give it up and
> sent tune2fs a SIGINT. It happily aborted, and fsck appears to have
> fixed whatever inconsistencies were introduced during the tune2fs run
> and the files on the volume appear to be perfectly intact.

That is good news. I would have worried that interrupting this operation
isn't very safe, because it is moving some core filesystem data around.

> Am going to go the route of simply pulling the data off and using
> mkfs.ext4 to create a new filesystem on the device. But, I am still
> curious as to why this would run for so long. Some volume statistics,
> in case they may be of help in trying to determine why this process was
> running for so long; I don't even know how much longer it would have
> run for.

> Inode size: 128

It looks like it did not finish the inode resize operation, for whatever
reason.

If you have time, after the backup is done could you please re-try the
"tune2fs -I 256" operation and let it go for a while, then "ltrace -p {pid}"
on the process from another window to see what it is doing (if anything).
You may need to have the "e2fsprogs-debuginfo" RPM installed to get useful
data there.

After grabbing some of the data with ltrace, also run "strace -p {pid}"
to get disk IO information, and finally "gdb -p {pid}" and grab a
stack trace with "bt", let it continue for a few seconds with "c", and
then CTRL-C again and grab another stack trace with "bt".

I'm assuming the code shouldn't actually have taken 13h to complete, but
is instead confused about something in the filesystem and is looping.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-11-11 16:09:22

by Michael B. Trausch

[permalink] [raw]
Subject: Re: "tune2fs -I 256" runtime---can it be interrupted safely?

On Fri, 07 Nov 2008 14:16:43 -0700
Andreas Dilger <[email protected]> wrote:

> If you have time, after the backup is done could you please re-try the
> "tune2fs -I 256" operation and let it go for a while, then "ltrace -p
> {pid}" on the process from another window to see what it is doing (if
> anything). You may need to have the "e2fsprogs-debuginfo" RPM
> installed to get useful data there.
>
> After grabbing some of the data with ltrace, also run "strace -p
> {pid}" to get disk IO information, and finally "gdb -p {pid}" and
> grab a stack trace with "bt", let it continue for a few seconds with
> "c", and then CTRL-C again and grab another stack trace with "bt".

Ugh, I forgot to do the gdb backtraces. *headdesk* I will do another
run with them later.

I do have an ltrace up and online at:

http://fd0man.theunixplace.com/tune2fs.ltrace.bz2

The bzip2 file is 44 MB, and the uncompressed text is _huge_ (about 3
GB). This is a two hour run of "tune2fs -I 256 /dev/sdb" on an
external 250 GB (235 GiB) drive formatted with the ext3 filesystem.
I aborted the tune2fs run after 2 hours, though I am sure it would have
run for far longer. Hopefully, this will give some insight as to what
the problem might be.

I _was_ hoping that it would make a difference in runtime that the
filesystem was empty (I did this right after "mkfs.ext3 -I
128 /dev/sdb"), but it was still going very slowly...

--- Mike

--
My sigfile ran away and is on hiatus.
http://www.trausch.us/


Attachments:
signature.asc (197.00 B)

2008-11-14 08:30:19

by Andreas Dilger

[permalink] [raw]
Subject: Re: "tune2fs -I 256" runtime---can it be interrupted safely?

On Nov 11, 2008 11:09 -0500, Michael B. Trausch wrote:
> I do have an ltrace up and online at:
>
> http://fd0man.theunixplace.com/tune2fs.ltrace.bz2
>
> The bzip2 file is 44 MB, and the uncompressed text is _huge_ (about 3
> GB). This is a two hour run of "tune2fs -I 256 /dev/sdb" on an
> external 250 GB (235 GiB) drive formatted with the ext3 filesystem.
> I aborted the tune2fs run after 2 hours, though I am sure it would have
> run for far longer. Hopefully, this will give some insight as to what
> the problem might be.

It looks like the code is just spending hours looking at the bitmaps,
but I have no idea why:

ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e1, 1, 0, 0x1f418b0) = 0
ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e2, 2, 1, 0x1f418b0) = 0
ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e3, 4, 2, 0x1f418b0) = 0
ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e4, 8, 3, 0x1f418b0) = 0
ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e5, 16, 4, 0x1f418b0) = 0
ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e6, 32, 5, 0x1f418b0) = 0
ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e7, 64, 6, 0x1f418b0) = 0
ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e8, 128, 7, 0x1f418b0) = 0


Possibly the code is very inefficient and is scanning every bit in
the bitmap individually to see if an inode is in use, instead of doing
something fast like just checking an entire bitmap word == 0 and
skipping it.

Unfortunately, ltrace doesn't show what is calling this code, so
a stack trace is needed from GDB.



Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-11-14 13:09:24

by Andreas Schultz

[permalink] [raw]
Subject: Re: "tune2fs -I 256" runtime---can it be interrupted safely?

Hi,

[...]

> Unfortunately, ltrace doesn't show what is calling this code, so
> a stack trace is needed from GDB.

Got the same problem, here is a GDB backtrace done on Ubuntu Intrepid, e2fsprogs 1.41.3:

(gdb) bt
#0 0x00007fa619228bcc in ext2fs_new_block (fs=0x1fea860, goal=263012, map=0x1feaa30, ret=0x7fff21668cac) at /build/buildd/e2fsprogs-1.41.3/lib/ext2fs/alloc.c:166
#1 0x000000000040408d in resize_inode (fs=0x1fea860, new_size=<value optimized out>) at /build/buildd/e2fsprogs-1.41.3/misc/tune2fs.c:1020
#2 0x0000000000404d83 in main (argc=<value optimized out>, argv=<value optimized out>) at /build/buildd/e2fsprogs-1.41.3/misc/tune2fs.c:1638

Cheers
Andreas


Attachments:
(No filename) (676.00 B)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-11-14 21:06:41

by Andreas Dilger

[permalink] [raw]
Subject: Re: "tune2fs -I 256" runtime---can it be interrupted safely?

On Nov 14, 2008 13:49 +0100, Andreas Schultz wrote:
> > Unfortunately, ltrace doesn't show what is calling this code, so
> > a stack trace is needed from GDB.
>
> Got the same problem, here is a GDB backtrace done on Ubuntu Intrepid,
> e2fsprogs 1.41.3:
>
> (gdb) bt
> #0 0x00007fa619228bcc in ext2fs_new_block (fs=0x1fea860, goal=263012, map=0x1feaa30, ret=0x7fff21668cac) at /build/buildd/e2fsprogs-1.41.3/lib/ext2fs/alloc.c:166
> #1 0x000000000040408d in resize_inode (fs=0x1fea860, new_size=<value optimized out>) at /build/buildd/e2fsprogs-1.41.3/misc/tune2fs.c:1020
> #2 0x0000000000404d83 in main (argc=<value optimized out>, argv=<value optimized out>) at /build/buildd/e2fsprogs-1.41.3/misc/tune2fs.c:1638

Ah, it is trying to find free blocks, but the e2fsprogs allocator is very
inefficient, IIRC, doing a linear scan of the filesystem. We probably
would be far better off to generate an RB tree of the block maps so that
it is easier to work with lists of blocks that need to be moved or marked
in use for the resized inode table.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-11-15 05:50:24

by Theodore Ts'o

[permalink] [raw]
Subject: Re: "tune2fs -I 256" runtime---can it be interrupted safely?

On Fri, Nov 14, 2008 at 02:06:12PM -0700, Andreas Dilger wrote:
>
> Ah, it is trying to find free blocks, but the e2fsprogs allocator is very
> inefficient, IIRC, doing a linear scan of the filesystem. We probably
> would be far better off to generate an RB tree of the block maps so that
> it is easier to work with lists of blocks that need to be moved or marked
> in use for the resized inode table.

We eventually could use a more sophisticated block allocator in the
ext2fs userspace library, but there's a quick fix should do the job
without having to do something more complicated. The problem is that
the goal block is always reset to blk, which makes this an O(n**2)
operation in the number of in-use blocks in the filesystem. This
following patch makes it be O(n), which should make it be a marked
improvement.

Running a profile analysis on tune2fs, there's another O(n*m)
operation in transalate_block() (yes, the function name is mispelled),
where n is the number of blocks that needs to be moved, and m is the
number of in-use blocks in the filesystems.

Fixing both of these should remove most of the user-space time that's
being burned by tune2fs -I 256. There is still some CPU burned by the
undo file, and indeed we can probably speed this up some more by
omitting creating an undo entry when we're copying a data block to a
previously unused block. But hopefully this should be enough to speed
up "tune2fs -I 256" for most people.

- Ted

commit 27c6de45a4187a348ec0960472d4a113ee6ea425
Author: Theodore Ts'o <[email protected]>
Date: Sat Nov 15 00:32:39 2008 -0500

tune2fs: Fix inefficient O(n**2) algorithms when expanding the inode size

When running "tune2fs -I 256" on moderate to large filesystems, the
time required to run tune2fs can take many hours (20+ before some
users gave up in disgust). This was due to some O(n**2) and O(n*m)
algorithms in move_block() and inode_scan_and_fix(), respectively.

Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index b29b344..e72518a 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -1011,13 +1011,13 @@ static int move_block(ext2_filsys fs, ext2fs_block_bitmap bmap)
if (retval)
return retval;

- for (blk = fs->super->s_first_data_block;
- blk < fs->super->s_blocks_count; blk++) {
+ for (new_blk = blk = fs->super->s_first_data_block;
+ blk < fs->super->s_blocks_count; blk++) {

if (!ext2fs_test_block_bitmap(bmap, blk))
continue;

- retval = ext2fs_new_block(fs, blk, NULL, &new_blk);
+ retval = ext2fs_new_block(fs, new_blk, NULL, &new_blk);
if (retval)
goto err_out;

@@ -1068,12 +1068,14 @@ static int process_block(ext2_filsys fs EXT2FS_ATTR((unused)),
e2_blkcnt_t blockcnt EXT2FS_ATTR((unused)),
blk_t ref_block EXT2FS_ATTR((unused)),
int ref_offset EXT2FS_ATTR((unused)),
- void *priv_data EXT2FS_ATTR((unused)))
+ void *priv_data)
{
int ret = 0;
blk_t new_blk;
+ ext2fs_block_bitmap bmap = (ext2fs_block_bitmap) priv_data;

-
+ if (!ext2fs_test_block_bitmap(bmap, *block_nr))
+ return 0;
new_blk = transalate_block(*block_nr);
if (new_blk) {
*block_nr = new_blk;
@@ -1086,7 +1088,7 @@ static int process_block(ext2_filsys fs EXT2FS_ATTR((unused)),
return ret;
}

-static int inode_scan_and_fix(ext2_filsys fs)
+static int inode_scan_and_fix(ext2_filsys fs, ext2fs_block_bitmap bmap)
{
errcode_t retval = 0;
ext2_ino_t ino;
@@ -1122,8 +1124,8 @@ static int inode_scan_and_fix(ext2_filsys fs)
* Do we need to fix this ??
*/

- if (inode.i_file_acl) {
-
+ if (inode.i_file_acl &&
+ ext2fs_test_block_bitmap(bmap, inode.i_file_acl)) {
blk = transalate_block(inode.i_file_acl);
if (!blk)
continue;
@@ -1142,9 +1144,8 @@ static int inode_scan_and_fix(ext2_filsys fs)
if (!ext2fs_inode_has_valid_blocks(&inode))
continue;

- retval = ext2fs_block_iterate2(fs, ino, 0,
- block_buf, process_block,
- 0);
+ retval = ext2fs_block_iterate2(fs, ino, 0, block_buf,
+ process_block, bmap);
if (retval)
goto err_out;

@@ -1344,7 +1345,7 @@ static int resize_inode(ext2_filsys fs, unsigned long new_size)
if (retval)
goto err_out;

- retval = inode_scan_and_fix(fs);
+ retval = inode_scan_and_fix(fs, bmap);
if (retval)
goto err_out;



2008-11-16 12:11:42

by Andreas Schultz

[permalink] [raw]
Subject: Re: "tune2fs -I 256" runtime---can it be interrupted safely?

Hi,

On Saturday 15 November 2008 06:50:17 Theodore Tso wrote:

> Fixing both of these should remove most of the user-space time that's
> being burned by tune2fs -I 256. There is still some CPU burned by the
> undo file, and indeed we can probably speed this up some more by
> omitting creating an undo entry when we're copying a data block to a
> previously unused block. But hopefully this should be enough to speed
> up "tune2fs -I 256" for most people.

it does speed up things, but it is still quite slow. I tried it on a 250GB partition which is about 86% filled:

[email protected]:~# df -i /usr/src2/
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sdb1 30523392 1401574 29121818 5% /usr/src2
[email protected]:~# df /usr/src2/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb1 240308484 196091320 32010176 86% /usr/src2
[email protected]:~# umount /usr/src2
[email protected]:~# time tune2fs -I 256 /dev/sdb1
tune2fs 1.41.3 (12-Oct-2008)
Setting inode size 256

real 64m31.189s
user 43m46.708s
sys 1m50.627s

64min for 250GB seems to be to much for the average user.

From watching the disc activity and the output of 'top', it seems that the operation periodically consumes 100%CPU for extended periods without any disc activity, followed by a brief period of about 20%CPU load and light disc activity.

Andreas


Attachments:
(No filename) (1.37 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments