It's taken way too long, but I've finally finished integrating the
64-bit patches into e2fsprogs's mainline repository. All of the
necessary patches should now be in the master branch for e2fsprogs.
The big change from before is that I replaced Val's changes for fixing
up how mke2fs picked the correct fs-type profile from mke2fs.conf with
something that I think works much better and leaves the code much
cleaner. With this change you need to add the following to your
/etc/mke2fs.conf file if you want to enable the 64-bit feature flag
automatically for a big disk:
[fs_types]
ext4 = {
features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
auto_64-bit_support = 1 # <---- add this line
inode_size = 256
}
Alternatively you can change the features line to include the feature
"64bit"; this will force the use of the 64-bit fields, and double the
size of the block group descriptors, even for smaller file systems that
don't require the 64-bit support. (This was one of my problems with
Val's implementation; it forced the mke2fs.conf file to always enable
the 64-bit feature flag, which would cause backwards compatibility
issues.) This might be a good thing to do for debugging purposes,
though, so this is an option which I left open, but the better way of
doing things is to use the auto_64-bit-support flag.
Should the default for auto_64-bit-support be on or off? For now I've
left it to be defaulted to "off", on the theory that it might be useful
for distributions that aren't quite ready to enable 64-bit support until
we do a lot more testing. But I may very well change this default
before 1.42 ships, on the theory that people who want to disable this
just ship an edited mke2fs.conf file. (Users can always explicit
request 64bit support by using "mke2fs -O 64bit", of course.) Comments
on this would be appreciated.
The other support which I've added into mke2fs.conf handling is I've
added two additional automatically selected fs-types, which work like
"floppy" and "small". These are "big" which is automatically selected
for filesystems >= 4TB, and and "huge" which is elected for filesystems
>= 16TB. I'm not 100% sure this will be useful, but it seemed like
it might be useful to have these. Again, comments appreciated it; the
names and the cutoff points may change before the 1.42 release.
What are things that are still left to be done before we 64-bit support
is completely supported? Just a few things:
* Currently the badblocks list mechanism only supports 32-bit blocks.
This may be OK, since running "badblocks" on a really large disk is
probably a fool's errand. But how we handle this is an open question;
should we just refuse "mke2fs -c" or "e2fsck -c" for really big file
systems? Should we deprecate the badblocks inode altogether?
* The online resizing code, which relies on using a resize inode and
indirect blocks, will not scale to 64-bit filesystems. We have the
beginnings of support for the "meta_bg" style of resizing, which is
supported by the kernel and the e2fsprogs code --- but it hasn't been
implemented in the kernel yet. We need to add that.
As a related note, currently the online resizing code doesn't
understand about flex_bg, so the filesystem layout for filesystems
which are grown using online resizing is definitely not optimized for
flex_bg. This is something that we would probably want to fix at the
same time, since it means adding a new ioctl interface between the
kernel and the resize2fs program.
- Ted
Theodore Ts'o wrote:
...
> What are things that are still left to be done before we 64-bit support
> is completely supported? Just a few things:
>
> * Currently the badblocks list mechanism only supports 32-bit blocks.
> This may be OK, since running "badblocks" on a really large disk is
> probably a fool's errand. But how we handle this is an open question;
> should we just refuse "mke2fs -c" or "e2fsck -c" for really big file
> systems? Should we deprecate the badblocks inode altogether?
>
> * The online resizing code, which relies on using a resize inode and
> indirect blocks, will not scale to 64-bit filesystems. We have the
> beginnings of support for the "meta_bg" style of resizing, which is
> supported by the kernel and the e2fsprogs code --- but it hasn't been
> implemented in the kernel yet. We need to add that.
* Lazy inode table initialization so that large fs mkfs time is acceptable.
-Eric
On 06/14/2010 09:39 AM, Theodore Ts'o wrote:
> It's taken way too long, but I've finally finished integrating the
> 64-bit patches into e2fsprogs's mainline repository. All of the
> necessary patches should now be in the master branch for e2fsprogs.
>
> The big change from before is that I replaced Val's changes for fixing
> up how mke2fs picked the correct fs-type profile from mke2fs.conf with
> something that I think works much better and leaves the code much
> cleaner. With this change you need to add the following to your
> /etc/mke2fs.conf file if you want to enable the 64-bit feature flag
> automatically for a big disk:
>
> [fs_types]
> ext4 = {
> features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
> auto_64-bit_support = 1 #<---- add this line
> inode_size = 256
> }
>
> Alternatively you can change the features line to include the feature
> "64bit"; this will force the use of the 64-bit fields, and double the
> size of the block group descriptors, even for smaller file systems that
> don't require the 64-bit support. (This was one of my problems with
> Val's implementation; it forced the mke2fs.conf file to always enable
> the 64-bit feature flag, which would cause backwards compatibility
> issues.) This might be a good thing to do for debugging purposes,
> though, so this is an option which I left open, but the better way of
> doing things is to use the auto_64-bit-support flag.
>
> Should the default for auto_64-bit-support be on or off? For now I've
> left it to be defaulted to "off", on the theory that it might be useful
> for distributions that aren't quite ready to enable 64-bit support until
> we do a lot more testing. But I may very well change this default
> before 1.42 ships, on the theory that people who want to disable this
> just ship an edited mke2fs.conf file. (Users can always explicit
> request 64bit support by using "mke2fs -O 64bit", of course.) Comments
> on this would be appreciated.
>
> The other support which I've added into mke2fs.conf handling is I've
> added two additional automatically selected fs-types, which work like
> "floppy" and "small". These are "big" which is automatically selected
> for filesystems>= 4TB, and and "huge" which is elected for filesystems
>
>> = 16TB. I'm not 100% sure this will be useful, but it seemed like
>>
> it might be useful to have these. Again, comments appreciated it; the
> names and the cutoff points may change before the 1.42 release.
>
> What are things that are still left to be done before we 64-bit support
> is completely supported? Just a few things:
>
> * Currently the badblocks list mechanism only supports 32-bit blocks.
> This may be OK, since running "badblocks" on a really large disk is
> probably a fool's errand. But how we handle this is an open question;
> should we just refuse "mke2fs -c" or "e2fsck -c" for really big file
> systems? Should we deprecate the badblocks inode altogether?
>
I think that badblocks is pretty much a legacy item at this point.
Certainly not common on really large devices which are almost always
RAID'ed in some form.
Thanks!
Ric
> * The online resizing code, which relies on using a resize inode and
> indirect blocks, will not scale to 64-bit filesystems. We have the
> beginnings of support for the "meta_bg" style of resizing, which is
> supported by the kernel and the e2fsprogs code --- but it hasn't been
> implemented in the kernel yet. We need to add that.
>
> As a related note, currently the online resizing code doesn't
> understand about flex_bg, so the filesystem layout for filesystems
> which are grown using online resizing is definitely not optimized for
> flex_bg. This is something that we would probably want to fix at the
> same time, since it means adding a new ioctl interface between the
> kernel and the resize2fs program.
>
> - Ted
>
On 06/14/2010 08:39 AM, Theodore Ts'o wrote:
> It's taken way too long, but I've finally finished integrating the
> 64-bit patches into e2fsprogs's mainline repository. All of the
> necessary patches should now be in the master branch for e2fsprogs.
FWIW, this:
commit cf828f1a72ec1eb0c1e819307137879447c909b7
Author: Theodore Ts'o <[email protected]>
Date: Sun Oct 25 21:46:01 2009 -0400
libext2fs: Byte-swap 64-bit block group descriptors
Signed-off-by: "Theodore Ts'o" <[email protected]>
is blowing up all over on ppc, with glibc-detected memory problems like:
Running e2fsprogs test suite...
d_loaddump: debugfs load/dump test: *** glibc detected ***
../debugfs/debugfs: free(): invalid next size (fast): 0x000001001bb03130 ***
======= Backtrace: =========
/lib64/libc.so.6[0x803da3f744]
../debugfs/debugfs[0x100213e4]
../debugfs/debugfs[0x100037c8]
../debugfs/debugfs[0x100040b4]
/lib64/libc.so.6[0x803d9dbc78]
/lib64/libc.so.6(__libc_start_main-0x184e60)[0x803d9dbe70]
======= Memory map: ========
10000000-10040000 r-xp 00000000 fd:00 1443282
/root/e2fsprogs/debugfs/debugfs
10040000-10050000 rw-p 00040000 fd:00 1443282
/root/e2fsprogs/debugfs/debugfs
803d920000-803d950000 r-xp 00000000 fd:00 134600
/lib64/ld-2.12.so
803d950000-803d960000 r--p 00020000 fd:00 134600
/lib64/ld-2.12.so
803d960000-803d970000 rw-p 00030000 fd:00 134600
/lib64/ld-2.12.so
803d990000-803db50000 r-xp 00000000 fd:00 134601
/lib64/libc-2.12.so
803db50000-803db60000 r--p 001b0000 fd:00 134601
/lib64/libc-2.12.so
803db60000-803db70000 rw-p 001c0000 fd:00 134601
/lib64/libc-2.12.so
803db70000-803db80000 rw-p 00000000 00:00 0
803db80000-803db90000 r-xp 00000000 fd:00 134602
/lib64/libdl-2.12.so
803db90000-803dba0000 r--p 00000000 fd:00 134602
/lib64/libdl-2.12.so
803dba0000-803dbb0000 rw-p 00010000 fd:00 134602
/lib64/libdl-2.12.so
803dbb0000-803dbd0000 r-xp 00000000 fd:00 134613
/lib64/libpthread-2.12.so
803dbd0000-803dbe0000 r--p 00010000 fd:00 134613
/lib64/libpthread-2.12.so
803dbe0000-803dbf0000 rw-p 00020000 fd:00 134613
/lib64/libpthread-2.12.so
1001bb00000-1001bb30000 rw-p 00000000 00:00 0
[heap]
fffab260000-fffab280000 r-xp 00000000 00:00 0
[vdso]
ffffb920000-ffffba70000 rw-p 00000000 00:00 0
[stack]
./d_loaddump/script: line 22: 2743 Aborted (core
dumped) $DEBUGFS -R "write $TEST_DATA test_data" -w $TMPFILE >> $OUT.new
2>&1
On Mon, Jun 14, 2010 at 03:15:40PM -0500, Eric Sandeen wrote:
> On 06/14/2010 08:39 AM, Theodore Ts'o wrote:
> > It's taken way too long, but I've finally finished integrating the
> > 64-bit patches into e2fsprogs's mainline repository. All of the
> > necessary patches should now be in the master branch for e2fsprogs.
>
> FWIW, this:
>
> commit cf828f1a72ec1eb0c1e819307137879447c909b7
> Author: Theodore Ts'o <[email protected]>
> Date: Sun Oct 25 21:46:01 2009 -0400
>
> libext2fs: Byte-swap 64-bit block group descriptors
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> is blowing up all over on ppc, with glibc-detected memory problems like:
I took a quick look at this patch and saw one obvious thing:
diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
index 52f56c0..7b325a1 100644
--- a/lib/ext2fs/openfs.c
+++ b/lib/ext2fs/openfs.c
@@ -322,7 +322,7 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
#ifdef WORDS_BIGENDIAN
gdp = (struct ext2_group_desc *) dest;
for (j=0; j < groups_per_block*first_meta_bg; j++)
- ext2fs_swap_group_desc(gdp++);
+ ext2fs_swap_group_desc2(fs, gdp++);
#endif
dest += fs->blocksize*first_meta_bg;
}
@@ -332,9 +332,11 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
if (retval)
goto cleanup;
#ifdef WORDS_BIGENDIAN
- gdp = (struct ext2_group_desc *) dest;
- for (j=0; j < groups_per_block; j++)
- ext2fs_swap_group_desc(gdp++);
+ for (j=0; j < groups_per_block; j++) {
+ /* The below happens to work... be careful. */
+ gdp = ext2fs_group_desc(fs, blk, j);
+ ext2fs_swap_group_desc2(fs, gdp);
+ }
#endif
dest += fs->blocksize;
}
I think the first hunk should use the same code as the second hunk -
the first bit is always incrementing by the size of struct
ext2_group_desc, when it needs to increment by the size of struct
ext4_group_desc on 64-bit file systems. ext2fs_group_desc() does the
right thing.
Also, there's a teensy bit of whitespace damage in the second hunk in
csum.c.
Looks like there's a lot of low-hanging fruit just compiling for
big-endian.
-VAL
Hi Ted,
Resize seems not work when the size is bigger than 16TB (offline resize).
My test machine:
x64 platform 2.6.32 kernel + this newest patch
1. <16TB ext4 enlarge to >16TB (offline)
a. I use "8 x 2TB WD disks" and "mdadm" build linear raid
b. then use mkfs.ext4 to make ext4 file system
c. grow the linear raid to "10 X 2TB"
d. finally it grow to "2.X TB" smaller than before
2. >16TB offline resize, the steps is similiar as before.
a. I use "9 x 2TB WD disks" build linear raid
b. mkfs.ext4 and not mount
c. grow the linear raid to "10 X 2TB"
d. do resize
e. finally it grow to "2.X TB" smaller than before
*. Base on my trace, seems the "EXT4_FEATURE_INCOMPAT_64BIT"
not on when mkfs.ext4. So the new_size is wrong when do "resize",
*. When do resize, "EXT2_FLAG_64BITS" not add to fs->flags.
So when execute "ext2fs_allocate_block_bitmap" function in resize process,
it won't go to 64bit check path.
*. And in "adjust_fs_info" function, I think should modify the following code
- fs->desc_blocks = ext2fs_div_ceil(fs->group_desc_count,
+ fs->desc_blocks = ext2fs_div64_ceil(fs->group_desc_count,
EXT2_DESC_PER_BLOCK(fs->super));
I try to on "EXT4_FEATURE_INCOMPAT_64BIT" and "EXT2_FLAG_64BITS"
when mkfs and resize.
And modify ext2fs_div_ceil code to ext2fs_div64_ceil.
It seems work something, the fs size isn't grow but also not deduce,
remain the same.
If you have any new patch, I can help to test. Thanks.
-HsuanTing
Best Regards,
Sigh, it seems I was unsubscribed a while ago from linux-fsdevel and linux-ext4 for some reason. I was CC'd on enough emails that I didn't notice it until the call today.
On 06/14/2010 09:39 AM, Theodore Ts'o wrote:
> With this change you need to add the following to your
> /etc/mke2fs.conf file if you want to enable the 64-bit feature flag automatically for a big disk:
>
> [fs_types]
> ext4 = {
> features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
> auto_64-bit_support = 1 #<---- add this line
It is kind of awkward to have both underscores and hyphens in the same parameter. What about just using "auto_64bit_support", since the feature is also called "64bit" and not "64-bit"?
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
On Mon, Jun 21, 2010 at 09:44:31PM +0800, Hsuan-Ting wrote:
> Hi Ted,
>
> Resize seems not work when the size is bigger than 16TB (offline resize).
>
> My test machine:
> x64 platform 2.6.32 kernel + this newest patch
>
> 1. <16TB ext4 enlarge to >16TB (offline)
> a. I use "8 x 2TB WD disks" and "mdadm" build linear raid
> b. then use mkfs.ext4 to make ext4 file system
> c. grow the linear raid to "10 X 2TB"
> d. finally it grow to "2.X TB" smaller than before
This doesn't surprise me. We should add some checks to simply not
allow the file system growing greater than 16TB if the 64-bit feature
is not set for now. Making this work is going to be tricky, because
enabling the 64-bit feature doubles the size of the block group
descriptors, which means we have to make room for them. This could
involve moving files out of the way, as well as moving the inode
table.
This means that we may want to enable the 64-bit feature flag if there
is an expectation that the filesystem might be grown to a size large
enough where this would be an issue.
> 2. >16TB offline resize, the steps is similiar as before.
> a. I use "9 x 2TB WD disks" build linear raid
> b. mkfs.ext4 and not mount
> c. grow the linear raid to "10 X 2TB"
> d. do resize
> e. finally it grow to "2.X TB" smaller than before
>
> I try to on "EXT4_FEATURE_INCOMPAT_64BIT" and "EXT2_FLAG_64BITS"
> when mkfs and resize.
> And modify ext2fs_div_ceil code to ext2fs_div64_ceil.
> It seems work something, the fs size isn't grow but also not deduce,
> remain the same.
I'm not sure I understand that last sentence; it's not parsing as an
understandable English sentence, sorry. Are you saying that both
attempts to grow and shrink the filesystem is failing? If so, how?
Are you getting an error message? Is it appearing to succeed but the
file system size isn't changing?
Thanks for the bug report,
- Ted
2010/6/22 <[email protected]>
>
> On Mon, Jun 21, 2010 at 09:44:31PM +0800, Hsuan-Ting wrote:
> > Hi Ted,
> >
> > ? ?Resize seems not work when the size is bigger than 16TB (offline resize).
> >
> > My test machine:
> > x64 platform 2.6.32 kernel + this newest patch
> >
> > 1. <16TB ext4 enlarge to >16TB (offline)
> > ? ? a. I use "8 x 2TB WD disks" and "mdadm" build linear raid
> > ? ? b. then use mkfs.ext4 to make ext4 file system
> > ? ? c. grow the linear raid to "10 X 2TB"
> > ? ? d. finally it grow to "2.X TB" smaller than before
>
> This doesn't surprise me. ?We should add some checks to simply not
> allow the file system growing greater than 16TB if the 64-bit feature
> is not set for now. ?Making this work is going to be tricky, because
> enabling the 64-bit feature doubles the size of the block group
> descriptors, which means we have to make room for them. ?This could
> involve moving files out of the way, as well as moving the inode
> table.
>
> This means that we may want to enable the 64-bit feature flag if there
> is an expectation that the filesystem might be grown to a size large
> enough where this would be an issue.
Sounds like I must enable 64-bit feature when mkfs.
Then it will work, right?
But base on my test, it will occur core dump when resize:
(gdb) bt
#0 0x00000000004160bf in ext2fs_test_bit64 ()
#1 0x0000000000416318 in ba_test_bmap ()
#2 0x0000000000410629 in ext2fs_test_generic_bmap ()
#3 0x0000000000410656 in ext2fs_test_block_bitmap_range2 ()
#4 0x000000000040873d in ext2fs_get_free_blocks2 ()
#5 0x000000000040936d in ext2fs_allocate_group_table ()
#6 0x0000000000404456 in adjust_fs_info ()
#7 0x0000000000404a81 in resize_fs ()
#8 0x00000000004069c7 in main ()
I do the following modification
(to enable "EXT4_FEATURE_INCOMPAT_64BIT" and "EXT2_FLAG_64BITS"):
misc/mke2fs.c :
@@ -1530,6 +1945,8 @@ static void PRS(int argc, char *argv[])
EXT2_BLOCK_SIZE(&fs_param));
exit(1);
}
+ fs_param.s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
ext2fs_blocks_count_set(&fs_param, fs_blocks_count);
resize/resize2fs.c :
@@ -585,6 +598,9 @@ static errcode_t adjust_superblock(ext2_resize_t
rfs, blk64_t new_size)
if (retval)
return retval;
+ fs->super->s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
retval = adjust_fs_info(fs, rfs->old_fs, rfs->reserve_blocks, new_size);
lib/ext2fs/openfs.c :
@@ -109,6 +109,8 @@ errcode_t ext2fs_open2(const char *name, const
char *io_options,
memset(fs, 0, sizeof(struct struct_ext2_filsys));
fs->magic = EXT2_ET_MAGIC_EXT2FS_FILSYS;
fs->flags = flags;
+ fs->flags |= EXT2_FLAG_64BITS;
Did I mistake something?
>
> > 2. >16TB offline resize, the steps is similiar as before.
> > ? ?a. I use "9 x 2TB WD disks" build linear raid
> > ? ?b. mkfs.ext4 and not mount
> > ? ?c. grow the linear raid to "10 X 2TB"
> > ? ?d. do resize
> > ? ?e. finally it grow to "2.X TB" smaller than before
> >
> > I try to on "EXT4_FEATURE_INCOMPAT_64BIT" and "EXT2_FLAG_64BITS"
> > when mkfs and resize.
> > And modify ext2fs_div_ceil code to ext2fs_div64_ceil.
> > It seems work something, the fs size isn't grow but also not deduce,
> > remain the same.
>
> I'm not sure I understand that last sentence; it's not parsing as an
> understandable English sentence, sorry. ?Are you saying that both
> attempts to grow and shrink the filesystem is failing? ?If so, how?
> Are you getting an error message? ?Is it appearing to succeed but the
> file system size isn't changing?
Sorry for my poor English. The last sentence means "succeed but the
file system size isn't changing".
I also remove "flex_bg" feature in this case.
Thanks.
-HsuanTing
>
> Thanks for the bug report,
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
On 2010-06-22, at 03:15, Hsuan-Ting wrote:
2010/6/22 <[email protected]>
>> This means that we may want to enable the 64-bit feature flag if there
>> is an expectation that the filesystem might be grown to a size large
>> enough where this would be an issue.
>
> Sounds like I must enable 64-bit feature when mkfs.
> Then it will work, right?
>
> But base on my test, it will occur core dump when resize:
> (gdb) bt
> #0 0x00000000004160bf in ext2fs_test_bit64 ()
> #1 0x0000000000416318 in ba_test_bmap ()
> #2 0x0000000000410629 in ext2fs_test_generic_bmap ()
> #3 0x0000000000410656 in ext2fs_test_block_bitmap_range2 ()
> #4 0x000000000040873d in ext2fs_get_free_blocks2 ()
> #5 0x000000000040936d in ext2fs_allocate_group_table ()
> #6 0x0000000000404456 in adjust_fs_info ()
> #7 0x0000000000404a81 in resize_fs ()
> #8 0x00000000004069c7 in main ()
>
> I do the following modification
> (to enable "EXT4_FEATURE_INCOMPAT_64BIT" and "EXT2_FLAG_64BITS"):
>
> misc/mke2fs.c :
> @@ -1530,6 +1945,8 @@ static void PRS(int argc, char *argv[])
> EXT2_BLOCK_SIZE(&fs_param));
> exit(1);
> }
> + fs_param.s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
You don't need to modify mke2fs for this, just run "mke2fs -O 64bit ..." to tell it to create the filesystem with this feature flag set.
> resize/resize2fs.c :
> @@ -585,6 +598,9 @@ static errcode_t adjust_superblock(ext2_resize_t
> rfs, blk64_t new_size)
> + fs->super->s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
> retval = adjust_fs_info(fs, rfs->old_fs, rfs->reserve_blocks, new_size);
You can't simply set this flag on an existing filesystem and expect anything except corruption to result.
> lib/ext2fs/openfs.c :
> @@ -109,6 +109,8 @@ errcode_t ext2fs_open2(const char *name, const
> char *io_options,
> memset(fs, 0, sizeof(struct struct_ext2_filsys));
> fs->magic = EXT2_ET_MAGIC_EXT2FS_FILSYS;
> fs->flags = flags;
> + fs->flags |= EXT2_FLAG_64BITS;
>
> Did I mistake something?
Yes, that you can't just set this flag on an existing filesystem and expect it to work. The only possibility to do this is to have resize2fs move the inode tables in groups where there are group descriptor tables (as if it were growing the filesystem) and then write 64-byte group descriptors.
Cheers, Andreas
2010/6/23 Andreas Dilger <[email protected]>:
> On 2010-06-22, at 03:15, Hsuan-Ting wrote:
> 2010/6/22 <[email protected]>
>>> This means that we may want to enable the 64-bit feature flag if there
>>> is an expectation that the filesystem might be grown to a size large
>>> enough where this would be an issue.
>>
>> Sounds like I must enable 64-bit feature when mkfs.
>> Then it will work, right?
>>
>> But base on my test, it will occur core dump when resize:
>> (gdb) bt
>> #0 ?0x00000000004160bf in ext2fs_test_bit64 ()
>> #1 ?0x0000000000416318 in ba_test_bmap ()
>> #2 ?0x0000000000410629 in ext2fs_test_generic_bmap ()
>> #3 ?0x0000000000410656 in ext2fs_test_block_bitmap_range2 ()
>> #4 ?0x000000000040873d in ext2fs_get_free_blocks2 ()
>> #5 ?0x000000000040936d in ext2fs_allocate_group_table ()
>> #6 ?0x0000000000404456 in adjust_fs_info ()
>> #7 ?0x0000000000404a81 in resize_fs ()
>> #8 ?0x00000000004069c7 in main ()
>>
>> I do the following modification
>> (to enable "EXT4_FEATURE_INCOMPAT_64BIT" and "EXT2_FLAG_64BITS"):
>>
>> misc/mke2fs.c :
>> @@ -1530,6 +1945,8 @@ static void PRS(int argc, char *argv[])
>> ? ? ? ? ? ? ? ? ? ? ? ?EXT2_BLOCK_SIZE(&fs_param));
>> ? ? ? ? ? ? ? ?exit(1);
>> ? ? ? ?}
>> + ? ? ? fs_param.s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
>
>
> You don't need to modify mke2fs for this, just run "mke2fs -O 64bit ..." to tell it to create the filesystem with this feature flag set.
Hi Andreas ,
If the size is not big enough, "-O 64bit" will get the following error messages:
"/dev/sda3: Cannot create filesystem with requested number of inodes
while setting up superblock"
And base on my trace, I think "-O 64bit" won't always work.
It will check the blocks count as the following code in mke2fs.c:
if ((fs_blocks_count > MAX_32_NUM) &&
!(fs_param.s_feature_incompat & EXT4_FEATURE_INCOMPAT_64BIT) &&
get_bool_from_profile(fs_types, "auto_64-bit_support", 0)) {
fs_param.s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
fs_param.s_feature_compat &= ~EXT2_FEATURE_COMPAT_RESIZE_INODE;
}
So I do the above modification to force 64bit feature enabling.
I also find some strange things when mkfs:
If the size(ex. 91.3G) isn't big enough it will use "floppy" settings
in mke2fs.conf instead of "ext4".
I do the following steps, it seems work but still has core dump messages:
1. remove "resize_inode" and "flex_bg" features ,and let "floppy"
settings the same as "ext4" in mke2fs.conf
2. build linera raid with a small partition of 1 disks (91.3G)
3. mkfs.ext4 to this raid
5. grow this linear raid (91.3G + 9 x 2TB disks)
6. do resize2fs
6. get the following core dump:
[458783.472100] resize2fs[27376]: segfault at 7fa420f14000 ip
0000000000416055 sp 00007fff533a99e8 error 4 in
resize2fs[400000+1e000]
Segmentation fault (core dumped)
(gdb) bt
#0 0x000000000041606f in ext2fs_test_bit64 ()
Cannot access memory at address 0x7fff2be12328
8. but the fs size have grown (16.3T)
So I think if the 64bit feature is enabled when mkfs, it will reserve
more room in file system
And we can grow it up bigger than 16TB.
It seems there are some issues in "master" still, I'll try "pu" branch later.
Thanks.
-HsuanTing
Hi Andreas ,
Sorry, please forgot what I said before.
Base on the log(some build raid error), it seems
I didn't stop the old raid before starting this new test.
So this result isn't correct.
-HsuanTing
Thanks.
2010/6/23 Hsuan-Ting <[email protected]>:
> 2010/6/23 Andreas Dilger <[email protected]>:
>> On 2010-06-22, at 03:15, Hsuan-Ting wrote:
>> 2010/6/22 <[email protected]>
>>>> This means that we may want to enable the 64-bit feature flag if there
>>>> is an expectation that the filesystem might be grown to a size large
>>>> enough where this would be an issue.
>>>
>>> Sounds like I must enable 64-bit feature when mkfs.
>>> Then it will work, right?
>>>
>>> But base on my test, it will occur core dump when resize:
>>> (gdb) bt
>>> #0 ?0x00000000004160bf in ext2fs_test_bit64 ()
>>> #1 ?0x0000000000416318 in ba_test_bmap ()
>>> #2 ?0x0000000000410629 in ext2fs_test_generic_bmap ()
>>> #3 ?0x0000000000410656 in ext2fs_test_block_bitmap_range2 ()
>>> #4 ?0x000000000040873d in ext2fs_get_free_blocks2 ()
>>> #5 ?0x000000000040936d in ext2fs_allocate_group_table ()
>>> #6 ?0x0000000000404456 in adjust_fs_info ()
>>> #7 ?0x0000000000404a81 in resize_fs ()
>>> #8 ?0x00000000004069c7 in main ()
>>>
>>> I do the following modification
>>> (to enable "EXT4_FEATURE_INCOMPAT_64BIT" and "EXT2_FLAG_64BITS"):
>>>
>>> misc/mke2fs.c :
>>> @@ -1530,6 +1945,8 @@ static void PRS(int argc, char *argv[])
>>> ? ? ? ? ? ? ? ? ? ? ? ?EXT2_BLOCK_SIZE(&fs_param));
>>> ? ? ? ? ? ? ? ?exit(1);
>>> ? ? ? ?}
>>> + ? ? ? fs_param.s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
>>
>>
>> You don't need to modify mke2fs for this, just run "mke2fs -O 64bit ..." to tell it to create the filesystem with this feature flag set.
>
> Hi Andreas ,
>
> If the size is not big enough, "-O 64bit" will get the following error messages:
> "/dev/sda3: Cannot create filesystem with requested number of inodes
> while setting up superblock"
>
> And base on my trace, I think "-O 64bit" won't always work.
> It will check the blocks count ?as the following code in mke2fs.c:
> ? ?if ((fs_blocks_count > MAX_32_NUM) &&
> ? ? ? ?!(fs_param.s_feature_incompat & EXT4_FEATURE_INCOMPAT_64BIT) &&
> ? ? ? ?get_bool_from_profile(fs_types, "auto_64-bit_support", 0)) {
> ? ? ? ?fs_param.s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
> ? ? ? ?fs_param.s_feature_compat &= ~EXT2_FEATURE_COMPAT_RESIZE_INODE;
> ? ?}
> So I do the above modification to force 64bit feature enabling.
>
> I also find some strange things when mkfs:
> If the size(ex. 91.3G) isn't big enough it will use "floppy" settings
> in mke2fs.conf instead of "ext4".
>
> I do the following steps, it seems work but still has core dump messages:
> 1. remove "resize_inode" and "flex_bg" features ,and let "floppy"
> settings the same as "ext4" in mke2fs.conf
> 2. build linera raid with a small partition of 1 disks (91.3G)
> 3. mkfs.ext4 to this raid
> 5. grow this linear raid (91.3G + 9 x 2TB disks)
> 6. do resize2fs
> 6. get the following core dump:
> [458783.472100] resize2fs[27376]: segfault at 7fa420f14000 ip
> 0000000000416055 sp 00007fff533a99e8 error 4 in
> resize2fs[400000+1e000]
> Segmentation fault (core dumped)
> (gdb) bt
> #0 ?0x000000000041606f in ext2fs_test_bit64 ()
> Cannot access memory at address 0x7fff2be12328
> 8. but the fs size have grown (16.3T)
>
> So I think if the 64bit feature is enabled when mkfs, it will reserve
> more room in file system
> And we can grow it up bigger than 16TB.
> It seems there are some issues in "master" still, I'll try "pu" branch later.
>
> Thanks.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -HsuanTing
>
Hi Ted,
I try to :
1. force to enable 64bits feature even the size <16TB
2. fix the "block_bmap" overflow issue when resizing.
It seems OK when resize to >16TB .
(the content of the test file is correct).
But after it grown up, it will get error when do fsck.
My modification is as the following:
lib/ext2fs/openfs.c
@@ -109,6 +109,8 @@ errcode_t ext2fs_open2(const char *name, const
char *io_options,
memset(fs, 0, sizeof(struct struct_ext2_filsys));
fs->magic = EXT2_ET_MAGIC_EXT2FS_FILSYS;
fs->flags = flags;
+ fs->flags |= EXT2_FLAG_64BITS;
/* don't overwrite sb backups unless flag is explicitly cleared */
fs->flags |= EXT2_FLAG_MASTER_SB_ONLY;
fs->umask = 022;
misc/mke2fs.c
@@ -1530,6 +1531,8 @@ static void PRS(int argc, char *argv[])
EXT2_BLOCK_SIZE(&fs_param));
exit(1);
}
+ fs_param.s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
ext2fs_blocks_count_set(&fs_param, fs_blocks_count);
resize/resize2fs.c
index 064c4c4..e28f5f2 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -294,7 +294,8 @@ errcode_t adjust_fs_info(ext2_filsys fs, ext2_filsys old_fs,
blk64_t overhead = 0;
blk64_t rem;
blk64_t blk, group_block;
- ext2_ino_t real_end;
+ __u64 real_end;
blk64_t adj, old_numblocks, numblocks, adjblocks;
unsigned long i, j, old_desc_blocks, max_group;
unsigned int meta_bg, meta_bg_size;
@@ -381,9 +382,9 @@ retry:
fs->inode_map);
if (retval) goto errout;
- real_end = ((EXT2_BLOCKS_PER_GROUP(fs->super)
- * fs->group_desc_count)) - 1 +
- fs->super->s_first_data_block;
+ real_end = ((__u64)(EXT2_BLOCKS_PER_GROUP(fs->super)
+ * (__u64)fs->group_desc_count)) - 1ULL +
+ (__u64)fs->super->s_first_data_block;
retval = ext2fs_resize_block_bitmap2(ext2fs_blocks_count(fs->super)-1,
real_end, fs->block_map);
@@ -585,6 +586,8 @@ static errcode_t adjust_superblock(ext2_resize_t
rfs, blk64_t new_size)
if (retval)
return retval;
+ fs->super->s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
retval = adjust_fs_info(fs, rfs->old_fs, rfs->reserve_blocks, new_size);
if (retval)
goto errout;
My test case:
1. build a linear raid (1 x 2TB disk)
2. mkfs.ext4, mount it and"echo 123 > test" to
touch a test file.
3. grown the linear raid to >16TB (9 x 2TB + 1 x 1.5TB)
4. do resize ( resize -fpF /dev/md2 )
After resizing, the content of the test file is correct.
But "fsck -nyv" will get the following error:
e2fsck 1.41.12 (17-May-2010)
###open|= EXT2_FLAG_64BITS
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: +(244154882--244187135) +(244187650--244188163)
Fix? no
/dev/md2: ********** WARNING: Filesystem still has errors **********
12 inodes used (0.00%)
0 non-contiguous files (0.0%)
0 non-contiguous directories (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 2
74638645 blocks used (1.57%)
0 bad blocks
0 large files
1 regular file
2 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
3 files
I think maybe I should modify "ext2_ino_t" type from
"__u32" to "__u64".
Maybe this modification will fix many overflow issue.
Do you have any idea of this fsck error or any opinions of this approach?
Thanks
-HsuanTing
2010/6/22 Hsuan-Ting <[email protected]>:
> 2010/6/22 <[email protected]>
>>
>> On Mon, Jun 21, 2010 at 09:44:31PM +0800, Hsuan-Ting wrote:
>> > Hi Ted,
>> >
>> > ? ?Resize seems not work when the size is bigger than 16TB (offline resize).
>> >
>> > My test machine:
>> > x64 platform 2.6.32 kernel + this newest patch
>> >
>> > 1. <16TB ext4 enlarge to >16TB (offline)
>> > ? ? a. I use "8 x 2TB WD disks" and "mdadm" build linear raid
>> > ? ? b. then use mkfs.ext4 to make ext4 file system
>> > ? ? c. grow the linear raid to "10 X 2TB"
>> > ? ? d. finally it grow to "2.X TB" smaller than before
>>
>> This doesn't surprise me. ?We should add some checks to simply not
>> allow the file system growing greater than 16TB if the 64-bit feature
>> is not set for now. ?Making this work is going to be tricky, because
>> enabling the 64-bit feature doubles the size of the block group
>> descriptors, which means we have to make room for them. ?This could
>> involve moving files out of the way, as well as moving the inode
>> table.
>>
>> This means that we may want to enable the 64-bit feature flag if there
>> is an expectation that the filesystem might be grown to a size large
>> enough where this would be an issue.
>
> Sounds like I must enable 64-bit feature when mkfs.
> Then it will work, right?
>
> But base on my test, it will occur core dump when resize:
> (gdb) bt
> #0 ?0x00000000004160bf in ext2fs_test_bit64 ()
> #1 ?0x0000000000416318 in ba_test_bmap ()
> #2 ?0x0000000000410629 in ext2fs_test_generic_bmap ()
> #3 ?0x0000000000410656 in ext2fs_test_block_bitmap_range2 ()
> #4 ?0x000000000040873d in ext2fs_get_free_blocks2 ()
> #5 ?0x000000000040936d in ext2fs_allocate_group_table ()
> #6 ?0x0000000000404456 in adjust_fs_info ()
> #7 ?0x0000000000404a81 in resize_fs ()
> #8 ?0x00000000004069c7 in main ()
>
> I do the following modification
> (to enable "EXT4_FEATURE_INCOMPAT_64BIT" and "EXT2_FLAG_64BITS"):
>
> misc/mke2fs.c :
> @@ -1530,6 +1945,8 @@ static void PRS(int argc, char *argv[])
> ? ? ? ? ? ? ? ? ? ? ? ?EXT2_BLOCK_SIZE(&fs_param));
> ? ? ? ? ? ? ? ?exit(1);
> ? ? ? ?}
> + ? ? ? fs_param.s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
>
> ? ? ? ?ext2fs_blocks_count_set(&fs_param, fs_blocks_count);
>
>
> resize/resize2fs.c :
> @@ -585,6 +598,9 @@ static errcode_t adjust_superblock(ext2_resize_t
> rfs, blk64_t new_size)
> ? ? ? ?if (retval)
> ? ? ? ? ? ? ? ?return retval;
>
> + ? ? ? fs->super->s_feature_incompat |= EXT4_FEATURE_INCOMPAT_64BIT;
> ? ? ? ?retval = adjust_fs_info(fs, rfs->old_fs, rfs->reserve_blocks, new_size);
>
>
> lib/ext2fs/openfs.c :
> @@ -109,6 +109,8 @@ errcode_t ext2fs_open2(const char *name, const
> char *io_options,
> ? ? ? ?memset(fs, 0, sizeof(struct struct_ext2_filsys));
> ? ? ? ?fs->magic = EXT2_ET_MAGIC_EXT2FS_FILSYS;
> ? ? ? ?fs->flags = flags;
> + ? ? ? fs->flags |= EXT2_FLAG_64BITS;
>
>
> Did I mistake something?
>
>>
>> > 2. >16TB offline resize, the steps is similiar as before.
>> > ? ?a. I use "9 x 2TB WD disks" build linear raid
>> > ? ?b. mkfs.ext4 and not mount
>> > ? ?c. grow the linear raid to "10 X 2TB"
>> > ? ?d. do resize
>> > ? ?e. finally it grow to "2.X TB" smaller than before
>> >
>> > I try to on "EXT4_FEATURE_INCOMPAT_64BIT" and "EXT2_FLAG_64BITS"
>> > when mkfs and resize.
>> > And modify ext2fs_div_ceil code to ext2fs_div64_ceil.
>> > It seems work something, the fs size isn't grow but also not deduce,
>> > remain the same.
>>
>> I'm not sure I understand that last sentence; it's not parsing as an
>> understandable English sentence, sorry. ?Are you saying that both
>> attempts to grow and shrink the filesystem is failing? ?If so, how?
>> Are you getting an error message? ?Is it appearing to succeed but the
>> file system size isn't changing?
>
> ?Sorry for my poor English. The last sentence means "succeed but the
> file system size isn't changing".
> I also remove "flex_bg" feature in this case.
>
> Thanks.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-HsuanTing
>
>>
>> Thanks for the bug report,
>>
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>
On 2010-06-25, at 04:33, Hsuan-Ting wrote:
> My test case:
> 1. build a linear raid (1 x 2TB disk)
> 2. mkfs.ext4, mount it and"echo 123 > test" to
> touch a test file.
> 3. grown the linear raid to >16TB (9 x 2TB + 1 x 1.5TB)
> 4. do resize ( resize -fpF /dev/md2 )
> After resizing, the content of the test file is correct.
This is mostly unsurprising, since there is very little chance that the single file is corrupted by a resize. Better would be to fill nearly the whole filesystem (e.g. llverfs, previously posted to this list) and verify the file contents after the resize.
> But "fsck -nyv" will get the following error:
> I think maybe I should modify "ext2_ino_t" type from
> "__u32" to "__u64".
> Maybe this modification will fix many overflow issue.
No, this will completely break the ext2/3/4 on-disk format. What you need to make sure is that when resize2fs is resizing the filesystem that it limits the total number of inodes in the filesystem to 2^32-1. I guess that means the groups beyond the 2^32nd inode will have no inode table at all, which is a bit strange, but something that we need to expect in e2fsck.
I guess the alternative would be to allocate the inode table, but we couldn't (yet?) use those inodes without significant work to support 64-bit inode numbers. Probably the first step in that direction would be the "dirdata" patch that we have to allow storing extra data in directory entries.
Cheers, Andreas
Hi Andreas,
I've follow your steps, fill nearly the whole filesystem before resizing.
After resizing and do fsck, the files seems OK.
My test steps is as following:
1. build a linear raid ( 1 X 2TB disk )
2. fill nearly the whole filesystem
(copy 133 * "test folders" to this volume,
test folders include "kernel source" + 10G HD video + pdf files + small video)
3. grown the linear raid to >16TB (10 x 2TB)
4. do resize ( resize -fpF /dev/md2 )
5. after resize the "df" result isn't correct, and it will occur error
when "rm" files
("df" its "Used colum" show "114.3M", actually it must be "1.5T")
("rm error" I add after these steps)
6. do "fsck.ext4 -yvf", then "df" is correct
7. copy 30 * "test folders" again to fill new space
8. Do some roughly verification, the content of files and
rm command seems OK:
(roughly verification: "diff -r" to compare one test kernel source
with original, play video and open pdf files)
Now I'm doing "llverfs -l" and run a script to recursive do "diff -r"
for verifying all
test kernel souce. If it occurs error, I'll update later.
If you have any new idea of these error(df and fsck) or any opinions,
please let me know.
I'm still trying to find these error root cause.
Thanks.
"rm error":
[511874.472848] EXT4-fs error (device md2): mb_free_blocks:
double-free of inode 16391's block 66051(bit 515 in group 2)
[511874.483885] Aborting journal on device md2-8.
[511874.488741] EXT4-fs (md2): Remounting filesystem read-only
[511874.494928] EXT4-fs error (device md2) in
ext4_reserve_inode_write: Journal has aborted
[511874.503288] EXT4-fs error (device md2) in
ext4_reserve_inode_write: Journal has aborted
[511874.511791] EXT4-fs error (device md2) in ext4_ext_remove_space:
Journal has aborted
[511874.520125] EXT4-fs error (device md2) in
ext4_reserve_inode_write: Journal has aborted
[511874.528676] EXT4-fs error (device md2) in ext4_ext_truncate:
Journal has aborted
[511874.536581] EXT4-fs error (device md2) in
ext4_reserve_inode_write: Journal has aborted
[511874.545186] EXT4-fs error (device md2) in ext4_orphan_del: Journal
has aborted
[511874.552786] EXT4-fs error (device md2) in
ext4_reserve_inode_write: Journal has aborted
2010/6/26 Andreas Dilger <[email protected]>:
> On 2010-06-25, at 04:33, Hsuan-Ting wrote:
>> My test case:
>> 1. build a linear raid (1 x 2TB disk)
>> 2. mkfs.ext4, mount it and"echo 123 > test" to
>> touch a test file.
>> 3. ?grown the linear raid to >16TB (9 x 2TB + 1 x 1.5TB)
>> 4. do resize ( resize -fpF /dev/md2 )
>> After resizing, the content of the test file is correct.
>
> This is mostly unsurprising, since there is very little chance that the single file is corrupted by a resize. ?Better would be to fill nearly the whole filesystem (e.g. llverfs, previously posted to this list) and verify the file contents after the resize.
>
>> But "fsck -nyv" will get the following error:
>> I think maybe I should modify "ext2_ino_t" type from
>> "__u32" to "__u64".
>> Maybe this modification will fix many overflow issue.
>
> No, this will completely break the ext2/3/4 on-disk format. ?What you need to make sure is that when resize2fs is resizing the filesystem that it limits the total number of inodes in the filesystem to 2^32-1. ?I guess that means the groups beyond the 2^32nd inode will have no inode table at all, which is a bit strange, but something that we need to expect in e2fsck.
>
> I guess the alternative would be to allocate the inode table, but we couldn't (yet?) use those inodes without significant work to support 64-bit inode numbers. ?Probably the first step in that direction would be the "dirdata" patch that we have to allow storing extra data in directory entries.
>
> Cheers, Andreas
>
>
>
>
>
>