Hi,
I wanted to try out ext4 on my shiny new 9+TB RAID5 device
(11 x 1TB disks in md raid5).
I obtained the 1.39-tyt3 version of e2fsprogs, and did:
./mkfs.ext3 -j -m 0 -N 1000000000 -O dir_index,filetype,resize_inode -E stride=65536,resize=5120000000 -J device=/dev/mapper/vg11-md15--journal -L data2 /dev/md15
(If using a separate device for the journal is inadvisable, please let
me know; this is on a different set of spindles that md15 is running on.)
The stride was calculated from the 64k chunk of the raid5 device.
Mainly a guess, as I couldn't find any clear reference on how to plug in
the values to fill this in.
Anyway, that did:
| mke2fs 1.38 (30-Jun-2005)
| Filesystem label=data2
| OS type: Linux
| Block size=4096 (log=2)
| Fragment size=4096 (log=2)
| 1000204128 inodes, 2441859680 blocks
| 0 blocks (0.00%) reserved for the super user
| First data block=0
| Maximum filesystem blocks=5485408000
| 65527 block groups
| 37265 blocks per group, 37265 fragments per group
| 15264 inodes per group
| Superblock backups stored on blocks:
| 37265, 111795, 186325, 260855, 335385, 931625, 1006155, 1825985,
| 3018465, 4658125, 9055395, 12781895, 23290625, 27166185, 81498555,
| 89473265, 116453125, 244495665, 582265625, 626312855, 733486995,
| 2200460985
|
| Writing inode tables: done
| Adding journal to device /dev/mapper/vg11-md15--journal: done
| Writing superblocks and filesystem accounting information: done
|
| This filesystem will be automatically checked every 27 mounts or
| 180 days, whichever comes first. Use tune2fs -c or -i to override.
Note the "37265 blocks per group".
Trying to mount this (after of course "tune2fs -E test_fs /dev/md15",
which BTW didn't work with the 1.39-tyt3 version) gave a kernel error:
| EXT4-fs: #blocks per group too big: 37265
I then tried adding "-g 16384" to the mkfs.ext3 invocation, but that
apparently got ignored and still showed "37265 blocks per group".
Removing the stride option didn't help. Removing all options didn't
help...
Is it impossible to create ext4 on such a device? I must be overlooking
something... the Documentation/filesystems/ext4.txt file doesn't help
(2.6.25-rc8).
thanks,
Paul Slootman
Paul Slootman wrote:
> Hi,
> I wanted to try out ext4 on my shiny new 9+TB RAID5 device
> (11 x 1TB disks in md raid5).
>
> I obtained the 1.39-tyt3 version of e2fsprogs, and did:
>
> ./mkfs.ext3 -j -m 0 -N 1000000000 -O dir_index,filetype,resize_inode -E stride=65536,resize=5120000000 -J device=/dev/mapper/vg11-md15--journal -L data2 /dev/md15
>
> (If using a separate device for the journal is inadvisable, please let
> me know; this is on a different set of spindles that md15 is running on.)
>
> The stride was calculated from the 64k chunk of the raid5 device.
> Mainly a guess, as I couldn't find any clear reference on how to plug in
> the values to fill this in.
>
> Anyway, that did:
>
> | mke2fs 1.38 (30-Jun-2005)
> | Filesystem label=data2
> | OS type: Linux
> | Block size=4096 (log=2)
> | Fragment size=4096 (log=2)
> | 1000204128 inodes, 2441859680 blocks
> | 0 blocks (0.00%) reserved for the super user
> | First data block=0
> | Maximum filesystem blocks=5485408000
> | 65527 block groups
> | 37265 blocks per group, 37265 fragments per group
I'd probably not use 1.39-tyt3... that's pretty old. (see the 2005?) :)
I did some >8T work that didn't officially make it in 'til 1.40... I'm
not sure if it's in 1.39-tyt3 or not, I'd guess not.
Also, stride=65536 isn't going to give you what you want, at a minimum
because it's stored in a __u16, and it'll wrap around to 0. (newer
e2fsprogs fails this way, though it's not clear that that's the reason,
when it fails).
But, if I try bleeding edge e2fsprogs on a semi-similar fs (smaller
stride value just so it doesn't fail):
[tmp]$ /src2/e2fsprogs-git/e2fsprogs/misc/mke2fs -F -j -m 0 -N
1000000000 -O dir_index,filetype,resize_inode -E
stride=13172,resize=5120000000 -J device=journal -L data2 testfsfile
mke2fs 1.40.8 (13-Mar-2008)
Filesystem label=data2
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
1001548800 inodes, 2441859680 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
74520 block groups
32768 blocks per group, 32768 fragments per group
13440 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544, 1934917632
I at least get a sane blocks per group.
-Eric
On Thu, Apr 03, 2008 at 06:19:04PM +0200, Paul Slootman wrote:
> Hi,
> I wanted to try out ext4 on my shiny new 9+TB RAID5 device
> (11 x 1TB disks in md raid5).
>
> I obtained the 1.39-tyt3 version of e2fsprogs, and did:
>
> ./mkfs.ext3 -j -m 0 -N 1000000000 -O dir_index,filetype,resize_inode -E stride=65536,resize=5120000000 -J device=/dev/mapper/vg11-md15--journal -L data2 /dev/md15
>
> (If using a separate device for the journal is inadvisable, please let
> me know; this is on a different set of spindles that md15 is running on.)
>
> The stride was calculated from the 64k chunk of the raid5 device.
> Mainly a guess, as I couldn't find any clear reference on how to plug in
> the values to fill this in.
The stride parameter is the problem. Newer versions of e2fsprogs
don't allow a stride parameter which is too big. If you want to do
the perfect calculation, you take the 64k chunk size, and divide it by
the 4k blocksize to yield a stride parameter of 16. Actually, though,
simply using a non-zero stride size is actually good enough --- and if
you have a even number of RAID-5 disks, you might not need this
parameter at all. (It's only purpose is to perturb the location of
the block bitmaps so that all of the bitmaps don't end up on a single
hard drive.)
BTW, we will be making a new snapshot for people who want to test ext4
soon....
- Ted
On Thu, Apr 03, 2008 at 12:58:30PM -0500, Eric Sandeen wrote:
> > I obtained the 1.39-tyt3 version of e2fsprogs, and did:
> > | mke2fs 1.38 (30-Jun-2005)
> I'd probably not use 1.39-tyt3... that's pretty old. (see the 2005?) :)
Actually, e2fsprogs 1.39-tyt isn't that old (April 2007). The 2005 is
because the mke2fs in use was 1.38, not 1.39. (Which was probably
another problem.)
Which would be another problem. As I mentioned, we will have a new
e2fsprogs release out soon. What's in the git repository is probably
better than 1.39-tyt3 in most cases, but we want to fix a few final
things before we cut a 1.41-rc0 release.
- Ted
Theodore Tso wrote:
> On Thu, Apr 03, 2008 at 12:58:30PM -0500, Eric Sandeen wrote:
>>> I obtained the 1.39-tyt3 version of e2fsprogs, and did:
>
>>> | mke2fs 1.38 (30-Jun-2005)
>
>> I'd probably not use 1.39-tyt3... that's pretty old. (see the 2005?) :)
>
> Actually, e2fsprogs 1.39-tyt isn't that old (April 2007). The 2005 is
> because the mke2fs in use was 1.38, not 1.39. (Which was probably
> another problem.)
Ah, missed that.
Ted, it looks like the stride value is still getting % 65536, just FWIW.
[e2fsprogs]$ misc/mke2fs -F -E stride=65536 testit
mke2fs 1.40.8 (13-Mar-2008)
Invalid stride parameter: 65536
Bad option(s) specified:
...
[e2fsprogs]$ misc/mke2fs -F -E stride=65537 testit
mke2fs 1.40.8 (13-Mar-2008)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
...
I'll send a patch for it in a bit, trying not to get too distracted
right now ;)
-Eric
> The stride parameter is the problem. Newer versions of e2fsprogs
> don't allow a stride parameter which is too big. If you want to do
> the perfect calculation, you take the 64k chunk size, and divide it by
> the 4k blocksize to yield a stride parameter of 16. Actually, though,
> simply using a non-zero stride size is actually good enough --- and if
> you have a even number of RAID-5 disks, you might not need this
> parameter at all. (It's only purpose is to perturb the location of
> the block bitmaps so that all of the bitmaps don't end up on a single
> hard drive.)
Actually, as I wrote:
>> Removing the stride option didn't help. Removing all options didn't
>> help...
I still end up with a "blocks per group" of 37265, and when mounted I'm
greeted with the message "EXT4-fs: #blocks per group too big: 37265".
Is the ext4 code in the 2.6.25-rc8 kernel too old? According to the
source the number of block per group must be <= 8 * blocksize; with 4k
blocks that would mean 32768, not 37265.
Even passing the -g option to explicitly set the blocks per group gets
ignored.
> BTW, we will be making a new snapshot for people who want to test ext4
> soon....
Kernel code and userspace utils? Or just kernel code?
Where can I find the most recent version of both? I looked at
Documentation/filesystems/ext4.txt, but I feel that's a bit outdated:
- It's still mke2fs -j /dev/hda1
- mount /dev/hda1 /wherever -t ext4dev
This ignores the fact you need to set the testing option...
Thanks,
Paul Slootman
On Fri, Apr 04, 2008 at 11:21:20AM +0200, Paul Slootman wrote:
> I still end up with a "blocks per group" of 37265, and when mounted I'm
> greeted with the message "EXT4-fs: #blocks per group too big: 37265".
> Is the ext4 code in the 2.6.25-rc8 kernel too old? According to the
> source the number of block per group must be <= 8 * blocksize; with 4k
> blocks that would mean 32768, not 37265.
>
> Even passing the -g option to explicitly set the blocks per group gets
> ignored.
I didn't notice it initially, but it looks like you're using a 1.38
mke2fs program and 1.39-tyt3 libraries. I'm guessing that was
responsible for the wierd results, since mke2fs and e2fsck are much
more sensitive to library versions than most other libext2fs library
programs.
So that's probably the real proximate cause of your problems.
> > BTW, we will be making a new snapshot for people who want to test ext4
> > soon....
>
> Kernel code and userspace utils? Or just kernel code?
> Where can I find the most recent version of both?
I was referring to e2fsprogs. The latest version is available in the
git repository, at:
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=summary
The 'master' branch is pretty stable, the 'next' branch is a bit more
exciting, and the 'pu' (proposed update) branch is constantly getting
rebased and mainly for ext4 developers.
The mainline linux kernel has the fairly latest kernel code. There is
also more recent kernel patches in the ext4 patch queue, which is
available here:
git://repo.or.cz/ext4-patch-queue.git
http://repo.or.cz/w/ext4-patch-queue.git
The patch queue is really intended mostly for ext4 developers, though.
So if you want to use the latest bleeding-edge development code,
that's where to find it. (Please think very carefully before doing
anything with production data, though! We try to be very careful, but
it's your data on the line at the end of the day. :-)
The main issue is that e2fsprogs-1.39-tyt3 is quite about 5-6 months
old, and so it's a bit out of synch with the latest kernel code if you
are using the latest kernel release. That will be remedied once we
can get e2fsprogs 1.41-rc0 out the door. I had been hoping to get it
out this week, but I'm guessing it will probably slip until next week.
At that point we can update Documentation/filesystems/ext4.txt, and
things much easier to set up.
- Ted
On Fri 04 Apr 2008, Theodore Tso wrote:
> On Fri, Apr 04, 2008 at 11:21:20AM +0200, Paul Slootman wrote:
> > I still end up with a "blocks per group" of 37265, and when mounted I'm
> > greeted with the message "EXT4-fs: #blocks per group too big: 37265".
> > Is the ext4 code in the 2.6.25-rc8 kernel too old? According to the
> > source the number of block per group must be <= 8 * blocksize; with 4k
> > blocks that would mean 32768, not 37265.
> >
> > Even passing the -g option to explicitly set the blocks per group gets
> > ignored.
>
> I didn't notice it initially, but it looks like you're using a 1.38
> mke2fs program and 1.39-tyt3 libraries. I'm guessing that was
I downloaded e2fsprogs-64bit-x86_64.tar.bz2 from somewhere
(unfortunately I can't recall from where, but it was some link from a
page about ext4) and used the mkfs.ext3 from there, which ldd tells me
is statically linked. Apparently that version is too old.
> I was referring to e2fsprogs. The latest version is available in the
> git repository, at:
>
> git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
> http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=summary
Thanks for your help, I'll get that version and try again :-)
> that's where to find it. (Please think very carefully before doing
> anything with production data, though! We try to be very careful, but
> it's your data on the line at the end of the day. :-)
This is on a system that provides 2nd stage backup (long term storage).
As I've just replaced the old 400G disks with 1TB disks, all the old
data is gone anyway, so any crashes or whatever won't immediataly lead
to problems :-)
Paul Slootman