2021-07-23 15:32:09

by Mikhail Morfikov

[permalink] [raw]
Subject: Is it safe to use the bigalloc feature in the case of ext4 filesystem?

In the man ext4(5) we can read the following:

Warning: The bigalloc feature is still under development,
and may not be fully supported with your kernel or may
have various bugs. Please see the web page
http://ext4.wiki.kernel.org/index.php/Bigalloc for details.
May clash with delayed allocation (see nodelalloc mount
option).

According to the link above, the info is dated back to 2013,
which is a little bit ancient.

What's the current status of the feature? Is it safe to use
bigalloc on several TiB hard disks where only big files will be
stored?


2021-07-27 06:25:00

by Andreas Dilger

[permalink] [raw]
Subject: Re: Is it safe to use the bigalloc feature in the case of ext4 filesystem?

On Jul 23, 2021, at 9:30 AM, Mikhail Morfikov <[email protected]> wrote:
>
> In the man ext4(5) we can read the following:
>
> Warning: The bigalloc feature is still under development,
> and may not be fully supported with your kernel or may
> have various bugs. Please see the web page
> http://ext4.wiki.kernel.org/index.php/Bigalloc for details.
> May clash with delayed allocation (see nodelalloc mount
> option).
>
> According to the link above, the info is dated back to 2013,
> which is a little bit ancient.
>
> What's the current status of the feature? Is it safe to use
> bigalloc on several TiB hard disks where only big files will be
> stored?

Hi Mikhail,
I am not using bigalloc myself (and I'm not aware of its use with
any Lustre-releated ext4 filesystems), but I believe that bigalloc
is in use at some other large storage sites. Hopefully someone
that is using it can respond here (this may be slow due to summer
vacation).

Cheers, Andreas






Attachments:
signature.asc (890.00 B)
Message signed with OpenPGP

2021-07-27 23:01:54

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Is it safe to use the bigalloc feature in the case of ext4 filesystem?

On Fri, Jul 23, 2021 at 05:30:13PM +0200, Mikhail Morfikov wrote:
> In the man ext4(5) we can read the following:
>
> Warning: The bigalloc feature is still under development,
> and may not be fully supported with your kernel or may
> have various bugs. Please see the web page
> http://ext4.wiki.kernel.org/index.php/Bigalloc for details.
> May clash with delayed allocation (see nodelalloc mount
> option).
>
> According to the link above, the info is dated back to 2013,
> which is a little bit ancient.
>
> What's the current status of the feature? Is it safe to use
> bigalloc on several TiB hard disks where only big files will be
> stored?

Yes; the places where bigalloc is perhaps not as well tested is
support FALLOC_FL_COLLAPSE_RANGE, FALLOC_FL_INSERT_RANGE, and
FALLOC_FL_PUNCH_HOLE. Bigalloc is also not very efficient for large
directories (where we allocate a full cluster for each directory
block). Older kernels did not handle ENOSPC errors when delayed
allocation was enabled, but that has since been fixed, and bigalloc is
passing file system regression tests, so it should safe to use as
you've described.

Cheers,

- Ted


2021-07-28 09:36:56

by Mikhail Morfikov

[permalink] [raw]
Subject: Re: Is it safe to use the bigalloc feature in the case of ext4 filesystem?

Thanks for the answer.

I have one question. Basically there's the /etc/mke2fs.conf file and
I've created the following stanza in it:

bigdata = {
errors = remount-ro
features = has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize,bigalloc,^uninit_bg,sparse_super2
inode_size = 256
inode_ratio = 4194304
cluster_size = 4M
reserved_ratio = 0
lazy_itable_init = 0
lazy_journal_init = 0
}

It looks like the cluster_size parameter is ignored in such case (I've
tried both 4M and 4194304 values), and the filesystem was created with
64K cluster size (via mkfs -t bigdata -L bigdata /dev/sdb1 ), which is
the default when the bigalloc feature is set.

So it looks like the cluster_size doesn't do anything when set in
/etc/mke2fs.conf . When I used the -C 4M flag (i.e.
mkfs -t bigdata -L bigdata -C 4M /dev/sdb1), the cluster size was set to
4M as it should.

Is something wrong with the cluster_size parameter set in the
/etc/mke2fs.conf file?

----
# mkfs -V
mkfs from util-linux 2.36.1




On 28/07/2021 01.01, Theodore Ts'o wrote:
> On Fri, Jul 23, 2021 at 05:30:13PM +0200, Mikhail Morfikov wrote:
>> In the man ext4(5) we can read the following:
>>
>> Warning: The bigalloc feature is still under development,
>> and may not be fully supported with your kernel or may
>> have various bugs. Please see the web page
>> http://ext4.wiki.kernel.org/index.php/Bigalloc for details.
>> May clash with delayed allocation (see nodelalloc mount
>> option).
>>
>> According to the link above, the info is dated back to 2013,
>> which is a little bit ancient.
>>
>> What's the current status of the feature? Is it safe to use
>> bigalloc on several TiB hard disks where only big files will be
>> stored?
>
> Yes; the places where bigalloc is perhaps not as well tested is
> support FALLOC_FL_COLLAPSE_RANGE, FALLOC_FL_INSERT_RANGE, and
> FALLOC_FL_PUNCH_HOLE. Bigalloc is also not very efficient for large
> directories (where we allocate a full cluster for each directory
> block). Older kernels did not handle ENOSPC errors when delayed
> allocation was enabled, but that has since been fixed, and bigalloc is
> passing file system regression tests, so it should safe to use as
> you've described.
>
> Cheers,
>
> - Ted
>
>

2021-07-29 18:00:49

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Is it safe to use the bigalloc feature in the case of ext4 filesystem?

On Wed, Jul 28, 2021 at 11:36:27AM +0200, Mikhail Morfikov wrote:
> Thanks for the answer.
>
> I have one question. Basically there's the /etc/mke2fs.conf file and
> I've created the following stanza in it:
>
> bigdata = {
> errors = remount-ro
> features = has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize,bigalloc,^uninit_bg,sparse_super2
> inode_size = 256
> inode_ratio = 4194304
> cluster_size = 4M
> reserved_ratio = 0
> lazy_itable_init = 0
> lazy_journal_init = 0
> }
>
> It looks like the cluster_size parameter is ignored in such case (I've
> tried both 4M and 4194304 values), and the filesystem was created with
> 64K cluster size (via mkfs -t bigdata -L bigdata /dev/sdb1 ), which is
> the default when the bigalloc feature is set.

It does work, but you need to use an integer value for cluster_size,
and it needs to be in the [fs_types[ section. So something like what I
have attached below.

And then try using the command "mke2fs -t ext4 -T bigdata -L bigdata
/dev/sdb1".

If you see the hugefile and hugefiles stanzas below, that's an example
of one way bigalloc has gotten a fair amount of use. In this use case
mke2fs has pre-allocated the huge data files guaranteeing that they
will be 100% contiguous. We're using a 32k cluster becuase there are
some metadata files where better allocation efficiencies is desired.

Cheers,

- Ted

[defaults]
base_features = sparse_super,large_file,filetype,resize_inode,dir_index,ext_attr
default_mntopts = acl,user_xattr
enable_periodic_fsck = 0
blocksize = 4096
inode_size = 256
inode_ratio = 16384
undo_dir = /var/lib/e2fsprogs/undo

[fs_types]
ext3 = {
features = has_journal
}
ext4 = {
features = has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize
inode_size = 256
}
small = {
blocksize = 1024
inode_size = 128
inode_ratio = 4096
}
floppy = {
blocksize = 1024
inode_size = 128
inode_ratio = 8192
}
big = {
inode_ratio = 32768
}
huge = {
inode_ratio = 65536
}
news = {
inode_ratio = 4096
}
largefile = {
inode_ratio = 1048576
blocksize = -1
}
largefile4 = {
inode_ratio = 4194304
blocksize = -1
}
hurd = {
blocksize = 4096
inode_size = 128
}
hugefiles = {
features = extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
hash_alg = half_md4
reserved_ratio = 0.0
num_backup_sb = 0
packed_meta_blocks = 1
make_hugefiles = 1
inode_ratio = 4194304
hugefiles_dir = /storage
hugefiles_name = chunk-
hugefiles_digits = 5
hugefiles_size = 4G
hugefiles_align = 256M
hugefiles_align_disk = true
zero_hugefiles = false
flex_bg_size = 262144
}

hugefile = {
features = extent,huge_file,bigalloc,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
cluster_size = 32768
hash_alg = half_md4
reserved_ratio = 0.0
num_backup_sb = 0
packed_meta_blocks = 1
make_hugefiles = 1
inode_ratio = 4194304
hugefiles_dir = /storage
hugefiles_name = huge-file
hugefiles_digits = 0
hugefiles_size = 0
hugefiles_align = 256M
hugefiles_align_disk = true
num_hugefiles = 1
zero_hugefiles = false
}
bigdata = {
errors = remount-ro
features = has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize,bigalloc,^uninit_bg,sparse_super2
inode_size = 256
inode_ratio = 4194304
cluster_size = 4194304
reserved_ratio = 0
lazy_itable_init = 0
lazy_journal_init = 0
}

2021-07-29 18:33:17

by Mikhail Morfikov

[permalink] [raw]
Subject: Re: Is it safe to use the bigalloc feature in the case of ext4 filesystem?

On 29/07/2021 19.59, Theodore Ts'o wrote:
> On Wed, Jul 28, 2021 at 11:36:27AM +0200, Mikhail Morfikov wrote:
>> Thanks for the answer.
>>
>> I have one question. Basically there's the /etc/mke2fs.conf file and
>> I've created the following stanza in it:
>>
>> bigdata = {
>> errors = remount-ro
>> features = has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize,bigalloc,^uninit_bg,sparse_super2
>> inode_size = 256
>> inode_ratio = 4194304
>> cluster_size = 4M
>> reserved_ratio = 0
>> lazy_itable_init = 0
>> lazy_journal_init = 0
>> }
>>
>> It looks like the cluster_size parameter is ignored in such case (I've
>> tried both 4M and 4194304 values), and the filesystem was created with
>> 64K cluster size (via mkfs -t bigdata -L bigdata /dev/sdb1 ), which is
>> the default when the bigalloc feature is set.
>
> It does work, but you need to use an integer value for cluster_size,
> and it needs to be in the [fs_types[ section. So something like what I
> have attached below.
>
> And then try using the command "mke2fs -t ext4 -T bigdata -L bigdata
> /dev/sdb1".

Yes, this helped and the cluster size was set to 4194304 as it should.

>
> If you see the hugefile and hugefiles stanzas below, that's an example
> of one way bigalloc has gotten a fair amount of use. In this use case
> mke2fs has pre-allocated the huge data files guaranteeing that they
> will be 100% contiguous. We're using a 32k cluster becuase there are
> some metadata files where better allocation efficiencies is desired.

I'll try them both and see whether I could use either one of them on
my drive.

>
> Cheers,
>
> - Ted
>
> [defaults]
> base_features = sparse_super,large_file,filetype,resize_inode,dir_index,ext_attr
> default_mntopts = acl,user_xattr
> enable_periodic_fsck = 0
> blocksize = 4096
> inode_size = 256
> inode_ratio = 16384
> undo_dir = /var/lib/e2fsprogs/undo
>
> [fs_types]
> ext3 = {
> features = has_journal
> }
> ext4 = {
> features = has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize
> inode_size = 256
> }
> small = {
> blocksize = 1024
> inode_size = 128
> inode_ratio = 4096
> }
> floppy = {
> blocksize = 1024
> inode_size = 128
> inode_ratio = 8192
> }
> big = {
> inode_ratio = 32768
> }
> huge = {
> inode_ratio = 65536
> }
> news = {
> inode_ratio = 4096
> }
> largefile = {
> inode_ratio = 1048576
> blocksize = -1
> }
> largefile4 = {
> inode_ratio = 4194304
> blocksize = -1
> }
> hurd = {
> blocksize = 4096
> inode_size = 128
> }
> hugefiles = {
> features = extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
> hash_alg = half_md4
> reserved_ratio = 0.0
> num_backup_sb = 0
> packed_meta_blocks = 1
> make_hugefiles = 1
> inode_ratio = 4194304
> hugefiles_dir = /storage
> hugefiles_name = chunk-
> hugefiles_digits = 5
> hugefiles_size = 4G
> hugefiles_align = 256M
> hugefiles_align_disk = true
> zero_hugefiles = false
> flex_bg_size = 262144
> }
>
> hugefile = {
> features = extent,huge_file,bigalloc,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
> cluster_size = 32768
> hash_alg = half_md4
> reserved_ratio = 0.0
> num_backup_sb = 0
> packed_meta_blocks = 1
> make_hugefiles = 1
> inode_ratio = 4194304
> hugefiles_dir = /storage
> hugefiles_name = huge-file
> hugefiles_digits = 0
> hugefiles_size = 0
> hugefiles_align = 256M
> hugefiles_align_disk = true
> num_hugefiles = 1
> zero_hugefiles = false
> }
> bigdata = {
> errors = remount-ro
> features = has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize,bigalloc,^uninit_bg,sparse_super2
> inode_size = 256
> inode_ratio = 4194304
> cluster_size = 4194304
> reserved_ratio = 0
> lazy_itable_init = 0
> lazy_journal_init = 0
> }
>

2021-07-30 13:59:19

by Mikhail Morfikov

[permalink] [raw]
Subject: Re: Is it safe to use the bigalloc feature in the case of ext4 filesystem?

I have a question concerning the *hugefiles* parameters. When a
filesystems is created using the hugefiles stanza, it also creates
lots of chunk-* files inside of the /storage/ dir. You say that it
guarantees the huge files to be 100% contiguous. But if I create
the filesystem with the preallocated files that consume the whole
drive, how am I suppose to use that drive? :) Are the files only
temporary, and should they be removed once the filesystem is
"ready"? What's the purpose of such options? Do they affect the
EXT4 metadata in some way? I mean, what's the change compared to
not using the options?

Also I have a question concerning the hugefiles stanza
itself -- it's missing the bigalloc feature, should it be there?


On 29/07/2021 19.59, Theodore Ts'o wrote:
>...
> If you see the hugefile and hugefiles stanzas below, that's an example
> of one way bigalloc has gotten a fair amount of use. In this use case
> mke2fs has pre-allocated the huge data files guaranteeing that they
> will be 100% contiguous. We're using a 32k cluster becuase there are
> some metadata files where better allocation efficiencies is desired.
>
>...
>
> hugefiles = {
> features = extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
> hash_alg = half_md4
> reserved_ratio = 0.0
> num_backup_sb = 0
> packed_meta_blocks = 1
> make_hugefiles = 1
> inode_ratio = 4194304
> hugefiles_dir = /storage
> hugefiles_name = chunk-
> hugefiles_digits = 5
> hugefiles_size = 4G
> hugefiles_align = 256M
> hugefiles_align_disk = true
> zero_hugefiles = false
> flex_bg_size = 262144
> }
>
> hugefile = {
> features = extent,huge_file,bigalloc,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
> cluster_size = 32768
> hash_alg = half_md4
> reserved_ratio = 0.0
> num_backup_sb = 0
> packed_meta_blocks = 1
> make_hugefiles = 1
> inode_ratio = 4194304
> hugefiles_dir = /storage
> hugefiles_name = huge-file
> hugefiles_digits = 0
> hugefiles_size = 0
> hugefiles_align = 256M
> hugefiles_align_disk = true
> num_hugefiles = 1
> zero_hugefiles = false
> }