Under what circumstances can an inode ever end up with EXT4_HUGE_FILE_FL
set? (Other than in an artificially constructed filesystem.)
As far as I can tell, extents don't allow a file to get bigger than
2**32 filesystem blocks (because they store block offsets in an le32),
which with the maximum filesystem block size of 65536 would be 2**48
bytes.
That's lower than the file size limit that EXT4_HUGE_FILE_FL seems to
exist to surpass; even without EXT4_HUGE_FILE_FL, the 48-bit "block"
count in the inode would allow a file to have 2**48 512-byte "blocks" in
it, or 2**57 bytes.
Was EXT4_HUGE_FILE_FL just added for future extensibility, in case a
future file storage mechanism allows storing files bigger than 2**32
blocks?
How extensively has it been tested?
(Related: are there any plans or discussions regarding a future extent
format? Not necessarily just for that reason, but there are other limits
in the existing extent format, such as the limit of 32768 contiguous
blocks in one extent.)
Thanks,
Josh Triplett
On Mon, Apr 06, 2020 at 03:45:34PM -0700, Josh Triplett wrote:
> Under what circumstances can an inode ever end up with EXT4_HUGE_FILE_FL
> set? (Other than in an artificially constructed filesystem.)
>
> Was EXT4_HUGE_FILE_FL just added for future extensibility, in case a
> future file storage mechanism allows storing files bigger than 2**32
> blocks?
Yes. basically. When we added the huge_file feature, which introduced
the i_blocks_hi field, the thinking was to add EXT4_HUGE_FILE_FL so
that we could painlessly upgrade a file system from ext3 (w/o the huge
file feature) to enabling the feature without having to rewrite all of
the inodes. However, we also didn't want to artificially limit
ourselves to 2**57 file sizes, so we also added the EXT4_HUGE_FILE_FL
flag.
It hasn't gotten a huge amount of testing in a while, but it would be
relatively easy to add debugging code (triggered via a mount option or
a sysfs file) which forces the use of EXT4_HUGE_FILE_FL all the time.
> (Related: are there any plans or discussions regarding a future extent
> format? Not necessarily just for that reason, but there are other limits
> in the existing extent format, such as the limit of 32768 contiguous
> blocks in one extent.)
We've talked about it, and when I implemented the e2fsprogs support
for extents, I deliberately implemented it so we could more easily
support multiple extent tree formats. Unfortunately, the kernel code
wasn't written to do this easily. So we would either need to fork a
large portion of fs/ext4/extents.c, or we would have to refactor the
code to allow supporting multiple extent formats at the same titme.
A related project would be to create a more general btree library
which understands supports journalled changes using jbd2, but which
was general enough it could support the extent tree code, but also
might be usable to support an tree-based extent allocation tree with
refcounts to replace the block allocation bitmaps, to enable ext4 to
support copy-on-write reflinks and snapshots.
It just hasn't been high enough priority for any one to get their
company to fund that kind of work --- and it's complex enough that it
would be hard to make it fit within an intern project or a Google
Summer of Code project. Maybe if we assumed that the intern already
was familiar with Kernel programming, but that's in general not a safe
assumption that we can make.
Cheers,
- Ted
On Mon, Apr 06, 2020 at 11:30:31PM -0400, Theodore Y. Ts'o wrote:
> On Mon, Apr 06, 2020 at 03:45:34PM -0700, Josh Triplett wrote:
> > Under what circumstances can an inode ever end up with EXT4_HUGE_FILE_FL
> > set? (Other than in an artificially constructed filesystem.)
> >
> > Was EXT4_HUGE_FILE_FL just added for future extensibility, in case a
> > future file storage mechanism allows storing files bigger than 2**32
> > blocks?
>
> Yes. basically. When we added the huge_file feature, which introduced
> the i_blocks_hi field, the thinking was to add EXT4_HUGE_FILE_FL so
> that we could painlessly upgrade a file system from ext3 (w/o the huge
> file feature) to enabling the feature without having to rewrite all of
> the inodes. However, we also didn't want to artificially limit
> ourselves to 2**57 file sizes, so we also added the EXT4_HUGE_FILE_FL
> flag.
Thanks for the explanation! That makes sense.
> It hasn't gotten a huge amount of testing in a while, but it would be
> relatively easy to add debugging code (triggered via a mount option or
> a sysfs file) which forces the use of EXT4_HUGE_FILE_FL all the time.
That does seem like a good idea. It would also be nice to have an e2fsck
option to rewrite all inodes to use EXT4_HUGE_FILE_FL.
I think I'll avoid poking that code for now, though, since I don't
currently have a need for files anywhere near that large.