2017-03-20 10:29:03

by Artem Blagodarenko

[permalink] [raw]
Subject: 64-bit inode number

Hello,

This topic was mentioned in "Add largedir feature”, but need to be discussed in separate thread.
Increasing maximum inode count is useful with and without larg_dir. There is at least one user who needs 64 bit inode number - Lustre FS.

As mentioned, MDS has 0-size files to store some information about Lustre FS files. Current MDS disk sizes allow to store large amount of such files, but EXT4 limits this number to ~4 billions.
Lustre FS has features like DNE to distribute MDS over many targets (disks), but disks are used not effectively. It would be great to have ability to store more then ~4 billions inodes on one
EXT4 file system.

I know there is dirdata feature that allows to store higher 32 bit of inode number in ext4 dirent. As I know, direct was not merged yet because of user absence. Quote of Andreas from "Add largedir feature”

> Mostly because there hasn't been any interest for it whenever I proposed
> merging it in the past. If there is some renewed interest in merging it
> I could look into it …

It looks like Lustre FS requires this feature now.

There is another approach how to solve this problem. It is obvious, but require change on disk format. Theodore’s quote from "Add largedir feature”

> I can imagine a new feature flag which defines the use a 64-bit inode
> number, but that's more for people who are creating a file system that
> takes advantage of 64-bit block numbers, and they are intending on
> using all of that space to store small (< 4k or < 8k) files.


This is exact Lustre FS MDS example. Many small inodes. If it possible to add new feature flag, probably this is the best solution: simple, obvious, fast.

Please, help with this questions:
1. Do we need 64 bit number now? (My opinion - we need it)
2. What solution from two above to choose? Another solution?

Thanks.

Artem Blagodarenko


2017-03-21 16:13:43

by Andreas Dilger

[permalink] [raw]
Subject: Re: 64-bit inode number

On Mar 20, 2017, at 6:22 AM, Благодаренко Артём <[email protected]> wrote:
>
> Hello,
>
> This topic was mentioned in "Add largedir feature”, but need to be discussed in
> a separate thread.
> Increasing maximum inode count is useful with and without larg_dir. There is at
> least one user who needs 64 bit inode number - Lustre FS.
>
> As mentioned, MDS has 0-size files to store some information about Lustre FS
> files. Current MDS disk sizes allow to store large amount of such files, but
> EXT4 limits this number to ~4 billions.
>
> Lustre FS has features like DNE to distribute MDS over many targets (disks),
> but disks are used not effectively. It would be great to have ability to
> store more then ~4 billions inodes on one EXT4 file system.

I guess the major potential problem with more than 4B inodes in a single
filesystem (and also the large 300TB+ filesystems you are using) is that
running e2fsck could take a very long time. Conversely, using DNE to spread
the metadata across multiple filesystems/servers allows e2fsck to run in
parallel and limits any failures to a smaller subset of the filesystem.

That doesn't mean I'm totally against this feature, since > 8TB disks are
becoming common.

> I know there is dirdata feature that allows to store higher 32 bit of inode
> number in ext4 dirent. As I know, direct was not merged yet because of user
> absence. Quote of Andreas from "Add largedir feature”
>
>> Mostly because there hasn't been any interest for it whenever I proposed
>> merging it in the past. If there is some renewed interest in merging it
>> I could look into it …
>
> It looks like Lustre FS requires this feature now.
>
> There is another approach how to solve this problem. It is obvious, but
> require change on disk format. Theodore’s quote from "Add largedir feature”
>
>> I can imagine a new feature flag which defines the use a 64-bit inode
>> number, but that's more for people who are creating a file system that
>> takes advantage of 64-bit block numbers, and they are intending on
>> using all of that space to store small (< 4k or < 8k) files.
>
> This is exact Lustre FS MDS example. Many small inodes. If it possible to add new feature flag, probably this is the best solution: simple, obvious, fast.
>
> Please, help with this questions:
> 1. Do we need 64 bit number now? (My opinion - we need it)
> 2. What solution from two above to choose? Another solution?

It wasn't clear from Ted's comments whether he was proposing the feature
flag to store 64-bit inode numbers directly into a new ext4_dir_entry64,
or to use the dir_data to hold the high 32 bits? My preference would be to
store the high bits of the inode number into dir_data. The reasons are:
- this won't use more space for 64-bit inodes than ext4_dir_entry64
- for 32-bit inode numbers will have smaller dirents
- significantly more 32-bit dirents can fit into a leaf block (i.e. 10-25%)
- it is backwards compatible with existing directories and can transparently
store 64-bit inode numbers into 32-bit directories without a full update
- it avoids duplicate code paths for ext4_dir_entry vs ext4_dir_entry64
- it would be possible to only store high 16 bits (2^48 inodes) since this
may be enough for ext4, since ext4_extent can only address 2^48 blocks
(2^60 bytes) and there isn't much value to more inodes than blocks?

Cheers, Andreas






Attachments:
signature.asc (195.00 B)
Message signed with OpenPGP