2019-09-03 21:32:34

by Deepa Dinamani

[permalink] [raw]
Subject: Re: "beyond 2038" warnings from loopback mount is noisy

On Tue, Sep 3, 2019 at 2:18 PM Theodore Y. Ts'o <[email protected]> wrote:
>
> On Tue, Sep 03, 2019 at 09:18:44AM -0700, Deepa Dinamani wrote:
> >
> > This prints a warning for each inode that doesn't extend limits beyond
> > 2038. It is rate limited by the ext4_warning_inode().
> > Looks like your filesystem has inodes that cannot be extended.
> > We could use a different rate limit or ignore this corner case. Do the
> > maintainers have a preference?
>
> We need to drop this commit (ext4: Initialize timestamps limits), or
> at least the portion which adds the call to the EXT4_INODE_SET_XTIME
> macro in ext4.h.

As Arnd said, I think this can be fixed by warning only when the inode
size is not uniformly 128 bytes in ext4.h. Is this an acceptable
solution or we want to drop this warning altogether?

Arnd, should I be sending a pull request again with the fix? Or, we
drop the ext4 patch and I can send it to the maintainers directly?

> I know of a truly vast number of servers in production all over the
> world which are using 128 byte inodes, and spamming the inodes at the
> maximum rate limit is a really bad idea. This includes at some major
> cloud data centers where the life of individual servers in their data
> centers is well understood (they're not going to last until 2038) and
> nothing stored on the local Linux file systems are long-lived ---
> that's all stored in the cluster file systems. The choice of 128 byte
> inode was deliberately chosen to maximize storage TCO, and so spamming
> a warning at high rates is going to be extremely unfriendly.
>
> In cases where the inode size is such that there is no chance at all
> to support timestamps beyond 2038, a single warning at mount time, or
> maybe a warning at mkfs time might be acceptable. But there's no
> point printing a warning time each time we set a timestamp on such a
> file system. It's not going to change, and past a certain point, we
> need to trust that people who are using 128 byte inodes did so knowing
> what the tradeoffs might be. After all, it is *not* the default.

We have a single mount time warning already in place here. I did not
realize some people actually chose to use 128 byte inodes on purpose.

-Deepa


2019-09-03 22:18:14

by Theodore Ts'o

[permalink] [raw]
Subject: Re: "beyond 2038" warnings from loopback mount is noisy

On Tue, Sep 03, 2019 at 02:31:06PM -0700, Deepa Dinamani wrote:
> > We need to drop this commit (ext4: Initialize timestamps limits), or
> > at least the portion which adds the call to the EXT4_INODE_SET_XTIME
> > macro in ext4.h.
>
> As Arnd said, I think this can be fixed by warning only when the inode
> size is not uniformly 128 bytes in ext4.h. Is this an acceptable
> solution or we want to drop this warning altogether?

If we have a mount-time warning, I really don't think a warning in the
kernel is going to be helpful. It's only going to catch the most
extreme cases --- specifically, a file system originally created and
written using ext3 (real ext3; even before we dropped ext3 from the
upstream kernel, most distributions were using ext4 to provide ext3
support) and which included enough extended attributes that there is
no space in the inode and the external xattr block for there to make
space for the extra timestamp. That's extremely rare edge cases, and
I don't think it's worth trying to catch it in the kernel.

The right place to catch this is rather in e2fsck, I think.

> We have a single mount time warning already in place here. I did not
> realize some people actually chose to use 128 byte inodes on purpose.

Yes, there are definitely some people who are still doing this. The
other case, as noted on this thread, is that file systems smaller than
512 MiB are treated as type "small" (and file systems smaller than
4MiB are treated as type "floppy"), and today, we are still using 128
byte inodes to minimize the overhead of the inode table. It's
probably time to reconsider these defaults, but that's an e2fsprogs
level change. And that's not going to change the fact that there are
people who are deliberately choosing to use 128 byte inode.

Changes that we could consider:

1) Change the default for types "small" and "floppy" to be 256 byte inodes.

2) Add a warning to mke2fs to give a warning when creating a file
system with 128 byte inodes.

3) Add code to e2fsck to automatically make room for the timestamp if
possible.

4) Add code to e2fsck so that at some pre-determined point in the
future (maybe 5 years before 2038?) have it print warnings for file
systems using 128 byte inodes, and for file systems with 256+ byte
inodes and where there isn't enough space in the inode for expanded
timestamps.

Cheers,

- Ted