2024-05-18 00:09:03

by Jeff Layton

[permalink] [raw]
Subject: [PATCH] fs: switch timespec64 fields in inode to discrete integers

Adjacent struct timespec64's pack poorly. Switch them to discrete
integer fields for the seconds and nanoseconds. This shaves 8 bytes off
struct inode with a garden-variety Fedora Kconfig on x86_64, but that
also moves the i_lock into the previous cacheline, away from the fields
it protects.

To remedy that, move i_generation above the i_lock, which moves the new
4-byte hole to just after the i_fsnotify_mask in my setup. Amir has
plans to use that to expand the i_fsnotify_mask, so add a comment to
that effect as well.

Cc: Amir Goldstein <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
For reference (according to pahole):

Before: /* size: 624, cachelines: 10, members: 53 */
After: /* size: 616, cachelines: 10, members: 56 */

I've done some lightweight testing with this and it seems fine, but I
wouldn't mind seeing it sit in -next for a bit to make sure I haven't
missed anything.
---
include/linux/fs.h | 48 ++++++++++++++++++++++++++++++++----------------
1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index b7b9b3b79acc..45e8766de7d8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -660,9 +660,13 @@ struct inode {
};
dev_t i_rdev;
loff_t i_size;
- struct timespec64 __i_atime;
- struct timespec64 __i_mtime;
- struct timespec64 __i_ctime; /* use inode_*_ctime accessors! */
+ time64_t i_atime_sec;
+ time64_t i_mtime_sec;
+ time64_t i_ctime_sec;
+ u32 i_atime_nsec;
+ u32 i_mtime_nsec;
+ u32 i_ctime_nsec;
+ u32 i_generation;
spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
unsigned short i_bytes;
u8 i_blkbits;
@@ -719,10 +723,10 @@ struct inode {
unsigned i_dir_seq;
};

- __u32 i_generation;

#ifdef CONFIG_FSNOTIFY
__u32 i_fsnotify_mask; /* all events this inode cares about */
+ /* 32-bit hole reserved for expanding i_fsnotify_mask */
struct fsnotify_mark_connector __rcu *i_fsnotify_marks;
#endif

@@ -1544,23 +1548,27 @@ struct timespec64 inode_set_ctime_current(struct inode *inode);

static inline time64_t inode_get_atime_sec(const struct inode *inode)
{
- return inode->__i_atime.tv_sec;
+ return inode->i_atime_sec;
}

static inline long inode_get_atime_nsec(const struct inode *inode)
{
- return inode->__i_atime.tv_nsec;
+ return inode->i_atime_nsec;
}

static inline struct timespec64 inode_get_atime(const struct inode *inode)
{
- return inode->__i_atime;
+ struct timespec64 ts = { .tv_sec = inode_get_atime_sec(inode),
+ .tv_nsec = inode_get_atime_nsec(inode) };
+
+ return ts;
}

static inline struct timespec64 inode_set_atime_to_ts(struct inode *inode,
struct timespec64 ts)
{
- inode->__i_atime = ts;
+ inode->i_atime_sec = ts.tv_sec;
+ inode->i_atime_nsec = ts.tv_nsec;
return ts;
}

@@ -1569,28 +1577,32 @@ static inline struct timespec64 inode_set_atime(struct inode *inode,
{
struct timespec64 ts = { .tv_sec = sec,
.tv_nsec = nsec };
+
return inode_set_atime_to_ts(inode, ts);
}

static inline time64_t inode_get_mtime_sec(const struct inode *inode)
{
- return inode->__i_mtime.tv_sec;
+ return inode->i_mtime_sec;
}

static inline long inode_get_mtime_nsec(const struct inode *inode)
{
- return inode->__i_mtime.tv_nsec;
+ return inode->i_mtime_nsec;
}

static inline struct timespec64 inode_get_mtime(const struct inode *inode)
{
- return inode->__i_mtime;
+ struct timespec64 ts = { .tv_sec = inode_get_mtime_sec(inode),
+ .tv_nsec = inode_get_mtime_nsec(inode) };
+ return ts;
}

static inline struct timespec64 inode_set_mtime_to_ts(struct inode *inode,
struct timespec64 ts)
{
- inode->__i_mtime = ts;
+ inode->i_mtime_sec = ts.tv_sec;
+ inode->i_mtime_nsec = ts.tv_nsec;
return ts;
}

@@ -1604,23 +1616,27 @@ static inline struct timespec64 inode_set_mtime(struct inode *inode,

static inline time64_t inode_get_ctime_sec(const struct inode *inode)
{
- return inode->__i_ctime.tv_sec;
+ return inode->i_ctime_sec;
}

static inline long inode_get_ctime_nsec(const struct inode *inode)
{
- return inode->__i_ctime.tv_nsec;
+ return inode->i_ctime_nsec;
}

static inline struct timespec64 inode_get_ctime(const struct inode *inode)
{
- return inode->__i_ctime;
+ struct timespec64 ts = { .tv_sec = inode_get_ctime_sec(inode),
+ .tv_nsec = inode_get_ctime_nsec(inode) };
+
+ return ts;
}

static inline struct timespec64 inode_set_ctime_to_ts(struct inode *inode,
struct timespec64 ts)
{
- inode->__i_ctime = ts;
+ inode->i_ctime_sec = ts.tv_sec;
+ inode->i_ctime_nsec = ts.tv_nsec;
return ts;
}


---
base-commit: 7ee332c9f12bc5b380e36919cd7d056592a7073f
change-id: 20240517-amtime-68a7ebc76f45

Best regards,
--
Jeff Layton <[email protected]>



2024-05-18 10:12:28

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] fs: switch timespec64 fields in inode to discrete integers

On Fri, May 17, 2024 at 08:08:40PM -0400, Jeff Layton wrote:
> For reference (according to pahole):
>
> Before: /* size: 624, cachelines: 10, members: 53 */
> After: /* size: 616, cachelines: 10, members: 56 */

Smaller is always better, but for a meaningful improvement, we'd want
to see more. On my laptop running a Debian 6.6.15 kernel, I see:

inode_cache 11398 11475 640 25 4 : tunables 0 0 0 : slabdata 459 459 0

so there's 25 inodes per 4 pages. Going down to 632 is still 25 per 4
pages. At 628 bytes, we get 26 per 4 pages. Ar 604 bytes, we're at 27.
And at 584 bytes, we get 28.

Of course, struct inode gets embedded in a lot of filesystem inodes.
xfs_inode 142562 142720 1024 32 8 : tunables 0 0 0 : slabdata 4460 4460 0
ext4_inode_cache 81 81 1184 27 8 : tunables 0 0 0 : slabdata 3 3 0
sock_inode_cache 2123 2223 832 39 8 : tunables 0 0 0 : slabdata 57 57 0

So any of them might cross a magic boundary where we suddenly get more
objects per slab.

Not trying to diss the work you've done here, just pointing out the
limits for anyone who's trying to do something similar. Or maybe
inspire someone to do more reductions ;-)

2024-05-18 10:48:43

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] fs: switch timespec64 fields in inode to discrete integers

On Sat, 2024-05-18 at 06:23 +0100, Matthew Wilcox wrote:
> On Fri, May 17, 2024 at 08:08:40PM -0400, Jeff Layton wrote:
> > For reference (according to pahole):
> >
> > Before: /* size: 624, cachelines: 10, members: 53 */
> > After: /* size: 616, cachelines: 10, members: 56 */
>
> Smaller is always better, but for a meaningful improvement, we'd want
> to see more. On my laptop running a Debian 6.6.15 kernel, I see:
>
> inode_cache 11398 11475 640 25 4 : tunables 0 0 0 : slabdata 459 459 0
>
> so there's 25 inodes per 4 pages. Going down to 632 is still 25 per 4
> pages. At 628 bytes, we get 26 per 4 pages. Ar 604 bytes, we're at 27.
> And at 584 bytes, we get 28.
>
> Of course, struct inode gets embedded in a lot of filesystem inodes.
> xfs_inode 142562 142720 1024 32 8 : tunables 0 0 0 : slabdata 4460 4460 0
> ext4_inode_cache 81 81 1184 27 8 : tunables 0 0 0 : slabdata 3 3 0
> sock_inode_cache 2123 2223 832 39 8 : tunables 0 0 0 : slabdata 57 57 0
>
> So any of them might cross a magic boundary where we suddenly get more
> objects per slab.
>
> Not trying to diss the work you've done here, just pointing out the
> limits for anyone who's trying to do something similar. Or maybe
> inspire someone to do more reductions ;-)

Way to bust my bubble, Willy. ;-)

On a more serious note, I may be able to squeeze out another 4 bytes by
moving __i_ctime to a single 8 byte word. It's never settable from
userland, so we probably don't need the full range of times that a
timespec64 gives us there. Shrinking that may also make the multigrain
time rework simpler.

David Howells was also looking at removing the i_private field as well.
Since these structs are usually embedded in a larger structure, it's
not clear that we need that field. If we can make that work, it'll mean
another 8 bytes goes away on 64-bit arches.

IOW, I think there may be some other opportunities for shrinkage in the
future.
--
Jeff Layton <[email protected]>

2024-05-18 16:04:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] fs: switch timespec64 fields in inode to discrete integers

On Fri, 17 May 2024 at 22:23, Matthew Wilcox <[email protected]> wrote:
>
> Smaller is always better, but for a meaningful improvement, we'd want
> to see more.

I think one of the more interesting metrics for inodes is actually not
necessarily size per se, but cache footprint.

A *lot* of the inode is never actually touched in normal operation.
Inodes have all these fields that are only used for certain types, or
perhaps only for IO.

So inodes are big, but more important than shrinking them is likely to
try to make them dense in the cache for normal operations (ie
open/close/stat in particular). They cache very well, and actual
memory use - while still somewhat relevant - is less relevant than
cache misses.

Linus

2024-05-20 10:53:00

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] fs: switch timespec64 fields in inode to discrete integers

On Fri 17-05-24 20:08:40, Jeff Layton wrote:
> Adjacent struct timespec64's pack poorly. Switch them to discrete
> integer fields for the seconds and nanoseconds. This shaves 8 bytes off
> struct inode with a garden-variety Fedora Kconfig on x86_64, but that
> also moves the i_lock into the previous cacheline, away from the fields
> it protects.
>
> To remedy that, move i_generation above the i_lock, which moves the new
> 4-byte hole to just after the i_fsnotify_mask in my setup. Amir has
> plans to use that to expand the i_fsnotify_mask, so add a comment to
> that effect as well.
>
> Cc: Amir Goldstein <[email protected]>
> Suggested-by: Linus Torvalds <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>

I like this! Although it doesn't possibly result in less memory being used
by the slab cache in the end as Matthew points out, it still makes the
struct more dense and if nothing else it gives us more space for future
growth ;). So feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> For reference (according to pahole):
>
> Before: /* size: 624, cachelines: 10, members: 53 */
> After: /* size: 616, cachelines: 10, members: 56 */
>
> I've done some lightweight testing with this and it seems fine, but I
> wouldn't mind seeing it sit in -next for a bit to make sure I haven't
> missed anything.
> ---
> include/linux/fs.h | 48 ++++++++++++++++++++++++++++++++----------------
> 1 file changed, 32 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b7b9b3b79acc..45e8766de7d8 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -660,9 +660,13 @@ struct inode {
> };
> dev_t i_rdev;
> loff_t i_size;
> - struct timespec64 __i_atime;
> - struct timespec64 __i_mtime;
> - struct timespec64 __i_ctime; /* use inode_*_ctime accessors! */
> + time64_t i_atime_sec;
> + time64_t i_mtime_sec;
> + time64_t i_ctime_sec;
> + u32 i_atime_nsec;
> + u32 i_mtime_nsec;
> + u32 i_ctime_nsec;
> + u32 i_generation;
> spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
> unsigned short i_bytes;
> u8 i_blkbits;
> @@ -719,10 +723,10 @@ struct inode {
> unsigned i_dir_seq;
> };
>
> - __u32 i_generation;
>
> #ifdef CONFIG_FSNOTIFY
> __u32 i_fsnotify_mask; /* all events this inode cares about */
> + /* 32-bit hole reserved for expanding i_fsnotify_mask */
> struct fsnotify_mark_connector __rcu *i_fsnotify_marks;
> #endif
>
> @@ -1544,23 +1548,27 @@ struct timespec64 inode_set_ctime_current(struct inode *inode);
>
> static inline time64_t inode_get_atime_sec(const struct inode *inode)
> {
> - return inode->__i_atime.tv_sec;
> + return inode->i_atime_sec;
> }
>
> static inline long inode_get_atime_nsec(const struct inode *inode)
> {
> - return inode->__i_atime.tv_nsec;
> + return inode->i_atime_nsec;
> }
>
> static inline struct timespec64 inode_get_atime(const struct inode *inode)
> {
> - return inode->__i_atime;
> + struct timespec64 ts = { .tv_sec = inode_get_atime_sec(inode),
> + .tv_nsec = inode_get_atime_nsec(inode) };
> +
> + return ts;
> }
>
> static inline struct timespec64 inode_set_atime_to_ts(struct inode *inode,
> struct timespec64 ts)
> {
> - inode->__i_atime = ts;
> + inode->i_atime_sec = ts.tv_sec;
> + inode->i_atime_nsec = ts.tv_nsec;
> return ts;
> }
>
> @@ -1569,28 +1577,32 @@ static inline struct timespec64 inode_set_atime(struct inode *inode,
> {
> struct timespec64 ts = { .tv_sec = sec,
> .tv_nsec = nsec };
> +
> return inode_set_atime_to_ts(inode, ts);
> }
>
> static inline time64_t inode_get_mtime_sec(const struct inode *inode)
> {
> - return inode->__i_mtime.tv_sec;
> + return inode->i_mtime_sec;
> }
>
> static inline long inode_get_mtime_nsec(const struct inode *inode)
> {
> - return inode->__i_mtime.tv_nsec;
> + return inode->i_mtime_nsec;
> }
>
> static inline struct timespec64 inode_get_mtime(const struct inode *inode)
> {
> - return inode->__i_mtime;
> + struct timespec64 ts = { .tv_sec = inode_get_mtime_sec(inode),
> + .tv_nsec = inode_get_mtime_nsec(inode) };
> + return ts;
> }
>
> static inline struct timespec64 inode_set_mtime_to_ts(struct inode *inode,
> struct timespec64 ts)
> {
> - inode->__i_mtime = ts;
> + inode->i_mtime_sec = ts.tv_sec;
> + inode->i_mtime_nsec = ts.tv_nsec;
> return ts;
> }
>
> @@ -1604,23 +1616,27 @@ static inline struct timespec64 inode_set_mtime(struct inode *inode,
>
> static inline time64_t inode_get_ctime_sec(const struct inode *inode)
> {
> - return inode->__i_ctime.tv_sec;
> + return inode->i_ctime_sec;
> }
>
> static inline long inode_get_ctime_nsec(const struct inode *inode)
> {
> - return inode->__i_ctime.tv_nsec;
> + return inode->i_ctime_nsec;
> }
>
> static inline struct timespec64 inode_get_ctime(const struct inode *inode)
> {
> - return inode->__i_ctime;
> + struct timespec64 ts = { .tv_sec = inode_get_ctime_sec(inode),
> + .tv_nsec = inode_get_ctime_nsec(inode) };
> +
> + return ts;
> }
>
> static inline struct timespec64 inode_set_ctime_to_ts(struct inode *inode,
> struct timespec64 ts)
> {
> - inode->__i_ctime = ts;
> + inode->i_ctime_sec = ts.tv_sec;
> + inode->i_ctime_nsec = ts.tv_nsec;
> return ts;
> }
>
>
> ---
> base-commit: 7ee332c9f12bc5b380e36919cd7d056592a7073f
> change-id: 20240517-amtime-68a7ebc76f45
>
> Best regards,
> --
> Jeff Layton <[email protected]>
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-05-21 14:29:09

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] fs: switch timespec64 fields in inode to discrete integers

On Fri, 17 May 2024 20:08:40 -0400, Jeff Layton wrote:
> Adjacent struct timespec64's pack poorly. Switch them to discrete
> integer fields for the seconds and nanoseconds. This shaves 8 bytes off
> struct inode with a garden-variety Fedora Kconfig on x86_64, but that
> also moves the i_lock into the previous cacheline, away from the fields
> it protects.
>
> To remedy that, move i_generation above the i_lock, which moves the new
> 4-byte hole to just after the i_fsnotify_mask in my setup. Amir has
> plans to use that to expand the i_fsnotify_mask, so add a comment to
> that effect as well.
>
> [...]

Let's see what the performance bot thing has to say...

---

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/1] fs: switch timespec64 fields in inode to discrete integers
https://git.kernel.org/vfs/vfs/c/ad401d976810

2024-05-21 16:08:28

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH] fs: switch timespec64 fields in inode to discrete integers

On Sat, May 18, 2024 at 06:48:30AM -0400, Jeff Layton wrote:
> On Sat, 2024-05-18 at 06:23 +0100, Matthew Wilcox wrote:
> > On Fri, May 17, 2024 at 08:08:40PM -0400, Jeff Layton wrote:
> > > For reference (according to pahole):
> > >
> > > Before: /* size: 624, cachelines: 10, members: 53 */
> > > After: /* size: 616, cachelines: 10, members: 56 */
> >
> > Smaller is always better, but for a meaningful improvement, we'd want
> > to see more. On my laptop running a Debian 6.6.15 kernel, I see:
> >
> > inode_cache 11398 11475 640 25 4 : tunables 0 0 0 : slabdata 459 459 0
> >
> > so there's 25 inodes per 4 pages. Going down to 632 is still 25 per 4
> > pages. At 628 bytes, we get 26 per 4 pages. Ar 604 bytes, we're at 27.
> > And at 584 bytes, we get 28.
> >
> > Of course, struct inode gets embedded in a lot of filesystem inodes.
> > xfs_inode 142562 142720 1024 32 8 : tunables 0 0 0 : slabdata 4460 4460 0
> > ext4_inode_cache 81 81 1184 27 8 : tunables 0 0 0 : slabdata 3 3 0
> > sock_inode_cache 2123 2223 832 39 8 : tunables 0 0 0 : slabdata 57 57 0
> >
> > So any of them might cross a magic boundary where we suddenly get more
> > objects per slab.
> >
> > Not trying to diss the work you've done here, just pointing out the
> > limits for anyone who's trying to do something similar. Or maybe
> > inspire someone to do more reductions ;-)
>
> Way to bust my bubble, Willy. ;-)
>
> On a more serious note, I may be able to squeeze out another 4 bytes by
> moving __i_ctime to a single 8 byte word. It's never settable from
> userland, so we probably don't need the full range of times that a
> timespec64 gives us there. Shrinking that may also make the multigrain
> time rework simpler.
>
> David Howells was also looking at removing the i_private field as well.
> Since these structs are usually embedded in a larger structure, it's
> not clear that we need that field. If we can make that work, it'll mean
> another 8 bytes goes away on 64-bit arches.
>
> IOW, I think there may be some other opportunities for shrinkage in the
> future.

Incremental shrinking works well, we've managed to get btrfs_inode under
1024 bytes recently but it took several releases, removing, reordering
or otherwise optimizing the size. It's easier to focus on what's left
there than to take notes and assume all the other struct members that
could be optimized eventually.