It is common for epoll users to have thousands of epitems, so saving a
cache line on every allocation leads to large memory savings.
Since epitem allocations are cache-aligned, reducing sizeof(struct
epitem) from 136 bytes to 128 bytes will allow it to squeeze under a
cache line boundary on x86_64.
>From /sys/kernel/slab/eventpoll_epi, I see the following changes on my
x86_64 Core2 Duo (which has 64-byte cache alignment):
object_size : 192 => 128
objs_per_slab: 21 => 32
I have no access to other 64-bit machines, so I am limiting this to
x86_64-only with EPOLL_PACKED instead of __attribute__((packed))
Signed-off-by: Eric Wong <[email protected]>
Cc: Davide Libenzi <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andrew Morton <[email protected]>
---
fs/eventpoll.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index cfc4b16..06f3d0e 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -107,7 +107,7 @@
struct epoll_filefd {
struct file *file;
int fd;
-};
+} EPOLL_PACKED;
/*
* Structure used to track possible nested calls, for too deep recursions
--
Eric Wong
On Mon, 4 Mar 2013 11:29:41 +0000 Eric Wong <[email protected]> wrote:
> It is common for epoll users to have thousands of epitems, so saving a
> cache line on every allocation leads to large memory savings.
>
> Since epitem allocations are cache-aligned, reducing sizeof(struct
> epitem) from 136 bytes to 128 bytes will allow it to squeeze under a
> cache line boundary on x86_64.
>
> >From /sys/kernel/slab/eventpoll_epi, I see the following changes on my
> x86_64 Core2 Duo (which has 64-byte cache alignment):
>
> object_size : 192 => 128
> objs_per_slab: 21 => 32
>
> I have no access to other 64-bit machines, so I am limiting this to
> x86_64-only with EPOLL_PACKED instead of __attribute__((packed))
>
> ...
>
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -107,7 +107,7 @@
> struct epoll_filefd {
> struct file *file;
> int fd;
> -};
> +} EPOLL_PACKED;
>
> /*
> * Structure used to track possible nested calls, for too deep recursions
Yes, I see the same numbers on my gcc, x86_64 allmodconfig.
It's going to be hard to maintain this - someone will change something
sometime and break it. I suppose we could add a runtime check if we
cared enough. Adding a big fat comment to struct epitem might help.
I don't see much additional room to be saved. We could probably remove
epitem.nwait, but that wouldn't actually save anything because nwait
nestles with ffd.fd.
I tested your patch on powerpc and it reduced sizeof(epitem) from 136
to 128 for that arch as well, so I suggest we run with
--- a/fs/eventpoll.c~epoll-trim-epitem-by-one-cache-line-on-x86_64-fix
+++ a/fs/eventpoll.c
@@ -105,7 +105,7 @@
struct epoll_filefd {
struct file *file;
int fd;
-} EPOLL_PACKED;
+} __packed;
/*
* Structure used to track possible nested calls, for too deep recursions
_
Andrew Morton <[email protected]> wrote:
> It's going to be hard to maintain this - someone will change something
> sometime and break it. I suppose we could add a runtime check if we
> cared enough. Adding a big fat comment to struct epitem might help.
Thanks for looking at this patch. I'll send a patch with a comment
about keeping epitem size in check. Also, would adding (with comments):
BUILD_BUG_ON(sizeof(struct epitem) > 128);
...be too heavy-handed? I used that in my testing. I'll check for:
sizeof(void *) <= 8 too; in case 128-bit machines appear...
> I don't see much additional room to be saved. We could probably remove
> epitem.nwait, but that wouldn't actually save anything because nwait
> nestles with ffd.fd.
If we remove nwait, we can move epoll_event up and have event.events
tucked in there. I have more and more depending on epoll, so I'll be
around to comment on future epoll changes as they come up.
> I tested your patch on powerpc and it reduced sizeof(epitem) from 136
> to 128 for that arch as well, so I suggest we run with
>
> --- a/fs/eventpoll.c~epoll-trim-epitem-by-one-cache-line-on-x86_64-fix
> +++ a/fs/eventpoll.c
> @@ -105,7 +105,7 @@
> struct epoll_filefd {
> struct file *file;
> int fd;
> -} EPOLL_PACKED;
> +} __packed;
Thanks for testing on ppc. Looks good to me. For what it's worth:
Acked-by: Eric Wong <[email protected]>
On Thu, 7 Mar 2013 10:32:40 +0000 Eric Wong <[email protected]> wrote:
> Andrew Morton <[email protected]> wrote:
> > It's going to be hard to maintain this - someone will change something
> > sometime and break it. I suppose we could add a runtime check if we
> > cared enough. Adding a big fat comment to struct epitem might help.
>
> Thanks for looking at this patch. I'll send a patch with a comment
> about keeping epitem size in check. Also, would adding (with comments):
>
> BUILD_BUG_ON(sizeof(struct epitem) > 128);
>
> ...be too heavy-handed? I used that in my testing. I'll check for:
> sizeof(void *) <= 8 too; in case 128-bit machines appear...
I guess such a check might avoid accidents in the future. If it
becomes a problem, we can always delete it.
This will prevent us from accidentally introducing a memory bloat
regression here in the future.
Signed-off-by: Eric Wong <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Davide Libenzi <[email protected]>,
Cc: Al Viro <[email protected]>
---
Andrew Morton <[email protected]> wrote:
> On Thu, 7 Mar 2013 10:32:40 +0000 Eric Wong <[email protected]> wrote:
>
> > Andrew Morton <[email protected]> wrote:
> > > It's going to be hard to maintain this - someone will change something
> > > sometime and break it. I suppose we could add a runtime check if we
> > > cared enough. Adding a big fat comment to struct epitem might help.
> >
> > Thanks for looking at this patch. I'll send a patch with a comment
> > about keeping epitem size in check. Also, would adding (with comments):
> >
> > BUILD_BUG_ON(sizeof(struct epitem) > 128);
> >
> > ...be too heavy-handed? I used that in my testing. I'll check for:
> > sizeof(void *) <= 8 too; in case 128-bit machines appear...
>
> I guess such a check might avoid accidents in the future. If it
> becomes a problem, we can always delete it.
fs/eventpoll.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9fec183..55028da 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -128,6 +128,8 @@ struct nested_calls {
/*
* Each file descriptor added to the eventpoll interface will
* have an entry of this type linked to the "rbr" RB tree.
+ * Avoid increasing the size of this struct, there can be many thousands
+ * of these on a server and we do not want this to take another cache line.
*/
struct epitem {
/* RB tree node used to link this structure to the eventpoll RB tree */
@@ -1964,6 +1966,12 @@ static int __init eventpoll_init(void)
/* Initialize the structure used to perform file's f_op->poll() calls */
ep_nested_calls_init(&poll_readywalk_ncalls);
+ /*
+ * We can have many thousands of epitems, so prevent this from
+ * using an extra cache line on 64-bit (and smaller) CPUs
+ */
+ BUILD_BUG_ON(sizeof(void *) <= 8 && sizeof(struct epitem) > 128);
+
/* Allocates slab cache used to allocate "struct epitem" items */
epi_cache = kmem_cache_create("eventpoll_epi", sizeof(struct epitem),
0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
--
Eric Wong