2019-02-15 17:04:08

by Dave Watson

[permalink] [raw]
Subject: [LSF/MM TOPIC] Improve performance of fget/fput

In some of our hottest network services, fget_light + fput overhead
can represent 1-2% of the processes' total CPU usage. I'd like to
discuss ways to reduce this overhead.

One proposal we have been testing is removing the refcount increment
and decrement, and using some sort of safe memory reclamation
instead. The hottest callers include recvmsg, sendmsg, epoll_wait, etc
- mostly networking calls, often used on non-blocking sockets. Often
we are only incrementing and decrementing the refcount for a very
short period of time, ideally we wouldn't adjust the refcount unless
we know we are going to block.

We could use RCU, but we would have to be particularly careful that
none of these calls ever block, or ensure that we increment the
refcount at the blocking locations. As an alternative to RCU, hazard
pointers have similar overhead to SRCU, and could work equally well on
blocking or nonblocking syscalls without additional changes.

(There were also recent related discussions on SCM_RIGHTS refcount
cycle issues, which is the other half of a file* gc)

There might also be ways to rearrange the file* struct or fd table so
that we're not taking so many cache misses for sockfd_lookup_light,
since for sockets we don't use most of the file* struct at all.



2019-02-15 17:16:52

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [LSF/MM TOPIC] Improve performance of fget/fput

On Fri, Feb 15, 2019 at 04:38:05PM +0000, Dave Watson wrote:
> There might also be ways to rearrange the file* struct or fd table so
> that we're not taking so many cache misses for sockfd_lookup_light,
> since for sockets we don't use most of the file* struct at all.

I don't think there's too much opportunity to rearrange the fd table.
We go from task_struct->files_struct->fdtable->fd[i]. I have a plan
to use the Maple Tree data structure I'm currently working on to change
that to task_struct->files_struct->maple_node->fd[i], but it'll be
the same number of cache misses.