The xdr_get_next_encode_buffer() function is called infrequently.
On a typical build/test workload, it's called about 1 time in 400
calls to xdr_reserve_space() (measured on NFSD).
Force the compiler to remove it from xdr_reserve_space(), which is
a hot path. This change reduces the size of xdr_reserve_space() from
10 cache lines to 4 on my test system.
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xdr.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index b57cf9df4de8..08a85375b311 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -945,8 +945,13 @@ inline void xdr_commit_encode(struct xdr_stream *xdr)
}
EXPORT_SYMBOL_GPL(xdr_commit_encode);
-static __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
- size_t nbytes)
+/*
+ * The buffer space to be reserved crosses the boundary between
+ * xdr->buf->head and xdr->buf->pages, or between two pages
+ * in xdr->buf->pages.
+ */
+static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
+ size_t nbytes)
{
__be32 *p;
int space_left;
> On Jun 6, 2022, at 9:03 PM, NeilBrown <[email protected]> wrote:
>
> On Mon, 06 Jun 2022, Chuck Lever wrote:
>> The xdr_get_next_encode_buffer() function is called infrequently.
>> On a typical build/test workload, it's called about 1 time in 400
>> calls to xdr_reserve_space() (measured on NFSD).
>>
>> Force the compiler to remove it from xdr_reserve_space(), which is
>> a hot path. This change reduces the size of xdr_reserve_space() from
>> 10 cache lines to 4 on my test system.
>
> Does this really help at all? Are the instructions that are executed in
> the common case distributed over those 10 cache line, or are they all in
> the first 4?
>
> I would have thought the "unlikely" in xdr_reserve_space() would have
> pushed all the code from xdr_get_next_encode_buffer() to the end of the
> function leaving the remainder in a small contiguous chunk.
Well, granted that I'm compiling with -Os, not -O2. The compiler inlines
xdr_get_next_encode_buffer() right in the middle of xdr_reserve_space().
> NeilBrown
>
>
>>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> net/sunrpc/xdr.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
>> index b57cf9df4de8..08a85375b311 100644
>> --- a/net/sunrpc/xdr.c
>> +++ b/net/sunrpc/xdr.c
>> @@ -945,8 +945,13 @@ inline void xdr_commit_encode(struct xdr_stream *xdr)
>> }
>> EXPORT_SYMBOL_GPL(xdr_commit_encode);
>>
>> -static __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
>> - size_t nbytes)
>> +/*
>> + * The buffer space to be reserved crosses the boundary between
>> + * xdr->buf->head and xdr->buf->pages, or between two pages
>> + * in xdr->buf->pages.
>> + */
>> +static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
>> + size_t nbytes)
>> {
>> __be32 *p;
>> int space_left;
>>
>>
>>
--
Chuck Lever
On Tue, 07 Jun 2022, Chuck Lever III wrote:
>
> > On Jun 6, 2022, at 9:03 PM, NeilBrown <[email protected]> wrote:
> >
> > On Mon, 06 Jun 2022, Chuck Lever wrote:
> >> The xdr_get_next_encode_buffer() function is called infrequently.
> >> On a typical build/test workload, it's called about 1 time in 400
> >> calls to xdr_reserve_space() (measured on NFSD).
> >>
> >> Force the compiler to remove it from xdr_reserve_space(), which is
> >> a hot path. This change reduces the size of xdr_reserve_space() from
> >> 10 cache lines to 4 on my test system.
> >
> > Does this really help at all? Are the instructions that are executed in
> > the common case distributed over those 10 cache line, or are they all in
> > the first 4?
> >
> > I would have thought the "unlikely" in xdr_reserve_space() would have
> > pushed all the code from xdr_get_next_encode_buffer() to the end of the
> > function leaving the remainder in a small contiguous chunk.
>
> Well, granted that I'm compiling with -Os, not -O2. The compiler inlines
> xdr_get_next_encode_buffer() right in the middle of xdr_reserve_space().
Interesting. I tried with -O2 and it move xdr_get_next_encode_buffer()
to the end, but inlined xdr_commit_encode() in the middle.
I changed xdr_commit_encode() to wrap the "shift==0" test in likely(),
and then it produced a more reasonable result.
With your noinline patch, the "return xdr_get_next_encode_buffer()"
becomes a jump, not a jump-to-subroutine, so there is little cost in it.
Might I suggest:
Move the "xdr->scratch.iov_len" test out of xdr_commit_encode() and
put it in both callers as "unlikely".
Mark both xdr_commit_encode and xdr_get_next_encode_buffer() as
noinline
mention in the commit message that with -Os the "unlikely" in
xdr_reserve_space() doesn't help
??
Thanks,
NeilBrown
>
>
> > NeilBrown
> >
> >
> >>
> >> Signed-off-by: Chuck Lever <[email protected]>
> >> ---
> >> net/sunrpc/xdr.c | 9 +++++++--
> >> 1 file changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> >> index b57cf9df4de8..08a85375b311 100644
> >> --- a/net/sunrpc/xdr.c
> >> +++ b/net/sunrpc/xdr.c
> >> @@ -945,8 +945,13 @@ inline void xdr_commit_encode(struct xdr_stream *xdr)
> >> }
> >> EXPORT_SYMBOL_GPL(xdr_commit_encode);
> >>
> >> -static __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
> >> - size_t nbytes)
> >> +/*
> >> + * The buffer space to be reserved crosses the boundary between
> >> + * xdr->buf->head and xdr->buf->pages, or between two pages
> >> + * in xdr->buf->pages.
> >> + */
> >> +static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
> >> + size_t nbytes)
> >> {
> >> __be32 *p;
> >> int space_left;
> >>
> >>
> >>
>
> --
> Chuck Lever
>
>
>
>
On Mon, 06 Jun 2022, Chuck Lever wrote:
> The xdr_get_next_encode_buffer() function is called infrequently.
> On a typical build/test workload, it's called about 1 time in 400
> calls to xdr_reserve_space() (measured on NFSD).
>
> Force the compiler to remove it from xdr_reserve_space(), which is
> a hot path. This change reduces the size of xdr_reserve_space() from
> 10 cache lines to 4 on my test system.
Does this really help at all? Are the instructions that are executed in
the common case distributed over those 10 cache line, or are they all in
the first 4?
I would have thought the "unlikely" in xdr_reserve_space() would have
pushed all the code from xdr_get_next_encode_buffer() to the end of the
function leaving the remainder in a small contiguous chunk.
NeilBrown
>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> net/sunrpc/xdr.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> index b57cf9df4de8..08a85375b311 100644
> --- a/net/sunrpc/xdr.c
> +++ b/net/sunrpc/xdr.c
> @@ -945,8 +945,13 @@ inline void xdr_commit_encode(struct xdr_stream *xdr)
> }
> EXPORT_SYMBOL_GPL(xdr_commit_encode);
>
> -static __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
> - size_t nbytes)
> +/*
> + * The buffer space to be reserved crosses the boundary between
> + * xdr->buf->head and xdr->buf->pages, or between two pages
> + * in xdr->buf->pages.
> + */
> +static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
> + size_t nbytes)
> {
> __be32 *p;
> int space_left;
>
>
>