2020-08-25 14:55:52

by David Laight

[permalink] [raw]
Subject: [PATCH 00/13] lib/generic-radix-tree: genradix bug fix and optimisations.

The genradix code is used by SCTP for accessing per-stream data.
This means there are quite a lot of lookups but the code wasn't
really optimised at all.

Patch 1 fixes a rather nasty bug that could cause all sorts of oddities
including having a loop in a tree.
Fortunately the only 2 users of the genradix code in the kernel source
tree and neither can possibly hit the bug.

Patches 2 through 11 all reduce codepath. Especially for the common
case (for both users) where all the data fits in one page.

Patch 12 inlines the common lookup function for 1 page trees.
This also remove the '% constant' calcuation from the lookups.
However it will increase code size (but not codepath) because
of the number of callers.

Patch 13 removes part of the optimisation for 1 page tress
under the assumption that the lookup will have been inlined.
The code still works even if inlining didn't happen.

David Laight (13):
01/13: Fix potentially corrupt tree
02/13: Optimise out ilog2(variable).
03/13: Always use low 8 bits of 'root' for depth.
04/13: Optimise __genradix_ptr()
05/13: Optimise __genradix_ptr_alloc()
06/13: Rename gfp_mask to gfp to shorten lines.
07/13: Optimise __genradix_iter_peek()
08/13: Save number of bits to shift instead of tree level.
09/13: Check sizeof(_type) when defining a tree.
10/13: Simplify offset calculation:
11/13: Pass the root pointer to __genradix_ptr.
12/13: Inline genradix_ptr() for simple trees.
13/13: Simplify __genradix_ptr()

include/linux/generic-radix-tree.h | 54 ++++---
lib/generic-radix-tree.c | 224 ++++++++++++++++-------------
2 files changed, 162 insertions(+), 116 deletions(-)

--
2.25.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2020-08-25 15:44:47

by Marcelo Ricardo Leitner

[permalink] [raw]
Subject: Re: [PATCH 00/13] lib/generic-radix-tree: genradix bug fix and optimisations.

On Tue, Aug 25, 2020 at 02:52:34PM +0000, David Laight wrote:
> The genradix code is used by SCTP for accessing per-stream data.
> This means there are quite a lot of lookups but the code wasn't
> really optimised at all.

My test box is down for the moment and will bring it on later today or
tomorrow, so I can't test it yet. What should we expect as performance
gains here?

Marcelo

2020-08-25 16:02:00

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 00/13] lib/generic-radix-tree: genradix bug fix and optimisations.

From: 'Marcelo Ricardo Leitner'
> Sent: 25 August 2020 16:41
>
> On Tue, Aug 25, 2020 at 02:52:34PM +0000, David Laight wrote:
> > The genradix code is used by SCTP for accessing per-stream data.
> > This means there are quite a lot of lookups but the code wasn't
> > really optimised at all.
>
> My test box is down for the moment and will bring it on later today or
> tomorrow, so I can't test it yet. What should we expect as performance
> gains here?

Not sure, probably not much, but it ought to show up :-)
There'll be bigger gains on a cpu that has software ilog2().

I've only checked SCTP still works.
I've requested 32k streams on a listener - to force a level-2 tree.
I've also done at least one check with a massive pad in the sctp
stream structure.

I ought to check one of my M3UA or M2PA 'double-reflect' loopback tests.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-08-25 16:34:17

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 00/13] lib/generic-radix-tree: genradix bug fix and optimisations.

On Tue, Aug 25, 2020 at 04:00:35PM +0000, David Laight wrote:
> From: 'Marcelo Ricardo Leitner'
> > Sent: 25 August 2020 16:41
> >
> > On Tue, Aug 25, 2020 at 02:52:34PM +0000, David Laight wrote:
> > > The genradix code is used by SCTP for accessing per-stream data.
> > > This means there are quite a lot of lookups but the code wasn't
> > > really optimised at all.
> >
> > My test box is down for the moment and will bring it on later today or
> > tomorrow, so I can't test it yet. What should we expect as performance
> > gains here?
>
> Not sure, probably not much, but it ought to show up :-)
> There'll be bigger gains on a cpu that has software ilog2().
>
> I've only checked SCTP still works.
> I've requested 32k streams on a listener - to force a level-2 tree.
> I've also done at least one check with a massive pad in the sctp
> stream structure.

Have you benchmarked at all? Or were you looking at the generated assembly?

2020-08-26 07:39:04

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 00/13] lib/generic-radix-tree: genradix bug fix and optimisations.

From: Kent Overstreet
> Sent: 25 August 2020 17:32
>
> On Tue, Aug 25, 2020 at 04:00:35PM +0000, David Laight wrote:
> > From: 'Marcelo Ricardo Leitner'
> > > Sent: 25 August 2020 16:41
> > >
> > > On Tue, Aug 25, 2020 at 02:52:34PM +0000, David Laight wrote:
> > > > The genradix code is used by SCTP for accessing per-stream data.
> > > > This means there are quite a lot of lookups but the code wasn't
> > > > really optimised at all.
> > >
> > > My test box is down for the moment and will bring it on later today or
> > > tomorrow, so I can't test it yet. What should we expect as performance
> > > gains here?
> >
> > Not sure, probably not much, but it ought to show up :-)
> > There'll be bigger gains on a cpu that has software ilog2().
> >
> > I've only checked SCTP still works.
> > I've requested 32k streams on a listener - to force a level-2 tree.
> > I've also done at least one check with a massive pad in the sctp
> > stream structure.
>
> Have you benchmarked at all? Or were you looking at the generated assembly?

I've been reading a lot of assembly.
(Some of the code generated by modern gcc is crap.)
With horrible casts it is the easiest way to check the code is right!

I'm going to try marking the lookup functions with '__attribute__ ((pure))'.
That should help the sctp code that does repeated SCTP_SI().
In reality I want to mark them __attribute__ ((const)) - including the
inline wrappers, but that isn't allowed if they read memory.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-08-26 10:01:07

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 00/13] lib/generic-radix-tree: genradix bug fix and optimisations.

From: 'Marcelo Ricardo Leitner'
> Sent: 25 August 2020 16:41
>
> On Tue, Aug 25, 2020 at 02:52:34PM +0000, David Laight wrote:
> > The genradix code is used by SCTP for accessing per-stream data.
> > This means there are quite a lot of lookups but the code wasn't
> > really optimised at all.
>
> My test box is down for the moment and will bring it on later today or
> tomorrow, so I can't test it yet. What should we expect as performance
> gains here?

I've just done quick test that shows ~1% improvement on an MTP3+M2PA
'double reflect' test using the 'test' userpart and a test program.
So there is quite a lot of extra protocol stack in the way.

The test is running ~200k very small data chunks/sec (tx and rx)
on a poor little 2.4GHz 8 core atom C2758.

However the 'baseline' kernel just locked up.
(Reports 5.8.0+ so must be late in the merge for rc1).

Annoyingly the Ubuntu 20.04 userspace won't show the kernel messages
(only a graphical login) - so I can see any kernel ooops messages.
I'm going to have to setup a serial console.
That'll need a trip into the office.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-08-26 14:07:51

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 00/13] lib/generic-radix-tree: genradix bug fix and optimisations.

From: David Laight
> Sent: 26 August 2020 08:36
...
> I'm going to try marking the lookup functions with '__attribute__ ((pure))'.
> That should help the sctp code that does repeated SCTP_SI().
> In reality I want to mark them __attribute__ ((const)) - including the
> inline wrappers, but that isn't allowed if they read memory.

Neither pure nor const makes any difference.
Even to code that like:
if (SCTP_SO(...)->ext)
SCTP_SO(...)->ext->foo = 0;

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)