LinuxLists.cc - Re: [PATCH v8 00/70] Introducing the Maple Tree

2022-04-27 08:29:49

Subject: Re: [PATCH v8 00/70] Introducing the Maple Tree

On Tue, 26 Apr 2022 15:06:19 +0000 Liam Howlett <[email protected]> wrote:

> The maple tree is an RCU-safe range based B-tree designed to use modern
> processor cache efficiently. There are a number of places in the kernel

I think it would be helpful to expand on "a number of places".
Specifically which places?

> that a non-overlapping range-based tree would be beneficial, especially
> one with a simple interface. The first user that is covered in this
> patch set is the vm_area_struct, where three data structures are
> replaced by the maple tree: the augmented rbtree, the vma cache, and the
> linked list of VMAs in the mm_struct. The long term goal is to reduce
> or remove the mmap_sem contention.

"mmap_lock" ;)

>
> The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
> nodes. With the increased branching factor, it is significantly shorter than
> the rbtree so it has fewer cache misses. The removal of the linked list
> between subsequent entries also reduces the cache misses and the need to pull
> in the previous and next VMA during many tree alterations.

Do we have any quantitative testing results?

What's the plan on utilizing this to further reduce mmap_lock contention?

2022-04-27 09:50:58

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH v8 00/70] Introducing the Maple Tree

On Tue, Apr 26, 2022 at 01:08:57PM -0700, Andrew Morton wrote:
> On Tue, 26 Apr 2022 15:06:19 +0000 Liam Howlett <[email protected]> wrote:
> > The maple tree is an RCU-safe range based B-tree designed to use modern
> > processor cache efficiently. There are a number of places in the kernel
>
> I think it would be helpful to expand on "a number of places".
> Specifically which places?

The page cache would be a good place to use it once it has a few more
features to bring it up to par with the radix tree. I can go into more
detail if you want.

In general, anywhere that's using the IDR/Radix Tree/XArray would
benefit. The radix tree has some pretty poor space consumption
properties, particularly for anyone using the cyclic variants.

Many users of the rbtree would have better performance if converted
to the maple tree.

Ultimately, it's going to be incumbent on people who know their own
chunk of the kernel to say "Oh, yes, I can use that data structure",
rather than on Liam to go around converting everybody's code for them.

2022-04-27 14:37:01

by Liam R. Howlett

[permalink] [raw]

Subject: Re: [PATCH v8 00/70] Introducing the Maple Tree

* Andrew Morton <[email protected]> [220426 16:09]:
> On Tue, 26 Apr 2022 15:06:19 +0000 Liam Howlett <[email protected]> wrote:
>
> > The maple tree is an RCU-safe range based B-tree designed to use modern
> > processor cache efficiently. There are a number of places in the kernel
>
> I think it would be helpful to expand on "a number of places".
> Specifically which places?

Matthew answered this, but if you look for users of the interval tree
you will come across many users that do not have overlapping ranges.
The interval tree is being (ab)used because it is easier than the other
options even though it is not necessarily the best choice for the data
being stored. I don't want to be negative about the other options, they
are all really valuable and have their uses. I think where the maple
tree excels is the ease of use and the cache efficiency.

>
> > that a non-overlapping range-based tree would be beneficial, especially
> > one with a simple interface. The first user that is covered in this
> > patch set is the vm_area_struct, where three data structures are
> > replaced by the maple tree: the augmented rbtree, the vma cache, and the
> > linked list of VMAs in the mm_struct. The long term goal is to reduce
> > or remove the mmap_sem contention.
>
> "mmap_lock" ;)

Ah, right. Thanks.

>
> >
> > The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
> > nodes. With the increased branching factor, it is significantly shorter than
> > the rbtree so it has fewer cache misses. The removal of the linked list
> > between subsequent entries also reduces the cache misses and the need to pull
> > in the previous and next VMA during many tree alterations.
>
> Do we have any quantitative testing results?

I was waiting for the mmtests sweep to complete before sending them but
I didn't want to hold up Yu Zhao's testing of the combined tree as it
has proved useful already. Please don't include the results in the first
commit as it wouldn't make much sense. If you intend to put them in a
commit message, please put them in the patch introducing the maple tree.
The benchmarks are around the same as they have always been.

kernbench
rb5.18-rc2 mt5.18-rc2
Amean user-2 862.24 ( 0.00%) 861.45 * 0.09%*
Amean syst-2 136.65 ( 0.00%) 141.58 * -3.61%*
Amean elsp-2 505.38 ( 0.00%) 507.99 * -0.52%*
Amean user-4 890.24 ( 0.00%) 888.34 * 0.21%*
Amean syst-4 140.64 ( 0.00%) 145.32 * -3.33%*
Amean elsp-4 264.34 ( 0.00%) 265.76 * -0.54%*
Amean user-8 952.30 ( 0.00%) 947.57 * 0.50%*
Amean syst-8 145.52 ( 0.00%) 147.79 * -1.56%*
Amean elsp-8 145.02 ( 0.00%) 145.38 * -0.24%*
Amean user-16 920.83 ( 0.00%) 918.95 * 0.20%*
Amean syst-16 135.37 ( 0.00%) 138.99 * -2.67%*
Amean elsp-16 75.03 ( 0.00%) 75.25 * -0.29%*
Amean user-32 970.98 ( 0.00%) 969.01 * 0.20%*
Amean syst-32 144.75 ( 0.00%) 148.58 * -2.64%*
Amean elsp-32 44.10 ( 0.00%) 44.64 * -1.24%*
Amean user-64 1062.19 ( 0.00%) 1060.30 * 0.18%*
Amean syst-64 154.24 ( 0.00%) 157.58 * -2.17%*
Amean elsp-64 28.88 ( 0.00%) 29.10 * -0.76%*
Amean user-128 1609.09 ( 0.00%) 1612.19 * -0.19%*
Amean syst-128 210.05 ( 0.00%) 215.09 * -2.40%*
Amean elsp-128 25.22 ( 0.00%) 25.45 * -0.94%*
Amean user-256 1767.37 ( 0.00%) 1766.86 * 0.03%*
Amean syst-256 215.99 ( 0.00%) 221.56 * -2.58%*
Amean elsp-256 25.20 ( 0.00%) 25.33 * -0.54%*
Amean user-288 1772.73 ( 0.00%) 1772.35 * 0.02%*
Amean syst-288 216.08 ( 0.00%) 221.39 * -2.45%*
Amean elsp-288 25.16 ( 0.00%) 25.44 * -1.13%*

Increase in performance in the following micro-benchmarks in Hmean:
- context_switch1-processes +14.74% to 2.22%

Mixed results in the following micro-benchmarks in Hmean:
- malloc1-threads +34.95% to -30.06%
- malloc1-processes +2.73% to -23.65%
- page_fault3-threads +19.84% to -11.55%
- pthread_mutex1-threads +42.50% to -11.76%

Decrease in performance in the following micro-benchmarks in Hmean:
- brk1-processes -35.35% to -42.69%

brk1-processes decrease is due to the test itself. Since the VMA cannot
be expanded, the test is actually inserting a new VMA.

>
> What's the plan on utilizing this to further reduce mmap_lock contention?

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct. I can go into more details if you want.

2022-04-27 18:00:13

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH v8 00/70] Introducing the Maple Tree

On Wed, 27 Apr 2022 14:08:39 +0000 Liam Howlett <[email protected]> wrote:

> > >
> > > The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
> > > nodes. With the increased branching factor, it is significantly shorter than
> > > the rbtree so it has fewer cache misses. The removal of the linked list
> > > between subsequent entries also reduces the cache misses and the need to pull
> > > in the previous and next VMA during many tree alterations.
> >
> > Do we have any quantitative testing results?
>
> I was waiting for the mmtests sweep to complete before sending them but
> I didn't want to hold up Yu Zhao's testing of the combined tree as it
> has proved useful already. Please don't include the results in the first
> commit as it wouldn't make much sense. If you intend to put them in a
> commit message, please put them in the patch introducing the maple tree.

I did that.

> The benchmarks are around the same as they have always been.

So it's presently a wash.

That makes "the plan" (below) really critical, otherwise there seems
little point in merging this code at this time?

Please send me many very soothing words about how confident we should
be that the plan will be implemented and that it shall be good?

> >
> > What's the plan on utilizing this to further reduce mmap_lock contention?
>
> The plan is to get to the point where we use the maple tree in RCU mode.
> Readers will not block for writers. A single write operation will be
> allowed at a time. A reader re-walks if stale data is encountered. VMAs
> would be RCU enabled and this mode would be entered once multiple tasks
> are using the mm_struct. I can go into more details if you want.

Uh, that's very important information. It's really the whole point
of the patchset! I added this to the [0/n] changelog.

2022-04-27 18:42:19

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH v8 00/70] Introducing the Maple Tree

On Wed, Apr 27, 2022 at 10:33:31AM -0700, Andrew Morton wrote:
> On Wed, 27 Apr 2022 14:08:39 +0000 Liam Howlett <[email protected]> wrote:
> > The benchmarks are around the same as they have always been.
>
> So it's presently a wash.
>
> That makes "the plan" (below) really critical, otherwise there seems
> little point in merging this code at this time?
>
> Please send me many very soothing words about how confident we should
> be that the plan will be implemented and that it shall be good?

Yes, performance-wise it's a wash. However, Davidlohr was very
impressed that it was a wash because we're actually getting rid of three
data structures here; the linked list, the rbtree and the vmacache.
His opinion was that we should push the maple tree in now, in advance
of the future RCU uses.

We also have other users waiting in the wings. Dave Howells has something
he's working on that uses the maple tree directly. I have a couple of
XArray users that are using it inappropriately that I want to convert
... I just didn't want to do that work before all this lands.

The current LSFMM schedule has very many words about the Maple tree
scheduled for 13:30-15:00 on Monday. Hopefully we'll have a better idea
after that how confident we are that RCU VMA walking is going to work.

2022-04-28 04:36:52

by Liam R. Howlett

[permalink] [raw]

Subject: Re: [PATCH v8 00/70] Introducing the Maple Tree

* Andrew Morton <[email protected]> [220427 13:33]:
> On Wed, 27 Apr 2022 14:08:39 +0000 Liam Howlett <[email protected]> wrote:
>
> > > >
> > > > The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
> > > > nodes. With the increased branching factor, it is significantly shorter than
> > > > the rbtree so it has fewer cache misses. The removal of the linked list
> > > > between subsequent entries also reduces the cache misses and the need to pull
> > > > in the previous and next VMA during many tree alterations.
> > >
> > > Do we have any quantitative testing results?
> >
> > I was waiting for the mmtests sweep to complete before sending them but
> > I didn't want to hold up Yu Zhao's testing of the combined tree as it
> > has proved useful already. Please don't include the results in the first
> > commit as it wouldn't make much sense. If you intend to put them in a
> > commit message, please put them in the patch introducing the maple tree.
>
> I did that.
>
> > The benchmarks are around the same as they have always been.
>
> So it's presently a wash.
>
> That makes "the plan" (below) really critical, otherwise there seems
> little point in merging this code at this time?

There are a number of reasons to merge at this time. First, as Matthew
said, we have more than one user so having the tree upstream would
obviously help them out. Second, we can make sure this much (maple tree
+ vma tracking) works for everyone before we enable VMA RCU and play
even more with the locking scenarios. The third reason is to get more
community input into "the plan" to ensure the maple tree is beneficial
in the most common execution paths and capture corner cases.

>
> Please send me many very soothing words about how confident we should
> be that the plan will be implemented and that it shall be good?

Tea, warmth, fresh bread.

As we know, VMAs are rarely written and mostly read. This plan removes
the majority of the slow down on readers. The only slow down is when a
reader conflicts with a writer. We are actually slower on writes - we
allocate nodes for the tree vs having it embedded in the VMA itself and
we do more work in calculating the gaps, but we are actually fast enough
to hide that on the read side. Once the writers stop blocking the
readers it shall be good. We need this step to get to using the maple
tree in RCU mode to make this happen.

>
> > >
> > > What's the plan on utilizing this to further reduce mmap_lock contention?
> >
> > The plan is to get to the point where we use the maple tree in RCU mode.
> > Readers will not block for writers. A single write operation will be
> > allowed at a time. A reader re-walks if stale data is encountered. VMAs
> > would be RCU enabled and this mode would be entered once multiple tasks
> > are using the mm_struct. I can go into more details if you want.
>
> Uh, that's very important information. It's really the whole point
> of the patchset! I added this to the [0/n] changelog.

Yes, but that's not what the patch set does today. I didn't want to
include The Plan until it exists. I'd like to think that what we have
here is worth while on its own and a good start - but a start to
something big. It's hard enough to get people to look at 70 patches,
doubly so for an RFC of 70. The interest from others in using a tree
with an easy interface seems like a win as well.

I also have some side goals such as the removal of __vma_adjust() for
code readability.

Thanks,
Liam

2022-05-03 00:07:40

by Davidlohr Bueso

[permalink] [raw]

Subject: Re: [PATCH v8 00/70] Introducing the Maple Tree

On Wed, 27 Apr 2022, Matthew Wilcox wrote:

>On Wed, Apr 27, 2022 at 10:33:31AM -0700, Andrew Morton wrote:
>> On Wed, 27 Apr 2022 14:08:39 +0000 Liam Howlett <[email protected]> wrote:
>> > The benchmarks are around the same as they have always been.
>>
>> So it's presently a wash.
>>
>> That makes "the plan" (below) really critical, otherwise there seems
>> little point in merging this code at this time?
>>
>> Please send me many very soothing words about how confident we should
>> be that the plan will be implemented and that it shall be good?
>
>Yes, performance-wise it's a wash. However, Davidlohr was very
>impressed that it was a wash because we're actually getting rid of three
>data structures here; the linked list, the rbtree and the vmacache.
>His opinion was that we should push the maple tree in now, in advance
>of the future RCU uses.

Yes I like the maple tree, and at this stage I don't think we can ask
for more from this series wrt the MM - albeit there seems to still be
some folks reporting breakage. Fundamentally I see Liam's work to (re)move
complexity out of the MM (not to say that the actual maple tree is not
complex) by consolidating the three complimentary data structures very
much worth it considering performance does not take a hit. This was
very much a turn off with the range locking approach, which worst case
scenario incurred in prohibitive overhead. Also as Liam and Matthew
have mentioned, RCU opens up a lot of nice performance opportunities,
and in addition academia[1] has shown outstanding scalability of address
spaces with the foundation of replacing the locked rbtree with RCU
aware trees.

[1] https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf

Thanks,
Davidlohr