* Sven Schnelle <[email protected]> [220516 11:37]:
> Hi Liam,
>
> Liam Howlett <[email protected]> writes:
>
> > * Sven Schnelle <[email protected]> [220515 16:02]:
> >
> > I tried the above on my qemu s390 with kernel 5.18.0-rc6-next-20220513,
> > but it runs without issue, return code is 0. Is there something the VM
> > needs to have for this to trigger?
>
> A coworker said the same. Reason for this seems to be that i've run the
> code in a unittest environment which seems to make a difference. When
> compiling the code above with gcc on my system it also doesn't crash.
> So i have to figure out what makes this unittest binary special.
>
> >> I've added a few debug statements to the maple tree code:
> >>
> >> [ 27.769641] mas_next_entry: offset=14
> >> [ 27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14
> >
> > Where exactly are you printing this?
>
> I added a lot of debug statements to the code trying to understand
> it. I'll attach it to this mail.
Thanks. Can you check to see if that diff you sent was the correct
file? It appears to be the git stats and not the changes themselves.
>
> >>
> >> I see in mas_next_nentry() that there's a while that iterates over the
> >> (used?) slots until count is reached.`
> >
> > Yes, mas_next_nentry() looks for the next non-null entry in the current
> > node.
> >
> >>After that loop mas_next_entry()
> >> just picks the next (unused?) entry, which is slot 15 in that case.
> >
> > mas_next_entry() returns the next non-null entry. If there isn't one
> > returned by mas_next_nentry(), then it will advance to the next node by
> > calling mas_next_node(). There are checks in there for detecting dead
> > nodes for RCU use and limit checking as well.
> >
> >>
> >> What i noticed while scanning over include/linux/maple_tree.h is:
> >>
> >> struct maple_range_64 {
> >> struct maple_pnode *parent;
> >> unsigned long pivot[MAPLE_RANGE64_SLOTS - 1];
> >> union {
> >> void __rcu *slot[MAPLE_RANGE64_SLOTS];
> >> struct {
> >> void __rcu *pad[MAPLE_RANGE64_SLOTS - 1];
> >> struct maple_metadata meta;
> >> };
> >> };
> >> };
> >>
> >> and struct maple_metadata is:
> >>
> >> struct maple_metadata {
> >> unsigned char end;
> >> unsigned char gap;
> >> };
> >>
> >> If i swap the gap and end members 0x0e00000000000000 becomes
> >> 0x000e000000000000. And 0xe matches our msa->offset 14 above.
> >> So it looks like mas_next() in mmap_region returns the meta
> >> data for the node.
> >
> > If this is the case, then I think any task that has more than 14 VMAs
> > would have issues. I also use mas_next_entry() in mas_find() which is
> > used for the mas_for_each() macro/iterator. Can you please enable
> > CONFIG_DEBUG_VM_MAPLE_TREE ? mmap.c tests the tree after pretty much
> > any change and will dump useful information if there is an issue -
> > including the entire tree. See validate_mm_mt() for details.
> >
> > You can find CONFIG_DEBUG_VM_MAPLE_TREE in the config:
> > kernel hacking -> Memory debugging -> Debug VM -> Debug VM maple trees
>
> I have both DEBUG_MAPPLE_TREE and DEBUG_VM_MAPLE_TREE enabled, but don't
> see anything printed.
I suspect that this means the issue is actually in the mmap code and not
the tree. It may be sensitive to the task-specific layout. Do you have
randomization on as well? I'm thinking maybe I return NULL because I'm
asking for the next element after the last VMA and not checking that
correctly or similar.
Does ./scripts/faddr2line work for you? What does it say about
mmap_region+0x19e/0x848 ?
Thanks,
Liam
Liam Howlett <[email protected]> writes:
> * Sven Schnelle <[email protected]> [220516 11:37]:
>> Hi Liam,
>>
>> Liam Howlett <[email protected]> writes:
>>
>> > * Sven Schnelle <[email protected]> [220515 16:02]:
>> >
>> > I tried the above on my qemu s390 with kernel 5.18.0-rc6-next-20220513,
>> > but it runs without issue, return code is 0. Is there something the VM
>> > needs to have for this to trigger?
>>
>> A coworker said the same. Reason for this seems to be that i've run the
>> code in a unittest environment which seems to make a difference. When
>> compiling the code above with gcc on my system it also doesn't crash.
>> So i have to figure out what makes this unittest binary special.
>>
>> >> I've added a few debug statements to the maple tree code:
>> >>
>> >> [ 27.769641] mas_next_entry: offset=14
>> >> [ 27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14
>> >
>> > Where exactly are you printing this?
>>
>> I added a lot of debug statements to the code trying to understand
>> it. I'll attach it to this mail.
>
> Thanks. Can you check to see if that diff you sent was the correct
> file? It appears to be the git stats and not the changes themselves.
No, it wasn't. I did git stash show with -p, which doesn't make sense.
I've attached the correct diff.
>> > If this is the case, then I think any task that has more than 14 VMAs
>> > would have issues. I also use mas_next_entry() in mas_find() which is
>> > used for the mas_for_each() macro/iterator. Can you please enable
>> > CONFIG_DEBUG_VM_MAPLE_TREE ? mmap.c tests the tree after pretty much
>> > any change and will dump useful information if there is an issue -
>> > including the entire tree. See validate_mm_mt() for details.
>> >
>> > You can find CONFIG_DEBUG_VM_MAPLE_TREE in the config:
>> > kernel hacking -> Memory debugging -> Debug VM -> Debug VM maple trees
>>
>> I have both DEBUG_MAPPLE_TREE and DEBUG_VM_MAPLE_TREE enabled, but don't
>> see anything printed.
>
> I suspect that this means the issue is actually in the mmap code and not
> the tree. It may be sensitive to the task-specific layout. Do you have
> randomization on as well? I'm thinking maybe I return NULL because I'm
> asking for the next element after the last VMA and not checking that
> correctly or similar.
> Does ./scripts/faddr2line work for you? What does it say about
> mmap_region+0x19e/0x848 ?
2629 next = mas_next(&mas, ULONG_MAX);
2630 prev = mas_prev(&mas, 0);
2631 if (vm_flags & VM_SPECIAL)
2632 goto cannot_expand;
2633
2634 /* Attempt to expand an old mapping */
2635 /* Check next */
2636 if (next && next->vm_start == end && !vma_policy(next) &&
next above is 0x0e00000000000000 and next->vm_start will trigger the crash.
2637 can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen,
2638 NULL_VM_UFFD_CTX, NULL)) {
2639 merge_end = next->vm_end;
2640 vma = next;
2641 vm_pgoff = next->vm_pgoff - pglen;
2642 }
* Sven Schnelle <[email protected]> [220516 13:10]:
> Liam Howlett <[email protected]> writes:
>
> > * Sven Schnelle <[email protected]> [220516 11:37]:
> >> Hi Liam,
> >>
> >> Liam Howlett <[email protected]> writes:
> >>
> >> > * Sven Schnelle <[email protected]> [220515 16:02]:
> >> >
> >> > I tried the above on my qemu s390 with kernel 5.18.0-rc6-next-20220513,
> >> > but it runs without issue, return code is 0. Is there something the VM
> >> > needs to have for this to trigger?
> >>
> >> A coworker said the same. Reason for this seems to be that i've run the
> >> code in a unittest environment which seems to make a difference. When
> >> compiling the code above with gcc on my system it also doesn't crash.
> >> So i have to figure out what makes this unittest binary special.
> >>
> >> >> I've added a few debug statements to the maple tree code:
> >> >>
> >> >> [ 27.769641] mas_next_entry: offset=14
> >> >> [ 27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14
> >> >
> >> > Where exactly are you printing this?
> >>
> >> I added a lot of debug statements to the code trying to understand
> >> it. I'll attach it to this mail.
> >
> > Thanks. Can you check to see if that diff you sent was the correct
> > file? It appears to be the git stats and not the changes themselves.
>
> No, it wasn't. I did git stash show with -p, which doesn't make sense.
> I've attached the correct diff.
Thanks for this. I think the key is that the offset is beyond the end
of the node (count). What is happening is that we are already at the
last entry in the node and calling next. I had moved the last entry
from the loop to optimize mas_next_nentry() and set it after.
Unfortunately I did not check for this condition on entry to the
function. I have a patch I will send out shortly.