2023-09-16 23:46:05

by Liam R. Howlett

[permalink] [raw]
Subject: Re: Possible 6.5 regression: Huge values for "commited memory"

* Linus Torvalds <[email protected]> [230916 15:32]:
> On Sat, 16 Sept 2023 at 04:43, Bagas Sanjaya <[email protected]> wrote:
> >
> > Thanks for the regression report. Michael had already bisected it [1], so
> > telling regzbot:
> >
> > #regzbot ^introduced: 408579cd627a15
> > #regzbot title: huge committed memory due to returning 0 on do_vmi_align_mmunmap() success
> >
> > [1]: https://lore.kernel.org/linux-parisc/[email protected]/
>
> Funky. That commit isn't actually supposed to change anything, and the
> only locking change was because it incorrectly ended up doing the
> unlock a bit too early (before it did a validate_mm() - fixed in
> commit b5641a5d8b8b ("mm: don't do validate_mm() unnecessarily and
> without mmap locking").
>
> HOWEVER.
>
> Now that I look at it again, I note this change in move_vma().
>
> - if (do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false) < 0) {
> + if (!do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false)) {
>
> and I think that is wrong.
>
> The return value that changed was the old "return 1 if successful
> _and_ lock downgraded".
>
> Now it does "lock is always released on success if requested". So the
> special "1" return went away, but the failure case didn't change.
>
> So that change to "move_vma()" seems to be bogus. It used to do "if
> failed". Now it does "if success".
>
> Does the attached patch fix the problem?
>
> Liam - or am I just crazy? That return value check change really looks
> bogus to me, but it looks *so* bogus that it makes me think I'm
> missing something.

You are correct. I should not have changed that check.

I haven't devised a way to check this on parisc (I finally have a qemu
parisc vm booting), but it should happen on any call to move_vma().

It looks like you have a tester though, so I'll return to dinner.

Thanks,
Liam