MIME-Version: 1.0
References: <20211120201230.920082-1-shakeelb@google.com> <25b36a5c-5bbd-5423-0c67-05cd6c1432a7@redhat.com>
 <CALvZod5L1C1DV_DVs9O3xZm6CJnriunAoj89YLDdCp7ef5yBxA@mail.gmail.com> <1b30d06d-f9c0-1737-13e6-2d1a7d7b8507@redhat.com>
In-Reply-To: <1b30d06d-f9c0-1737-13e6-2d1a7d7b8507@redhat.com>
From:   Shakeel Butt <shakeelb@google.com>
Date:   Mon, 22 Nov 2021 17:20:13 -0800
Message-ID: <CALvZod5sFQbf3t_ZDW6ob+BqVtezn-c7i1UyOeev6Lwch96=7g@mail.gmail.com>
Subject: Re: [PATCH] mm: split thp synchronously on MADV_DONTNEED
To:     David Hildenbrand <david@redhat.com>
Cc:     "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
        Yang Shi <shy828301@gmail.com>, Zi Yan <ziy@nvidia.com>,
        Matthew Wilcox <willy@infradead.org>,
        Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

On Mon, Nov 22, 2021 at 10:59 AM David Hildenbrand <david@redhat.com> wrote:
>
[...]
>
> Thanks for the details, that makes sense to me. It's essentially like
> another kernel buffer charged to the process, only reclaimed on memory
> reclaim.
>
> (can we add that to the patch description?)
>

Sure.

[...]
> >
> > I did a simple benchmark of madvise(MADV_DONTNEED) on 10000 THPs on
> > x86 for both settings you suggested. I don't see any statistically
> > significant difference with and without the patch. Let me know if you
> > want me to try something else.
>
> Awesome, thanks for benchmarking. I did not check, but I assume on
> re-access, we won't actually re-use pages from the underlying, partially
> unmapped, THP, correct?

Correct.

> So after MADV_DONTNEED, the zapped sub-pages are
> essentially lost until reclaimed by splitting the THP?

Yes.

> If they could get
> reused, there would be value in the deferred split when partially
> unmapping a THP.
>
>
> I do wonder which purpose the deferred split serves nowadays at all.
> Fortunately, there is documentation: Documentation/vm/transhuge.rst:
>
> "
> Unmapping part of THP (with munmap() or other way) is not going to free
> memory immediately. Instead, we detect that a subpage of THP is not in
> use in page_remove_rmap() and queue the THP for splitting if memory
> pressure comes. Splitting will free up unused subpages.
>
> Splitting the page right away is not an option due to locking context in
> the place where we can detect partial unmap. It also might be
> counterproductive since in many cases partial unmap happens during
> exit(2) if a THP crosses a VMA boundary.
>
> The function deferred_split_huge_page() is used to queue a page for
> splitting. The splitting itself will happen when we get memory pressure
> via shrinker interface.
> "
>
> I do wonder which these locking contexts are exactly, and if we could
> also do the same thing on ordinary munmap -- because I assume it can be
> similarly problematic for some applications.

This is a good question regarding munmap. One main difference is
munmap takes mmap_lock in write mode and usually performance critical
applications avoid such operations.

> The "exit()" case might
> indeed be interesting, but I really do wonder if this is even observable
> in actual number: I'm not so sure about the "many cases" but I might be
> wrong, of course.

I am not worried about the exit(). The whole THP will get freed and be
removed from the deferred list as well. Note that deferred list does
not hold reference to the THP and has a hook in the THP destructor.

thanks,
Shakeel