Date:   Thu, 25 Nov 2021 18:24:16 +0800
From:   Peter Xu <peterx@redhat.com>
To:     Shakeel Butt <shakeelb@google.com>
Cc:     David Hildenbrand <david@redhat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
        Yang Shi <shy828301@gmail.com>, Zi Yan <ziy@nvidia.com>,
        Matthew Wilcox <willy@infradead.org>,
        Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: split thp synchronously on MADV_DONTNEED
Message-ID: <YZ9kUD5AG6inbUEg@xz-m1.local>
References: <20211120201230.920082-1-shakeelb@google.com>
 <25b36a5c-5bbd-5423-0c67-05cd6c1432a7@redhat.com>
 <CALvZod5L1C1DV_DVs9O3xZm6CJnriunAoj89YLDdCp7ef5yBxA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <CALvZod5L1C1DV_DVs9O3xZm6CJnriunAoj89YLDdCp7ef5yBxA@mail.gmail.com>
Precedence: bulk

On Mon, Nov 22, 2021 at 10:40:54AM -0800, Shakeel Butt wrote:
> > Do we have a performance evaluation how much overhead is added e.g., for
> > a single 4k MADV_DONTNEED call on a THP or on a MADV_DONTNEED call that
> > covers the whole THP?
> 
> I did a simple benchmark of madvise(MADV_DONTNEED) on 10000 THPs on
> x86 for both settings you suggested. I don't see any statistically
> significant difference with and without the patch. Let me know if you
> want me to try something else.

I'm a bit surprised that sync split thp didn't bring any extra overhead.

"unmap whole thp" is understandable from that pov, because afaict that won't
even trigger any thp split anyway even delayed, if this is the simplest case
that only this process mapped this thp, and it mapped once.

For "unmap 4k upon thp" IIUC that's the worst case and zapping 4k should be
fast; while what I don't understand since thp split requires all hand-made work
for copying thp flags into small pages and so on, so I thought there should at
least be some overhead measured.  Shakeel, could there be something overlooked
in the test, or maybe it's me that overlooked?

I had the same concern as what Kirill/Matthew raised in the other thread - I'm
worried proactively splitting simply because any 4k page is zapped might
quickly free up 2m thps in the system and I'm not sure whether it'll exaggerate
the defragmentation of the system memory in general.  I'm also not sure whether
that's ideal for some very common workload that frequently uses DONTNEED to
proactively drop some pages.

To me, the old deffered-split has a point in that it'll only be done when at
least the memory or cgroup is in low mem, that means we're in extreme cases so
we'd better start to worry page allocation failures rather than number of thps
and memory performance.  v2 even added unmap() into account, so that'll further
amplify that effect, imho.

I'm wondering whether MADV_SPLIT would make more sense so as to keep the old
DONTNEED/unmap behaviors, however before that I think I should understand the
test results first, because besides 2m pages missing that'll be another
important factor for "whether a new interface is more welcomed" from perf pov.

Thanks,

-- 
Peter Xu