MIME-Version: 1.0
References: <20220104202227.2903605-1-yuzhao@google.com> <YdSuSHa/Vjl6bPkg@google.com>
 <YdgKClGAuHlkzVbQ@dhcp22.suse.cz> <YdiKVJlClB3h1Kmg@google.com>
 <YdxTR4+FL08XyFuO@dhcp22.suse.cz> <YdythmxHpSksJiXs@google.com>
In-Reply-To: <YdythmxHpSksJiXs@google.com>
From:   Jesse Barnes <jsbarnes@google.com>
Date:   Mon, 10 Jan 2022 14:46:08 -0800
Message-ID: <CAJmaN=n=kn9-gC8if5wp8Gfj7uN+QVrX0ex=9JPXC7rPvGf1Qg@mail.gmail.com>
Subject: Re: [PATCH v6 0/9] Multigenerational LRU Framework
To:     Yu Zhao <yuzhao@google.com>
Cc:     Michal Hocko <mhocko@suse.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Andi Kleen <ak@linux.intel.com>,
        Catalin Marinas <catalin.marinas@arm.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Jonathan Corbet <corbet@lwn.net>,
        Matthew Wilcox <willy@infradead.org>,
        Mel Gorman <mgorman@suse.de>,
        Michael Larabel <Michael@michaellarabel.com>,
        Rik van Riel <riel@surriel.com>,
        Vlastimil Babka <vbabka@suse.cz>,
        Will Deacon <will@kernel.org>,
        Ying Huang <ying.huang@intel.com>,
        linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        linux-mm@kvack.org, page-reclaim@google.com,
        X86 ML <x86@kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

> > > 2. There have been none that came with the testing/benchmarking
> > >    coverage as this one did. Please point me to some if I'm mistaken,
> > >    and I'll gladly match them.
> >
> > I do appreciate your numbers but you should realize that this is an area
> > that is really hard to get any conclusive testing for.
>
> Fully agreed. That's why we started a new initiative, and we hope more
> people will following these practices:
> 1. All results in this area should be reported with at least standard
>    deviations, or preferably confidence intervals.
> 2. Real applications should be benchmarked (with synthetic load
>    generator), not just synthetic benchmarks (not real applications).
> 3. A wide range of devices should be covered, i.e., servers, desktops,
>    laptops and phones.
>
> I'm very confident to say our benchmark reports were hold to the
> highest standards. We have worked with MariaDB (company), EnterpriseDB
> (Postgres), Redis (company), etc. on these reports. They have copies
> of these reports (PDF version):
> https://linux-mm.googlesource.com/benchmarks/
>
> We welcome any expert in those applications to examine our reports,
> and we'll be happy to run any other benchmarks or same benchmarks with
> different configurations that anybody thinks it's important and we've
> missed.

I really think this gets at the heart of the issue with mm
development, and is one of the reasons it's been extra frustrating to
not have an MM conf for the past couple of years; I think sorting out
how we measure & proceed on changes would be easier done f2f.  E.g.
concluding with a consensus that if something doesn't regress on X, Y,
and Z, and has reasonably maintainable and readable code, we should
merge it and try it out.

But since f2f isn't an option until 2052 at the earliest...

I understand the desire for an "incremental approach that gets us from
A->B".  In the abstract it sounds great.  However, with a change like
this one, I think it's highly likely that such a path would be
littered with regressions both large and small, and would probably be
more difficult to reason about than the relatively clean design of
MGLRU.  On top of that, I don't think we'll get the kind of user
feedback we need for something like this *without* merging it.  Yu has
done a tremendous job collecting data here (and the results are really
incredible), but I think we can all agree that without extensive
testing in the field with all sorts of weird codes, we're not going to
find the problematic behaviors we're concerned about.

So unless we want to eschew big mm changes entirely (we shouldn't!
look at net or scheduling for how important big rewrites are to
progress), I think we should be open to experimenting with new stuff.
We can always revert if things get too unwieldy.

None of this is to say that there may not be lots more comments on the
code or potential fixes/changes to incorporate before merging; I'm
mainly arguing about the mindset we should have to changes like this,
not all the stuff the community is already really good at (i.e.
testing and reviewing code on a nuts & bolts level).

Thanks,
Jesse