From: "Huang\, Ying" <ying.huang@intel.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huang\, Ying" <ying.huang@intel.com>,
        Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
        <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH -mm -v9 2/3] mm, THP, swap: Check whether THP can be split firstly
References: <20170419070625.19776-1-ying.huang@intel.com>
        <20170419070625.19776-3-ying.huang@intel.com>
        <20170419161318.GC3376@cmpxchg.org>
        <87efwnrjfg.fsf@yhuang-dev.intel.com>
        <20170420205035.GA13229@cmpxchg.org>
Date: Fri, 21 Apr 2017 08:34:22 +0800
In-Reply-To: <20170420205035.GA13229@cmpxchg.org> (Johannes Weiner's message
        of "Thu, 20 Apr 2017 16:50:35 -0400")
Message-ID: <87r30mha41.fsf@yhuang-dev.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1759
Lines: 39

Johannes Weiner <hannes@cmpxchg.org> writes:

> On Thu, Apr 20, 2017 at 08:50:43AM +0800, Huang, Ying wrote:
>> Johannes Weiner <hannes@cmpxchg.org> writes:
>> > On Wed, Apr 19, 2017 at 03:06:24PM +0800, Huang, Ying wrote:
>> >> With the patchset, the swap out throughput improves 3.6% (from about
>> >> 4.16GB/s to about 4.31GB/s) in the vm-scalability swap-w-seq test case
>> >> with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
>> >> device used is a RAM simulated PMEM (persistent memory) device.  To
>> >> test the sequential swapping out, the test case creates 8 processes,
>> >> which sequentially allocate and write to the anonymous pages until the
>> >> RAM and part of the swap device is used up.
>> >> 
>> >> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> >> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> >> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [for can_split_huge_page()]
>> >
>> > How often does this actually happen in practice? Because all that this
>> > protects us from is trying to allocate a swap cluster - which with the
>> > si->free_clusters list really isn't all that expensive - and return it
>> > again. Unless this happens all the time in practice, this optimization
>> > seems misplaced.
>>
>> To my surprise too, I found this patch has measurable impact in my
>> test.  The swap out throughput improves 3.6% in the vm-scalability
>> swap-w-seq test case with 8 processes.  Details are in the original
>> patch description.
>
> Yeah I think that justifies it.
>
> The changelog says "the patchset", I didn't realize this is the gain
> from just this patch alone. Care to update that?

Sorry for confusing, will update it in the next version.

Best Regards,
Huang, Ying

> Thanks!