Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
IronPort-SDR: iDyaWnHUrkAo8yQPYdUPC+jbuV27CL2z3GGRs1IqvF9Nt6SbbtjirkhqLrWHPDkB99w8/hoLEV
 voSBNZKvDa1Q==
IronPort-SDR: zmFcrBohW+JG8R86S3YwohkhcQ3wO0XgD/DjX22vbRzGTUGBkhtN8DjTLcT+6x1cm8bn9IsIDT
 Lrmtz//xUydw==
Subject: Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput
 -33.4% regression
To:     Mike Kravetz <mike.kravetz@oracle.com>,
        kernel test robot <rong.a.chen@intel.com>
Cc:     Linus Torvalds <torvalds@linux-foundation.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Michal Hocko <mhocko@kernel.org>,
        Hugh Dickins <hughd@google.com>,
        Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
        Andrea Arcangeli <aarcange@redhat.com>,
        "Kirill A.Shutemov" <kirill.shutemov@linux.intel.com>,
        Davidlohr Bueso <dave@stgolabs.net>,
        Prakash Sangappa <prakash.sangappa@oracle.com>,
        LKML <linux-kernel@vger.kernel.org>, lkp@lists.01.org
References: <20200622005551.GK5535@shao2-debian>
 <e140ec78-1fbd-73e2-7a11-7db3b714874d@oracle.com>
 <718e1653-b273-096b-0ee3-f720cf794612@oracle.com>
From:   Xing Zhengjun <zhengjun.xing@linux.intel.com>
Message-ID: <d945497d-0edb-f540-33e1-8b1ba1e20f62@linux.intel.com>
Date:   Fri, 21 Aug 2020 16:39:11 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
 Thunderbird/68.11.0
MIME-Version: 1.0
In-Reply-To: <718e1653-b273-096b-0ee3-f720cf794612@oracle.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk


On 6/26/2020 5:33 AM, Mike Kravetz wrote:
> On 6/22/20 3:01 PM, Mike Kravetz wrote:
>> On 6/21/20 5:55 PM, kernel test robot wrote:
>>> Greeting,
>>>
>>> FYI, we noticed a -33.4% regression of vm-scalability.throughput due to commit:
>>>
>>>
>>> commit: c0d0381ade79885c04a04c303284b040616b116e ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>
>>> in testcase: vm-scalability
>>> on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
>>> with following parameters:
>>>
>>> 	runtime: 300s
>>> 	size: 8T
>>> 	test: anon-cow-seq-hugetlb
>>> 	cpufreq_governor: performance
>>> 	ucode: 0x11
>>>
>>
>> Some performance regression is not surprising as the change includes acquiring
>> and holding the i_mmap_rwsem (in read mode) during hugetlb page faults.  33.4%
>> seems a bit high.  But, the test is primarily exercising the hugetlb page
>> fault path and little else.
>>
>> The reason for taking the i_mmap_rwsem is to prevent PMD unsharing from
>> invalidating the pmd we are operating on.  This specific test case is operating
>> on anonymous private mappings.  So, PMD sharing is not possible and we can
>> eliminate acquiring the mutex in this case.  In fact, we should check all
>> mappings (even sharable) for the possibly of PMD sharing and only take the
>> mutex if necessary.  It will make the code a bit uglier, but will take care
>> of some of these regressions.  We still need to take the mutex in the case
>> of PMD sharing.  I'm afraid a regression is unavoidable in that case.
>>
>> I'll put together a patch.
> 
> Not acquiring the mutex on faults when sharing is not possible is quite
> straight forward.  We can even use the existing routine vma_shareable()
> to easily check.  However, the next patch in the series 87bf91d39bb5
> "hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race" depends
> on always acquiring the mutex.  If we break this assumption, then the
> code to back out hugetlb reservations needs to be written.  A high level
> view of what needs to be done is in the commit message for 87bf91d39bb5.
> 
> I'm working on the code to back out reservations.
> 

I find that 34ae204f18519f0920bd50a644abd6fefc8dbfcf(hugetlbfs: remove 
call to huge_pte_alloc without i_mmap_rwsem) fixed this regression, I 
test with the patch, the regression reduced to 10.1%, do you have plan 
to continue to improve it? Thanks.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/size/test/cpufreq_governor/ucode:
 
lkp-knm01/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/8T/anon-cow-seq-hugetlb/performance/0x11

commit:
   49aef7175cc6eb703a9280a7b830e675fe8f2704
   c0d0381ade79885c04a04c303284b040616b116e
   v5.8
   34ae204f18519f0920bd50a644abd6fefc8dbfcf
   v5.9-rc1

49aef7175cc6eb70 c0d0381ade79885c04a04c30328                        v5.8 
34ae204f18519f0920bd50a644a                    v5.9-rc1
---------------- --------------------------- --------------------------- 
--------------------------- ---------------------------
          %stddev     %change         %stddev     %change 
%stddev     %change         %stddev     %change         %stddev
              \          |                \          |                \ 
         |                \          |                \
      38084           -31.1%      26231 ±  2%     -26.6%      27944 ± 
5%      -7.0%      35405            -7.5%      35244 
vm-scalability.median
       9.92 ±  9%     +12.0       21.95 ±  4%      +3.9       13.87 ± 
30%      -5.3        4.66 ±  9%      -6.6        3.36 ±  7% 
vm-scalability.median_stddev%
   12827311           -35.0%    8340256 ±  2%     -30.9%    8865669 ± 
5%     -10.1%   11532087           -10.2%   11513595 ±  2% 
vm-scalability.throughput
  2.507e+09           -22.7%  1.938e+09           -15.3%  2.122e+09 ± 
6%      +8.0%  2.707e+09            +8.0%  2.707e+09 ±  2% 
vm-scalability.workload


-- 
Zhengjun Xing