Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp1312895rwi; Thu, 20 Oct 2022 10:57:54 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6mgDAJYQrA93nSocil+JU+84Dg3z1VIvhonZlkWx8pkYBSb3/JiBG7hGoDHIWFBDimM0RT X-Received: by 2002:a05:6402:2913:b0:45c:a7d6:c1ef with SMTP id ee19-20020a056402291300b0045ca7d6c1efmr13008133edb.276.1666288673769; Thu, 20 Oct 2022 10:57:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666288673; cv=none; d=google.com; s=arc-20160816; b=Y//sfgRvHwFsmm/3zRrW9K6YdVhRgWgvZ+bl4zsgHty2izooCV4zKSZ5eoORF2Aher exFvyI05O3vdUP27QktDX+RAIdz2lt58Oo4DJ+NKFn2E9HrorLpZgoe9js9r5nqGR3Ef oEe6K9VSrb/33l9rh3E52aWV+C3I5O+jsGt1Whyaf8X1G5hHWpuwxlME0cYrEzt7Uwpk 2R3+kqOBAf5IdR2f5dQ6QYtGsbbG1ShrXMxixLKWd/vQyddDxExHLSrfSa46LdqkkVvN fZEGJCQ22KwxxIUP8DSquD2n4aQj+pr6FLTVvonI0QIwbKGnNuWpexQNiQSUkoU2ZGL8 FHKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=hAYs5LPwdOVcyR6TPpcu6D9xHQZgQU1cl27KqeZI9Ic=; b=J3EgbWTybtnhlgXD2vZBlVx78X7vfSmKKLOzdGODIMGFFEuFaKHgDXE2FMFTdD/TFV OKKHFT8QPj9wCVvbl+qaQ/3niJQK9jamPVGZvz6wW3E3jtHqL2P/G+SiE7s6MG3aLymX hdr7q5OT4RBRSFWWH6l06hK/MAIxy6r6noqKRM2U2fomWKdTteienCXNvCVBMix4AZ80 I1gj4Kg6ZorJR77fZUb5BxNYIbD63RXRwsaObqeCXEqE+EX369QYv0c3SFEpJf0Jl17X 7NwpXMp1TXw/aaMhkywwSghzmTO2clLm8AlrDxPvsT4outDhurfWRiSeNm3P7NHgcF7a WtrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=XTDgM2l4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hh6-20020a170906a94600b00781b67df7f6si17128320ejb.167.2022.10.20.10.57.28; Thu, 20 Oct 2022 10:57:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=XTDgM2l4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229958AbiJTRQ1 (ORCPT + 99 others); Thu, 20 Oct 2022 13:16:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229622AbiJTRQW (ORCPT ); Thu, 20 Oct 2022 13:16:22 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3932E1C4ED7 for ; Thu, 20 Oct 2022 10:16:21 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C9A2B61CD1 for ; Thu, 20 Oct 2022 17:16:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6D49EC433C1; Thu, 20 Oct 2022 17:16:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1666286180; bh=AuNPXEriMinpq6NQJseGQt+ecB+hhC8Jw8SzfZT7M0Q=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=XTDgM2l40sbykPCnoeu94u0Mc4uGNuIfWAfp2J1b72iHfxXysHMzaTwV3mNCUYu2z SwAKh/uGMUjEoofG+Cy8PPmWGc0DB7uM5ZvbS0mhabOIjeKo8D6i/oJAjgzy6X8+B7 e9tGWlf3i7rmeU78imt3XP2C+OEwopABxiFWjfoTf3Uwd/IOi8sTrynw9XsYScxY1Q ejgTBk2RPWuhm1KhCZhwFUHCYfUXnCjyz5njxcxaSqZBTRbcQPxPtZXnug281aC3KT kEIWlD6e2f3++wLjIOsTZyBcG/Q3cNKz4Tgc2o3O64MwSYqTY0mH4Uec/SHJX50ORj rKebXr3V7R28Q== Date: Thu, 20 Oct 2022 10:16:17 -0700 From: Nathan Chancellor To: Rik van Riel Cc: "Huang, Ying" , kernel test robot , lkp@lists.01.org, lkp@intel.com, Andrew Morton , Yang Shi , Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org, feng.tang@intel.com, zhengjun.xing@linux.intel.com, fengwei.yin@intel.com Subject: Re: [mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression Message-ID: References: <202210181535.7144dd15-yujie.liu@intel.com> <87edv4r2ip.fsf@yhuang6-desk2.ccr.corp.intel.com> <871qr3nkw2.fsf@yhuang6-desk2.ccr.corp.intel.com> <366045a27a96e01d0526d63fd78d4f3c5d1f530b.camel@surriel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <366045a27a96e01d0526d63fd78d4f3c5d1f530b.camel@surriel.com> X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Rik, On Thu, Oct 20, 2022 at 11:28:16AM -0400, Rik van Riel wrote: > On Thu, 2022-10-20 at 13:07 +0800, Huang, Ying wrote: > > > > Nathan Chancellor writes: > > > > > > For what it's worth, I just bisected a massive and visible > > > performance > > > regression on my Threadripper 3990X workstation to commit > > > f35b5d7d676e > > > ("mm: align larger anonymous mappings on THP boundaries"), which > > > seems > > > directly related to this report/analysis. I initially noticed this > > > because my full set of kernel builds against mainline went from 2 > > > hours > > > and 20 minutes or so to over 3 hours. Zeroing in on x86_64 > > > allmodconfig, > > > which I used for the bisect: > > > > > > @ 7b5a0b664ebe ("mm/page_ext: remove unused variable in > > > offline_page_ext"): > > > > > > Benchmark 1: make -skj128 LLVM=1 allmodconfig all > > >   Time (mean ± σ):     318.172 s ±  0.730 s    [User: 31750.902 s, > > > System: 4564.246 s] > > >   Range (min … max):   317.332 s … 318.662 s    3 runs > > > > > > @ f35b5d7d676e ("mm: align larger anonymous mappings on THP > > > boundaries"): > > > > > > Benchmark 1: make -skj128 LLVM=1 allmodconfig all > > > Time (mean ± σ): 406.688 s ± 0.676 s [User: 31819.526 s, > System: 16327.022 s] > > > Range (min … max): 405.954 s … 407.284 s 3 run > > > > Have you tried to build with gcc?  Want to check whether is this > > clang > > specific issue or not. > > This may indeed be something LLVM specific. In previous tests, > GCC has generally seen a benefit from increased THP usage. > Many other applications also benefit from getting more THPs. Indeed, GCC builds actually appear to be slightly faster on my system now, apologies for not trying that before reporting :/ 7b5a0b664ebe: Benchmark 1: make -skj128 allmodconfig all Time (mean ± σ): 355.294 s ± 0.931 s [User: 33620.469 s, System: 6390.064 s] Range (min … max): 354.571 s … 356.344 s 3 runs f35b5d7d676e: Benchmark 1: make -skj128 allmodconfig all Time (mean ± σ): 347.400 s ± 2.029 s [User: 34389.724 s, System: 4603.175 s] Range (min … max): 345.815 s … 349.686 s 3 runs > LLVM showing 10% system time before this change, and a whopping > 30% system time after that change, suggests that LLVM is behaving > quite differently from GCC in some ways. The above tests were done with GCC 12.2.0 from Arch Linux. The previous LLVM tests were done with a self-compiled version of LLVM from the main branch (16.0.0), optimized with BOLT [1]. To eliminate that as a source of issues, I used my distribution's version of clang (14.0.6) and saw similar results as before: 7b5a0b664ebe: Benchmark 1: make -skj128 LLVM=/usr/bin/ allmodconfig all Time (mean ± σ): 462.517 s ± 1.214 s [User: 48544.240 s, System: 5586.212 s] Range (min … max): 461.115 s … 463.245 s 3 runs f35b5d7d676e: Benchmark 1: make -skj128 LLVM=/usr/bin/ allmodconfig all Time (mean ± σ): 547.927 s ± 0.862 s [User: 47913.709 s, System: 17682.514 s] Range (min … max): 547.429 s … 548.922 s 3 runs > If we can figure out what these differences are, maybe we can > just fine tune the code to avoid this issue. > > I'll try to play around with LLVM compilation a little bit next > week, to see if I can figure out what might be going on. I wonder > if LLVM is doing lots of mremap calls or something... If there is any further information I can provide or patches I can test, I am more than happy to do so. [1]: https://github.com/llvm/llvm-project/tree/96552e73900176d65ee6650facae8d669d6f9498/bolt Cheers, Nathan