Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp17348134rwd; Tue, 27 Jun 2023 01:31:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6x8eDVxrDZJ8e4bHY07iZbCm2HKXmskDgx5sMePS3aoxDTX+86DvLSerzfwzeSloanVXVM X-Received: by 2002:a17:907:9722:b0:991:b292:695 with SMTP id jg34-20020a170907972200b00991b2920695mr4862625ejc.55.1687854710665; Tue, 27 Jun 2023 01:31:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687854710; cv=none; d=google.com; s=arc-20160816; b=ThKB78Wm30X4JDzHUURyY9rGqBFT6zPFJV0A4QyHcMKpmOSMzQB3IOQNn0lXT/8bkC uZUNIFoVoHiSLthhcIdoh1QRhIWIlfLY+Nx4E8te1YNSkmrSMnZyWKdsDZ9dXroju0br g92L4EPHd0May3jDIK+HXgq9hU4FHusOuq3WKvyIz5FzLE40u+PB6jltOI5xCiuu0wYp tthUqA7F1gP4RzuMI16H4rlH7OozhYvBf64Qi84WgckkiHPZl8yiPPGTJWhZqu/DDCkI 1oLyM+gjhDUhkhKLqTbVMushvGRCeBgWxvs/29QARlXSuK+wJ6jAEajcGt4mqfFYaUWk NCgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=eo3fjt+pdVHwFcUAUU2kV7EYv7OpDGZ269voWMltBrg=; fh=zTYFNic+S8YlfmSZQ9M48CdblHrjHKKPw48r1IJBtXA=; b=eYEmRg1gz0uTDuNS0PNncmdO6W6jNW2KxnW28vfcHFqeSsQQ8MwVoUs7SibVYEnm6q O0N5ThV3wleT1HJO2jqUY4cQohD80Xf8cZMfBvEnh1iT97e+lHe4zEBBpuYQfEFhejX1 srWkDuK7kM2EDd0KvXWFkKKaxr+wxDlty3/oiL879DsJIKiAvWRmkB7siOAIb5XYLOX0 2XUwyzFvq3Zj7CJ3QqJGW90UTiRkMxNisosnnXy2KUba3uNr3Tr+V8fOezd301prjKov N6ZuXviMEzwjNE+NJ1fHaYmhUSvLwgQeWatHQo95P8ggZ5paH8I97e80mWUK+YWrMXTa Sbog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=lPaiuSe+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q17-20020a170906b29100b00982278e2ce0si4141773ejz.210.2023.06.27.01.31.25; Tue, 27 Jun 2023 01:31:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=lPaiuSe+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231540AbjF0HuE (ORCPT + 99 others); Tue, 27 Jun 2023 03:50:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231578AbjF0Ht5 (ORCPT ); Tue, 27 Jun 2023 03:49:57 -0400 Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A738A10FB for ; Tue, 27 Jun 2023 00:49:54 -0700 (PDT) Received: by mail-qt1-x82b.google.com with SMTP id d75a77b69052e-401f4408955so141881cf.1 for ; Tue, 27 Jun 2023 00:49:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687852194; x=1690444194; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=eo3fjt+pdVHwFcUAUU2kV7EYv7OpDGZ269voWMltBrg=; b=lPaiuSe+QrHU/UAvIGj5ZYbZ3Cp+GUm9cg80JCZveFMlflbA5+j7acPoyJ5ZXu1Lut FvOPlQDymVBgnVuR4hmh5Rco8ghw0Rg53DPB/gJFQYhTQnfSG8GRcqsJdpzvenYBbgxO Bi+5qxIiGDf+RzqveF5bTQ7NC1lRm0Od9CVoE1bC8quQqKSsm1IfBZndE2JX0j2xNKaz Bofk6w6z/7YGNhu0EDK+9BZHKwWBWxbdHqZDaCr+FWfJ6huk/DCN3qp5dmlaQNZlxUoB mH+TY6w+jgwSZUPvcKkJsuIZP2Qo4uVMrri0wcbSXHK3rmedhiI2mEkWxD2pnIzjHwJm OQrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687852194; x=1690444194; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eo3fjt+pdVHwFcUAUU2kV7EYv7OpDGZ269voWMltBrg=; b=T9C/9uzRVJTMZ+4jpj++FMLpZ+KXWU5Mptkg02fJYQRpp1B2KRyfMG1Xsh8YBjmbLS a4W7SyB3OTg8ZspBEHKepwrm9QqpxI3CFUeVXTXZITYYXqh/VLlPl4f4wysBmW/0By+E U9+W1RC/KZNtdyFYmYCK3aHBlmRzu8R2UMh6Ea6/zj68OjbORjRwBvXKmFkPT3RTLAwE RTIViq8yzlYD3W+Drm1+2fzluaV4tAb9DS4s1flNlObEgu7YuFM5tkGs+JsLB32Dk17E rv+HxzRfLU0+WKrcAA86oia8GtYhr457gSaueGVXayGLuxBstMy/mhTNSw9PnkgbNxFF m9CA== X-Gm-Message-State: AC+VfDyt/66l5d20BBr/FZ9+33gbAonlPDkAt0El/raU43xWamIJp77L TnywLq+zR2Kg8rj1Hq/qiBvomCIz//g2UEk52HIg3Q== X-Received: by 2002:a05:622a:24a:b0:3f6:97b4:1a4d with SMTP id c10-20020a05622a024a00b003f697b41a4dmr430821qtx.23.1687852193576; Tue, 27 Jun 2023 00:49:53 -0700 (PDT) MIME-Version: 1.0 References: <20230626171430.3167004-1-ryan.roberts@arm.com> In-Reply-To: From: Yu Zhao Date: Tue, 27 Jun 2023 01:49:17 -0600 Message-ID: Subject: Re: [PATCH v1 00/10] variable-order, large folios for anonymous memory To: Ryan Roberts Cc: Andrew Morton , "Matthew Wilcox (Oracle)" , "Kirill A. Shutemov" , Yin Fengwei , David Hildenbrand , Catalin Marinas , Will Deacon , Geert Uytterhoeven , Christian Borntraeger , Sven Schnelle , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-s390@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 26, 2023 at 9:30=E2=80=AFPM Yu Zhao wrote: > > On Mon, Jun 26, 2023 at 11:14=E2=80=AFAM Ryan Roberts wrote: > > > > Hi All, > > > > Following on from the previous RFCv2 [1], this series implements variab= le order, > > large folios for anonymous memory. The objective of this is to improve > > performance by allocating larger chunks of memory during anonymous page= faults: > > > > - Since SW (the kernel) is dealing with larger chunks of memory than b= ase > > pages, there are efficiency savings to be had; fewer page faults, ba= tched PTE > > and RMAP manipulation, fewer items on lists, etc. In short, we reduc= e kernel > > overhead. This should benefit all architectures. > > - Since we are now mapping physically contiguous chunks of memory, we = can take > > advantage of HW TLB compression techniques. A reduction in TLB press= ure > > speeds up kernel and user space. arm64 systems have 2 mechanisms to = coalesce > > TLB entries; "the contiguous bit" (architectural) and HPA (uarch). > > > > This patch set deals with the SW side of things only and based on feedb= ack from > > the RFC, aims to be the most minimal initial change, upon which future > > incremental changes can be added. For this reason, the new behaviour is= hidden > > behind a new Kconfig switch, CONFIG_LARGE_ANON_FOLIO, which is disabled= by > > default. Although the code has been refactored to parameterize the desi= red order > > of the allocation, when the feature is disabled (by forcing the order t= o be > > always 0) my performance tests measure no regression. So I'm hoping thi= s will be > > a suitable mechanism to allow incremental submissions to the kernel wit= hout > > affecting the rest of the world. > > > > The patches are based on top of v6.4 plus Matthew Wilcox's set_ptes() s= eries > > [2], which is a hard dependency. I'm not sure of Matthew's exact plans = for > > getting that series into the kernel, but I'm hoping we can start the re= view > > process on this patch set independently. I have a branch at [3]. > > > > I've posted a separate series concerning the HW part (contpte mapping) = for arm64 > > at [4]. > > > > > > Performance > > ----------- > > > > Below results show 2 benchmarks; kernel compilation and speedometer 2.0= (a > > javascript benchmark running in Chromium). Both cases are running on Am= pere > > Altra with 1 NUMA node enabled, Ubuntu 22.04 and XFS filesystem. Each b= enchmark > > is repeated 15 times over 5 reboots and averaged. > > > > All improvements are relative to baseline-4k. 'anonfolio-basic' is this= series. > > 'anonfolio' is the full patch set similar to the RFC with the additiona= l changes > > to the extra 3 fault paths. The rest of the configs are described at [4= ]. > > > > Kernel Compilation (smaller is better): > > > > | kernel | real-time | kern-time | user-time | > > |:----------------|------------:|------------:|------------:| > > | baseline-4k | 0.0% | 0.0% | 0.0% | > > | anonfolio-basic | -5.3% | -42.9% | -0.6% | > > | anonfolio | -5.4% | -46.0% | -0.3% | > > | contpte | -6.8% | -45.7% | -2.1% | > > | exefolio | -8.4% | -46.4% | -3.7% | > > | baseline-16k | -8.7% | -49.2% | -3.7% | > > | baseline-64k | -10.5% | -66.0% | -3.5% | > > > > Speedometer 2.0 (bigger is better): > > > > | kernel | runs_per_min | > > |:----------------|---------------:| > > | baseline-4k | 0.0% | > > | anonfolio-basic | 0.7% | > > | anonfolio | 1.2% | > > | contpte | 3.1% | > > | exefolio | 4.2% | > > | baseline-16k | 5.3% | > > Thanks for pushing this forward! > > > Changes since RFCv2 > > ------------------- > > > > - Simplified series to bare minimum (on David Hildenbrand's advice) > > My impression is that this series still includes many pieces that can > be split out and discussed separately with followup series. > > (I skipped 04/10 and will look at it tomorrow.) I went through the series twice. Here what I think a bare minimum series (easier to review/debug/land) would look like: 1. a new arch specific function providing a prefered order within (0, PMD_ORDER). 2. an extended anon folio alloc API taking that order (02/10, partially). 3. an updated folio_add_new_anon_rmap() covering the large() && !pmd_mappable() case (similar to 04/10). 4. s/folio_test_pmd_mappable/folio_test_large/ in page_remove_rmap() (06/10, reviewed-by provided). 5. finally, use the extended anon folio alloc API with the arch preferred order in do_anonymous_page() (10/10, partially). The rest can be split out into separate series and move forward in parallel with probably a long list of things we need/want to do.