Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp2865546rdh; Mon, 27 Nov 2023 00:14:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IGHCmJHm9bOJufyMoR1pRUcqvpY56GrfTDHRSbrhm29L2P5+H3o0aEaajHg6uYCRLeizbJc X-Received: by 2002:a05:6a00:2191:b0:6cb:4cf4:2f4b with SMTP id h17-20020a056a00219100b006cb4cf42f4bmr11845352pfi.2.1701072874496; Mon, 27 Nov 2023 00:14:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701072874; cv=none; d=google.com; s=arc-20160816; b=l5iMhtQThgx8Ixp+8S0k/kycvOeCeY7avhcLwQQ7maDpZz18lgfgthUa3Qwt+BVs6B R/XEfXkRAG22tV8jUxmtC/3R1KFWfZkpnO6YT7tRUKLxSka2Q0mn1Fgk/ANJRLCRlqyH xvOJt3H038IdToILLmL0zK8BPsr476PD5rDiROlut2LTdlmU6nVc/XbPIW7P+J3KKjG1 rV9ZIX14U4/n7KFTPe+sjltkhlzfVUQ/0ycqZqKYi7asWlSj9PComMyDJDsugOeqeirc SwEmHVeTeuYBaqMKjAdRhvIMIoToWo8XH1VwyQ92EvN5vQhb1ayjz0RR+X36RE3eqhkv 4aHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=PdH2jyu800OfUGXIQJ9d4watjXtBqJ+y351M+NsN6Fw=; fh=n6ejAeoWN0PkON63OYb0qOd8vSWjNCuR2z4alAfISUo=; b=iueXdguCISxTIbkP3NIGi5V/5x8+AQTUdChqcNJaM2Smv37dLY19LMXYt+vUW6rUWm QBlIPmJnX/n4g5Rufj8Gl13J74cQynOpPub2c5Dpb3XrcX0loEh0+SBmxnN2MjgpIGo+ QGye0Ru8Hg9cOwpes+/mwYHzh9EgfFFGj1+bB+VY5x9625YAKGJG38uuNgBxzomKXoNf IRTOl86odzZ5Pqh1fvmQLYa6jSBdCPCsZRDiljh/Na1G9oJ/ClDLLqwkZAJNh8XGxYyC kA8xtoEoMO5a4v3+OWMpFyByijIcti9XEtlh87J7Yjg11uyqUYROuXjFHzghSSL+YSEY Ir5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=TFLcZrFn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id x24-20020a634858000000b005a9e4c3d350si9058899pgk.743.2023.11.27.00.14.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 00:14:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=TFLcZrFn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 882E5808EE41; Mon, 27 Nov 2023 00:14:30 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232454AbjK0IOM (ORCPT + 99 others); Mon, 27 Nov 2023 03:14:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232458AbjK0IOK (ORCPT ); Mon, 27 Nov 2023 03:14:10 -0500 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9FA3BE for ; Mon, 27 Nov 2023 00:14:15 -0800 (PST) Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-6cb9dd2ab56so3263945b3a.3 for ; Mon, 27 Nov 2023 00:14:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701072855; x=1701677655; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PdH2jyu800OfUGXIQJ9d4watjXtBqJ+y351M+NsN6Fw=; b=TFLcZrFnTU0lcWmYbilODFCaz5TKc/flv6Fd8/sGL+XxYlpvmvw6cZI5PXgHe/bs88 Ow3NYvaxIL1WTyMdg+boEyVP9TpmiLHKe/9o72vslXg1AzG9o0e2eFLAow8o2it43NvR Bi5+dURT8gDdZLKI7eEqY5zMaKJTj+vWOSyybY9SyEA1xKiI05mFBnFlQYsSLThAMqoI n+OjSM9zn6yRplS3hm8HU4U/01xPAoDlxtaxUH/uXI4YWkKGxxOl+j1ef9xWMONIq5KG DaB9a1C+9kT25r2NOVlKWJuajc9ffadfL26HSmXj0lhUvNvSYxQ6dy5iu71gtgemociS K7jQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701072855; x=1701677655; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PdH2jyu800OfUGXIQJ9d4watjXtBqJ+y351M+NsN6Fw=; b=UewVBL3pS3RxWudImPVLlICVYWebLpnu6K4/9i19aRi5ME/1gZQ4+1UiKjDKry8J2Q 7G25yrRarv3eFiD8SvrBuChoxVyWiS6BJ4wV5OOXx/3Zl6E+Etqvl7JbBumjeqgK1H18 LR8A0mV6VrwcS3nPO4kyqT7V2DcLmgTk8BIsFLPMXCMzjDfQQ96a3tZn5w0t07vGQDT8 pydfjjYAvCd13FbmOhqkepHz1ew9pE4XQmHw1n9XkfK7pwDtWPCcl72hQryHKMYklPB7 FTHIRMpWY5tQ2Ve5y14XY4LpjyMtRNWiT/KWO/eNO2ZuRjxP3R4mCkOcpKptN7Yvc2HZ WB9g== X-Gm-Message-State: AOJu0YxWzOJb1LRswf9A0P8pzR/p64af05Ws7rG2lFUPnv6igyqNSvhU ZbvQFa1nXZYxCOxnToxfMF33F+0SyncsnJcxfsbmgQ== X-Received: by 2002:a05:6a20:9e4d:b0:14c:c511:387d with SMTP id mt13-20020a056a209e4d00b0014cc511387dmr11710221pzb.9.1701072855207; Mon, 27 Nov 2023 00:14:15 -0800 (PST) MIME-Version: 1.0 References: <20231123065708.91345-1-luxu.kernel@bytedance.com> <94c2e04c-4c62-4ee1-8ae7-cbd675c5064e@app.fastmail.com> In-Reply-To: <94c2e04c-4c62-4ee1-8ae7-cbd675c5064e@app.fastmail.com> From: Xu Lu Date: Mon, 27 Nov 2023 16:14:03 +0800 Message-ID: Subject: Re: [External] Re: [RFC PATCH V1 00/11] riscv: Introduce 64K base page To: Arnd Bergmann Cc: Paul Walmsley , Palmer Dabbelt , Albert Ou , Ard Biesheuvel , Anup Patel , Atish Patra , dengliang.1214@bytedance.com, Xie Yongji , lihangjing@bytedance.com, Muchun Song , punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 27 Nov 2023 00:14:30 -0800 (PST) Thanks a lot for your reply! And sorry for replying so late. On Thu, Nov 23, 2023 at 5:30=E2=80=AFPM Arnd Bergmann wrote= : > > On Thu, Nov 23, 2023, at 07:56, Xu Lu wrote: > > Some existing architectures like ARM supports base page larger than 4K > > as their MMU supports more page sizes. Thus, besides hugetlb page and > > transparent huge page, there is another way for these architectures to > > enjoy the benefits of fewer TLB misses without worrying about cost of > > splitting and merging huge pages. However, on architectures with only > > 4K MMU, larger base page is unavailable now. > > > > This patch series attempts to break through the limitation of MMU and > > supports larger base page on RISC-V, which only supports 4K page size > > now. > > > > The key idea to implement larger base page based on 4K MMU is to > > decouple the MMU page from the base page in view of kernel mm, which we > > denote as software page. In contrary to software page, we denote the MM= U > > page as hardware page. Below is the difference between these two kinds > > of pages. > > We have played with this on arm32, but the conclusion is that it's > almost never worth the memory overhead, as most workloads end up > using several times the amount of physical RAM after each small > file in the page cache and any sparse populated anonymous memory > area explodes to up to 16 times the size. > > On ppc64, using 64KB pages was way to get around limitations in > their hashed MMU design, which had a much bigger performance impact > because any page table access ends up being a cache miss. On arm64, > there are some CPUs like the Fujitsu A64FX that are really bad at > 4KB pages and don't support 16KB pages, so this is the only real > option. > > You will see a notable performance benefit in synthetic benchmarks > like speccpu with 64KB pages, or on specific computational > workloads that have large densely packed memory chunks, but for > real workloads, the usual answer is to just use transparent > hugepages for larger mappings and a page size of no more than > 16KB for the page cache. Actually we did find actual performance benefits brought by 64K page size in real business scenarios. On the Ampere ARM server, when applying 64K base page size, we saw an improvement of 2.5x for both qps and latency on redis, a performance improvement of 10~20% on our own newsql database and 50% on object storage. For mysql, the qps increases about 14%, 17.5% and 20% for read-only, write-only and random read/write workloads respectively. And the latency reduces about 13.7%, 15.8% and 14.5% on average. This is also why we chose to implement a similar feature on RISC-V in the beginning. > > With the work going into using folios in the kernel (see e.g. > https://lwn.net/Articles/932386/), even the workloads that > benefit from 64KB base pages should be better off with 4KB > pages and just using the TLB hints for large folios. Maybe 64K page size combined with large folios can achieve more benefits. As is mentioned in this patch[1], a 64K page size kernel combined with large folios and THPs via cont pte can achieve speedup of 10.5x on some memory-intensive workloads on arm64 SBSA server. [1] https://lore.kernel.org/all/c507308d-bdd4-5f9e-d4ff-e96e4520be85@nvidia= .com/ > > Arnd