Received: by 2002:a05:7412:b101:b0:e2:908c:2ebd with SMTP id az1csp3158017rdb; Thu, 16 Nov 2023 01:36:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IGbFXfUP8OTXKZygns77Qe+9OT1OkI11vXViwHb8MYVHTSEMUQjHtkO1u0OwOeBjGnjstzA X-Received: by 2002:a17:90b:1b4d:b0:263:1f1c:ef4d with SMTP id nv13-20020a17090b1b4d00b002631f1cef4dmr12565088pjb.10.1700127369598; Thu, 16 Nov 2023 01:36:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700127369; cv=none; d=google.com; s=arc-20160816; b=LJyErdAiiljvchjSp/ACNDu4UieWNdpSxKhe9qZEfgVuiXLVgMdhj4XtR0L+28DEul hKNV1Xeon1Gg/XOduaDZWi3q4V5tuY6Awvewtw7QNaLPruEn+S1OijIqrSkkoqIILK2/ E67czeBK8nAgOnS0zTQOZ1cHGjctvH2NdMH0s/xpDrvgPTbzDdy9y8y9AIwfNvRSSmZh vKhqHkVfZgwi46mU6jNz8FvceSEhFaJan+9fDNNC5GNgbyKZpnqHeYQ3HiDLN1RmMMvh GNt5JclkwrPq/DHVTxcX+EVy5T6TwfdTONtm4bjfY8saoD28Fe8yEcn57dVT2jdxVgMB /IDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id; bh=m+1/YV0hAugOc6yGncJs+DfT85L8hpPTwMbJK7Tbfmk=; fh=JcuAArjwsgLDrxEsT3Gbe8hMuDaPt6QRfzmoJO6Jjwc=; b=w3Q1BSTE3QaLQGG8QFSeJGBl5ZFh0aktGJXNfrEgcs4gPN2knQTihYsy2gETXFSwdQ LAie10lR3Vf5zx5H/Llh2ZzbNqO09N5nmysN75jfqKp1o0SNibUEy1o/xJTU4wBtxttR /xaRd3ncjoHE/ZFsbOZx2M4XqTrqdVE8u9XafklV1yQfxfmOANnBqW8uACpals2YC6t1 elXbOWPIjR3kMFSWogJTlS4YI+cBqCin+jMwhMdmzA40CXbsd4yTRdcSDEJllofjmzJm GIymmBeRAitPvyQh96wEmF1VbmNHVa0scXm/dzqshF4K1MNYJziBklIl9m8WJv7GWCsp UfLQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id kk5-20020a17090b4a0500b00280979dbb7asi1750956pjb.103.2023.11.16.01.36.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Nov 2023 01:36:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 8EE14809F3BC; Thu, 16 Nov 2023 01:35:16 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235656AbjKPJfH (ORCPT + 99 others); Thu, 16 Nov 2023 04:35:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60062 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229806AbjKPJfG (ORCPT ); Thu, 16 Nov 2023 04:35:06 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A97CE1A3 for ; Thu, 16 Nov 2023 01:35:01 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E3AA61595; Thu, 16 Nov 2023 01:35:46 -0800 (PST) Received: from [10.1.35.163] (XHFQ2J9959.cambridge.arm.com [10.1.35.163]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B91443F6C4; Thu, 16 Nov 2023 01:34:57 -0800 (PST) Message-ID: Date: Thu, 16 Nov 2023 09:34:56 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 01/14] mm: Batch-copy PTE ranges during fork() To: Andrew Morton Cc: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20231115163018.1303287-1-ryan.roberts@arm.com> <20231115163018.1303287-2-ryan.roberts@arm.com> <20231115133743.674690dc78041768b79fadd9@linux-foundation.org> Content-Language: en-GB From: Ryan Roberts In-Reply-To: <20231115133743.674690dc78041768b79fadd9@linux-foundation.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Thu, 16 Nov 2023 01:35:16 -0800 (PST) On 15/11/2023 21:37, Andrew Morton wrote: > On Wed, 15 Nov 2023 16:30:05 +0000 Ryan Roberts wrote: > >> However, the primary motivation for this change is to reduce the number >> of tlb maintenance operations that the arm64 backend has to perform >> during fork > > Do you have a feeling for how much performance improved due to this? The commit log for patch 13 (the one which implements ptep_set_wrprotects() for armt64) has performance numbers for a fork() microbenchmark with/without the optimization: ---8<--- I see huge performance regression when PTE_CONT support was added, then the regression is mostly fixed with the addition of this change. The following shows regression relative to before PTE_CONT was enabled (bigger negative value is bigger regression): | cpus | before opt | after opt | |-------:|-------------:|------------:| | 1 | -10.4% | -5.2% | | 8 | -15.4% | -3.5% | | 16 | -38.7% | -3.7% | | 24 | -57.0% | -4.4% | | 32 | -65.8% | -5.4% | ---8<--- Note that's running on Ampere Altra, where TLBI tends to have high cost. > > Are there other architectures which might similarly benefit? By > implementing ptep_set_wrprotects(), it appears. If so, what sort of > gains might they see? The rationale for this is to reduce expense for arm64 to manage contpte-mappings. If other architectures support contpte-mappings then they could benefit from this API for the same reasons that arm64 benefits. I have a vague understanding that riscv has a similar concept to the arm64's contiguous bit, so perhaps they are a future candidate. But I'm not familiar with the details of the riscv feature so couldn't say whether they would be likely to see the same level of perf improvement as arm64. Thanks, Ryan