Received: by 2002:a05:7412:d024:b0:f9:90c9:de9f with SMTP id bd36csp89660rdb; Wed, 20 Dec 2023 07:05:18 -0800 (PST) X-Google-Smtp-Source: AGHT+IHzcgjmwdEOLGTK5sbZTh5jR9l9JCZ948i5VWMQo5wXieI99gsi92dmqHSIa4f2JHbK0lY5 X-Received: by 2002:a05:6214:202f:b0:67f:68ed:ca3e with SMTP id 15-20020a056214202f00b0067f68edca3emr2256145qvf.46.1703084718423; Wed, 20 Dec 2023 07:05:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703084718; cv=none; d=google.com; s=arc-20160816; b=EvqBsfmLqglfMYeiBbe5qGsBsABAv75Z+h82XCFj092nA+0IPOZq/WKKXvxhU079oi sb/eUO1241uVYizTck0AVOVn/k42qg8TAwIgIIhlRBmw8YlzqVYB7ng19x7tQ4aQnYXF b082zI2NO4kKbZ5aQZWkLNyP/xsuv7Vr+ey1cMOmkU9ZFOG94MBD1d97fhwcEjnvqcOn 8oFGEbeftiF7eJvb+mfS1v1NyfwFuZp6mw1FXlNCpKtVroKzIHFcFlKtZjeRqt01I5er rMK86EG5h1tzb8Q8DowGXbXAOUIbD4LeHy67jkgDQ/eATfFOmvhACn5UKysrBGyS60cd gguQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=h7BGnrYwWm4ECJgPSco10cNCr3ei3qZXena4UlpVSqU=; fh=trt4PdSBegCHZPRiqPmatjero+uClKSnMdCqK8PHWAU=; b=u4q/ES6GluoH2Ix2Uoi5w7Yqjl8Qs44nJ8qFSJyb4IsrTFzpQ0WCXR5G3RI+dQUnMP GXwrF/oe2emnxf5EEsAuY8T7Bw6UEPIAmq7YrR4RxU0gMZZyExyHMLfV1zUyIB5X24B2 CY6H8jqCOYNwDmzzCs/0WGrrzipDdkqy7H1HxPq/8mlP3cPbgwDJedUjHgnFpjx8/x0Q R0GLak/tSv7kqY28NDrzY/uLnBOOvsGiqRdJjGAr9jV2QNz2h71vk1X4Tsle0l3l0kd5 1GtLbJn+VTfF6xPB1GgvhW8pVnrLDT3qZZmTTxPNJLMJ3rWIIPVFiQi/OPull8tuJUQL T/hQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-7113-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-7113-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id r17-20020a0c9e91000000b0067f79531c05si387997qvd.593.2023.12.20.07.05.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 07:05:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-7113-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-7113-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-7113-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 2E13A1C2131D for ; Wed, 20 Dec 2023 15:05:18 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B8B703DBA8; Wed, 20 Dec 2023 15:05:12 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C43483E477 for ; Wed, 20 Dec 2023 15:05:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 095B41FB; Wed, 20 Dec 2023 07:05:55 -0800 (PST) Received: from [10.57.75.247] (unknown [10.57.75.247]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 93AB03F64C; Wed, 20 Dec 2023 07:05:06 -0800 (PST) Message-ID: <396cae5d-70e4-449f-af6c-2348b720d3a3@arm.com> Date: Wed, 20 Dec 2023 15:05:05 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 02/16] mm: Batch-copy PTE ranges during fork() To: David Hildenbrand , Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20231218105100.172635-1-ryan.roberts@arm.com> <7c0236ad-01f3-437f-8b04-125d69e90dc0@redhat.com> <9a58b1a2-2c13-4fa0-8ffa-2b3d9655f1b6@arm.com> <28968568-f920-47ac-b6fd-87528ffd8f77@redhat.com> <10b0b562-c1c0-4a66-9aeb-a6bff5c218f6@arm.com> <8f8023cb-3c31-4ead-a9e6-03a10e9490c6@redhat.com> <699cb1db-51eb-460e-9ceb-1ce08ca03050@redhat.com> <2a8c5b6c-f5ae-43b2-99aa-6d10e79b76e1@redhat.com> <3194b8a5-3f72-4d9e-a267-fbdad32ad864@redhat.com> <9f99a3ca-051e-4b1b-81e9-8456d8e422ad@redhat.com> <5fcbf405-7e62-4b38-acc4-a9dd8cc91214@redhat.com> Content-Language: en-GB From: Ryan Roberts In-Reply-To: <5fcbf405-7e62-4b38-acc4-a9dd8cc91214@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 20/12/2023 14:00, David Hildenbrand wrote: > [...] > >>>> >>> >>> gcc version 13.2.1 20231011 (Red Hat 13.2.1-4) (GCC) >>> >>>  From Fedora 38. So "a bit" newer :P >>> >> >> I'll retry with newer toolchain. >> >> FWIW, with the code fix and the original compiler: >> >> Fork, order-0, Apple M2: >> | kernel                |   mean_rel |   std_rel | >> |:----------------------|-----------:|----------:| >> | mm-unstable           |       0.0% |      0.8% | >> | hugetlb-rmap-cleanups |       1.3% |      2.0% | >> | fork-batching         |       4.3% |      1.0% | >> >> Fork, order-9, Apple M2: >> | kernel                |   mean_rel |   std_rel | >> |:----------------------|-----------:|----------:| >> | mm-unstable           |       0.0% |      0.8% | >> | hugetlb-rmap-cleanups |       0.9% |      0.9% | >> | fork-batching         |     -37.3% |      1.0% | >> >> Fork, order-0, Ampere Altra: >> | kernel                |   mean_rel |   std_rel | >> |:----------------------|-----------:|----------:| >> | mm-unstable           |       0.0% |      0.7% | >> | hugetlb-rmap-cleanups |       3.2% |      0.7% | >> | fork-batching         |       5.5% |      1.1% | >> >> Fork, order-9, Ampere Altra: >> | kernel                |   mean_rel |   std_rel | >> |:----------------------|-----------:|----------:| >> | mm-unstable           |       0.0% |      0.1% | >> | hugetlb-rmap-cleanups |       0.5% |      0.1% | >> | fork-batching         |     -10.4% |      0.1% | >> > > I just gave it another quick benchmark run on that Intel system. > > hugetlb-rmap-cleanups -> fork-batching > > order-0: 0.014114 -> 0.013848 > > -1.9% > > order-9: 0.014262 -> 0.009410 > > -34% > > Note that I disable SMT and turbo, and pin the test to one CPU, to make the > results as stable as possible. My kernel config has anything related to > debugging disabled. > And with gcc 13.2 on arm64: Fork, order-0, Apple M2 VM: | kernel | mean_rel | std_rel | |:----------------------|-----------:|----------:| | mm-unstable | 0.0% | 1.5% | | hugetlb-rmap-cleanups | -3.3% | 1.1% | | fork-batching | -3.6% | 1.4% | Fork, order-9, Apple M2 VM: | kernel | mean_rel | std_rel | |:----------------------|-----------:|----------:| | mm-unstable | 0.0% | 1.8% | | hugetlb-rmap-cleanups | -5.8% | 1.3% | | fork-batching | -38.1% | 2.3% | Fork, order-0, Ampere Altra: | kernel | mean_rel | std_rel | |:----------------------|-----------:|----------:| | mm-unstable | 0.0% | 1.3% | | hugetlb-rmap-cleanups | -0.1% | 0.4% | | fork-batching | -0.4% | 0.5% | Fork, order-9, Ampere Altra: | kernel | mean_rel | std_rel | |:----------------------|-----------:|----------:| | mm-unstable | 0.0% | 0.1% | | hugetlb-rmap-cleanups | -0.1% | 0.1% | | fork-batching | -13.9% | 0.1% | So all looking good. Compiler was the issue. Sorry for the noise. So please go ahead with you rmap v2 stuff, and I'll wait for you to post the fork and zap batching patches properly, then rebase my arm64 contpte stuff on top and remeasure everything. Thanks, Ryan