Received: by 2002:ab2:3350:0:b0:1f4:6588:b3a7 with SMTP id o16csp2029503lqe; Tue, 9 Apr 2024 07:43:07 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVpFKoYh7iz9EODqg9jrIIjUALhsZ7ZWUodr+wIhWSFvBnJy7XKMG6Kd+sfVjknwz3JMX3Cymbm+mD4eUTc2pmDYc/vdu8Man/WkZyd9A== X-Google-Smtp-Source: AGHT+IHwpJ8qlF3/t5PRVWcd96dDnnV5iyLa8+LqnxN5O38aQRUhTlmb3Q8l1wMSRh8V4siCOoJK X-Received: by 2002:a05:622a:60b:b0:432:e8c4:2e6b with SMTP id z11-20020a05622a060b00b00432e8c42e6bmr15211743qta.19.1712673787448; Tue, 09 Apr 2024 07:43:07 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712673787; cv=pass; d=google.com; s=arc-20160816; b=j6wPNdwTu8NA9VTQE5aJXHJFPOvNL2R5aqKyT8+SM9B6oswnOJ4CGKLBsCcKycUYPK E4fKF7vAFvAopTTafY5N96QPMszdivpFPTqj7AW96EbOHp3Ogff7cJRFjcFONQ+vByjO 33wED9VljMSEdPGZlK2zAsK2rVPGBKxrBmudLEj0wU+dzUOAeIvT92Molhf88sy4qHMs mmxhn7VPl8F3dF55lOVDrmqooTvU4oLwEk3njSwx/gIkpkyuyNkTatI9RzHy3xgW1hRS yzr7q5WBiEfiosC9TmTeo1JzsflF/7y+rpVC1OzFRB0/Djgrmt/HFkXJZWUU/zJq9OLq 3OgQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=Zm13LNlesvFMwKSuwiaB4FHPVCQU9BtitXYpzA5cOjs=; fh=hIPXi5ggGlRs3uPP7BMsw0Z0Q8E4f9jRI+aweQjck9g=; b=FMh6/vkGgor6PHoXMIQEje8GBW7Zj9tznUKoQBnNg+d663FwCJUfUDsQfgTznUU6Yl su5EZ7RcLYsa+7vR5OGNpG3+WKS2zF+7UPyJcFKFMIFIwrVDAz/1UrqAxiEqNMpsGlh3 sK+QuglDAK4nppouPccZ+g1x+4VDYrI4/Q1UaDtNjCInK+0hN8U3+Fss2wRu3CNvEeDM z+50KfIH8LId/U5j1RJhXXYdCKdP44mhPt+i2u3kskyqSdwwoOVuGUvyWvVaKtWWz1HX tII+v2W3gO8I+kqE4+R8uQYVXQEw7dnpLwg5ZOvf5AT/e3mwKsg49NP8VNa5v3LdBA2n I8FQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-137104-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-137104-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d8-20020ac85ac8000000b004344673572dsi10865565qtd.242.2024.04.09.07.42.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Apr 2024 07:43:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-137104-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-137104-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-137104-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 84B861C20AF7 for ; Tue, 9 Apr 2024 14:42:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 86EFF13118C; Tue, 9 Apr 2024 14:39:37 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3E63F8F77 for ; Tue, 9 Apr 2024 14:39:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712673576; cv=none; b=nyygqFLaoMiTbPXnc5+9zajV7rjkA6XH3nqhjqjcqlDPCBgsacJ7R+nR1PvHJ6ccAcueivhmqeBrMoZaAgO949noURn97tnGI71rzGkI5pliWAESTRBuZQRFGJE+qbYkIuQsbwQaaKwen6pegKvXHuafTX/40rGfk7ZZCsZonzA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712673576; c=relaxed/simple; bh=UupAhcVDZ+YPXdJA9ZfJYnp9ZzmUBNiEeIsHXJvdxUM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=difJBWO8JqhS0lugWqLXo0HI50hJaajbByffUsjkEp+NbrSuCi+OK4s6UhZftHaAXZjPtumxA7HVMqzhFsZbIQueJbZK8ABZMzrJazO47qfrm/ik8WmCoBzw23KrYbMhRS9swHo7R74X3Pxr1hc3T8Yrl4iQJMO2rGn+39nb97A= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A9CEF139F; Tue, 9 Apr 2024 07:40:03 -0700 (PDT) Received: from [10.1.33.185] (XHFQ2J9959.cambridge.arm.com [10.1.33.185]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 78BDA3F766; Tue, 9 Apr 2024 07:39:30 -0700 (PDT) Message-ID: Date: Tue, 9 Apr 2024 15:39:29 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 0/4] Speed up boot with faster linear map creation Content-Language: en-GB To: David Hildenbrand , Itaru Kitayama Cc: Catalin Marinas , Will Deacon , Mark Rutland , Ard Biesheuvel , Donald Dutile , Eric Chanudet , Linux ARM , "linux-kernel@vger.kernel.org" References: <20240404143308.2224141-1-ryan.roberts@arm.com> <533adb77-8c2b-40db-84cb-88de77ab92bb@arm.com> <1d5abb48-08a8-4d83-a681-6915bc7b6907@arm.com> <268FBD1C-B102-4726-A7F4-1125123BDA7A@linux.dev> <5e4dc2fe-2945-4fc5-a533-c8b2d04668a0@redhat.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 09/04/2024 15:29, David Hildenbrand wrote: > On 09.04.24 16:13, Ryan Roberts wrote: >> On 09/04/2024 12:51, David Hildenbrand wrote: >>> On 09.04.24 13:29, David Hildenbrand wrote: >>>> On 09.04.24 13:22, David Hildenbrand wrote: >>>>> On 09.04.24 12:13, Itaru Kitayama wrote: >>>>>> >>>>>> >>>>>>> On Apr 9, 2024, at 19:04, Ryan Roberts wrote: >>>>>>> >>>>>>> On 09/04/2024 01:10, Itaru Kitayama wrote: >>>>>>>> Hi Ryan, >>>>>>>> >>>>>>>>> On Apr 8, 2024, at 16:30, Ryan Roberts wrote: >>>>>>>>> >>>>>>>>> On 06/04/2024 11:31, Itaru Kitayama wrote: >>>>>>>>>> Hi Ryan, >>>>>>>>>> >>>>>>>>>> On Sat, Apr 06, 2024 at 09:32:34AM +0100, Ryan Roberts wrote: >>>>>>>>>>> Hi Itaru, >>>>>>>>>>> >>>>>>>>>>> On 05/04/2024 08:39, Itaru Kitayama wrote: >>>>>>>>>>>> On Thu, Apr 04, 2024 at 03:33:04PM +0100, Ryan Roberts wrote: >>>>>>>>>>>>> Hi All, >>>>>>>>>>>>> >>>>>>>>>>>>> It turns out that creating the linear map can take a significant >>>>>>>>>>>>> proportion of >>>>>>>>>>>>> the total boot time, especially when rodata=full. And most of the >>>>>>>>>>>>> time is spent >>>>>>>>>>>>> waiting on superfluous tlb invalidation and memory barriers. This >>>>>>>>>>>>> series reworks >>>>>>>>>>>>> the kernel pgtable generation code to significantly reduce the number >>>>>>>>>>>>> of those >>>>>>>>>>>>> TLBIs, ISBs and DSBs. See each patch for details. >>>>>>>>>>>>> >>>>>>>>>>>>> The below shows the execution time of map_mem() across a couple of >>>>>>>>>>>>> different >>>>>>>>>>>>> systems with different RAM configurations. We measure after applying >>>>>>>>>>>>> each patch >>>>>>>>>>>>> and show the improvement relative to base (v6.9-rc2): >>>>>>>>>>>>> >>>>>>>>>>>>>                    | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere >>>>>>>>>>>>> Altra >>>>>>>>>>>>>                    | VM, 16G     | VM, 64G     | VM, 256G    | Metal, >>>>>>>>>>>>> 512G >>>>>>>>>>>>> ---------------|-------------|-------------|-------------|------------- >>>>>>>>>>>>>                    |   ms    (%) |   ms    (%) |   ms    (%) | >>>>>>>>>>>>> ms    (%) >>>>>>>>>>>>> ---------------|-------------|-------------|-------------|------------- >>>>>>>>>>>>> base           |  153   (0%) | 2227   (0%) | 8798   (0%) | 17442   >>>>>>>>>>>>> (0%) >>>>>>>>>>>>> no-cont-remap  |   77 (-49%) |  431 (-81%) | 1727 (-80%) |  3796 >>>>>>>>>>>>> (-78%) >>>>>>>>>>>>> batch-barriers |   13 (-92%) |  162 (-93%) |  655 (-93%) |  1656 >>>>>>>>>>>>> (-91%) >>>>>>>>>>>>> no-alloc-remap |   11 (-93%) |  109 (-95%) |  449 (-95%) |  1257 >>>>>>>>>>>>> (-93%) >>>>>>>>>>>>> lazy-unmap     |    6 (-96%) |   61 (-97%) |  257 (-97%) |   838 >>>>>>>>>>>>> (-95%) >>>>>>>>>>>>> >>>>>>>>>>>>> This series applies on top of v6.9-rc2. All mm selftests pass. I've >>>>>>>>>>>>> compile and >>>>>>>>>>>>> boot tested various PAGE_SIZE and VA size configs. >>>>>>>>>>>>> >>>>>>>>>>>>> --- >>>>>>>>>>>>> >>>>>>>>>>>>> Changes since v1 [1] >>>>>>>>>>>>> ==================== >>>>>>>>>>>>> >>>>>>>>>>>>>       - Added Tested-by tags (thanks to Eric and Itaru) >>>>>>>>>>>>>       - Renamed ___set_pte() -> __set_pte_nosync() (per Ard) >>>>>>>>>>>>>       - Reordered patches (biggest impact & least controversial first) >>>>>>>>>>>>>       - Reordered alloc/map/unmap functions in mmu.c to aid reader >>>>>>>>>>>>>       - pte_clear() -> __pte_clear() in clear_fixmap_nosync() >>>>>>>>>>>>>       - Reverted generic p4d_index() which caused x86 build error. >>>>>>>>>>>>> Replaced with >>>>>>>>>>>>>         unconditional p4d_index() define under arm64. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> https://lore.kernel.org/linux-arm-kernel/20240326101448.3453626-1-ryan.roberts@arm.com/ >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Ryan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Ryan Roberts (4): >>>>>>>>>>>>>       arm64: mm: Don't remap pgtables per-cont(pte|pmd) block >>>>>>>>>>>>>       arm64: mm: Batch dsb and isb when populating pgtables >>>>>>>>>>>>>       arm64: mm: Don't remap pgtables for allocate vs populate >>>>>>>>>>>>>       arm64: mm: Lazily clear pte table mappings from fixmap >>>>>>>>>>>>> >>>>>>>>>>>>> arch/arm64/include/asm/fixmap.h  |   5 +- >>>>>>>>>>>>> arch/arm64/include/asm/mmu.h     |   8 + >>>>>>>>>>>>> arch/arm64/include/asm/pgtable.h |  13 +- >>>>>>>>>>>>> arch/arm64/kernel/cpufeature.c   |  10 +- >>>>>>>>>>>>> arch/arm64/mm/fixmap.c           |  11 + >>>>>>>>>>>>> arch/arm64/mm/mmu.c              | 377 +++++++++++++++++++++++-------- >>>>>>>>>>>>> 6 files changed, 319 insertions(+), 105 deletions(-) >>>>>>>>>>>>> >>>>>>>>>>>>> --  >>>>>>>>>>>>> 2.25.1 >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I've build and boot tested the v2 on FVP, base is taken from your >>>>>>>>>>>> linux-rr repo. Running run_vmtests.sh on v2 left some gup longterm not >>>>>>>>>>>> oks, would you take a look at it? The mm ksefltests used is from your >>>>>>>>>>>> linux-rr repo too. >>>>>>>>>>> >>>>>>>>>>> Thanks for taking a look at this. >>>>>>>>>>> >>>>>>>>>>> I can't reproduce your issue unfortunately; steps as follows on Apple >>>>>>>>>>> M2 VM: >>>>>>>>>>> >>>>>>>>>>> Config: arm64 defconfig + the following: >>>>>>>>>>> >>>>>>>>>>> # Squashfs for snaps, xfs for large file folios. >>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZ4 >>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_LZO >>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_XZ >>>>>>>>>>> ./scripts/config --enable CONFIG_SQUASHFS_ZSTD >>>>>>>>>>> ./scripts/config --enable CONFIG_XFS_FS >>>>>>>>>>> >>>>>>>>>>> # For general mm debug. >>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM >>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE >>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_RB >>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS >>>>>>>>>>> ./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE >>>>>>>>>>> ./scripts/config --enable CONFIG_PAGE_TABLE_CHECK >>>>>>>>>>> >>>>>>>>>>> # For mm selftests. >>>>>>>>>>> ./scripts/config --enable CONFIG_USERFAULTFD >>>>>>>>>>> ./scripts/config --enable CONFIG_TEST_VMALLOC >>>>>>>>>>> ./scripts/config --enable CONFIG_GUP_TEST >>>>>>>>>>> >>>>>>>>>>> Running on VM with 12G memory, split across 2 (emulated) NUMA nodes >>>>>>>>>>> (needed by >>>>>>>>>>> some mm selftests), with kernel command line to reserve hugetlbs and >>>>>>>>>>> other >>>>>>>>>>> features required by some mm selftests: >>>>>>>>>>> >>>>>>>>>>> " >>>>>>>>>>> transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable >>>>>>>>>>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2 >>>>>>>>>>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K >>>>>>>>>>> hugepages=0:2,1:2 >>>>>>>>>>> " >>>>>>>>>>> >>>>>>>>>>> Ubuntu userspace running off XFS rootfs. Build and run mm selftests >>>>>>>>>>> from same >>>>>>>>>>> git tree. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Although I don't think any of this config should make a difference to >>>>>>>>>>> gup_longterm. >>>>>>>>>>> >>>>>>>>>>> Looks like your errors are all "ftruncate() failed". I've seen this >>>>>>>>>>> problem on >>>>>>>>>>> our CI system. There it is due to running the tests from NFS file >>>>>>>>>>> system. What >>>>>>>>>>> filesystem are you using? Perhaps you are sharing into the FVP using >>>>>>>>>>> 9p? That >>>>>>>>>>> might also be problematic. >>>>>>>>>> >>>>>>>>>> That was it. This time I booted up the kernel including your series on >>>>>>>>>> QEMU on my M1 and executed the gup_longterm program without the ftruncate >>>>>>>>>> failures. When testing your kernel on FVP, I was executing the script >>>>>>>>>> from the FVP's host filesystem using 9p. >>>>>>>>> >>>>>>>>> I'm not sure exactly what the root cause is. Perhaps there isn't enough >>>>>>>>> space on >>>>>>>>> the disk? It might be worth enhancing the error log to provide the >>>>>>>>> errno in >>>>>>>>> tools/testing/selftests/mm/gup_longterm.c. >>>>>>>>> >>>>>>>> >>>>>>>> Attached is the strace’d gup_longterm executiong log on your >>>>>>>> pgtable-boot-speedup-v2 kernel. >>>>>>> >>>>>>> Sorry are you saying that it only fails with the pgtable-boot-speedup-v2 >>>>>>> patch >>>>>>> set applied? I thought we previously concluded that it was independent of >>>>>>> that? >>>>>>> I was under the impression that it was filesystem related and not something >>>>>>> that >>>>>>> I was planning to investigate. >>>>>> >>>>>> No, irrespective of the kernel, if using 9p on FVP the test program fails. >>>>>> It is indeed 9p filesystem related, as I switched to using NFS all the >>>>>> issues are gone. >>>>> >>>>> Did it never work on 9p? If so, we might have to SKIP that test. >>>>> >>>>> openat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", O_RDWR|O_CREAT|O_EXCL, >>>>> 0600) = 3 >>>>> unlinkat(AT_FDCWD, "gup_longterm.c_tmpfile_BLboOt", 0) = 0 >>>>> fstatfs(3, 0xffffe505a840)              = -1 EOPNOTSUPP (Operation not >>>>> supported) >>>>> ftruncate(3, 4096)                      = -1 ENOENT (No such file or >>>>> directory) >>>> >>>> Note: I'm wondering if the unlinkat here is the problem that makes >>>> ftruncate() with 9p result in weird errors (e.g., the hypervisor >>>> unlinked the file and cannot reopen it for the fstatfs/ftruncate. ... >>>> which gives us weird errors here). >>>> >>>> Then, we should lookup the fs type in run_with_local_tmpfile() before >>>> the unlink() and simply skip the test if it is 9p. >>> >>> The unlink with 9p most certainly was a known issue in the past: >>> >>> https://gitlab.com/qemu-project/qemu/-/issues/103 >>> >>> Maybe it's still an issue with older hypervisors (QEMU?)? Or it was never >>> completely resolved? >> >> I believe Itaru is running on FVP (Fixed Virtual Platform - "fast model" - >> Arm's architecture emulator). So QEMU won't be involved here. The FVP emulates >> a 9p device, so perhaps the bug is in there. > > Very likely. > >> >> Note that I see lots of "fallocate() failed" failures in gup_longterm when >> running on our CI system. This is a completely different setup; Real HW with >> Linux running bare metal using an NFS rootfs. I'm not sure if this is related. >> Logs show it failing consistently for the "tmpfile" and "local tmpfile" test >> configs. I also see a couple of these fails in the cow tests. > > What is the fallocate() errno you are getting? strace log would help (to see if > statfs also fails already)! Likely a similar NFS issue. Unfortunately this is a system I don't have access to. I've requested some of this triage to be done, but its fairly low priority unfortunately.