Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp3003635ybt; Mon, 29 Jun 2020 12:38:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwmLYgS1VP8evz32uE1tWXkaB0IXDp1LWFHeRzmVYn7ywGk+fkk8wN+CIUOPyl/hX1zepTR X-Received: by 2002:a50:8e53:: with SMTP id 19mr1955866edx.185.1593459511378; Mon, 29 Jun 2020 12:38:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593459511; cv=none; d=google.com; s=arc-20160816; b=eP+1cioTap46riC+05xsKLnPIBBLEtAkPsvtFixLLE4yp32n14Mwu3LOrA6UC4u5DW +g1u+VS3ar8mtFjoGi9JzObIyQbFwi2sB0+KD9uap8E3IVqxC5FnYkwwVGqNa89A9vg7 gTY5ZwDj/kJdTluV8kw5mCsANRDBeJgRffFdpRg3eBp6vqnpIN4/f9iFRHQe7GO+zhCN ZNgp9BPffcvR5CkMqWeIkGE7yDTAGtIYqlKwaWIbV3MXSj3z2xkM5f8fxT3kwIEjwHh2 sMCoBp8P5vNahAIL9eVQpcna2W/g32gVfc/FWUMelf94kGZl1wKq21CkJ8a4sYpGq10W 9cAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=pqj+dady1X+GemDq5ZnLVNODDe5xSuu9bujRjC2EVKw=; b=GdMwxfRgg94k1iOIYc012zGoHbD9XTASbNsMO5xeBVnt/QL6YYQDSXghw/+HvWntuR 6wP5ya2A7Ac8EqbgMdHBUeur8tpd8B5LgWpkEXV7KlDuY9aWw+3yefLtOxzWOHjW/vqb AngUqRAtCVq6o10zL71MKMdDCeKo6eFFzmt58FzCgFcvCU+B0OFNBUIot9oW11TOdgQv KmiI6rMAVgRTOcUgAEUCKL2bAoJG8vGgN4eKRlYFlj4gSC/WL7FEOtd0SkPnAHoJmnHd Un+aV8S98hji8tCEDNiyaECH8P1A7hl3c/LPot2YFqmnv+kdxPTAkFdwrfYrgP1l41+0 NbTQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dk14si298678ejb.640.2020.06.29.12.38.07; Mon, 29 Jun 2020 12:38:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730352AbgF2Tg4 (ORCPT + 99 others); Mon, 29 Jun 2020 15:36:56 -0400 Received: from mslow2.mail.gandi.net ([217.70.178.242]:56232 "EHLO mslow2.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733211AbgF2Tgw (ORCPT ); Mon, 29 Jun 2020 15:36:52 -0400 Received: from relay11.mail.gandi.net (unknown [217.70.178.231]) by mslow2.mail.gandi.net (Postfix) with ESMTP id 769703AB531 for ; Mon, 29 Jun 2020 14:42:35 +0000 (UTC) Received: from [192.168.1.11] (lfbn-gre-1-325-105.w90-112.abo.wanadoo.fr [90.112.45.105]) (Authenticated sender: alex@ghiti.fr) by relay11.mail.gandi.net (Postfix) with ESMTPSA id C2AD310000E; Mon, 29 Jun 2020 14:42:12 +0000 (UTC) Subject: Re: [PATCH 2/2] riscv: Use PUD/PGDIR entries for linear mapping when possible To: Atish Patra Cc: Anup Patel , "linux-kernel@vger.kernel.org List" , Atish Patra , Palmer Dabbelt , Paul Walmsley , linux-riscv , Alistair Francis References: <20200603153608.30056-1-alex@ghiti.fr> <20200603153608.30056-3-alex@ghiti.fr> <23529a84-44a0-3c45-f16d-5a7ee528610d@ghiti.fr> <2588a00a-b042-4902-1602-7cb8d587ac2b@ghiti.fr> From: Alex Ghiti Message-ID: <567064d8-0f23-1629-c40a-606a89cf4b97@ghiti.fr> Date: Mon, 29 Jun 2020 10:42:12 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Atish, Le 6/22/20 à 3:11 PM, Atish Patra a écrit : > On Sun, Jun 21, 2020 at 2:39 AM Alex Ghiti wrote: >> >> Hi Atish, >> >> Le 6/20/20 à 5:04 AM, Alex Ghiti a écrit : >>> Hi Atish, >>> >>> Le 6/19/20 à 2:16 PM, Atish Patra a écrit : >>>> On Thu, Jun 18, 2020 at 9:28 PM Alex Ghiti wrote: >>>>> Hi Atish, >>>>> >>>>> Le 6/18/20 à 8:47 PM, Atish Patra a écrit : >>>>>> On Wed, Jun 3, 2020 at 8:38 AM Alexandre Ghiti wrote: >>>>>>> Improve best_map_size so that PUD or PGDIR entries are used for >>>>>>> linear >>>>>>> mapping when possible as it allows better TLB utilization. >>>>>>> >>>>>>> Signed-off-by: Alexandre Ghiti >>>>>>> --- >>>>>>> arch/riscv/mm/init.c | 45 >>>>>>> +++++++++++++++++++++++++++++++++----------- >>>>>>> 1 file changed, 34 insertions(+), 11 deletions(-) >>>>>>> >>>>>>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c >>>>>>> index 9a5c97e091c1..d275f9f834cf 100644 >>>>>>> --- a/arch/riscv/mm/init.c >>>>>>> +++ b/arch/riscv/mm/init.c >>>>>>> @@ -424,13 +424,29 @@ static void __init create_pgd_mapping(pgd_t >>>>>>> *pgdp, >>>>>>> create_pgd_next_mapping(nextp, va, pa, sz, prot); >>>>>>> } >>>>>>> >>>>>>> -static uintptr_t __init best_map_size(phys_addr_t base, >>>>>>> phys_addr_t size) >>>>>>> +static bool is_map_size_ok(uintptr_t map_size, phys_addr_t base, >>>>>>> + uintptr_t base_virt, phys_addr_t size) >>>>>>> { >>>>>>> - /* Upgrade to PMD_SIZE mappings whenever possible */ >>>>>>> - if ((base & (PMD_SIZE - 1)) || (size & (PMD_SIZE - 1))) >>>>>>> - return PAGE_SIZE; >>>>>>> + return !((base & (map_size - 1)) || (base_virt & (map_size >>>>>>> - 1)) || >>>>>>> + (size < map_size)); >>>>>>> +} >>>>>>> + >>>>>>> +static uintptr_t __init best_map_size(phys_addr_t base, uintptr_t >>>>>>> base_virt, >>>>>>> + phys_addr_t size) >>>>>>> +{ >>>>>>> +#ifndef __PAGETABLE_PMD_FOLDED >>>>>>> + if (is_map_size_ok(PGDIR_SIZE, base, base_virt, size)) >>>>>>> + return PGDIR_SIZE; >>>>>>> + >>>>>>> + if (pgtable_l4_enabled) >>>>>>> + if (is_map_size_ok(PUD_SIZE, base, base_virt, size)) >>>>>>> + return PUD_SIZE; >>>>>>> +#endif >>>>>>> + >>>>>>> + if (is_map_size_ok(PMD_SIZE, base, base_virt, size)) >>>>>>> + return PMD_SIZE; >>>>>>> >>>>>>> - return PMD_SIZE; >>>>>>> + return PAGE_SIZE; >>>>>>> } >>>>>>> >>>>>>> /* >>>>>>> @@ -576,7 +592,7 @@ void create_kernel_page_table(pgd_t *pgdir, >>>>>>> uintptr_t map_size) >>>>>>> asmlinkage void __init setup_vm(uintptr_t dtb_pa) >>>>>>> { >>>>>>> uintptr_t va, end_va; >>>>>>> - uintptr_t map_size = best_map_size(load_pa, >>>>>>> MAX_EARLY_MAPPING_SIZE); >>>>>>> + uintptr_t map_size; >>>>>>> >>>>>>> load_pa = (uintptr_t)(&_start); >>>>>>> load_sz = (uintptr_t)(&_end) - load_pa; >>>>>>> @@ -587,6 +603,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa) >>>>>>> >>>>>>> kernel_virt_addr = KERNEL_VIRT_ADDR; >>>>>>> >>>>>>> + map_size = best_map_size(load_pa, PAGE_OFFSET, >>>>>>> MAX_EARLY_MAPPING_SIZE); >>>>>>> va_pa_offset = PAGE_OFFSET - load_pa; >>>>>>> va_kernel_pa_offset = kernel_virt_addr - load_pa; >>>>>>> pfn_base = PFN_DOWN(load_pa); >>>>>>> @@ -700,6 +717,8 @@ static void __init setup_vm_final(void) >>>>>>> >>>>>>> /* Map all memory banks */ >>>>>>> for_each_memblock(memory, reg) { >>>>>>> + uintptr_t remaining_size; >>>>>>> + >>>>>>> start = reg->base; >>>>>>> end = start + reg->size; >>>>>>> >>>>>>> @@ -707,15 +726,19 @@ static void __init setup_vm_final(void) >>>>>>> break; >>>>>>> if (memblock_is_nomap(reg)) >>>>>>> continue; >>>>>>> - if (start <= __pa(PAGE_OFFSET) && >>>>>>> - __pa(PAGE_OFFSET) < end) >>>>>>> - start = __pa(PAGE_OFFSET); >>>>>>> >>>>>>> - map_size = best_map_size(start, end - start); >>>>>>> - for (pa = start; pa < end; pa += map_size) { >>>>>>> + pa = start; >>>>>>> + remaining_size = reg->size; >>>>>>> + >>>>>>> + while (remaining_size) { >>>>>>> va = (uintptr_t)__va(pa); >>>>>>> + map_size = best_map_size(pa, va, >>>>>>> remaining_size); >>>>>>> + >>>>>>> create_pgd_mapping(swapper_pg_dir, va, pa, >>>>>>> map_size, PAGE_KERNEL); >>>>>>> + >>>>>>> + pa += map_size; >>>>>>> + remaining_size -= map_size; >>>>>>> } >>>>>>> } >>>>>>> >>>>>> This may not work in the RV32 with 2G memory and if the map_size is >>>>>> determined to be a page size >>>>>> for the last memblock. Both pa & remaining_size will overflow and the >>>>>> loop will try to map memory from zero again. >>>>> I'm not sure I understand: if pa starts at 0x8000_0000 and size is 2G, >>>>> then pa will overflow in the last iteration, but remaining_size will >>>>> then be equal to 0 right ? >>>>> >>>> Not unless the remaining_size is at least page size aligned. The last >>>> remaining size would "fff". >>>> It will overflow as well after subtracting the map_size. >> >> >> While fixing this issue, I noticed that if the size in the device tree >> is not aligned on PAGE_SIZE, the size is then automatically realigned on >> PAGE_SIZE: see early_init_dt_add_memory_arch where size is and-ed with >> PAGE_MASK to remove the unaligned part. >> > Yes. But the memblock size is not guaranteed to be PAGE_SIZE aligned. > The memblock size is updated in memblock_cap_size > > /* adjust *@size so that (@base + *@size) doesn't overflow, return > new size */ > static inline phys_addr_t memblock_cap_size(phys_addr_t base, > phys_addr_t *size) > { > return *size = min(*size, PHYS_ADDR_MAX - base); > } > Yes you're right, I will fix that in a v2. Thanks, Alex > You will not see this issue right away even if you allocate 2GB of > memory while running 32 bit linux in qemu > because the kernel removes anything beyond 0xc0400000 for 32 bit in > bootmem setup. > > │[ 0.000000][ T0] memblock_remove: [0xc0400000-0xfffffffe] > setup_bootmem+0x90/0x216 > > This also restricts the kernel to use only 1GB of memory even if > maximum physical memory supported is 2GB. > >> So the issue does not need to be fixed :) >> >> Thanks anyway, >> >> Alex >> >> >>>> >>>>> And by the way, I realize that this loop only handles sizes that are >>>>> aligned on map_size. >>>>> >>>> Yeah. >>> >>> >>> Thanks for noticing, I send a v2. >>> >>> Alex >>> >>> >>>> >>>>> Thanks, >>>>> >>>>> Alex >>>>> >>>>> >>>>>>> -- >>>>>>> 2.20.1 >>>>>>> >>>>>>> >>>> >>>> >>> > > > > -- > Regards, > Atish >