Received: by 2002:a05:6500:1b45:b0:1f5:f2ab:c469 with SMTP id cz5csp82858lqb; Tue, 16 Apr 2024 09:21:16 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVvyZL+PhDlhuYC03vSuRyAppr6J6u/aYym4yL5pZpattYtQmv03CBsPl2QBI0w7f1bDhcltlM6SGwDVxelEiWGF10Ow1liZyKKsa2Cyg== X-Google-Smtp-Source: AGHT+IFdFfsJR/5vYufa+o8ly5dfdvHJkO7vjDQct2y23UPHGK8hjA8JFGwCaj1LZM+BKwa4UIN0 X-Received: by 2002:a17:902:6546:b0:1e4:320b:4311 with SMTP id d6-20020a170902654600b001e4320b4311mr3245230pln.34.1713284475879; Tue, 16 Apr 2024 09:21:15 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713284475; cv=pass; d=google.com; s=arc-20160816; b=HDv9cBOde9O1vROZyIBgLkbwsXLawqYTBnvNoj6imYiOMCIm+/GVbQo4mJIp6lyM4W IQfLySzoZVY3FbAs0kps+9TyDhiqkg/tvLlxUA7XG7UvEtDt1+q6E6zn1PdexF7v2tYu AUcDLzPhDcI+Fm8GTfFpIzhdZ/kvew+u4PR5JmFnOjud4BoR3zAYHLVMkKMJciTOeLpy 2cJKFHlaK2G1A3cDiFU6xXKKbo6vqJZxB3Whpi6apGkReosVKgcNA7HzZWhORjAVLekI WC/CptbRkbrzX7nfbe79ZL4B2vdIIMctjjy+SrR592aautZCYL7B4KMm+pTcLou3GSxv kanw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date:dkim-signature; bh=QutnKxs+EPULjG0HWXlw2WQTKH2p8mi5dRfV25LWpgU=; fh=R8olTB8XBM+kYmlbHD3Pg4Jmf/goVW9TS0cYiOW5HRo=; b=rr/1puiq/X+QPkaN6rk2oL04EJoMTR5ed6Exn6zYbz6DWqeAJPmUYIJ4/hufomhgKk cjog2vqZb4CDpOf5OevqrVJ4kN9u9zNZtFnG65/I+2vex/fqC+9UrkFRvp314OItaBi2 7be/HAIvoPOxfEVflDAPHY5w3P/4w59BIjQrNs4S1qwgOCO+yxO1HYc3wYN4lRvPBi01 7pyiQmW7n/MERt8sxUCTQbVYnwhk2QbknMWwLn5BYwC4RhbQ9dLsbVxHwXoHGV2NEPnO 1b9xKE2CzV6X7VDXBs7wsPSqt1ydgIlqqeBolFxwoXP2jfwcmAzNCUCU2Ktcw7ZNRe6M /OfQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=JJ3JfecI; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-ext4+bounces-2106-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-ext4+bounces-2106-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id cp1-20020a170902e78100b001e49bce9d71si9822891plb.276.2024.04.16.09.21.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Apr 2024 09:21:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4+bounces-2106-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=JJ3JfecI; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-ext4+bounces-2106-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-ext4+bounces-2106-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 45A4E287115 for ; Tue, 16 Apr 2024 16:20:45 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C9FC5131BDB; Tue, 16 Apr 2024 16:20:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JJ3JfecI" X-Original-To: linux-ext4@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4209B12CD81; Tue, 16 Apr 2024 16:20:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713284438; cv=none; b=db0azZIaqhGblAM3+tpaFFOS5w7xY3gJB+nNcDHrVkg0xiy7c5sqRBPyQ3qBT88jCmNS2/benrxg/iDpyjRDI05Y35THJR8ja8NXyCJCURTTKx7E8b8uZ5tuh4tTKJ00xlZ5DJnjsq3myk64XpqshzM66pUqV6so0KTDIXbg1PM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713284438; c=relaxed/simple; bh=IdxsQtYfDChkI825sswlPFFsLG18TizRixcQTe0GU+0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=oXy2MCQHPGM0Dad51RQFuirBmdy4gkRoct4iTYQ85rAKQW8Mgh/RC04S19CI3I4Bt+UTui3glSbDlqtT1/PpTkSMhHwbsyL++QlgeCSs2GJdXhhMfioqqtOUpq9h5oMbFnsZR8eT926JMD+iKQG1oDv07xMmNLLiuB6J26EOd20= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JJ3JfecI; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 41E98C113CE; Tue, 16 Apr 2024 16:20:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1713284437; bh=IdxsQtYfDChkI825sswlPFFsLG18TizRixcQTe0GU+0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=JJ3JfecIfR0RG90DjjKBO/IzYxYgUZoinL+ICvju60akeupllGua+bdVjUkHeZoOU rrh8C2I5+D6gylneSLP41HrjfkZ/jOqUjXFJl1wcPlI9coCwFXf37RY466/46Dflk2 uXKz1LZ4Em7voJLG1tQNdQ3cpqt0OttXJNpmAQ6MZ/o9jBu8928zDaLb05wPpTPoDZ X6MXOqMVy6GHFaE4uIxS+JDpwP+0hS145prGm6ZJjlWm1q98baevz2Hrw66dXwQV8Q 09Om9vh1/uMByU3H9EFkyYnx6XjHo2uOJMZkjd80hsOypgW4EW219VBE7X6HjaSvJQ btkrDaW78I1Ew== Date: Tue, 16 Apr 2024 19:19:27 +0300 From: Mike Rapoport To: Nam Cao Cc: =?iso-8859-1?Q?Bj=F6rn_T=F6pel?= , Christian Brauner , Andreas Dilger , Al Viro , linux-fsdevel , Jan Kara , Linux Kernel Mailing List , linux-riscv@lists.infradead.org, Theodore Ts'o , Ext4 Developers List , Conor Dooley , "Matthew Wilcox (Oracle)" , Anders Roxell , Alexandre Ghiti Subject: Re: riscv32 EXT4 splat, 6.8 regression? Message-ID: References: <20240416-deppen-gasleitung-8098fcfd6bbd@brauner> <8734rlo9j7.fsf@all.your.base.are.belong.to.us> <20240416171713.7d76fe7d@namcao> Precedence: bulk X-Mailing-List: linux-ext4@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20240416171713.7d76fe7d@namcao> On Tue, Apr 16, 2024 at 05:17:13PM +0200, Nam Cao wrote: > On 2024-04-16 Mike Rapoport wrote: > > Hi, > > > > On Tue, Apr 16, 2024 at 01:02:20PM +0200, Bj?rn T?pel wrote: > > > Christian Brauner writes: > > > > > > > [Adding Mike who's knowledgeable in this area] > > > > > > >> > Further, it seems like riscv32 indeed inserts a page like that to the > > > >> > buddy allocator, when the memblock is free'd: > > > >> > > > > >> > | [] __free_one_page+0x2a4/0x3ea > > > >> > | [] __free_pages_ok+0x158/0x3cc > > > >> > | [] __free_pages_core+0xe8/0x12c > > > >> > | [] memblock_free_pages+0x1a/0x22 > > > >> > | [] memblock_free_all+0x1ee/0x278 > > > >> > | [] mem_init+0x10/0xa4 > > > >> > | [] mm_core_init+0x11a/0x2da > > > >> > | [] start_kernel+0x3c4/0x6de > > > >> > > > > >> > Here, a page with VA 0xfffff000 is a added to the freelist. We were just > > > >> > lucky (unlucky?) that page was used for the page cache. > > > >> > > > >> I just educated myself about memory mapping last night, so the below > > > >> may be complete nonsense. Take it with a grain of salt. > > > >> > > > >> In riscv's setup_bootmem(), we have this line: > > > >> max_low_pfn = max_pfn = PFN_DOWN(phys_ram_end); > > > >> > > > >> I think this is the root cause: max_low_pfn indicates the last page > > > >> to be mapped. Problem is: nothing prevents PFN_DOWN(phys_ram_end) from > > > >> getting mapped to the last page (0xfffff000). If max_low_pfn is mapped > > > >> to the last page, we get the reported problem. > > > >> > > > >> There seems to be some code to make sure the last page is not used > > > >> (the call to memblock_set_current_limit() right above this line). It is > > > >> unclear to me why this still lets the problem slip through. > > > >> > > > >> The fix is simple: never let max_low_pfn gets mapped to the last page. > > > >> The below patch fixes the problem for me. But I am not entirely sure if > > > >> this is the correct fix, further investigation needed. > > > >> > > > >> Best regards, > > > >> Nam > > > >> > > > >> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > > > >> index fa34cf55037b..17cab0a52726 100644 > > > >> --- a/arch/riscv/mm/init.c > > > >> +++ b/arch/riscv/mm/init.c > > > >> @@ -251,7 +251,8 @@ static void __init setup_bootmem(void) > > > >> } > > > >> > > > >> min_low_pfn = PFN_UP(phys_ram_base); > > > >> - max_low_pfn = max_pfn = PFN_DOWN(phys_ram_end); > > > >> + max_low_pfn = PFN_DOWN(memblock_get_current_limit()); > > > >> + max_pfn = PFN_DOWN(phys_ram_end); > > > >> high_memory = (void *)(__va(PFN_PHYS(max_low_pfn))); > > > >> > > > >> dma32_phys_limit = min(4UL * SZ_1G, (unsigned long)PFN_PHYS(max_low_pfn)); > > > > > > Yeah, AFAIU memblock_set_current_limit() only limits the allocation from > > > memblock. The "forbidden" page (PA 0xc03ff000 VA 0xfffff000) will still > > > be allowed in the zone. > > > > > > I think your patch requires memblock_set_current_limit() is > > > unconditionally called, which currently is not done. > > > > > > The hack I tried was (which seems to work): > > > > > > -- > > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > > > index fe8e159394d8..3a1f25d41794 100644 > > > --- a/arch/riscv/mm/init.c > > > +++ b/arch/riscv/mm/init.c > > > @@ -245,8 +245,10 @@ static void __init setup_bootmem(void) > > > */ > > > if (!IS_ENABLED(CONFIG_64BIT)) { > > > max_mapped_addr = __pa(~(ulong)0); > > > - if (max_mapped_addr == (phys_ram_end - 1)) > > > + if (max_mapped_addr == (phys_ram_end - 1)) { > > > memblock_set_current_limit(max_mapped_addr - 4096); > > > + phys_ram_end -= 4096; > > > + } > > > } > > > > You can just memblock_reserve() the last page of the first gigabyte, e.g. > > "last page of the first gigabyte" - why first gigabyte? Do you mean > last page of *last* gigabyte? With 3G-1G split linear map can map only 1G from 0xc0000000 to 0xffffffff (or 0x00000000 with 32-bit overflow): [ 0.000000] lowmem : 0xc0000000 - 0x00000000 (1024 MB) > > if (!IS_ENABLED(CONFIG_64BIT) > > memblock_reserve(SZ_1G - PAGE_SIZE, PAGE_SIZE); > > > > The page will still be mapped, but it will never make it to the page > > allocator. > > > > The nice thing about it is, that memblock lets you to reserve regions that are > > not necessarily populated, so there's no need to check where the actual RAM > > ends. > > I tried the suggested code, it didn't work. I think there are 2 > mistakes: > - last gigabyte, not first > - memblock_reserve() takes physical addresses as arguments, not > virtual addresses > > The below patch fixes the problem. Is this what you really meant? > > Best regards, > Nam > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > index fa34cf55037b..ac7efdd77be8 100644 > --- a/arch/riscv/mm/init.c > +++ b/arch/riscv/mm/init.c > @@ -245,6 +245,7 @@ static void __init setup_bootmem(void) > * be done as soon as the kernel mapping base address is determined. > */ > if (!IS_ENABLED(CONFIG_64BIT)) { > + memblock_reserve(__pa(-PAGE_SIZE), __pa(PAGE_SIZE)); __pa(-PAGE_SIZE) is what I meant, yes. > max_mapped_addr = __pa(~(ulong)0); > if (max_mapped_addr == (phys_ram_end - 1)) > memblock_set_current_limit(max_mapped_addr - 4096); > -- Sincerely yours, Mike.