Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp381068rdb; Thu, 21 Dec 2023 11:51:18 -0800 (PST) X-Google-Smtp-Source: AGHT+IGX2AQjokclHyEkGZThaIk0W+aXML76BIZhXf0azpvQtAM4qgy+XmU7lTJBsuOtds3FPeXh X-Received: by 2002:a05:6a00:8d98:b0:6d9:447a:9638 with SMTP id im24-20020a056a008d9800b006d9447a9638mr228421pfb.37.1703188277956; Thu, 21 Dec 2023 11:51:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703188277; cv=none; d=google.com; s=arc-20160816; b=J1nWbLY/d7HOmHB5LgsGYrtytTiPtuSRslQWRn+JNgppcG8TJm2x++K2Zg+Btmdbt3 dGBMMWFoaWimbDAiw7Vr5E4Q8sx9uB4Hpsoi4ac1UD6hQ8NHlPY4Iosn2kbrL8GAeSVB D3TVeokSzo9V12NEcS3CqiRLIfT/JQu2MwehASkUP62RgXHMD882lf1nkuGxe5XS/LlP MUjhZaY8JwHs+EJl9UBRa5Mtx2pl8/dq8h11oHDZP3sxS4KwGoE5HVEX0b4AHKHFRM42 dgSd8VcWXJNkG/AOJrq4knxXiXLfDSRml99/ynN+GLEEthens7Q0DwyNOf/PryV/ZXvb Jlhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=jkYZaGzPxkq2kteSU3jlRwqK1FmL8my5278X4xLgGkM=; fh=E1wFMOBf9CL8c33EbKMSIN7cQWRJqiPnpXeX8A6biyU=; b=KD653UC/rf2xreuC86qPwBlLF5tsC4xrbRKC25IsZFIada/mhDRaIFVsk4CZc+xHY8 xcan+gZ4h8TfPwUtM5uNm2zUmS3IWdKNzLA+xopD5mVmSomcaDP3bSQXDju27U1cYljB nd3xr1MJimVBkCfHBgAHqFJF8b3QgGbEEvVb9uR+841RcHdILGY/jw1c6rCO48iJx5Sv LO2DEQZk/UOvk0ufgCfDrO4rf79ts3OlTbpxOixw3L4E1arijCa0DlrgbwZoE0UoQ4R6 j+TJQB5D7/bMQc6Jfwj6mozJTeLdHh9FJkZ1y6sCajSHZqKs4kc1X5wnEXtaQA9zTho7 l+bw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=VMKdh2gt; spf=pass (google.com: domain of linux-kernel+bounces-9013-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-9013-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id c12-20020a056a00008c00b006d94573e781si2021012pfj.380.2023.12.21.11.51.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Dec 2023 11:51:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-9013-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=VMKdh2gt; spf=pass (google.com: domain of linux-kernel+bounces-9013-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-9013-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 9B2EA2867BD for ; Thu, 21 Dec 2023 19:51:17 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 70AA5697B1; Thu, 21 Dec 2023 19:51:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="VMKdh2gt" X-Original-To: linux-kernel@vger.kernel.org Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A29B651B4 for ; Thu, 21 Dec 2023 19:51:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=jkYZaGzPxkq2kteSU3jlRwqK1FmL8my5278X4xLgGkM=; b=VMKdh2gtMwrC3FAIuyqWrG9EQn U28olh8j+a+NXdl4qQ84hTfiqSlI00X9abBsH8obLeyLcxgOuBI/rCKbTf1tfce4iVTnqhLRG5yNk Dj5BJirDtX973eF2mf8ltVGphu/5yFP6FWk1n1rOKOJJ+5dZQ8zDQCP0nXoA8S8S3m4fsTjhsi56p fvjc75daOgfV1wjzw+CQmSMy9a3L8t6BSiBB3P6qS2SuE/4rohcV1K4MvrASjRNc0Nv4S8UiPaQJy qQWTBd/2jPjM1RL9c66u/iBMw5ZpEauG3HofqHTBpIL2CZhV0+3OxeEzsOmho2PsYVQRSp+ufDqu5 yvMzEBxQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1rGP4J-0061YK-D2; Thu, 21 Dec 2023 19:50:55 +0000 Date: Thu, 21 Dec 2023 19:50:55 +0000 From: Matthew Wilcox To: Fangrui Song Cc: Yang Shi , Andrew Morton , linux-mm@kvack.org, Song Liu , Miaohe Lin , linux-kernel@vger.kernel.org, Zhouyi Zhou Subject: Re: [PATCH] mm: remove VM_EXEC requirement for THP eligibility Message-ID: References: <20231220054123.1266001-1-maskray@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Dec 20, 2023 at 08:53:38PM -0800, Fangrui Song wrote: > Thanks for the comment. Frankly, I am not familiar with huge pages... > I noticed this VM_EXEC condition when I was writing this > hugepage-related section in > https://maskray.me/blog/2023-12-17-exploring-the-section-layout-in-linker-output#transparent-huge-pages-for-mapped-files > (Thanks to Alexander Monakov's comment about > CONFIG_READ_ONLY_THP_FOR_FS in > https://mazzo.li/posts/check-huge-page.html). CONFIG_READ_ONLY_THP_FOR_FS is a preliminary hack which solves some problems. The real solution is using large folios, which at the moment means that you should test on XFS or AFS; filesystem authors have not been enthusiastic about adding support to their filesystems so far. In your blog, you write: : In -z noseparate-code layouts, the file content starts somewhere at : the first page, potentially wasting half a huge page on unrelated : content. Switching to -z separate-code allows reclaiming the benefits : of the half huge page but increases the file size. Balancing : these aspects poses a challenge. One potential solution is using : fallocate(FALLOC_FL_PUNCH_HOLE), which introduces complexity into the : linker. However, this approach feels like a workaround to address a : kernel limitation. It would be preferable if a file-backed huge page : didn't necessitate a file offset aligned to a huge page boundary. You should distinguish between file size (ie st_size in stat(3)) and amount of space occupied on storage (st_blocks). The linker should be fine with creating a sparse file. If it doesn't, cp --sparse will do the trick. Yes, it's a kernel limitation that folios have to be aligned within the file as well as in both virtual and physical address space. It's a huge complexity win to do that; I don't think we'd be able to tile the page cache effectively if we allowed folios to be placed at arbitrary offsets (I think it turns into a knapsack problem at that point). > As dTLB for read-only data is also an important optimization of > file-backed THP, it seems straightforward that we should drop the > VM_EXEC condition :) I'm not particularly enthusiastic about making CONFIG_READ_ONLY_THP_FOR_FS better. Large folios are the future. Indeed, I'd like to see CONFIG_READ_ONLY_THP_FOR_FS go away in the next year or two once btrfs and ext4 have support for large folios.